AMD Says Instinct MI325X Bests Nvidia H200, Vows Huge Uplift With MI350

While AMD says its forthcoming Instinct MI325X GPU can outperform Nvidia’s H200 for large language model inference, the chip designer is teasing that its next-generation MI350 series will deliver magnitudes of better inference performance in the second half of next year.

AMD said its forthcoming 256-GB Instinct MI325X GPU can outperform Nvidia’s 141-GB H200 processor on AI inference workloads and vowed that the next-generation MI350 accelerator chips will improve performance by magnitudes.

When it comes to training AI models, AMD said the MI325X is on par or slightly better than the H200, the successor to Nvidia’s popular and powerful H100 GPU.

The Santa Clara, Calif.-based chip designer was expected to make the claims at its Advancing AI event in San Francisco, where the company will discuss its plan to take on AI computing giant Nvidia with Instinct chips, EPYC CPUs, networking chips, an open software stack and data center design expertise.

“AMD continues to deliver on our roadmap, offering customers the performance they need and the choice they want, to bring AI infrastructure, at scale, to market faster,” said Forrest Norrod, head of AMD’s Data Center Solutions business group, in a statement.

The MI325X is a follow-up with greater memory capacity and bandwidth to the Instinct MI300X, which launched last December and put AMD on the map as a worthy competitor to Nvidia’s prowess in delivering powerful AI accelerator chips. It’s part of AMD’s new strategy to release Instinct chips every year instead of every two years, which was explicitly done to keep up with Nvidia’s accelerated chip release cadence.

The MI325X is set to arrive in systems from Dell Technologies, Lenovo, Supermicro, Hewlett Packard Enterprise, Gigabyte, Eviden and several other server vendors starting in the first quarter of next year, according to AMD.

Instinct MI325X Specs And Performance Metrics

Whereas the Instinct MI300X features 192GB of HBM3 high-bandwidth memory and 5.3 TB/s in memory bandwidth, the MI325X—which is based on the same CDNA 3 GPU architecture as the MI300X—comes with 256GB in HBM3E memory and can reach 6 TB/s in memory bandwidth thanks to the update in memory format.

In terms of throughput, the MI325X has the same capabilities as the MI300X: 2.6 petaflops for 8-bit floating point (FP8) performance and 1.3 petaflops for 16-bit floating point (FP16).

When comparing AI inference performance to the H200 at a chip level, AMD said the MI325X provides 40 percent faster throughput with an 8-group, 7-billion-parameter Mixtral model; 30 percent lower latency with a 7-billion-parameter Mixtral model and 20 percent lower latency with a 70-billion-parameter Llama 3.1 model.

The MI325X will fit into the eight-chip Instinct MI325X platform, which will serve as the foundation for servers launching early next year.

With eight MI325X GPUs connected over AMD’s Infinity Fabric with a bandwidth of 896 GB/z, the platform will feature 2TB of HBM3e memory, 48 TB/s of memory bandwidth, 20.8 petaflops of FP8 performance and 10.4 petaflops of FP16 performance, AMD said.

According to AMD, this means the MI325X platform has 80 percent higher memory capacity, 30 percent greater memory bandwidth and 30 percent faster FP8 and FP16 throughput than Nvidia’s H200 HGX platform, which comes with eight H200 GPUs and started shipping earlier this year as the foundation for H200-based servers.

Comparing inference performance to the H200 HGX platform, AMD said the MI325X platform provides 40 percent faster throughput with a 405-billion-parameter Llama 3.1 model and 20 percent lower latency with a 70-billion-parameter Llama 3.1 model.

When it comes to training a 7-billion-parameter Llama 2 model on a single GPU, AMD said the MI325X is 10 percent faster than the H200, according to AMD. The MI325X platform, on the other hand, is on par with the H200 HGX platform when it comes to training a 70-billion-parameter Llama 2 model across eight GPUs, the company added.

AMD Teases 35-Fold Inference Boost For MI350 Chips

AMD said its next-generation Instinct MI350 accelerator chip series is on track to launch in the second half of next year and teased that it will provide up to a 35-fold improvement in inference performance compared to the MI300X.

The company said this is a projection based on engineering estimates for an eight-GPU MI350 platform running a 1.8-trillion-parameter Mixture of Experts model.

Based on AMD’s next-generation CDNA 4 architecture and using a 3-nanometer manufacturing process, the MI350 series will include the MI355X GPU, which will feature 288GB of HBM3e memory and 8 TB/s of memory bandwidth.

With the MI350 series supporting new 4-bit and 6-bit floating point formats (FP4, FP6), the MI355X is capable of achieving 9.2 petaflop, according to AMD. For FP8 and FP16, the MI355X is expected to reach 4.6 petaflops and 2.3 petaflops, respectively.

This means the next-generation Instinct chip is expected to provide 77 percent faster performance with the FP8 and FP16 formats than the MI325X or MI300X.

Featuring eight MI355X GPUs, the Instinct MI355X platform is expected to feature 2.3TB of HBM3e memory, 64 TB/s of memory bandwidth, 18.5 petaflops of FP16 performance, 37 petaflops of FP8 performance as well as 74 petaflops of FP6 and FP4 performance.

With the 74 petaflops of FP6 and FP4 performance, the MI355X platform is expected to be 7.4 times faster than FP16 capabilities of the MI300X platform, according to AMD.

The MI355X platform’s 50 percent greater memory capacity means it can support up to 4.2-trillion-parameter models on a single system, which is six times greater than what was capable with the MI300X platform.

After AMD debuts the MI355X in the second half of next year, the company plans to introduce the Instinct MI400 series in 2026 with a next-generation CDNA architecture.

New Features In AMD’s Open Software Stack

AMD said it is introducing new features and capabilities in its AMD ROCm open software stack, which includes new algorithms, new libraries and expanding platform support.

The company said ROCm now supports the “most widely used AI frameworks, libraries and models including PyTorch, Triton, Hugging Face and many others.”

“This work translates to out-of-the-box performance and support with AMD Instinct accelerators on popular generative AI models like Stable Diffusion 3, Meta Llama 3, 3.1 and 3.2 and more than one million models at Hugging Face,” AMD said.

With the new 6.2 version of ROCm, AMD will support the new FP8 format, Flash Attention 3 and Kernel Fusion, among other things. This will translate into a 2.4-fold improvement on inference performance and 80 percent better training performance for a variety of large language models, according to the company.