AMD has unveiled its next-generation accelerators for artificial intelligence (AI) inference and training, among other workloads — claiming that the new Instinct MI350 Series delivers up to four times the raw compute performance of its last-generation parts and a whopping 35 times more inference performance — while existing Instinct users can be benefit from performance gains by upgrading to ROCm 7.0.
“The world of AI isn’t slowing down — and neither are we,” claims AMD’s Vamsi Boppana, senior vice president of the company’s artificial intelligence group. “At AMD, we’re not just keeping pace, we’re setting the bar. Our customers are demanding real, deployable solutions that scale, and that’s exactly what we’re delivering with the AMD Instinct MI350 Series. With cutting-edge performance, massive memory bandwidth, and flexible, open infrastructure, we’re empowering innovators across industries to go faster, scale smarter, and build what’s next.”
AMD has unveiled its Instinct MI350 Series of AI accelerators, power-hungry petaflop-scale parts for AI training and inference. (📷: AMD)
AMD has confirmed two models at launch, the Instinct MI350X and Instinct MI355X. The former features 288GB of High Bandwidth Memory 3E (HBM3E) with 8TB/s of bandwidth and delivers 72 tera-floating-point operations per second (TFLOPS) at FP64 precision rising to 18.45 peta-floating-point operations per second (PFLOPS) at FP6 or FP4 precision with structured sparsity; the latter includes the same memory specifications, which AMD says can run large language models (LLMs) and other AI models with up to 520 billion parameters on-device, but increases the performance to 78.6 TFLOPS at FP64 and 20.1 PFLOPS at FP6/FP4 with structured sparsity.
The company isn’t expecting users to buy just a single card, though: with the ever-growing power demands of both training and running next-generation models, AMD also offers the Instinct MI350X Platform and Instinct MI355X Platform — bundles of eight cards, offering a combined total of 2.3TB of HBM3E memory and peak performance of 147.6 PFLOPS and 161 PFLOPS respectively at FP6/FP4 with structured sparsity. Anyone looking to run such a system will need plenty of power and cooling at hand, though: each MI350X has a thermal design profile of an eyebrow-raising 1kW, with the MI355X upping that to 1.4kW.
Both new and existing Instinct users will also be able to benefit from AMD’s upcoming ROCm 7.0 release, the company says, which will deliver a claimed tripling of training performance and more than 3.5 times the inference performance of ROCm 7.0. “This achievement stems from advances in usability, performance, and support for lower precision data types like FP4 and FP6,” says AMD’s Anush Elangovan of the upcoming release. “Further enhancements in communication stacks have [also] optimized GPU utilization and data movement.”
The company also promises impressive performance gains for existing Instinct users, through the new ROCm 7.0 and its support for lower precision. (📷: AMD)
Finally, the most patient and heavily-funded AI experimenters may want to hang fire until the new year, with Boppana teasing the next-next-generation Instinct MI400 Series due for release in 2026. “The AMD Instinct MI400 Series will represent a dramatic generational leap in performance enabling full rack level solutions for large scale training and distributed inference,” he says, revealing models offering up to 432GB of HBM4 memory with 19.6TB/s of bandwidth and performance up to 40 FLOPS at FP4 with structured sparsity. These will be offered, he says, in an “AI Rack” system dubbed “Helios,” which combines the cards with AMD EPYC “Venice” CPUs and Pesando “Vulcano” AI network cards to form an all-in-one platform for training and inference workloads.
Instinct MI350 Series cards, meanwhile, will be made available on-demand through cloud providers and for on-premises use from original equipment manufacturers including Dell, HPE, and Supermicro. More information is available on the AMD website.