NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results



NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results

NVIDIA is working with companies worldwide to build out AI factories — speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference.

The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training — the 12th since the benchmark’s introduction in 2018 — the NVIDIA AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark’s toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

The NVIDIA platform was the only one that submitted results on every MLPerf Training v5.0 benchmark — underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks.

The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, NVIDIA collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs.

On the new Llama 3.1 405B pretraining benchmark, Blackwell delivered 2.2x greater performance compared with previous-generation architecture at the same scale.

On the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA DGX B200 systems, powered by eight Blackwell GPUs, delivered 2.5x more performance compared with a submission using the same number of GPUs in the prior round.

These performance leaps highlight advancements in the Blackwell architecture, including high-density liquid-cooled racks, 13.4TB of coherent memory per rack, fifth-generation NVIDIA NVLink and NVIDIA NVLink Switch interconnect technologies for scale-up and NVIDIA Quantum-2 InfiniBand networking for scale-out. Plus, innovations in the NVIDIA NeMo Framework software stack raise the bar for next-generation multimodal LLM training, critical for bringing agentic AI applications to market.

These agentic AI-powered applications will one day run in AI factories — the engines of the agentic AI economy. These new applications will produce tokens and valuable intelligence that can be applied to almost every industry and academic domain.

The NVIDIA data center platform includes GPUs, CPUs, high-speed fabrics and networking, as well as a vast array of software like NVIDIA CUDA-X libraries, the NeMo Framework, NVIDIA TensorRT-LLM and NVIDIA Dynamo. This highly tuned ensemble of hardware and software technologies empowers organizations to train and deploy models more quickly, dramatically accelerating time to value.

The NVIDIA partner ecosystem participated extensively in this MLPerf round. Beyond the submission with CoreWeave and IBM, other compelling submissions were from ASUS, Cisco, Dell Technologies, Giga Computing, Google Cloud, Hewlett Packard Enterprise, Lambda, Lenovo, Nebius, Oracle Cloud Infrastructure, Quanta Cloud Technology and Supermicro.

Learn more about MLPerf benchmarks.

By admin

Deixe um comentário

O seu endereço de email não será publicado. Campos obrigatórios marcados com *