This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .
by Hugo Affaticati (Technical Program Manager - Microsoft), Sonal Doomra (Technical Program Manager 2 - Microsoft), Tri Dao (PhD Candidate - Hazy research), Reza Soroushmehr (Senior Product Manager - Microsoft), and Jon Shelley (Principal TPM Manager - Microsoft)
The latest MLPerf Training v2.1 submission demonstrates the performance gains that can be achieved on Azure by leveraging Hazy Research*** cutting-edge software optimization for BERT and NVIDIA accelerated computing. This collaborative submission demonstrates the potential for optimized heavy workloads combined with Azure’s innovative infrastructure.
On the hardware side, we benchmarked 1, 8, and 16 virtual machines of the NDm A100 v4-series featuring eight NVIDIA A100 Tensor Core GPUs on Azure. The cluster used the latest versions of the software stack (Ubuntu 20.04-HPC marketplace image and PyTorch NVIDIA release 22.09) and resources (Cycle Cloud 8.2 and slurm 2.6.5). The NDm A100 v4-series instance is what we and our Azure customers turn to for our large-scale AI and ML training workloads.
On the software side, our submissions benefit from algorithmic improvement to the self-attention module at the heart of the transformers. Existing implementations of self-attention tend to be slow and memory-hungry on long sequences. Instead, FlashAttention leverages tiling and recomputation techniques to reduce the GPU memory reads/writes of attention, without any approximation. Hazy Research’s implementation yields speedups in training and inference of large language models and image-generative models.
The main highlight from our submission is that Azure-HazyResearch is the only submitter that reached below the 2-minute mark with BERT on 16 virtual machines.
These training benchmark results demonstrate Azure’s commitment to providing our customers with the most efficient and scalable offerings — that are available on demand in the cloud — to allow them to exceed the on-premises performances for their AI workloads.
More about MLPerf™ from MLCommons®
MLCommons® is an open engineering consortium of AI leaders from academia, research labs, and industry where the mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. MLPerf™ Inference benchmarks consist of real-world compute-intensive AI workloads to best simulate customer’s needs. Tests are transparent and objective, so technology decision makers can rely on the results to make informed buying decisions.
Special thanks to NVIDIA for providing the guidance and containers to run these benchmarks.
To recreate MLPerf Training v2.1 results in Azure, please see here.