A quick start guide to benchmarking AI models in Azure: Llama 2 from MLPerf Inference v4.0

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

By: Mark Gitau, Software Engineer, and Hugo Affaticati, Technical Program Manager 2 


Useful resources: 

New NC H100 v5-series: Microsoft NC H100 v5-series 

Thought leadership article: Aka.ms/Blog/MLPerfInfv4 

Azure results for MLPerf Inference: MLPerf Inference V4.0  

Submission to GitHub: mlcommons/inference_results_v4.0 


Microsoft Azure has delivered industry-leading results for AI inference workloads amongst cloud service providers in the most recent MLPerf Inference results published publicly by MLCommons. The Azure results were achieved using the new NC H100 v5 Virtual Machines (VMs) and reinforced the commitment from Azure to designing AI infrastructure that is optimized for training and inferencing in the cloud. In this document, one will find the steps to reproduce the results with the model Llama 2 from MLPerf Inference v4.0 on the new NC H100 v5 virtual machines.  



Step 1: Deploy and set up a virtual machine on Azure. 

Step 2: Mount the NVMe disks

cd /mnt
sudo vi nvme.sh

Copy and paste the following mounting script:


NVME_DISKS_NAME=`ls /dev/nvme*n1`
NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l`

echo "Number of NVMe Disks: $NVME_DISKS"

if [ "$NVME_DISKS" == "0" ]
    exit 0
    mkdir -p /mnt/resource_nvme
    # Needed incase something did not unmount as expected. This will delete any data that may be left behind
    mdadm  --stop /dev/md*
    mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME
    mkfs.xfs -f /dev/md128
    mount /dev/md128 /mnt/resource_nvme

chmod 1777 /mnt/resource_nvme

Run the script to mount the disk

sudo sh nvme.sh

Step 3: Set up docker

Update the Docker root directory in the docker daemon configuration file

sudo vi /etc/docker/daemon.json

Paste the following lines:


Verify the previous steps and enable docker

docker --version
sudo systemctl restart docker
sudo systemctl enable docker

Register your user for Docker

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

You should not have any permission issues when running

docker info


Set up the environment:

Once your machine is deployed and configured, create a folder for the scripts and get the scripts from MLPerf Inference v4.0 repository. 

cd /mnt/resource_nvme
git clone https://github.com/mlcommons/inference_results_v4.0.git
cd inference_results_v4.0/closed/Azure

Create folders for the data and model:

export MLPERF_SCRATCH_PATH=/mnt/resource_nvme/scratch

To download the model and the preprocessed dataset, please follow the steps in code/llama2-70b/tensorrt/README.md (a license is required). 

Prebuild the container on the instance. 

make prebuild

The system name is saved under code/common/systems/custom_list.py and the configuration files are located in configs/[benchmark]/[scenario]/custom.py.  

You can finally build the container: 

make build


Run the benchmark 

Finally, run the benchmark with the make run command below. The performance result should match Azure’s official results published for MLPerf Inference v4.0. 

make run RUN_ARGS="--benchmarks=llama2-70b --scenarios=offline,server --config_ver=high_accuracy" 


Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.