This post has been republished via RSS; it originally appeared at: Azure Compute Blog articles.
The Azure HPC team is pleased to announce the availability of optimized and pre-configured VM images for HPC and AI workloads. These VM images (VMI) are based on the base CentOS and Ubuntu marketplace VM images. These VMIs are pre-configured with Nvidia Mellanox OFED for InfiniBand, Nvidia GPU drivers (on select distros/versions today), popular MPI libraries, vendor tuned HPC libraries, and recommended performance optimizations.
NOTE: Typically we find that users of the HPC VMs running traditional HPC applications tend to utilize CentOS as their preferred OS. While users of AI/ML applications running on the GPU VMs tend to prefer Ubuntu as the OS. Hence currently, only the Ubuntu-HPC and CentOS-HPC 7.9 VM OS images contain the Nvidia GPU drivers as well as GPU specific software stack.
Supported VM Sizes
Refer to the HPC VM image documentation for the latest H- and N- series VM size support matrix.
The latest Nvidia Mellanox InfiniBand (IB) OFED versions on the VM images support ConnextX-5 and newer IB NICs. This implies the following VM size support matrix for the IB drivers in these VMIs:
Note that all the above VM sizes support "Gen 2" VMs (though some older ones also support "Gen 1").
As mentioned above, only select Linux distros and versions of the HPC VMIs come pre-configured with the Nvidia GPU drivers and the GPU compute software stack (CUDA, NCCL, etc.). This implies only the following N-series VMs are enabled with GPU driver support in these VMIs: NDv2, NDv4
Supported Linux Distros and Versions
These derivative HPC VMIs are based on the base CentOS and Ubuntu marketplace VMIs. Refer to the azhpc-images repo at GitHub for the scripts and recipes used for these VMIs as well as additional distros/versions (which are not published as VMIs today).
While the CentOS and Ubuntu based HPC VMIs share a lot of common packages and configuration, currently only the Ubuntu-HPC and CentOS-HPC 7.9 VM OS images contain the Nvidia GPU drivers as well as GPU specific software stack.
Follow the links below for a detailed list of package versions in each of the VM images:
- CentOS 8.1 HPC VMI
- CentOS 7.9 HPC VMI
- CentOS 7.8 HPC VMI
- CentOS 7.7 HPC VMI
- CentOS 7.6 HPC VMI
- Ubuntu 18.04 HPC VMI
- Ubuntu 20.04 HPC VMI
Configuration & Optimization
Refer to the azhpc-images repo at GitHub for the latest details on what packages and configuration is included in each VM image. Here is an short overview of the configuration and packages in these optimized VM images. The included configurations are based on optimization recommendations from vendors and partners, as well as learnings from common HPC workloads and usage practices in traditional HPC systems.
- Azure Linux Agent (WAAgent)
- Limit waagent's (VM agent running on every Azure Linux VM) usage of CPU/memory resources.
- Optionally, consider disabling waagent at the beginning of your job script, and enabling it back at the end, for CPU sensitive workloads as follows:
sudo systemctl stop waagent <HPC job> sudo systemctl restart waagent
- Higher Memory Limits
- Set max-locked-memory limit to unlimited
- Set number of open files limit to 65535
- Zone Reclaim mode
- Set zone_reclaim_mode to 1
- Disable firewall daemon to help MPI job launchers
- Mellanox OFED
- Pre-configured IPoIB (IP-over-InfiniBand)
- Popular InfiniBand based MPI Libraries
- MVAPICH2, MVAPICH2-X
- Communication Runtimes
- Optimized librares
- AMD Blis
- AMD FFTW
- AMD Flame
- Intel MKL
- Platform recommended GCC version
MPI libraries and software packages are available as environment modules. To load an MPI library/package, just do:
Deploying HPC VM Images
The HPC VM images are available from Azure Marketplace, and it can be deployed through a variety of deployment vehicles (CycleCloud, Batch, ARM templates, etc). AzureHPC scripts provides an easy way to quickly deploy an HPC cluster using these HPC VM images.
Refer to the following guidance to find and deploy the HPC VMIs.
All VMIs are "Gen 2". Some also have "Gen 1" versions.
Portal: Search for "CentOS-HPC" by publisher "OpenLogic".
All VMIs are "Gen 2".
Portal: Search for "Ubuntu-HPC" by publisher "Microsoft-DSVM".