Azure HPC VM Images

This post has been republished via RSS; it originally appeared at: Azure Compute Blog articles.

The Azure HPC team is pleased to announce the availability of optimized and pre-configured VM images for HPC and AI workloads. These VM images (VMI) are based on the base CentOS and Ubuntu marketplace VM images. These VMIs are pre-configured with Nvidia Mellanox OFED for InfiniBand, Nvidia GPU drivers (on select distros/versions today), popular MPI libraries, vendor tuned HPC libraries, and recommended performance optimizations. 

 

NOTE: Typically we find that users of the HPC VMs running traditional HPC applications tend to utilize CentOS as their preferred OS. While users of AI/ML applications running on the GPU VMs tend to prefer Ubuntu as the OS. Hence currently, only the Ubuntu-HPC and CentOS-HPC 7.9 VM OS images contain the Nvidia GPU drivers as well as GPU specific software stack.

 

Supported VM Sizes

Refer to the HPC VM image documentation for the latest H- and N- series VM size support matrix.

 

The latest Nvidia Mellanox InfiniBand (IB) OFED versions on the VM images support ConnextX-5 and newer IB NICs. This implies the following VM size support matrix for the IB drivers in these VMIs:

Note that all the above VM sizes support "Gen 2" VMs (though some older ones also support "Gen 1").

As mentioned above, only select Linux distros and versions of the HPC VMIs come pre-configured with the Nvidia GPU drivers and the GPU compute software stack (CUDA, NCCL, etc.). This implies only the following N-series VMs are enabled with GPU driver support in these VMIs: NDv2, NDv4

 

Supported Linux Distros and Versions

These derivative HPC VMIs are based on the base CentOS and Ubuntu marketplace VMIs. Refer to the azhpc-images repo at GitHub for  the scripts and recipes used for these VMIs as well as additional distros/versions (which are not published as VMIs today).

While the CentOS and Ubuntu based HPC VMIs share a lot of common packages and configuration, currently only the Ubuntu-HPC and CentOS-HPC 7.9 VM OS images contain the Nvidia GPU drivers as well as GPU specific software stack.

 

Follow the links below for a detailed list of package versions in each of the VM images:

 

Configuration & Optimization
Refer to the azhpc-images repo at GitHub for the latest details on what packages and configuration is included in each VM image. Here is an short overview of the configuration and packages in these optimized VM images. 
The included configurations are based on optimization recommendations from vendors and partners, as well as learnings from common HPC workloads and usage practices in traditional HPC systems.

  • Azure Linux Agent (WAAgent)
    • Limit waagent's (VM agent running on every Azure Linux VM) usage of CPU/memory resources.
    • Optionally, consider disabling waagent at the beginning of your job script, and enabling it back at the end, for CPU sensitive workloads as follows:
      sudo systemctl stop waagent
      <HPC job>
      sudo systemctl restart waagent
  • Higher Memory Limits
    • Set max-locked-memory limit to unlimited
    • Set number of open files limit to 65535
  • Zone Reclaim mode
    • Set zone_reclaim_mode to 1
  • Disable firewall daemon to help MPI job launchers

Software packages

  • Mellanox OFED
  • Pre-configured IPoIB (IP-over-InfiniBand)
  • Popular InfiniBand based MPI Libraries
    • HPC-X 
    • IntelMPI 
    • MVAPICH2, MVAPICH2-X
    • OpenMPI 
  • Communication Runtimes
    • Libfabric
    • OpenUCX
  • Optimized librares
    • AMD Blis 
    • AMD FFTW 
    • AMD Flame
    • Intel MKL 
  • Platform recommended GCC version

MPI libraries and software packages are available as environment modules. To load an MPI library/package, just do:

 

module load <package-name>

 

 

Deploying HPC VM Images

The HPC VM images are available from Azure Marketplace, and it can be deployed through a variety of deployment vehicles (CycleCloud, Batch, ARM templates, etc).  AzureHPC scripts provides an easy way to quickly deploy an HPC cluster using these HPC VM images.

 

Refer to the following guidance to find and deploy the HPC VMIs.

 

CentOS-HPC VMI

All VMIs are "Gen 2". Some also have "Gen 1" versions.

 

CLI:

 

az vm image list --publisher openlogic --offer centos-hpc --output table --all Offer Publisher Sku Urn Version ---------- -------------- ----- ----------------------------------------------- ---------------- CentOS-HPC OpenLogic 7.6 OpenLogic:CentOS-HPC:7.6:7.6.2021022200 7.6.2021022200 CentOS-HPC OpenLogic 7_6gen2 OpenLogic:CentOS-HPC:7_6gen2:7.6.2021022201 7.6.2021022201 CentOS-HPC OpenLogic 7_7-gen2 OpenLogic:CentOS-HPC:7_7-gen2:7.7.2021022401 7.7.2021022401 CentOS-HPC OpenLogic 7_8 OpenLogic:CentOS-HPC:7_8:7.8.2021020400 7.8.2021020400 CentOS-HPC OpenLogic 7_8-gen2 OpenLogic:CentOS-HPC:7_8-gen2:7.8.2021020401 7.8.2021020401 CentOS-HPC OpenLogic 7_9 OpenLogic:CentOS-HPC:7_9:7.9.2021052400 7.9.2021052400 CentOS-HPC OpenLogic 7_9-gen2 OpenLogic:CentOS-HPC:7_9-gen2:7.9.2021052401 7.9.2021052401 CentOS-HPC OpenLogic 8_1 OpenLogic:CentOS-HPC:8_1:8.1.2021020400 8.1.2021020400 CentOS-HPC OpenLogic 8_1-gen2 OpenLogic:CentOS-HPC:8_1-gen2:8.1.2021020401 8.1.2021020401

 

 

Portal: Search for "CentOS-HPC" by publisher "OpenLogic".

 

Ubuntu-HPC VMI

All VMIs are "Gen 2".

 

CLI:

 

az vm image list --publisher microsoft-dsvm --offer ubuntu-hpc --output table --all Offer Publisher Sku Urn Version ---------- -------------- ----- ----------------------------------------------- ---------------- ubuntu-hpc microsoft-dsvm 1804 microsoft-dsvm:ubuntu-hpc:1804:18.04.2021042201 18.04.2021042201 ubuntu-hpc microsoft-dsvm 1804 microsoft-dsvm:ubuntu-hpc:1804:18.04.2021051701 18.04.2021051701 ubuntu-hpc microsoft-dsvm 2004 microsoft-dsvm:ubuntu-hpc:2004:20.04.2021051401 20.04.2021051401

 

 

Portal: Search for "Ubuntu-HPC" by publisher "Microsoft-DSVM".

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.