Add a new Partition to a running CycleCloud SLURM cluster

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Overview

Azure CycleCloud (CC) is a user-friendly platform that orchestrates High-Performance Computing (HPC) environments on Azure, enabling admins to set up infrastructure, job schedulers, filesystems and scale resources efficiently at any size. It's designed for HPC administrators intent on deploying environments with specific schedulers.

SLURM, a widely-used HPC job scheduler, is notable for its open-source, scalable, fault-tolerant design, suitable for Linux clusters of any scale. SLURM manages user resources, workloads, accounting, monitoring, and supports parallel/distributed computing, organizing compute nodes into partitions.

This blog will specifically explain how to integrate a new partition into an active SLURM cluster within CycleCloud, without the need to terminate or restart the entire cluster.

Requirements/Versions:

CycleCloud Server (CC version used is 8.6.2)
Cyclecloud cli initialized on the CycleCloud VM
A Running Slurm Cluster
- CycleCloud project used is 3.0.7
- Slurm version used is 23.11.7-1
SSH and HTTPS access to CycleCloud VM

High Level Overview

Git clone the CC SLURM repo (not required if you already have a slurm template file)
Edit the Slurm template to add a new partition
Export parameters from the running SLURM cluster
Import the updated template file to the running cluster
Activate the new nodearray(s)
Update the cluster settings (VM size, core count, Image, etc)
Scale the cluster to create the nodes

Step 1: Git clone the CC SLURM repo

SSH into the CC VM and run the following commands:

sudo yum install -y git 
git clone https://github.com/Azure/cyclecloud-slurm.git
cd cyclecloud-slurm/templates
ll

Step 2: Edit the SLURM template to add new partition(s)

Use your editor of choice (ie. vi, vim, Nano, VSCode remote, etc) to edit the “slurm.txt” template file:

cp slurm.txt slurm-part.txt
vim slurm-part.txt

The template file nodearray is the CC configuration unit that associates to a SLURM partition. There are 3 nodearrays defined in the default template:

hpc: tightly coupled MPI workloads with Infiniband (slurm.hpc = true)

htc: massively parallel throughput jobs w/o Infiniband (slurm.hpc = false)

dynamic: enables multiple VM types in the same partition

Choose the nodearray type for the new partition (hpc or htc) and duplicate the [[[nodearray …]]] config section. For example, to create a new nodearray named “GPU” based on the hpc nodearray (NOTE: hpc nodearray configs included for reference):

    [[nodearray hpc]]
    Extends = nodearraybase
    MachineType = $HPCMachineType
    ImageName = $HPCImageName
    MaxCoreCount = $MaxHPCExecuteCoreCount
    Azure.MaxScalesetSize = $HPCMaxScalesetSize
    AdditionalClusterInitSpecs = $HPCClusterInitSpecs
    EnableNodeHealthChecks = $EnableNodeHealthChecks

        [[[configuration]]]
        slurm.default_partition = true
        slurm.hpc = true
        slurm.partition = hpc

    [[nodearray GPU]]
    Extends = nodearraybase
    MachineType = $GPUMachineType
    ImageName = $GPUImageName
    MaxCoreCount = $MaxGPUExecuteCoreCount
    Azure.MaxScalesetSize = $HPCMaxScalesetSize
    AdditionalClusterInitSpecs = $GPUClusterInitSpecs
    EnableNodeHealthChecks = $EnableNodeHealthChecks

        [[[configuration]]]
        slurm.default_partition = false
        slurm.hpc = true
        slurm.partition = gpu
        slurm.use_pcpu = false

NOTE: there can only be 1 “slurm.default_partition” and by default it is the HPC nodearray. Set the new one to false, or if you set it to true then change the HPC nodearray to false.

The “variables” in the nodearray config (ie. $GPUMachineType) are referred to as “Parameters” in CC. The Parameters are attributes exposed in the CC GUI to enable per cluster customization. Further down in the template file begins the Parameters configuration beginning with [parameters About] section. We need to add several configuration blocks throughout this section to correspond to the Parameters defined in the nodearray (ie. $GPUMachineType).

Add the GPUMachineType from HPCMachineType:

        [[[parameter HPCMachineType]]]
        Label = HPC VM Type
        Description = The VM type for HPC execute nodes
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_F2s_v2

        [[[parameter GPUMachineType]]]
        Label = GPU VM Type
        Description = The VM type for GPU execute nodes
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_F2s_v2

Add the GPUExecuteCoreCount from HPCExecuteCoreCount:

        [[[parameter MaxHPCExecuteCoreCount]]]
        Label = Max HPC Cores
        Description = The total number of HPC execute cores to start
        DefaultValue = 100
        Config.Plugin = pico.form.NumberTextBox
        Config.MinValue = 1
        Config.IntegerOnly = true

        [[[parameter MaxGPUExecuteCoreCount]]]
        Label = Max GPU Cores
        Description = The total number of GPU execute cores to start
        DefaultValue = 100
        Config.Plugin = pico.form.NumberTextBox
        Config.MinValue = 1
        Config.IntegerOnly = true

Add the GPUImageName from HPCImageName:

        [[[parameter HPCImageName]]]
        Label = HPC OS
        ParameterType = Cloud.Image
        Config.OS = linux
        DefaultValue = almalinux8
        Config.Filter := Package in {"cycle.image.centos7", "cycle.image.ubuntu20", "cycle.image.ubuntu22", "cycle.image.sles15-hpc", "almalinux8"}

        [[[parameter GPUImageName]]]
        Label = GPU OS
        ParameterType = Cloud.Image
        Config.OS = linux
        DefaultValue = almalinux8
        Config.Filter := Package in {"cycle.image.centos7", "cycle.image.ubuntu20", "cycle.image.ubuntu22", "cycle.image.sles15-hpc", "almalinux8"}

Add the GPUClusterInitSpecs from HPCClusterInitSpecs:

        [[[parameter HPCClusterInitSpecs]]]
        Label = HPC Cluster-Init
        DefaultValue = =undefined
        Description = Cluster init specs to apply to HPC execute nodes
        ParameterType = Cloud.ClusterInitSpecs

        [[[parameter GPUClusterInitSpecs]]]
        Label = GPU Cluster-Init
        DefaultValue = =undefined
        Description = Cluster init specs to apply to GPU execute nodes
        ParameterType = Cloud.ClusterInitSpecs

NOTE: Keep in mind that you can customize the "DefaultValue" for parameters as per your requirements, or alternatively, you can make changes directly within the CycleCloud graphical user interface.

Save the template file and exit (ie. :wq for vi/vim).

Step 3: Export parameters from the running SLURM cluster

You now have an updated SLURM template file to add a new GPU partition. The template will need to be “imported” into CycleCloud to overwrite the existing cluster definition. Before doing that, however, we need to export all the current cluster GUI parameter configs from the cluster into a local json file to use in the import process. Without this json file the cluster configs are all reset to the default values specified in the template file (and overwriting any customizations applied to the cluster in the GUI).

From the CycleCloud VM run the following command format:

cyclecloud export_parameters cluster_name > file_name.json

For my cluster the specific command is:

cyclecloud export_parameters jm-slurm-test > jm-slurm-test-params.json
cat jm-slurm-test-params.json
{
  "UsePublicNetwork" : false,
  "configuration_slurm_accounting_storageloc" : null,
  "AdditionalNFSMountOptions" : null,
  "About shared" : null,
  "NFSSchedAddress" : null,
  "loginMachineType" : "Standard_D8as_v4",
  "DynamicUseLowPrio" : false,
  "configuration_slurm_accounting_password" : null,
  "Region" : "southcentralus",
  "MaxHPCExecuteCoreCount" : 240,
  "NumberLoginNodes" : 0,
  "HTCImageName" : "cycle.image.ubuntu22",
  "MaxHTCExecuteCoreCount" : 10,
  "AdditionalNFSExportPath" : "/data",
  "DynamicClusterInitSpecs" : null,
  "About shared part 2" : null,
  "HPCImageName" : "cycle.image.ubuntu22",
  "SchedulerClusterInitSpecs" : null,
  "SchedulerMachineType" : "Standard_D4as_v4",
  "NFSSchedDiskWarning" : null,
  …<truncated>
}

If the cyclecloud command does not work you may need to initialize the cli tool as described in the docs: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/install-cyclecloud-cli?view=cyclecloud-8#initialize-cyclecloud-cli

Step 4: Import the updated template file to the running cluster

To import the updated template to the running cluster in CycleCloud run the following command format:

cyclecloud import_cluster <cluster_name> -c Slurm -f <template file name> txt -p <parameter file name> --force

For my cluster the specific command is:

cyclecloud import_cluster jm-slurm-test -c Slurm -f slurm-part.txt -p jm-slurm-test-params.json --force

In the CycleCloud GUI we can now see the “gpu” nodearray has been added. Click on the “Arrays” tab in the middle panel as shown in the following screen capture:

The gpu nodearray is added to the cluster but it is not yet “Activated,” which means it is not yet available for use.

Step 5: Activate the new nodearray(s)

The cyclecloud start_cluster command will now kickstart the new nodearray activation using the following format:

cyclecloud start_cluster <cluster_name>

For my cluster the command is:

cyclecloud start_cluster jm-slurm-test

From the CycleCloud GUI we will see the gpu nodearray status will move to “Activation” and finally “Activated:”

Step 6: Update the cluster settings

Edit the cluster settings in the CycleCloud GUI to pick the “GPU VM Type” and “Max GPU Cores” in the “Required Settings” section:

Update the “GPU OS” and “GPU Cluster-Init” as needed in the “Advanced Settings” section:

Step 7: Scale the cluster to create the nodes

To this point we added the new nodearray to CycleCloud but SLURM does not yet know about the new GPU partition. We can see this from the scheduler VM with the sinfo command:

The final step is to “scale” the cluster to “pre-define” the compute nodes as needed by SLURM. The CycleCloud azslurm scale command will accomplish this:

Your cluster is now ready to use the new GPU partition.

SUMMARY

Adding a new partition to SLURM with Azure CycleCloud is a flexible and efficient way to update your cluster and leverage different types of compute nodes. You can follow the steps outlined in this article to create a new nodearray, configure the cluster settings, and scale the cluster to match the SLURM partition. By using CycleCloud and SLURM, you can optimize your cluster performance and resource utilization.

References:
CycleCloud Documentation

CycleCloud-SLURM Github repository

Microsoft Training for SLURM on Azure CycleCloud

SLURM documentation