This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.
Would you like to have a single script to quickly provision High Performance Computing (HPC) clusters with access to several ready-to-use HPC applications (WRF, GROMACS, OpenFOAM, and many more) so you can focus on solving hard scientific and engineering challenges? Then this blog may be relevant to you.
To achieve this goal we rely on a cluster provisioning script described in a previous blog post (see here), which is based on Command Line Interface (CLI) from both Azure and Azure CycleCloud. We then extended this script to automatically setup EESSI (European Environment for Scientific Software Installations)—see a previous blog post on EESSI here. And to make things concrete, we describe here how we put all of this together using WRF (Weather Research & Forecasting) Model as a use case scenario.
TL;TR
- We provide a script for a SLURM cluster creation via Azure CycleCloud ready to submit your (MPI) jobs using only Command Line Interface (CLI);
- We leverage EESSI to have access to ready-to-use applications which are mounted into the cluster nodes;
- The setup of EESSI is done via two relevant CycleCloud concepts: projects and cluster templates—so here you will learn a bit about those and can reuse the projects and cluster template even when CycleCloud was provisioned in other ways such as via marketplace in the Azure Portal;
- We use WRF as example, so the cluster provisioned will contain both WRF application and the benchmarking data Conus 2.5km and Conus 12km automatically available for job submission;
- This tutorial shows how you could do this for different applications and is not intended to describe the optimized ways to run WRF in a production system. For this, there are plenty of material out there, including a previous blog post (see here and here).
Here is the git repository that contains the script:
- git folder: git folder with automation script and cluster templates + cyclecloud projects for EESSI support
- cyclecloud_cli.sh: script itself to automate CycleCloud+SLURM installation using Azure/CycleCloud CLI
- setvars.sh: helper script to setup variables to customize deployment
Assumptions
- Deployment relies only on PRIVATE IP addresses;
- Private and public ssh keys available;
- We use Ubuntu for all resources: CycleCloud VM, scheduler, and cluster nodes.
- Azure CLI must be setup in the machine that triggers the script call (i.e.
az login
should work with the subscription for deployment). The commandjq
must also be available.
1. How to run the script
Download the repository where the script is hosted:
Customize variables in setvars.sh
, including resource group, storage account, keyvault, among others and source
the file:
CCPASSWORD
and CCPUBKEY
are setup outside setvars.sh
. When running the automation script, you will be asked for their values in case you haven’t done before using:
If the variables below are setup, the script will automatically check for you when the cluster is ready for job submission. Otherwise, you can check the cluster creation yourself using Azure Bastion—the automation script will show you the IP address of the CycleCloud VM.
Provision the resources (resources group, vnet, keyvault, cyclecloud, etc…):
Example of execution output:
Azure CycleCloud UI with the new provisioned cluster and with the template to create similar clusters (SlurmEESSI):
2. Run your WRF job
Once you are in the cluster scheduler via ssh or Azure Bastion, just:
The benchmark data is in the azureuser home directory, together with a couple of SLURM batch script examples that you can work with depending on the SKU, network, and data you want to use.
Here is an example of a sbatch script available to run using for instance HB SKU, with Infiniband network, and Conus 2.5km benchmark data. For other applications, instead of module load WRF, you could load GROMACS, OpenFOAM, or other applications supported by EESSI (the list keeps growing as the project evolves).
Here we source the pilot EESSI repository which contains a WRF3.9 available. For other applications you could explore the most up-to-date repository. Click here for details and to get up to date information on applications that are being onboarded to EESSI. Once you source
the EESSI bash script, you can have access to many other apps/libraries, including GROMACS, OpenFOAM, among others.
Now let’s move to the behind the scenes here in case you want to learn how this was done or you want to modify/expand the current automation.
EESSI will provide you with quick access to various applications. Alternatively, you could modify the steps below, to have applications being built from source code, or use frameworks such as SPACK or EasyBuild (see references for details).
3. Behind the scenes: CycleCloud cluster templates, projects, cloud-init
When we provision a CycleCloud cluster, we can choose which job scheduler the cluster resources are managed by; which includes SLURM, PBS, and LSF. Such clusters have a pre-defined list of job queues. If we want to provision a cluster with some customizations, such as pre-download an application, change job queues and resource types, add start up tasks, among others, we can explore what is called cluster templates, projects, and cloud-init.
Cluster templates
Cluster templates define cluster configurations. You can specify the VM types of cluster nodes, storage options, deployment region, network ports to access a scheduler node, cluster partitions/queues, etc. All these can also be parameterized, so a template can be used for multiple use cases.
1. CycleCloud cluster template docs LINK 1
2. CycleCloud cluster templates docs LINK 2
Here is an example of a cluster template for a SLURM cluster: LINK
The format of these cluster templates follow the INI format.
Cluster projects
As mentioned above, cluster template defines configuration for the overall cluster. Inside the template, you can define configurations for nodes, and those are called CycleCloud projects. These projects contain specs. When a node starts, CycleCloud configures it by processing and running a sequence of specs. These specs can be python, shell, or powershell scripts. They are executed once nodes are ready (different from cloud-init, which is executed before cyclecloud processes are executed on the node).
Projects are used in the cluster templates with this following syntax:
Here is a simplified view of a CycleCloud project:
- templates directory: hold cluster templates
- specs: the specifications defining your project
- scripts: scripts executed in lexicographical order on the node
- files: raw data files to will be put on the node
- tests: tests executed when a cluster is started in testing mode
Here is the URL on how to create a project and additional functionalities of cluster projects: LINK
Cloud-init
CycleCloud also supports cloud-init. The configurations can be executed at the first boot a VM performs, before any other CycleCloud specific configuration occurs on the VM (such as installation of HPC schedulers). Cloud-init can be used for configuring things such as networking, yum/apt mirrors, etc.
Further details can be found here: LINK
Making WRF available on cluster nodes using EESSI
We can make WRF available through EESSI—European Environment for Scientific Software Installations (EESSI, pronounced as “easy”). There are certain steps to be executed in the cluster nodes to make WRF available for execution. We will make use of cluster template and cyclecloud project files to get there.
All of the steps below have been added to the CycleCloud CLI automation script.
There are several ways of doing so; let’s see one of those ways exploring CycleCloud projects (we could alternatively use cloud-init). Here we assume you are on an existing CycleCloud VM.
Creating and uploading a CycleCloud project
Copy the new template to the user home directory:
Create a file with this content cc_eessi/specs/default/cluster-init/scripts/00_setup_eessi.sh
:
Upload the project (in case you want to test it on existing CycleCloud environment):
Let’s create a second project so the scheduler downloads the WRF benchmark data once the scheduler is provisioned.
Create a file with this content cc_wrfconus/specs/default/cluster-init/scripts/00_get_conus.sh
:
Upload this second project (again, in case you want to test it on existing CycleCloud environment):
Getting the original SLURM template
Now we need a way to use these CycleCloud projects, and we will do this by customizing a CycleCloud cluster template.
In your $HOME
directory inside the CycleCloud VM:
You can also get the template from git:
Or:
If you diff
these NEW_TEMPLATE
files, the content should be exactly the same, assuming you got the right release ID from your current CycleCloud installation.
Updating and uploading a CycleCloud cluster template
We modified $NEW_TEMPLATE in three places.
We first changed the cluster template name from Slurm
to SlurmEESSI:
Second, we made sure EESSI could be used in all nodes, including the scheduler:
Third we added the WRF benchmark data project to be executed in the scheduler node:
Upload the cluster template.
With this you are ready to play with the new template+projects on existing CycleCloud VM. In our case, we uploaded these files into git to be consumed by the automation script or consumed by other CycleCloud VM created in different ways.
References
1. azure cyclecloud: https://learn.microsoft.com/en-us/azure/cyclecloud/overview
2. cyclecloud cluster templates (link 1): https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/2-describe-templates
3. cyclecloud cluster templates (link 2): https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/cluster-templates?view=cyclecloud-8
4. cyclecloud projects: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/projects?view=cyclecloud-8
5. cyclecloud projects: https://learn.microsoft.com/en-us/training/modules/customize-clusters-azure-cyclecloud/5-customize-software-installations
6. cyclecloud core concepts: https://learn.microsoft.com/en-us/azure/cyclecloud/concepts/core?view=cyclecloud-8
7. SLURM cluster template: https://github.com/Azure/cyclecloud-slurm/blob/master/templates/slurm.txt
8. cyclecloud cloud-init: https://learn.microsoft.com/en-us/azure/cyclecloud/how-to/cloud-init?view=cyclecloud-8
9. EESSI Website: https://www.eessi-hpc.org/
10. EESSI Getting Access: https://www.eessi.io/docs/getting_access/native_installation/
11. EESSI+WRF on Azure: https://easybuild.io/eum22/013_eum22_WRF_Azure_EESSI.pdf
12. SPACK Website: https://spack.io/
13. EasyBuild: https://easybuild.io/
#AzureHPCAI