This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .
Why a 'Learning Circle'?
It is no secret that - since the dawn of time (or thereabouts) - circles have signified unity, safety and equality. Several of our Azure HPC customer contacts from a variety of organisations - both academic and industrial - who work with Azure CycleCloud on a daily basis had expressed an interest in coming together with the engineers and developers to deepen their understanding of this enterprise-friendly tool for orchestrating HPC workloads on Azure: thus the Azure CycleCloud ‘Learning Circle’ workshop series was born.
Run by Microsoft's Azure HPC and AI Product Engineering Teams, these targeted workshop sessions took place in the first 2 weeks of November and covered a number of topics which had been proposed in advance by the session participants. To make the most efficient use of time - and to accommodate interaction between the Americas / EMEA time-zones - it was agreed that it made most sense for the Learning Circle to be run as two shorter sessions as opposed to a day-long event.
Taking place on 5th November 2021, the first session comprised an interactive workshop comprising discussion, demonstration and Q&A and which was led by Microsoft’s Andy Howard, Dan Harris and Doug Clayton.
Topics of discussion for Session 1 included an in-depth look at cluster template parameters, the steps to provision a cluster, interactions with the Azure API, customization, resilience, metrics and software updates.
While the session was very much tailored to the specific questions proposed in advanced by the session participants, there was also a large degree of flexibility which was commented on very positively in the customer feedback:
“The flexibility of the staff who attended from the Microsoft side to go off the planned topics and explore specific questions was helpful and, despite this, I felt they were still sensitive to ensuring we covered everything that was wanted by the attendees.”
The second session took place the following week on 10th November 2021. This was once again delivered in a similar format but this time with a Slurm focus. Led by Microsoft’s Andy Howard and Ryan Hamel, the presenters were also pleased to welcome Nick Ihli, Director of Sales Engineering at SchedMD.
Starting with a Session 1 recap, the participants then had an opportunity to follow up with additional questions which had arisen over the course of the week.
Given that the same week had seen the public announcement of the HBv3 Milan-X Preview this was also a talking-point, with discussion including how the newly-announced HBv3 VMs enhanced with AMD EPYC 3rd Gen processors with 3D v-cache (codenamed “Milan-X”) could be integrated seamlessly into the customers’ production systems (the good news was confirmed - that as soon as the Milan-X CPU appear in the APIs they will be available in CycleCloud, as Azure API is updated on a daily basis).
The session also went in-depth to cover CycleCloud’s management of Slurm configurations & parameters, with instructor Ryan showing and advanced preview of the upcoming new release of CC 8.2.1 (launched 12th November 2021) and articulating some of the main changes, including the ability to now do Slurm job accounting (see Slurm Cluster Updates) and Improved Cost Tracking which now shows approximate ongoing cluster costs and provides a REST API for fetching cost data programmatically.
Conclusion & Feedback
In summary this ‘Learning Circle’ series was one of the first of its kind, bringing together Microsoft software engineers and developers, Azure HPC specialists, Microsoft partner SchedMD and, most importantly providing an opportunity for some of our Azure HPC customers who have production HPC systems running in Azure to learn from each other and share ideas and insights.
Feedback from the sessions indicated that they had provided a positive and useful experience for all involved:
“Both sessions were useful and informative. As well as being able to speak directly with the technical staff, doing so in a forum of peers who were also using CycleCloud with SLURM was extremely valuable as we discussed questions and challenges that affect us all as well as sharing ideas.”
“Would love to see another deep dive session on CycleCloud in the near future!!”
“Very useful and a valuable use of time; friendly, relaxed and respectful; well organised and managed by our hosts.”
It is not often we have the privilege to run a session comprising Azure HPC customers from such a broad range of industry verticals - all of whom are supporting different HPC end-users and different scientific applications. So providing a 'Learning Circle' forum, allowing open discussion and Q&A between peers and subject-matter experts from different organisations provides a great opportunity for shared learning and development.
The Azure HPC team looks forward very much to running more of these deep dive sessions in the future.