This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .
Azure Managed Lustre delivers the time-tested Lustre file system as a first-party managed service on Azure. Long-time users of Lustre on-premises can now leverage the benefits of a complete HPC solution, including compute and high-performance storage, delivered on Azure.
There is a known behaviour in Lustre if a VM has the Lustre mounted and it gets evicted or deleted as part of workflow without releasing the filesystem lock. Lustre will keep the lock for next 10 – 15 minutes before it releases. Lustre has a ~10-minute timeout period to release the LOCK. The other VMs (Lustre clients) using the same Lustre mount point might experience intermittent hung mounts for 10-15 mins.
This blog discusses, how we can use Azure Schedule Events to unmount Azure Managed Lustre cleanly in a VMSS or a SPOT VM to avoid the similar issue explained above.
Scale set instances can opt-in to receive instance termination notifications and set a pre-defined delay timeout to the Terminate operation. The termination notification is sent through Azure Metadata Service – Scheduled Events, which provides notifications for and delaying of impactful operations such as reboots and redeploy.
Refer Terminate notification for Azure Virtual Machine Scale Set instances for more information.
With Azure Schedule Events, your application can discover when maintenance will occur and trigger tasks to limit its impact.
Many applications can benefit from time to prepare for VM maintenance. The time can be used to perform application-specific tasks that improve availability, reliability, and serviceability, including:
- Checkpoint and restore.
- Connection draining.
- Primary replica failover.
- Removal from a load balancer pool.
- Event logging.
- Graceful shutdown.
The scheduled event can be checked using the following command.
curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
The following Scheduled events are supported.
- Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.
- Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost).
- Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost).
- Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost). This event is made available on a best effort basis
- Terminate: The virtual machine is scheduled to be deleted.
Here is the sample output from the Schedule Events:
[root@almavmssn000000 ~]# curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 706 100 706 0 0 114k 0 --:--:-- --:--:-- --:--:-- 114k
{
"DocumentIncarnation": 4,
"Events": [
{
"EventId": "1F438EDF-C2E4-4291-80F5-642637520764",
"EventStatus": "Scheduled",
"EventType": "Reboot",
"ResourceType": "VirtualMachine",
"Resources": [
"almavmss_2"
],
"NotBefore": "Fri, 01 Sep 2023 10:26:13 GMT",
"Description": "Virtual machine is going to be restarted as requested by authorized user.",
"EventSource": "User",
"DurationInSeconds": -1
}
]
}
[vinil@almavmssn000004 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 305 100 305 0 0 50833 0 --:--:-- --:--:-- --:--:-- 50833
{
"DocumentIncarnation": 1,
"Events": [
{
"EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E",
"EventStatus": "Scheduled",
"EventType": "Terminate",
"ResourceType": "VirtualMachine",
"Resources": [
"almavmss_4"
],
"NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT",
"Description": "",
"EventSource": "Platform",
"DurationInSeconds": -1
}
]
}
[vinil@almavmssn000004 ~]$
Here is the script to unmount the Lustre filesystem using Scheduled Events. You could modify the script to suit your requirements. This is for demonstration purposes. The following script was added as a cron job to monitor the event. This script will work on most of the Linux distributions.
NOTE: Update the MOUNTPOINT variable according to your environment.
#!/bin/bash
#Author - Vinil Vadakepurakkal, Microsoft
#Date - 24/08/2023
#Using Azure Scheduled Events to unmount Lustre filesystems
#This script is intended to be run as a cron job on the Lustre client nodes
#The script will check for scheduled events and if it finds one, it will unmount the Lustre filesystem
#Event Types are below:
#Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.
#Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). This event is made available on a best effort basis
#Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). This event is delivered on a best effort basis.
#Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost).
#Terminate: The virtual machine is scheduled to be deleted.
MOUNTPOINT=/pfsvinilv
#This script is intended to be run as a cron job on the Lustre client nodes
NO_OF_EVENTS=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq | grep EventId | wc -l)
NO_OF_EVENTS=`expr $NO_OF_EVENTS - 1`
for i in `seq 0 $NO_OF_EVENTS`
do
RESOURCE_NAME=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].Resources[0])
EVENT_TYPE=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].EventType)
INSTANCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.name)
OS_HOSTNAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.osProfile.computerName| tr -d '"')
HOSTNAME=$(hostname)
echo $INSTANCE_NAME
echo $RESOURCE_NAME
if [ $RESOURCE_NAME = $INSTANCE_NAME ]
then
echo "$OS_HOSTNAME has a scheduled event of type $EVENT_TYPE" | logger
echo "unmounting Lustre filesystem $MOUNTPOINT from $HOSTNAME" | logger
/usr/bin/fuser -ku $MOUNTPOINT
/usr/bin/sleep 5
/usr/bin/umount -l $MOUNTPOINT
echo "Lustre filesystem unmounted from $HOSTNAME" | logger
fi
done
Testing the functionality. In my setup Lustre is mounted on /pfsvinilv mountpoint.
[root@almavmssn000001 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 16421280 0 16421280 0% /dev
tmpfs 16458648 0 16458648 0% /dev/shm
tmpfs 16458648 66080 16392568 1% /run
tmpfs 16458648 0 16458648 0% /sys/fs/cgroup
/dev/sda2 30416376 22340904 8075472 74% /
/dev/sda1 506528 254660 251868 51% /boot
/dev/sda15 506600 5952 500648 2% /boot/efi
10.222.1.17@tcp:/lustrefs 17010128952 1264 16151959984 1% /pfsvinilv
tmpfs 3291728 0 3291728 0% /run/user/1000
Invoked a terminate event on VM from the Azure Portal. It created a terminate event using Schedule Events.
[vinil@almavmssn000001 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 305 100 305 0 0 50833 0 --:--:-- --:--:-- --:--:-- 50833
{
"DocumentIncarnation": 1,
"Events": [
{
"EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E",
"EventStatus": "Scheduled",
"EventType": "Terminate",
"ResourceType": "VirtualMachine",
"Resources": [
"almavmss_1"
],
"NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT",
"Description": "",
"EventSource": "Platform",
"DurationInSeconds": -1
}
]
}
After a couple of minutes, the script in the cron job umounted lustre filesystem. This will avoid the intermittent filesystem hanging in the Lustre client.
the following output shows that it unmounted the lustre mountpoint before the VM was terminated.
[root@almavmssn000001 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 16421280 0 16421280 0% /dev
tmpfs 16458648 0 16458648 0% /dev/shm
tmpfs 16458648 66084 16392564 1% /run
tmpfs 16458648 0 16458648 0% /sys/fs/cgroup
/dev/sda2 30416376 22340904 8075472 74% /
/dev/sda1 506528 254660 251868 51% /boot
/dev/sda15 506600 5952 500648 2% /boot/efi
tmpfs 3291728 0 3291728 0% /run/user/1000
tmpfs 3291728 0 3291728 0% /run/user/0
[root@almavmssn000001 ~]#
This script will send some events in the syslog about the event.
[root@almavmssn000001 ~]# grep Reboot /var/log/messages
Sep 1 10:15:01 almavmssn000001 root[64930]: almavmssn000001 has a scheduled event of type "Reboot"
Sep 1 10:16:01 almavmssn000001 root[65056]: almavmssn000001 has a scheduled event of type "Reboot"
[root@almavmssn000001 ~]#
[root@almavmssn000001 ~]# grep pfsvinilv /var/log/messages
Sep 1 10:15:01 almavmssn000001 root[64932]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001
Sep 1 10:15:06 almavmssn000001 systemd[1]: pfsvinilv.mount: Succeeded.
Sep 1 10:16:01 almavmssn000001 root[65058]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001
[root@almavmssn000001 ~]#
References:
Azure Managed Lustre File System documentation
Terminate notification for Azure Virtual Machine Scale Set instances
Azure Metadata Service: Scheduled Events for Linux VMs