How to unmount Azure Managed Lustre filesystem using Azure Scheduled Events

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Azure Managed Lustre delivers the time-tested Lustre file system as a first-party managed service on Azure. Long-time users of Lustre on-premises can now leverage the benefits of a complete HPC solution, including compute and high-performance storage, delivered on Azure.

 

There is a known behaviour in Lustre if a VM has the Lustre mounted and it gets evicted or deleted as part of workflow without releasing the filesystem lock. Lustre will keep the lock for next 10 – 15 minutes before it releases.  Lustre has a ~10-minute timeout period to release the LOCK. The other VMs (Lustre clients) using the same Lustre mount point might experience intermittent hung mounts for 10-15 mins.

 

This blog discusses, how we can use Azure Schedule Events to unmount Azure Managed Lustre cleanly in a VMSS or a SPOT VM to avoid the similar issue explained above.

 

Scale set instances can opt-in to receive instance termination notifications and set a pre-defined delay timeout to the Terminate operation. The termination notification is sent through Azure Metadata Service – Scheduled Events, which provides notifications for and delaying of impactful operations such as reboots and redeploy.

Refer Terminate notification for Azure Virtual Machine Scale Set instances for more information.

 

With Azure Schedule Events, your application can discover when maintenance will occur and trigger tasks to limit its impact.

 

Many applications can benefit from time to prepare for VM maintenance. The time can be used to perform application-specific tasks that improve availability, reliability, and serviceability, including:

  • Checkpoint and restore.
  • Connection draining.
  • Primary replica failover.
  • Removal from a load balancer pool.
  • Event logging.
  • Graceful shutdown.

The scheduled event can be checked using the following command.

 

curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq

 

The following Scheduled events are supported.

 

  1. Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.
  2. Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost).
  3. Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost).
  4. Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost). This event is made available on a best effort basis
  5. Terminate: The virtual machine is scheduled to be deleted.

Here is the sample output from the Schedule Events:

 

[root@almavmssn000000 ~]# curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   706  100   706    0     0   114k      0 --:--:-- --:--:-- --:--:--  114k
{
  "DocumentIncarnation": 4,
  "Events": [
    {
      "EventId": "1F438EDF-C2E4-4291-80F5-642637520764",
      "EventStatus": "Scheduled",
      "EventType": "Reboot",
      "ResourceType": "VirtualMachine",
      "Resources": [
        "almavmss_2"
      ],
      "NotBefore": "Fri, 01 Sep 2023 10:26:13 GMT",
      "Description": "Virtual machine is going to be restarted as requested by authorized user.",
      "EventSource": "User",
      "DurationInSeconds": -1
    }
  ]
}
[vinil@almavmssn000004 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   305  100   305    0     0  50833      0 --:--:-- --:--:-- --:--:-- 50833
{
  "DocumentIncarnation": 1,
  "Events": [
    {
      "EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E",
      "EventStatus": "Scheduled",
      "EventType": "Terminate",
      "ResourceType": "VirtualMachine",
      "Resources": [
        "almavmss_4"
      ],
      "NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT",
      "Description": "",
      "EventSource": "Platform",
      "DurationInSeconds": -1
    }
  ]
}
[vinil@almavmssn000004 ~]$

 

Here is the script to unmount the Lustre filesystem using Scheduled Events. You could modify the script to suit your requirements. This is for demonstration purposes.  The following script was added as a cron job to monitor the event.  This script will work on most of the Linux distributions.

 

NOTE: Update the MOUNTPOINT variable according to your environment.

 

#!/bin/bash 
#Author - Vinil Vadakepurakkal, Microsoft
#Date - 24/08/2023

#Using Azure Scheduled Events to unmount Lustre filesystems
#This script is intended to be run as a cron job on the Lustre client nodes
#The script will check for scheduled events and if it finds one, it will unmount the Lustre filesystem

#Event Types are below:
#Freeze: The Virtual Machine is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.
#Reboot: The Virtual Machine is scheduled for reboot (non-persistent memory is lost). This event is made available on a best effort basis
#Redeploy: The Virtual Machine is scheduled to move to another node (ephemeral disks are lost). This event is delivered on a best effort basis.
#Preempt: The Spot Virtual Machine is being deleted (ephemeral disks are lost).
#Terminate: The virtual machine is scheduled to be deleted.

MOUNTPOINT=/pfsvinilv
#This script is intended to be run as a cron job on the Lustre client nodes
NO_OF_EVENTS=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq | grep EventId | wc -l)
NO_OF_EVENTS=`expr $NO_OF_EVENTS - 1`
for i in `seq 0 $NO_OF_EVENTS`
do
RESOURCE_NAME=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].Resources[0])
EVENT_TYPE=$(curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq .Events[$i].EventType)
INSTANCE_NAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.name)
OS_HOSTNAME=$(curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq .compute.osProfile.computerName| tr -d '"')
HOSTNAME=$(hostname)
echo $INSTANCE_NAME
echo $RESOURCE_NAME
if [ $RESOURCE_NAME = $INSTANCE_NAME ]
then
    echo "$OS_HOSTNAME has a scheduled event of type $EVENT_TYPE" | logger
    echo "unmounting Lustre filesystem $MOUNTPOINT from $HOSTNAME" | logger
    /usr/bin/fuser -ku $MOUNTPOINT
    /usr/bin/sleep 5
    /usr/bin/umount -l $MOUNTPOINT
    echo "Lustre filesystem unmounted from $HOSTNAME" | logger
fi
done

 

Testing the functionality. In my setup Lustre is mounted on /pfsvinilv mountpoint.

 

[root@almavmssn000001 ~]# df
Filesystem                  1K-blocks     Used   Available Use% Mounted on
devtmpfs                     16421280        0    16421280   0% /dev
tmpfs                        16458648        0    16458648   0% /dev/shm
tmpfs                        16458648    66080    16392568   1% /run
tmpfs                        16458648        0    16458648   0% /sys/fs/cgroup
/dev/sda2                    30416376 22340904     8075472  74% /
/dev/sda1                      506528   254660      251868  51% /boot
/dev/sda15                     506600     5952      500648   2% /boot/efi
10.222.1.17@tcp:/lustrefs 17010128952     1264 16151959984   1% /pfsvinilv
tmpfs                         3291728        0     3291728   0% /run/user/1000

 

Invoked a terminate event on VM from the Azure Portal. It created a terminate event using Schedule Events.

 

[vinil@almavmssn000001 ~]$ curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01 | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   305  100   305    0     0  50833      0 --:--:-- --:--:-- --:--:-- 50833
{
  "DocumentIncarnation": 1,
  "Events": [
    {
      "EventId": "32A12BE7-5935-49CE-980A-1270F672BD0E",
      "EventStatus": "Scheduled",
      "EventType": "Terminate",
      "ResourceType": "VirtualMachine",
      "Resources": [
        "almavmss_1"
      ],
      "NotBefore": "Mon, 04 Sep 2023 02:49:34 GMT",
      "Description": "",
      "EventSource": "Platform",
      "DurationInSeconds": -1
    }
  ]
}

 

After a couple of minutes, the script in the cron job umounted lustre filesystem. This will avoid the intermittent filesystem hanging in the Lustre client.

the following output shows that it unmounted the lustre mountpoint before the VM was terminated. 

 

[root@almavmssn000001 ~]# df
Filesystem     1K-blocks     Used Available Use% Mounted on
devtmpfs        16421280        0  16421280   0% /dev
tmpfs           16458648        0  16458648   0% /dev/shm
tmpfs           16458648    66084  16392564   1% /run
tmpfs           16458648        0  16458648   0% /sys/fs/cgroup
/dev/sda2       30416376 22340904   8075472  74% /
/dev/sda1         506528   254660    251868  51% /boot
/dev/sda15        506600     5952    500648   2% /boot/efi
tmpfs            3291728        0   3291728   0% /run/user/1000
tmpfs            3291728        0   3291728   0% /run/user/0
[root@almavmssn000001 ~]#

 

 This script will send some events in the syslog about the event.

 

[root@almavmssn000001 ~]# grep Reboot /var/log/messages
Sep  1 10:15:01 almavmssn000001 root[64930]: almavmssn000001 has a scheduled event of type "Reboot"
Sep  1 10:16:01 almavmssn000001 root[65056]: almavmssn000001 has a scheduled event of type "Reboot"
[root@almavmssn000001 ~]#
[root@almavmssn000001 ~]# grep pfsvinilv /var/log/messages
Sep  1 10:15:01 almavmssn000001 root[64932]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001
Sep  1 10:15:06 almavmssn000001 systemd[1]: pfsvinilv.mount: Succeeded.
Sep  1 10:16:01 almavmssn000001 root[65058]: unmounting Lustre filesystem /pfsvinilv from almavmssn000001
[root@almavmssn000001 ~]#

 

References:

Azure Managed Lustre File System documentation
Terminate notification for Azure Virtual Machine Scale Set instances
Azure Metadata Service: Scheduled Events for Linux VMs

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.