This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.
This article analyzes in-depth the available configuration settings for the ephemeral OS disk in Azure Kubernetes Service (AKS). With ephemeral OS disks, you see lower read/write latency on the OS disk of AKS agent nodes since the disk is locally attached. You will also get faster cluster operations like scale or upgrade thanks to faster re-imaging and boot times. You can use the Bicep modules in this GitHub repository to deploy an AKS cluster and repeat the tests described in this article.
Ephemeral OS disks for Azure VMs
Ephemeral OS disks are created on the local virtual machine (VM) storage and not saved to the remote Azure Storage, as when using managed OS disks. For more information on the performance of a managed disk, see Disk allocation and performance. Ephemeral OS disks work well for stateless workloads, where applications are tolerant of individual VM failures but are more affected by VM deployment time or reimaging of individual VM instances. With Ephemeral OS disks, you get lower read/write latency to the OS disk and faster VM reimage. The key features of ephemeral disks are the following:
- Ideal for stateless applications and workloads.
- Supported by the Azure Marketplace, custom images, and Azure Compute Gallery
- Ability to fast reset or reimage virtual machines and scale set instances to the original boot state.
- Lower latency, similar to a temporary disk.
- Ephemeral OS disks are free; you incur no storage cost for OS disks.
- Available in all Azure regions.
The following table remarks the main differences between persistent and ephemeral OS disks:
|
Persistent OS Disk |
Ephemeral OS Disk |
Size limit for OS disk |
2 TiB |
Cache size or temp size for the VM size or 2040 GiB, whichever is smaller. For the cache or temp size in GiB, see DS, ES, M, FS, and GS |
VM sizes supported |
All |
VM sizes that support Premium storage such as DSv1, DSv2, DSv3, Esv3, Fs, FsV2, GS, M, Mdsv2, Bs, Dav4, Eav4 |
Disk type support |
Managed and unmanaged OS disk |
Managed OS disk only |
Region support |
All regions |
All regions |
Data persistence |
OS disk data written to OS disk are stored in Azure Storage |
Data written to OS disk is stored on local VM storage and isn't persisted to Azure Storage. |
Stop-deallocated state |
VMs and scale set instances can be stop-deallocated and restarted from the stop-deallocated state |
Not Supported |
Specialized OS disk support |
Yes |
No |
OS disk resize |
Supported during VM creation and after VM is stop-deallocated |
Supported during VM creation only |
Resizing to a new VM size |
OS disk data is preserved |
Data on the OS disk is deleted, OS is reprovisioned |
Redeploy |
OS disk data is preserved |
Data on the OS disk is deleted, OS is reprovisioned |
Stop/ Start of VM |
OS disk data is preserved |
Not Supported |
Page file placement |
For Windows, page file is stored on the resource disk |
For Windows, page file is stored on the OS disk (for both OS cache placement and Temp disk placement). |
Maintenance of VM/VMSS using healing |
OS disk data is preserved |
OS disk data is not preserved |
Maintenance of VM/VMSS using Live Migration |
OS disk data is preserved |
OS disk data is preserved |
Placement options for Ephemeral OS disks
You can store ephemeral OS disks on the virtual machine's OS cache disk or temporary storage SSD (also known as resource disk). When deploying a virtual machine or a virtual machine scale set, you can use the DiffDiskPlacement property to specify where to place the Ephemeral OS disk, whether in the cache or resource disk.
Size requirements
As mentioned above, you can choose to deploy ephemeral OS disks on the VM cache or VM temp disk. The image OS disk's size should be less than or equal to the temp/cache size of the VM size chosen. For example, if you want to opt for OS cache placement, the Standard Windows Server images from the Marketplace are about 127 GiB, meaning that you need a VM size with a cache equal to or larger than 127 GiB. The Standard_DS3_v2 has a cache size of 127 GiB, which is large enough. In this case, the Standard_DS3_v2 is the smallest size in the DSv2 series that you can use with this image.
If you want to opt for Temp disk placement: the Standard Ubuntu server image from Marketplace is about 30 GiB. The temp disk size must be equal to or larger than 30 GiB to enable Ephemeral OS disk on the temporary storage. Standard_B4ms has a temporary storage size of 32 GiB, which can fit the 30 GiB OS disk. Upon creation of the VM, the temp disk space would be 2 GiB.
If you place the ephemeral OS disk in the temporary storage disk, the final size of the temporary disk will equal the initial size of the temporary disk size minus the OS image size. If you place the ephemeral OS disk in the temporary storage disk, the final size of the temporary disk will equal the initial size of the temporary disk size minus the OS image size. In addition, the ephemeral OS disk will share the IOPS with the temporary storage disk as per the VM size you selected. Ephemeral disks also require that the VM size supports Premium storage. The sizes usually have an s in the name, like DSv2 and EsV3. For more information, see Azure VM sizes for details around which sizes support Premium storage.
Note
Ephemeral disks will not be accessible through the portal. You will receive a "Resource not Found" or "404" error when trying to access an ephemeral OS disk.
Unsupported features
Ephemeral OS disks do not support the following features:
- Capturing VM images
- Disk snapshots
- Azure Disk Encryption
- Azure Backup
- Azure Site Recovery
- OS Disk Swap
AKS and Ephemeral OS disks
Azure automatically replicates data stored in the managed OS disk of a virtual machine to Azure storage to avoid data loss in case the virtual machine needs to be relocated to another host. Generally speaking, containers are not designed to have local state persisted to the managed OS disk, hence this behavior offers limited value to AKS hosted while providing some drawbacks, including slower node provisioning and higher read/write latency. There are a few exceptions where Kubernetes pods may need persisting data to the local storage of the OS disks:
- EmptyDir: an emptyDir volume is created when a pod is assigned to an agent node and exists as long as that pod is running on that node. As the name says, the emptyDir volume is initially empty. All containers in the pod can read and write the same files in the emptyDir volume, even if the volume can be mounted at the same or different paths in each container. When a pod is removed from a node, the data in the emptyDir is deleted permanently. EmptyDir volumes can be used in the following scenarios:
- Checkpointing long computation or data sorting for recovery from crashes
- Temporary storage area for application logs
Depending on your environment, emptyDir volumes are stored on any storage system used by agent nodes such as a managed disk or a temporary storage SSD, or network storage. As we will see in the remainder of this article, AKS provides options to store emptyDir volumes in the OS disk or temporary disk of an agent node.
- HostPath: hostPath volume mounts a file or directory from the host agent node's filesystem into a pod. HostPath volumes present many security risks, and it is a best practice to avoid using this kind of volume whenever possible. When a HostPath volume must be used, it should be scoped to only the required file or directory, and mounted as ReadOnly. Here are a few situations where using a hostPath volume:
- Running a container that needs access to Docker internals; use a hostPath of /var/lib/docker
- Running cAdvisor in a container; use a hostPath of /sys
Ephemeral OS disks are stored only on the host machine, hence they provide lower read/write latency, along with faster node scaling and cluster upgrades.
When a user does not explicitly request managed OS disks (e.g. using the --node-osdisk-type Managed parameter in an az aks create or in an az aks nodepool add command), AKS will default to ephemeral OS disks whenever possible for a given node pool configuration. The first requisite to using ephemeral OS disks are choosing a VM series for this feature, the second requisite is making sure that the OS disk can fit in the VM cache or temporary storage SSD. Let's make a couple of examples with two different VM series:
DSv2-series
The general purpose DSv2-series supports or does not support the following features:
- Premium Storage: Supported
- Premium Storage caching: Supported
- Live Migration: Supported
- Memory Preserving Updates: Supported
- VM Generation Support: Generation 1 and 2
- Accelerated Networking: Supported
- Ephemeral OS Disks: Supported
- Nested Virtualization: Not Supported
This VM Series supports both VM cache and temporary storage SSD. High Scale VMs like DSv2-series that leverage Azure Premium Storage has a multi-tier caching technology called BlobCache. BlobCache uses a combination of the host RAM and local SSD for caching. This cache is available for the Premium Storage persistent disks and VM local disks. The VM cache can be used for hosting an ephemeral OS disk. When a VM series supports the VM cache, its size depends on the VM series and VM size. The VM cache size is indicated in parentheses next to IO throughput ("cache size in GiB").
Size |
vCPU |
Memory: GiB |
Temp storage (SSD) GiB |
Max data disks |
Max cached and temp storage throughput: IOPS/MBps (cache size in GiB) |
Max uncached disk throughput: IOPS/MBps |
Max NICs |
Expected network bandwidth (Mbps) |
Standard_DS1_v21 |
1 |
3.5 |
7 |
4 |
4000/32 (43) |
3200/48 |
2 |
750 |
Standard_DS2_v2 |
2 |
7 |
14 |
8 |
8000/64 (86) |
6400/96 |
2 |
1500 |
Standard_DS3_v2 |
4 |
14 |
28 |
16 |
16000/128 (172) |
12800/192 |
4 |
3000 |
Standard_DS4_v2 |
8 |
28 |
56 |
32 |
32000/256 (344) |
25600/384 |
8 |
6000 |
Standard_DS5_v2 |
16 |
56 |
112 |
64 |
64000/512 (688) |
51200/768 |
8 |
12000 |
Using the AKS default VM size Standard_DS2_v2 with the default OS disk size of 100 GiB as an example, this VM size supports ephemeral OS disks but only has 86 GiB of cache size. This configuration would default to managed OS disks if the user does not specify explicitly. If a user explicitly requested ephemeral OS disks, they would receive a validation error.
If a user requests the same Standard_DS2_v2 with a 60 GiB OS disk, this configuration would default to ephemeral OS disks: the requested size of 60GiB is smaller than the maximum cache size of 86 GiB.
Using Standard_D8s_v3 with 100 GiB OS disk, this VM size supports ephemeral OS and has 200 GiB of VM cache space. If a user does not specify the OS disk type, the node pool would receive ephemeral OS by default.
When using the Azure CLI to create an AKS cluster or add a node pool to an existing cluster, ephemeral OS requires at least version 2.15.0 of the Azure CLI.
Ebdsv5-series
The memory-optimized Ebdsv5-series supports the following features:
- Premium Storage: Supported
- Premium Storage caching: Supported
- Live Migration: Supported
- Memory Preserving Updates: Supported
- VM Generation Support: Generation 1 and Generation 2
- Accelerated Networking: Supported (required)
- Ephemeral OS Disks: Supported
- Nested virtualization: Supported
The last generation VM series don’t have both a VM cache and temporary storage, they only have a larger temporary storage as shown in the following table.
Size |
vCPU |
Memory: GiB |
Temp storage (SSD) GiB |
Max data disks |
Max temp storage throughput: IOPS / MBps |
Max uncached storage throughput: IOPS / MBps |
Max burst uncached disk throughput: IOPS/MBp |
Max NICs |
Network bandwidth |
Standard_E2bds_v5 |
2 |
16 |
75 |
4 |
9000/125 |
5500/156 |
10000/1200 |
2 |
10000 |
Standard_E4bds_v5 |
4 |
32 |
150 |
8 |
19000/250 |
11000/350 |
20000/1200 |
2 |
10000 |
Standard_E8bds_v5 |
8 |
64 |
300 |
16 |
38000/500 |
22000/625 |
40000/1200 |
4 |
10000 |
Standard_E16bds_v5 |
16 |
128 |
600 |
32 |
75000/1000 |
44000/1250 |
64000/2000 |
8 |
12500 |
Standard_E32bds_v5 |
32 |
256 |
1200 |
32 |
150000/1250 |
88000/2500 |
120000/4000 |
8 |
16000 |
Standard_E48bds_v5 |
48 |
384 |
1800 |
32 |
225000/2000 |
120000/4000 |
120000/4000 |
8 |
16000 |
Standard_E64bds_v5 |
64 |
512 |
2400 |
32 |
300000/4000 |
120000/4000 |
120000/4000 |
8 |
20000 |
Using the Standard_E2bds_v5 with the default OS disk size of 100 GiB as an example, this VM size supports ephemeral OS disks but only has 75 GiB of temporary storage. This configuration would default to managed OS disks if the user does not specify explicitly. If a user explicitly requested ephemeral OS disks, they would receive a validation error.
If a user requests the same Standard_E2bds_v5 with a 60 GiB OS disk, this configuration would default to ephemeral OS disks: the requested size of 60 GiB is smaller than the maximum temporary storage of 75 GiB.
Using Standard_E4bds_v5 with 100 GiB OS disk, this VM size supports ephemeral OS and has 150 GiB of temporary storage. If a user does not specify the OS disk type, the node pool would receive ephemeral OS by default.
Use Ephemeral OS on new clusters
You can configure an AKS cluster to use ephemeral OS disks at provisioning time. For example, when creating a new cluster with the Azure CLI, you can use the --node-osdisk-type Ephemeral parameter in an az aks create command, as shown below:
If you want to create a regular cluster using managed OS disks, you can do so by specifying --node-osdisk-type Managed.
Use Ephemeral OS on existing clusters
You can configure a new node pool to use ephemeral OS disks at provisioning time. For example, when creating a new cluster with the Azure CLI, you can use the --node-osdisk-type Ephemeral parameter in an az aks create command, as shown below:
osDiskType and kubeletDiskType
As we have seen so far, when creating a new AKS cluster or adding a new node pool to an existing cluster, you can use the osDiskType parameter to specify the OS disk type:
- Ephemeral (default): the OS disk is created as an ephemeral OS disk in the VM cache or temporary storage, depending on the selected VM series and size.
- Managed: the OS disk is created as a network-attached managed disk.
Another setting that you can specify is kubeletDiskType. This parameter determines the placement of emptyDir volumes, container runtime data root, and Kubelet ephemeral storage.
- OS (default): emptyDir volumes, container runtime data root, and Kubelet ephemeral storage is hosted by the OS disk, no matter if this is managed or ephemeral.
- Temporary: emptyDir volumes, container runtime data root, and Kubelet ephemeral storage is hosted by the temporary storage.
I conducted some tests and tried out all the possible combinations of the values for the kubeletDiskType and osDiskType parameters to understand how the location of container images, emptyDir volumes, container images, and container logs varies depending on the current selection.
I created four different AKS clusters with the Standard_D8s_v3 VM size and four AKS clusters with the Standard_E16bds_v5 VM size and conducted some tests. Here are the results.
Dsv3-series
The Dsv3-series supports or does not support the following features:
- Premium Storage: Supported
- Premium Storage caching: Supported
- Live Migration: Supported
- Memory Preserving Updates: Supported
- VM Generation Support: Generation 1 and 2
- Accelerated Networking: Supported
- Ephemeral OS Disks: Supported
- Nested Virtualization: Supported
As you can see in the following table, the Standard_D4s_v3 VM size has a temporary storage of 32 GiB and a VM cache of 100 GiB.
|
vCPU |
Memory: GiB |
Temp storage (SSD) GiB |
Max data disks |
Max cached and temp storage throughput: IOPS/MBps (cache size in GiB) |
Max burst cached and temp storage throughput: IOPS/MBps2 |
Max uncached disk throughput: IOPS/MBps |
Max burst uncached disk throughput: IOPS/MBps1 |
Max NICs/ Expected network bandwidth (Mbps) |
Standard_D2s_v32 |
2 |
8 |
16 |
4 |
4000/32 (50) |
4000/200 |
3200/48 |
4000/200 |
2/1000 |
Standard_D4s_v3 |
4 |
16 |
32 |
8 |
8000/64 (100) |
8000/200 |
6400/96 |
8000/200 |
2/2000 |
Standard_D8s_v3 |
8 |
32 |
64 |
16 |
16000/128 (200) |
16000/400 |
12800/192 |
16000/400 |
4/4000 |
Standard_D16s_v3 |
16 |
64 |
128 |
32 |
32000/256 (400) |
32000/800 |
25600/384 |
32000/800 |
8/8000 |
Standard_D32s_v3 |
32 |
128 |
256 |
32 |
64000/512 (800) |
64000/1600 |
51200/768 |
64000/1600 |
8/16000 |
Standard_D48s_v3 |
48 |
192 |
384 |
32 |
96000/768 (1200) |
96000/2000 |
76800/1152 |
80000/2000 |
8/24000 |
Standard_D64s_v3 |
64 |
256 |
512 |
32 |
128000/1024 (1600) |
128000/2000 |
80000/1200 |
80000/2000 |
8/30000 |
Here are the results of my tests with the Standard_D4s_v3 VM size. Let’s see all the possible combinations of the values for the kubeletDiskType and osDiskType parameters and the location of container images, emptyDir volumes, container images, and container logs varies depending on each selection.
|
kubeletDiskType |
|||||||
OS |
Temporary |
|||||||
osDiskType |
Managed |
The root directory / is hosted by the managed disk. This includes the /var/lib/kubelet directory that contains kubelet data and /var/lib/containerd directory that contains container images. The managed disk hosts the OS, emptyDir volumes, writeable layers, container images, and logs. You can run the lsblk command to list the block devices attached to the agent node VM.
The sda device is the managed disk, while the sdb device is the local temporary storage SSD. You can run the ls -alF /mnt command to list the files and directories under the temporary storage SSD.
The emptyDir volume for a pod is in a directory under /var/lib/kubelet/pod/{podid}/volumes/ kubernetes.io~empty-dir/ on the managed disk sda. The total size for the kubelet data (including emptyDir volumes) and containerd data (e.g., container images) is equal to the total size of the managed disk, 100 GiB in this test, minus the space occupied by the Linux OS and other packages.
|
The root directory / is hosted by the managed disk. You can run the lsblk command to list the block devices attached to the agent node VM.
The sda device is the managed disk, while the sdb device is the local temporary storage SSD. You can run the ls -alF /mnt command to list the files and directories under the temporary storage SSD.
You can install and run the tree command to see the files and directories under the /mnt/aks directory in the temporary storage.
The /var/lib/kubelet is a bind mount of the /mnt/aks/kubelet directory. Likewise, /var/lib/containerd is a bind mount of the /mnt/aks/containers directory. The emptyDir volume for any pod is located in a directory under var/lib/kubelet/pods/{podid}/volumes/ kubernetes.io~empty-dir/ that is a mount point of the /mnt/aks/kubelet/pods/{podid}/volumes/
|
|||||
Ephemeral |
The root directory / is hosted by the ephemeral disk in the VM cache. This includes the /var/lib/kubelet directory that contains the kubelet data, and /var/lib/containerd directory that contains container images. Hence the OS, emptyDir volumes, writeable layers, container images, and logs are hosted by the ephemeral disk. You can run the lsblk command to list the block devices attached to the agent node VM.
The sda device is the ephemeral disk in the VM cache, while the sdb device is the temporary storage SSD. You can run the ls -alF /mnt command to list the files and directories under the temporary storage SSD.
The emptyDir volume for a pod is in a directory under /var/lib/kubelet/pods/{podid}/volumes/ kubernetes.io~empty-dir/ on the ephemeral disk sda hosted in the VM cache. The total size for the kubelet data (including emptyDir volumes) and containerd data (e.g., container images) is equal to the total size of ephemeral disk, 100 GiB in this test, minus the space occupied by the Linux operating system and other packages. |
The root directory / is hosted by the ephemeral disk in the VM cache. You can run the lsblk command to list the block devices attached to the agent node VM.
The sda device is the ephemeral disk in the VM cache, while the sdb device is the temporary storage SSD. You can run the ls -alF /mnt command to list the files and directories under the temporary storage SSD.
You can install and run the tree command to see the files and directories under the /mnt/aks directory in the temporary storage.
The /var/lib/kubelet is a bind mount of the /mnt/aks/kubelet directory. Likewise, /var/lib/containerd is a bind mount of the /mnt/aks/containers directory. The emptyDir volume for any pod is located in a directory under /var/lib/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ that is mount point of the /mnt/aks/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ directory on the sdb device hosted by the local temporary storage. The total size for the kubelet data (including emptyDir volumes) and containerd data (including container images) is equal to the total size of the sdb device hosted by the temporary storage, which is ~32 GiB in this test.
|
Observations
- When setting the value of kubeletDiskType equal to OS, the operating system, container images, emptyDir volumes, writable container layers, and container logs are all hosted in the OS disk, no matter if the OS disk is managed or ephemeral.
- When setting the value of kubeletDiskType equal to Temporary, the operating system is hosted by the OS disk, no matter if the OS disk is managed or ephemeral, while container images, emptyDir volumes, and container logs are hosted by the temporary storage.
- When setting the kubeletDiskType to Temporary, kubelet (e.g., pod logs and emptyDir volumes) and containerd (e.g., container images) files are moved to the temporary storage SSD which size depends on the VM size. This configuration may have some caveats: disruptive host failures which could lead to loss of temporary storage disk could end up deleting kubelet data, which would require AKS to remediate (probably by reimaging).
- Temporary storage and VM cache have the same performance characteristics. We can conclude that N GiB of temporary storage cost more than N GiB of VM cache/ephemeral OS disk. Hence, deploying a cluster with osDiskType equal to Ephemeral and kubeletDiskType equal to OS and setting the osDiskSize equal to the maximum VM cache size is the recommended approach in case you need a lot of ephemeral disk space for container images and emptyDir volumes.
Ebdsv5-series
The Standard_E16bds_v5 VM size has a temporary storage of 600 GiB and no VM cache. Let’s see all the possible combinations of the values for the kubeletDiskType and osDiskType parameters and the location of container images, emptyDir volumes, container images, and container logs varies depending on each selection.
|
kubeletDiskType |
|||||||
OS |
Temporary |
|||||||
osDiskType |
Managed |
The root directory / is hosted by the managed disk. This includes the /var/lib/kubelet directory that contains kubelet data, and /var/lib/containerd directory that contains container images. Hence the OS, emptyDir volumes, writeable layers, container images, and logs are hosted by the managed disk. You can run the lsblk command to list the block devices attached to the agent node VM.
The total size of the temporary storage for the Standard_E16bds_v5 VM is 600 GiB, while the OS disk size (osDiskSize) configured for the node pool is 100 GiB. As you can easily observe, the lsblk command returns a size of 100 GiB for the managed OS disk (sda), and 600 GiB for the temporary storage (sdb).
The emptyDir volume for a pod is in a directory under var/lib/kubelet/pods/{podid}/volumes/ kubernetes.io~empty-dir/ on the managed disk sda. The total size for the kubelet data (including emptyDir volumes) and containerd data (container images) is equal to the total size of the managed disk, 100 GiB in this test, minus the space occupied by the Linux operating system and other packages.
|
The root directory / is hosted by the managed disk. You can run the lsblk command to list the block devices attached to the agent node VM.
The total size of the temporary storage for the Standard_E16bds_v5 VM is 600 GiB, while the OS disk size (osDiskSize) configured for the node pool is 100 GiB. As you can easily observe, the lsblk command returns a size of 100 GiB for the managed OS disk (sda), and 600 GiB for the temporary storage (sdb).
You can install and run the tree command to see the files and directories under the /mnt/aks directory in the temporary storage.
The /var/lib/kubelet is a bind mount for the /mnt/aks/kubelet directory. Likewise, /var/lib/containerd is a bind mount of the /mnt/aks/containers directory. The emptyDir volume for any pod is located in a directory under /var/lib/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ that is mount point of the /mnt/aks/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ directory on the sdb device hosted by the local temporary storage. The total size for the kubelet data (including emptyDir volumes) and containerd data (including container images) is equal to the total size of the sdb device hosted by the temporary storage, that is ~600 GiB in this test.
|
|||||
Ephemeral |
The root directory / is hosted by the ephemeral disk. This includes the /var/lib/kubelet directory that contains the kubelet data, and /var/lib/containerd directory that contains container images. Hence the OS, emptyDir volumes, writeable layers, container images, and logs are hosted by the ephemeral disk in the local temporary storage. You can run the lsblk command to list the block devices attached to the agent node VM.
The total size of the temporary storage for the Standard_E16bds_v5 VM is 600 GiB, while the OS disk size configured for the node pool is 100 GiB. As you can easily observe, the lsblk command returns a size of 100 GiB for the sda device which holds the ephemeral OS disk and 500 GiB for the sdb device. Since 100 + 500 = 600, we can conclude that both the sda and sdb devices are hosted in the local temporary storage.
The emptyDir volume for a pod is in a directory under var/lib/kubelet/pods/{podid}/volumes/ kubernetes.io~empty-dir/ on the ephemeral OS disk sda hosted by the temporary storage. The total size for the kubelet data (including emptyDir volumes) and containerd data (container images) is equal to the total size of the ephemeral OS disk, 100 GiB in this test, minus the space occupied by the Linux operating system and other packages. |
The root directory / is hosted by the ephemeral disk in the VM cache. You can run the lsblk command to list the block devices attached to the agent node VM.
The total size of the temporary storage for the Standard_E16bds_v5 VM is 600 GiB, while the OS disk size configured for the node pool is 100 GiB. As you can easily observe, the lsblk command returns a size of 100 GiB for the sda device which holds the ephemeral OS disk and 500 GiB for the sdb device. Since 100 + 500 = 600, we can conclude that both the sda and sdb devices are hosted in the local temporary storage.
You can install and run the tree command to see the files and directories under the /mnt/aks directory in the temporary storage.
The /var/lib/kubelet is a bind mount of the /mnt/aks/kubelet directory. Likewise, /var/lib/containerd is a bind mount of the /mnt/aks/containers directory. The emptyDir volume for any pod is located in a directory under /var/lib/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ that is mount point of the /mnt/aks/kubelet/pods/{podid}/volumes/kubernetes.io~empty-dir/ directory on the sdb device hosted by the local temporary storage. The total size for the kubelet data (including emptyDir volumes) and containerd data (container images) is about 500 GiB, that is the total size of local temporary storage, 600 GiB in this test, minus the space occupied by the ephemeral OS disk (sda), 100 GiB in this test. |
Observations
- Using a recent VM series such as Ebdsv5-series allows you to have a unique, larger temporary storage rather than a smaller, separate temporary storage and VM cache like in Dsv3-series
- You can set osDiskType equal to Ephemeral, kubeletDiskType equal to OS, and the osDiskSize equal to the maximum temporary storage size.
- Alternatively, you can set osDiskType equal to Managed to host the operating system on a premium SSD whose size and performance tier depend on the OS disk size and dedicate all the temporary disk to the kubelet data (including emptyDir volumes) and containerd data (including container images) by setting kubeletDiskType equal to Temporary.
Show osDiskType and kubeletDiskType for an existing cluster
You can run the following Azure CLI command to find out the osDiskType and kubeletDiskType for each node pool of an existing AKS cluster:
The command returns the osDiskType and kubeletDiskType for the node pools of the selected AKS cluster:
Show osDiskType and kubeletDiskType for a node pool
You can run the following Azure CLI command to find out the osDiskType and kubeletDiskType for a given node pool of an existing AKS cluster:
The command returns the osDiskType and kubeletDiskType for the specified node pool:
EmptyDir Test
You can use the following YAML manifest to create a pod with three containers, each mounting the same emptyDir volume using a different mount path and writing a separate file to the same directory.
Next Steps
- Configurations for local ephemeral storage | Kubernetes Documentation
- KubeletDiskType | Azure REST API Documentation
- Microsoft.ContainerService/managedClusters/agentPools Resource Provider | Azure Documentation
Thanks
Thanks for reading this article! If you have any feedback, please write a comment below or submit an issue or a pull request on GitHub. If you found this article and companion sample useful, please like the article below and give a star to the project on GitHub, thanks.
Conclusion
The recommended configuration is using the osDiskType equal to Ephemeral, kubeletDiskType equal to OS, and the osDiskSize equal to the maximum VM cache or temporary storage size, depending on the VM series selected for the agent nodes.