How to reduce the total cost of ownership (TCO) of your Azure Kubernetes Service (AKS) cluster

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

This article contains a few recommendations for reducing the total cost of ownership (TCO) of your Azure Kubernetes Service (AKS) cluster. If you want to minimize the number of unused cores, you can use the following general guidelines to improve the density of your workloads and reduce the number of VMs to the bare minimum.

Use the cluster autoscaler, Kubernetes Event-Driver Autoscaler (KEDA), and Horizontal Pod Autoscaler to scale in and scale out the number of pods and the number of nodes based on the traffic conditions.
Make sure to properly set requests and limits for pods to avoid assigning too many resources in terms of CPU and memory to the user-defined workloads and improve application density. You can observe the average and maximum consumption of CPU and memory using Prometheus or Container Insights and properly configure limits and quotas for your pods in the YAML manifests, Helm charts, Kustomize manifests for your deployments. For more information, see Best practices for application developers to manage resources in Azure Kubernetes Service (AKS). There are 3^rd party tools like Densify that, by gathering granular container data from frameworks like Prometheus, learning the patterns of activity, and applying policies, can suggest requests and limits for each pod container, optimizing the overall density.
Use ResourceQuota objects to set quotas for the total amount of memory and CPU that can be used by all Pods running in a given namespace to prevent or reduce the likelihood of the noisy neighbor's issue, improve the application density, and reduce the number of agent nodes and hence the total cost of ownership. Likewise, Use LimitRange objects to configure the default requests in terms of CPU and memory for pods running in a namespace. Azure Policy integrates with AKS through built-in policies to apply at-scale enforcements and safeguards on your cluster in a centralized, consistent manner. Follow the documentation to enable the Azure Policy add-on on your cluster and apply the Ensure CPU and memory resource limits policy, ensuring CPU and memory resource limits are defined on containers in an Azure Kubernetes Service cluster.
Use the Vertical Pod Autoscaler (VPA), based on the open-source Kubernetes version, to analyze and set CPU and memory resources required by your pods. Instead of running tests to calculate the optimal CPU and memory requests and limits for the containers in your pods, you can configure vertical Pod autoscaling to provide recommended values for CPU and memory requests and limits that you can use to update your pods manually, or you can configure vertical Pod autoscaling to update the values automatically. When configured, the Vertical Pod Autoscaler (VPA) automatically sets resource requests and limits on containers per workload based on past usage. This ensures pods are scheduled onto nodes with the required CPU and memory resources and can fix the

slack issue (the gap between requested and used CPU) explained in this article: Kubernetes Resource Use and Management in Production. Also, see the following articles on Kubernetes autoscaling and Vertical Pod Autoscaler:
Choose the right VM size for the node pools of your AKS cluster based on the needs in terms of CPU and memory of your workloads. Azure offers many different instance types that match a wide range of use cases, with entirely different CPU, memory, storage, and networking capacity combinations.
Create multiple node pools with different VM sizes for particular purposes and workloads and use Kubernetes node labels, node selector, pod and node affinity and anti-affinity, taints and tolerations, and pod topology spread constraints to place applications on specific node pools to avoid noisy neighbor issues and improve the pod density. Keep node resources available for workloads that require them, and don't allow other workloads to be scheduled on these nodes. Using different VM sizes for different node pools can also be used to optimize costs. For more information, see Use multiple node pools in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Docs
The higher the VM SKU, the higher the number of vCores, and the higher the chance of unused cores. Assuming that the Kubernetes Scheduler and cluster autoscaler do a good job consolidating pods in a set of nodes, there will always be a fraction of unused vCores.
The higher the number of node pools, the higher the chance of unused vCores, as each node pool scales separately.

Here are some more general recommendations to reduce the TCO of an AKS cluster in addition to the previous guidelines and considerations:

Review the Cost optimization section of the Azure Well-Architected Framework for AKS.
Use Azure Advisor to monitor and release unused resources. Find and release any resource not used by your AKS cluster, such as public IPs, managed disks, etc. For more information, see Find and delete unattached Azure managed and unmanaged disks.
Use Microsoft Cost Management budgets and reviews to keep track of expenditures.
Use Azure Reservations to reduce the cost of the agent nodes. Azure Reservations help you save money by committing to one-year or three-year plans for multiple products. Committing allows you to get a discount on the resources you use. Reservations can significantly reduce your resource costs by up to 72% from pay-as-you-go prices. Reservations provide a billing discount and don't affect the runtime state of your resources. After you purchase a reservation, the discount automatically applies to matching resources. You can purchase reservations from the Azure portal, APIs, PowerShell, and Azure CLI.
Add one or more spot node pools to your AKS cluster. As you know, a spot node pool is backed by a spot Virtual Machine Scale Set (VMSS). Using spot VMs for nodes with your AKS cluster allows you to take advantage of unutilized capacity in Azure at significant cost savings. The amount of available unutilized capacity will vary based on many factors, including node size, region, and time of day. When deploying a spot node pool, Azure will allocate the spot nodes if there's capacity available. But there's no SLA for the spot nodes. A spot scale set that backs the spot node pool is deployed in a single fault domain and offers no high availability guarantees. When Azure needs the capacity back, the Azure infrastructure will evict spot nodes, when you create a spot node pool. You can define the maximum price you want to pay per hour and enable the cluster autoscaler, which is recommended for use with spot node pools. Based on the workloads running in your cluster, the cluster autoscaler scales out and scales in the number of nodes in the node pool. For spot node pools, the cluster autoscaler will scale out the number of nodes after an eviction if additional nodes are still needed. For more information, see Add a spot node pool to an Azure Kubernetes Service (AKS) cluster.
System pools must contain at least one node, while user node pools may contain zero or more nodes. Hence, you could set up a user node pool to automatically scale from 0 to N node. Using a horizontal pod autoscaler based on CPU and memory or the metrics of an external system like Apache Kafka, RabbitMQ, Azure Service Bus, etc., you could configure your workloads to scale out and scale in using Kubernetes Event-driven Autoscaling (KEDA).
Your AKS workloads may not need to run continuously, for example, a development cluster with node pools running specific workloads. To optimize your costs, you can completely turn off an AKS cluster or stop one or more node pools in your AKS cluster, allowing you to save on compute costs. For more information, see:

Deploy and manage containerized applications with Azure Kubernetes Service (AKS) running on Ampere Altra Arm-based processors. For more information, see Azure Virtual Machines with Ampere Altra Arm-based processors.
Migrate application workloads written in full .NET Framework, which requires running in Windows containers to .NET Standard. Migrated workloads will run in Linux containers with a smaller footprint in terms of a container image and hence provide better density. This decreases the number of agent nodes required to host and run applications.
For multitenant solutions, physical isolation is more costly and adds management overhead. Logical isolation requires more Kubernetes experience and increases the surface area for changes and security threats, but it shares the costs.

Don't hesitate to write a comment below if you want to suggest additional recommendations to reduce the total cost of ownership of an AKS cluster. I'll include your observations in this article. Thanks!

Leave a Reply Cancel reply