AKS Design Review Series – Part 1.1: Networking – Ingress / Egress

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

AKS Architecture

Azure Kubernetes Service (AKS) is a managed service, meaning that Microsoft manages the control plane components (i.e., API Server, Controller Manager, etcd DB & Scheduler) for you and offloads significant operational overhead from you as an administrator.

There are also agent nodes that live inside the cluster, which run your application workloads and are customer managed. Those are going to basically get placed into your own secure Virtual Network (VNET) in Azure and they're going to be exposed with private IP addresses.

While AKS is a managed service, it is also worth mentioning that it is not a Platform as a Service (PaaS) component, like for example an Azure App Service. PaaS services are generally fully managed by Microsoft, and you don't have to log in or RDP into a Virtual Machine (VM) to perform any kind of operation.

Because AKS is a managed service, that means the control plane of the cluster, on the left-hand side of the above image, is managed by Microsoft. On the right-hand side the agent nodes where you will be deploying your workloads are managed by you as a customer. You are responsible for things like patching and rebooting the VMs. Besides that, you are responsible for upgrading Kubernetes, whenever a new version of it is available. Finally, you are also responsible for deploying your applications, making sure they are highly available and scalable. There is a sharing of responsibility between Microsoft and you as a customer.

Baseline Architecture for AKS Cluster

Many of the best practices for AKS, presented in this post series, are included in what is called the baseline architecture for AKS. This is a great way to go ahead and get started, because it is a reference implementation of a cluster using all those best practices already pre-contained.

Networking

One of the things that's easy to do in Kubernetes is to deploy a pod, set up a “LoadBalancer” type service, create a tag to attach it to the right pod(s) and expose that out to the internet. With that, you're going to tell Azure to put a public-facing load balancer in front of your service and you're going to get a public IP address to access it. This is not recommended. You don't want to use public facing load balancers at all.

To address that from a governance perspective, you can apply one of the most powerful Azure Policies, which is to force your cluster or any cluster you're creating within your Azure Subscription or management group to only allow the usage of internal load balancers. Using this, you can no longer create an externally facing load balancer service and attach that to a pod. You'll instead have to use an ingress controller, which is a proper way of having that Layer 7 ingress isolation in front of your service and be able to control what's coming in and what's going out of your cluster.

You might also want to have a Web Application Firewall (WAF) like Microsoft provides with Azure Front Door or with Azure Application Gateway, to control the ingress traffic going into your cluster.

Generally, in an AKS cluster you can run two types of workloads:

Processes that expose an external endpoint (e.g., REST APIs)
Processes running inside the cluster and are not exposed externally (e.g., internal services, deamon sets, etc.)

Ingress

Standard recommended architecture for AKS ingress path

Let’s focus on the 1^st scenario and see how that would look like in a standard recommended way for the ingress path. Originally, you would have something like a VNET in Azure, which would be the AKS network where the cluster would be deployed (usually a Spoke VNET, if following a Hub and Spoke network topology).

For public users, ideally, you would have deployed inside the AKS VNET an Application Gateway resource inside its dedicated subnet. This will expose a public IP and you would have your public users connecting to that exposed public IP over the internet. You would also integrate Web Application Firewall (WAF) capabilities in your Application Gateway resource.

Then, also inside the VNET you would have the regular networking infrastructure for AKS, which consists of:

The Standard Load Balancer (SLB) that gets created along with the AKS cluster in its dedicated subnet, which is the entry point of the Kubernetes cluster.
The Kubernetes Subnet, where, ideally, you would have something like an ingress controller deployed in it (e.g., nginx), to route the traffic to the internal Kubernetes services.

This use case is for a regional deployment of the AKS cluster, because every Azure service used is deployed within a particular Azure region. This means that every user that tries to connect to that AKS cluster from the same region will have a much better User Experience (UX), than a user that would try to reach the AKS cluster from an entirely different region, especially when located in a faraway location.

Distributed Denial of Service (DDOS) Protection

Regarding the Public IP that the Application Gateway is exposing and in terms of networking, in Azure you get by default a “Basic Distributed Denial of Service (DDOS)” protection. This is the case for every public IP that you expose in Azure.

If you want, you can get another SKU of the DDOS protection in Azure, which is called the “Standard DDOS”. The difference between the two SKUs is that the Basic one is not tailored to the specific requirements of your workload, so Microsoft is not actually monitoring your workloads. What it does is, at some point, based on the usage patterns of your public IPs, it may decide that a DDOS attack is going on at the data center / regional level and try to apply some counter measures to prevent it.

On the other hand, if that does not fully suit your needs, and you need specific triggering for the counter measures of a potential DDOS attack, based on your workload requirements, then you should consider using the “Standard DDOS” SKU. In this SKU, Microsoft measures the usage patterns of your public IPs and knows over time what to expect from them. If there is a usage pattern in a particular public IP that deviates from what Microsoft expected from it, then it may decide to run counter measures for this potential DDOS attack.

In “Standard DDOS”, Microsoft also covers any costs that were caused by the appliance of autoscaling features in the event of a DDOS attack. For example, if you are experiencing a DDOS attack and start receiving a lot of requests to the public IP exposed by the Application Gateway and you have configured autoscaling for the Application Gateway component, then all the costs of the extra replicas deployed, caused by the DDOS attack, will be covered by Microsoft.

Keep in mind that the Standard DDOS SKU is something that is usually applied at the organizational level and not just on an individual workload, because it is an expensive SKU. With this you can protect over 100 of your public IPs.

Typical Ingress Path – External / Public users

A typical ingress path for a request coming from external users into your Kubernetes cluster, when deployed in the above way would be:

Public end users (the client) send HTTPS requests through public internet to a specific domain name (e.g., X.contoso.com).
1. That name is associated through a DNS A record to the public IP address of the Azure Application Gateway component.
2. This traffic is encrypted to make sure that the traffic between the client browser and gateway cannot be inspected or changed.
Requests arrive at the Public IP exposed by the Application Gateway, where Azure platform applies Basic or Standard DDOS attack protection.
Requests now get routed to the Application Gateway component. Application Gateway has an integrated web application firewall (WAF) component and negotiates the TLS handshake for X.contoso.com, allowing only secure ciphers.
1. Application Gateway is a TLS termination point, as it's required to process WAF inspection rules, and execute routing rules that forward the traffic to the configured backend.
2. The TLS certificate is stored in Azure Key Vault and it's accessed using a user-assigned managed identity integrated with Application Gateway.
3. You can also get the initiating client IP and host name of the original request using x-forwarded-for and x-original-host headers
As traffic moves from Application Gateway to the backend, it's encrypted again with another TLS certificate (e.g., wildcard for *.aks-ingress.contoso.com) as it's forwarded to the internal Standard load balancer (SLB), which will ideally have a private IP.
1. This re-encryption makes sure traffic that is not secure doesn't flow into the cluster subnet.
2. You can also apply NSG rules at the AKS subnet level to make sure you allow only requests coming from the Application Gateway to arrive at the AKS subnet.
From the SLB, encrypted traffic is then routed to the Ingress Controller deployed inside the AKS cluster and this is where Kubernetes networking will take place.
1. The Ingress Controller is another TLS termination point for *.aks-ingress.contoso.com and forwards the traffic to the workload pods over HTTP.
2. The certificates are stored in Azure Key Vault and mounted into the cluster using the Container Storage Interface (CSI) driver.

Note: There are many network hops with this approach and a possible overlap of functionality between Application Gateway and Ingress Controller (chained Layer 7 Load Balancers). If that is a problem for you, consider using Application Gateway Ingress Controller (AGIC) instead (details for this below). WAF capabilities can also be applied to the Ingress Controller.

Typical Ingress Path – Internal / Corporate Users

The second interesting flow is if you have internal corporate users that you want to support connecting to the AKS cluster.

In this case, you could have something like the following scenario:

You are probably going to have a Hub network, considering you are following a Hub and Spoke model for your network topology architecture.
Your AKS cluster will be deployed in a Spoke network.
The two above networks will be connected to each other using a VNET peering.
Your internal corporate users will be connecting to the Hub network, andthis will be done through the use of either an ExpressRoute or a VPN Gateway component.
With the use of the VNET peering between the Hub and the Spoke VNET, they will be able to connect to the AKS cluster, ideally to the private IP of the Application Gateway component.

Ingress Paths Summary

To summarize, typical ingress paths in AKS look like one of the following:

Application Gateway + Azure Load Balancer (Internal) + Ingress controller
- This is the recommended one for a regional deployment of AKS.
Azure Front Door + Azure Load Balancer (Public)
- For public endpoints you can use Azure Front Door to connect your users to those endpoints:
  - Azure Front Door can also do WAF, so it could replace Application Gateway.
  - It is also a global service which can be used in multi-cluster scenarios.
- Current Azure Front Door service requires public backends:
  - Traffic in Azure Load Balancer should be filtered to allow only outbound IPs from AFD service:
    - IP Filters can be applied at NSGs.
    - X-Azure-FDID header filter needs to be applied at Layer 7.
Azure Front Door (Premium) + ALB (Internal) + Ingress controller
- The new version of Azure Front Door will allow us to have private endpoints as backends of the service.
  - Currently it's not a great experience to create private link services from AKS Load Balancer type services because the load balancer resource resides inside the MC_{resourceGroupName}_{aksClusterName}_{aksClusterLocation} resource group and is managed by Azure.
  - There is work under way to be able to decorate services with metadata in AKS to automatically create private endpoints, but this is still under work.
Application Gateway Ingress Controller (AGIC)
- Since many people are already using Application Gateway as a WAF in the ingress path of AKS and since it is a Layer 7 reverse proxy that can implement the same routing capabilities as a Kubernetes Ingress Controller it makes sense to use the Application Gateway only instead
  - Less network hops, no Ingress Controller running in AKS required and Load Balancer is not used.
    - AGIC directly routes requests to pods.
  - Need to choose between add-on and helm deployment:
    - Recommendation would be to use add-on if possible.
    - For multi-tenant scenarios (the Application Gateway component needs to be shared between different AKS clusters or between clusters and non-AKS resources), Helm deployment is required.
    - Because of the way AGIC works, there is usually some downtime / impact between service updates in AKS and routing changes being propagated to AGIC.s
    - There are some recommended mitigations to minimize downtime with AGIC.

Egress

Regarding the egress traffic, which is the traffic that originates from within the AKS cluster and goes outside the cluster, it is recommended to also have some sort of egress traffic control in place for your cluster.

Default Traffic Setup

By default, when you create an AKS cluster, you will get a Public IP associated with the Standard Load Balancer component (SLB). This IP is being used for outbound traffic. If some pod wants to send a request from inside the cluster out to the Internet, then that will require SNAT-ting that connectivity, because for the outbound traffic to reach outside the cluster, a private and not a public IP is required. This, by default, would be the public IP associated with the SLB component. This is something that is not recommended, as you are not able to have any sort of rules to inspect what IPs are you allowed to egress to.

Standard recommended architecture for AKS egress path

Normally, if you follow the hub and spoke network topology model, you should have an Azure Firewall, or any other NVA device of your choice, defined inside the Hub network. Then you should have a custom User Defined Route (UDR) table configured in the AKS subnet, which will instruct all outgoing network traffic that originates from the AKS subnet to flow first through the Firewall (NVA) IP and then go out to the Internet. This UDR would contain the following rule:

0.0.0.0/0: This essentially means for all egress traffic.
- Next Hop: Firewall Private IP (which is inside the Hub)
- Type: Virtual Appliance

If you are forwarding all outbound traffic through the Firewall device in the Hub network, meaning you are essentially SNAT-ting all connectivity to the Public IP of the Firewall device, then you obviously do not need the public IP associated with the SLB, so it is recommended to disable that public IP when you create the AKS cluster.

To do this using the az aks create command, you must pass to the parameter --outbound-type property the value userDefinedRouting. The default is loadBalancer, which causes the creation of the public IP, the association of it with the SLB component and the routing of all egress traffic to the internet through this public IP.

If you choose the recommended userDefinedRouting value, then this will also validate that there is a UDR created and attached to the AKS VNET, and it will not create a public IP associated with the SLB. This means that the subnet that your worker nodes are going to run in, is going to have a UDR on the subnet that pushes the traffic over to the firewall device. If there is no outbound route set there on the subnet, when you try to create the cluster, it will fail. It will also give you a nice warning because you stated with the use of --outbound-type userDefinedRouting, that a UDR must exist there, for the cluster to be created. Now you'll be able to fully control egress out of the cluster properly through your firewall device.

Azure Firewall - SNAT port exhaustion issues

An important thing to note here is that when you use the Firewall for all egress traffic from your cluster, you need also to make sure that you have the right number of IPs associated with the Firewall, to have enough ephemeral ports for SNAT-ting outbound connections. Microsoft is currently recommending 20 public IPs attached to the firewall to limit SNAT port exhaustion issues. This depends a lot on the scenario and how many workloads are using the Firewall.

Egress traffic rules to enable at the Firewall device for cluster nodes in AKS

For this architecture to work properly, there are some traffic rules that needs to be enabled at the Firewall device. Microsoft has a list of all the required FQDNs that need to be enabled at the Firewall. All those sites are owned by Microsoft. Some of them are common mandatory ones that need to be enabled either way for all clusters. Microsoft also has others that depend on which AKS features you decide to enable in your cluster (e.g., use of Azure Monitor for Containers for monitoring purposes, use of Microsoft Defender for Containers, etc.). Always make sure that you review all the required FQDNs that need to be whitelisted for the components that you decide to enable in your AKS cluster.

Egress Paths Summary

To summarize, typical egress paths in AKS look like one of the following:

Firewall
- This is what we already saw above, and it is the recommended one.
Standard Load Balancer
- No Firewall device provisioned so no egress traffic inspection is possible:
  - Azure Load Balancer attached to the AKS cluster is used for outbound traffic to internet, public IPs are attached to the SLB at cluster creation time.
  - Set cluster outboundtype at creation time to LoadBalancer (this is actually the default one).
  - Be careful with SNAT Port exhaustion:
    - Configure number of IPs associated with the SLB and number of ports associated with each node to minimize this:
NAT Gateway
- Use NAT Gateway with AKS cluster to get outbound IPs for internet connections:
  - This allows better scalability than Load Balancers and better port utilization (without any per node reservations).
  - If NAT Gateway is used, no public IPs are required for the Load Balancer anymore.
  - AKS support for NAT Gateway is in preview:
    - Managed NAT Gateway
    - SNAT Port reuse