How to deploy gMSA on AKS with Terraform

This post has been republished via RSS; it originally appeared at: ITOps Talk Blog articles.

The other day I posted a blog on how to deploy an AKS cluster that is ready for Windows workloads using Terraform. Today, I wanted to expand that to include gMSA, which is a highly requested feature from Windows customers running containers on AKS. Obviously, the complexity of the Terraform template grows a lot, so this blog post will provide the details on what is needed for that to work.

 

gMSA requirements and items outside of Terraform scope

Before diving into the Terraform template, it’s important to review the gMSA pre-requisites and what is not part of the scope of Terraform when deploying the Azure resources:

  • Azure resources: As part of the gMSA environment, we need different Azure resources, such as an AKS cluster, Azure Virtual Network, Azure Key Vault, Azure Managed Identity, access for the Managed Identity to the Azure Key Vault, a secret in the Azure Key Vault containing the standard user that retrieves the GMSA, and a Domain Controller. All of these will be created using the Terraform template.
  • Non-Azure resources: To use gMSA, you will need to manually configure Active Directory in the Domain Controller VM. This includes installing the AD role, creating the a new forest with a root domain, and enabling gMSA in AD via the KDS feature. You also need to install the gMSA credential spec on your AKS cluster. These two operations are very sensitive, and the credential spec needs to be configured according to your environment.

A few notes on the Terraform template:

  1. The template deploys a Domain Controller. If your environment has a Domain Controller with Active Directory configured, you can remove this section of the Terraform template. Keep in mind that your AKS cluster needs to be configured with the IP address of the DC, so you will need to change that in the template. Also, make sure you read my other blog post with networking and AD considerations for gMSA on AKS.
  2. The script uses the same username and password for the Windows nodes on AKS and the Domain Controller. This is just so it’s easier for the deployment, but there’s no need to use the same info and you can update the template to use a different one.
  3. The standard user account stored in Azure Key Vault doesn’t exist in AD at the moment on which this script runs – the DC is being created by the script. Make sure you create the user account with the same username and password as you provided when you deployed the template.

Since this is a more complex Terraform template, I invite you to collaborate on it and if you see an opportunity for improvement, please send your suggestions!

 

gMSA on AKS Terraform template

The Terraform deployment has two files. The main.tf file contains the resources to be deployed. The variables.tf file contains the variables used during the deployment. Note that some of the variables’ values are not set in the file, both because you need to define it for the deployment and because some are sensitive, such as passwords.

Here is the main.tf file:

terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "=3.55.0" } } } data "azurerm_client_config" "current" {} data "azurerm_subscription" "current" {} provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = true recover_soft_deleted_key_vaults = false } } } #Creates Azure Resource Group resource "azurerm_resource_group" "rg" { name = var.resource_group location = var.location } #Creates Azure User Assigned Managed Identity resource "azurerm_user_assigned_identity" "managed_identity" { location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name name = "gmsami" } #Creates Azure Key Vault resource "azurerm_key_vault" "akv" { name = "viniapgmsatest" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name tenant_id = data.azurerm_client_config.current.tenant_id soft_delete_retention_days = 90 purge_protection_enabled = false sku_name = "standard" } #Assign reader role to MI on Azure Key Vault resource "azurerm_role_assignment" "mi_akv_reader" { scope = azurerm_key_vault.akv.id role_definition_name = "Reader" principal_id = azurerm_user_assigned_identity.managed_identity.principal_id } #Define AKV access policy for MI resource "azurerm_key_vault_access_policy" "akvpolicy" { key_vault_id = azurerm_key_vault.akv.id tenant_id = data.azurerm_client_config.current.tenant_id object_id = azurerm_user_assigned_identity.managed_identity.principal_id secret_permissions = [ "Get" ] } #Define AKV access for terraform session resource "azurerm_key_vault_access_policy" "tfpolicy" { key_vault_id = azurerm_key_vault.akv.id tenant_id = data.azurerm_client_config.current.tenant_id object_id = data.azurerm_client_config.current.object_id secret_permissions = [ "Get", "List", "Set" ] } #Creates the secret on Azure Key Vault (careful: this is the standard user on your AD) resource "azurerm_key_vault_secret" "gmsa_secret" { name = "gmsasecret" value = "${var.netbios_name}\\${var.gmsa_username}:${var.gmsa_userpassword}" key_vault_id = azurerm_key_vault.akv.id } #Creates Azure Virtual Network resource "azurerm_virtual_network" "vnet" { name = "gmsavnet" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name address_space = ["10.0.0.0/16","10.1.0.0/26"] } #Creates the gMSA Subnet - both pods and Domain Controller will use this subnet resource "azurerm_subnet" "gmsasubnet" { name = "gmsasubnet" resource_group_name = azurerm_resource_group.rg.name virtual_network_name = azurerm_virtual_network.vnet.name address_prefixes = ["10.0.0.0/16"] } #Optional: Creates the Azure Bastion vNEt for RDP into DC01 resource "azurerm_subnet" "AzureBastionSubnet" { name = "AzureBastionSubnet" resource_group_name = azurerm_resource_group.rg.name virtual_network_name = azurerm_virtual_network.vnet.name address_prefixes = ["10.1.0.0/26"] } #Creates a vNIC for the DC VM - remove this if you have an existin DC resource "azurerm_network_interface" "dc01_nic" { name = "dc01_nic" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name ip_configuration { name = "dc01_nic" subnet_id = azurerm_subnet.gmsasubnet.id private_ip_address_allocation = "Dynamic" } } #Creates the DC VM - remove this if you have an existing VM #You need to connect to this VM and finish the Active Directory configuration resource "azurerm_windows_virtual_machine" "dc01" { name = "DC01" resource_group_name = azurerm_resource_group.rg.name location = azurerm_resource_group.rg.location size = "Standard_D4s_v3" admin_username = var.win_username admin_password = var.win_userpass network_interface_ids = [ azurerm_network_interface.dc01_nic.id ] os_disk { caching = "ReadWrite" storage_account_type = "Standard_LRS" } source_image_reference { publisher = "MicrosoftWindowsServer" offer = "WindowsServer" sku = "2022-Datacenter" version = "latest" } } #Creates AKS cluster with Windows profile and gMSA enabled, and uses existing vNet #This is dependable on DC01 VM as we need to set up the DNS primary IP for the Windows nodes resource "azurerm_kubernetes_cluster" "aks" { name = "ContosoCluster" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name dns_prefix = "contosocluster" default_node_pool { name = "lin" node_count = var.node_count_linux vm_size = "Standard_D2_v2" vnet_subnet_id = azurerm_subnet.gmsasubnet.id } windows_profile { admin_username = var.win_username admin_password = var.win_userpass gmsa { dns_server = "10.0.0.4" root_domain = var.Domain_DNSName } } network_profile { network_plugin = "azure" service_cidr = "10.240.0.0/16" dns_service_ip = "10.240.0.10" } identity { type = "SystemAssigned" } depends_on = [ azurerm_windows_virtual_machine.dc01 ] } #Creates Windows node pool on AKS cluster resource "azurerm_kubernetes_cluster_node_pool" "win" { name = "wspool" kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id vm_size = "Standard_D4s_v3" node_count = var.node_count_windows os_type = "Windows" } output "kube_config" { value = azurerm_kubernetes_cluster.aks.kube_config_raw sensitive = true } #Assigns the User assigned Managed Identity to the Windows node pool resource "null_resource" "identity_assign" { provisioner "local-exec" { command = "az vmss identity assign -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --identities /subscriptions/${data.azurerm_subscription.current.subscription_id}/resourcegroups/${azurerm_resource_group.rg.name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${azurerm_user_assigned_identity.managed_identity.name}" } depends_on = [ azurerm_kubernetes_cluster_node_pool.win ] } #Update the VMSS instances resource "null_resource" "vmss_update" { provisioner "local-exec" { command = "az vmss update-instances -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --instance-ids *" } depends_on = [ null_resource.identity_assign ] } #Optional: Creates a public IP address for the Azure Bastion host resource "azurerm_public_ip" "bastion_ip" { name = "bastionip" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name allocation_method = "Static" sku = "Standard" } #Optional: Creates a Bastion Host to connect to the DC VM via RDP resource "azurerm_bastion_host" "gmsa_dc_bastion" { name = "gmsabastion" location = azurerm_resource_group.rg.location resource_group_name = azurerm_resource_group.rg.name ip_configuration { name = "configuration" subnet_id = azurerm_subnet.AzureBastionSubnet public_ip_address_id = azurerm_public_ip.bastion_ip.id } }

Here is the variables.tf file:

variable "resource_group" { type = string description = "Resource group name" default = "58TestRG" } variable "location" { type = string description = "RG and resources location" default = "East US" } variable "node_count_linux" { type = number description = "Linux nodes count" default = 1 } variable "node_count_windows" { type = number description = "Windows nodes count" default = 2 } variable "win_username" { description = "Windows node username" type = string sensitive = false } variable "win_userpass" { description = "Windows node password" type = string sensitive = true } variable "Domain_DNSName" { description = "FQDN for the Active Directory forest root domain" type = string sensitive = false } variable "netbios_name" { description = "NETBIOS name for the AD domain" type = string sensitive = false } variable "SafeModeAdministratorPassword" { description = "Password for AD Safe Mode recovery" type = string sensitive = true } variable "gmsa_username" { description = "Username for the standard domain account" type = string sensitive = false } variable "gmsa_userpassword" { description = "Password for standard domain account" type = string sensitive = true }

With the two files in the same folder, you can run:

az login az account set <subscription ID> terraform init terraform apply

I did not include the -auto-approve flag as you probably want to confirm that everything will run as you expected. Once you have the plan for the deployment, type yes and continue with it.

Now, let me go over the details of this template:

We start by creating a Resource group. The information about name and location for the RG are in the variables.tf file.

Next, we create the auxiliary Azure services (Key Vault and user assigned managed identity). You could use the regular identity from the AKS cluster once it’s deployed. I decided to go with a new one for testing and learning purposes. We then assign the managed identity a reader role to Azure Key Vault, and give it the “Get” permission for secrets. This is what will allow the managed identity to read the standard user account to then connect to AD. We then create the secret on Key Vault. Note that we also give the Terraform session itself list and set permissions to the Key Vault, so it can write the value of the standard user account into the secrets of that Key Vault.

Moving on, we create the Azure virtual network, and two subnets. One for the AKS cluster and Domain Controller VM, and another for Azure Bastion. This last one is optional as you might not need it, but I added it just in case.

To create the Domain Controller VM, we create a network interface associated with the gMSA subnet, and then create the Windows VM on Azure with the vNIC associated with it. Here you can change the size and disk of the VM - depending on your environment and cost limitations. The image used here is a Windows Server 2022 image. While that’s the recommended version, this deployment would work with Windows Server 2019. Keep in mind that you need to RDP/connect into this VM to finish the Active Directory configuration – this is outside the scope of this template.

We then finally create the AKS cluster. This is a standard AKS cluster with a simple default node pool with Linux nodes. Note that the subnet associated with it is the gMSA subnet created earlier. We also use a Windows profile for this cluster and already configure gMSA. IMPORTANT: At this moment, you must indicate the gMSA DNS server and FQDN of the AD root domain. If you have an existing DC that is a DNS server, you should pass on the internal IP address of that machine. This is just like adding a primary (and secondary) DNS server on the IP configuration of a Windows instance. However, if you are using this template for deploying your DC, do not change the DNS Server here. Since the DC VM was the first to be created in the subnet, it gets the first available IP address, which in this case is 10.0.0.4, hence the configuration on the template. For that to work, I set the “depends_on” flag on this resource. (In other words, the AKS cluster is created after the DC VM). Next, the Windows node pool is created with standard configurations. Here you can change the number of Windows nodes and the VM size.

The final steps in the template are to assign the managed identity to the Virtual Machine Scale Set (VMSS) of the Windows node pool and then update it. Since the managed identity has access to the Azure Key Vault, and we’re associating the managed identity to the VMSS, all nodes in that VMSS will be able to access the secret and authenticate with AD.

 

Post installation steps

The template does the heavy lifting of creating the Azure resources for the gMSA to work. As mentioned before, there are additional steps, so let me just go over it once again:

  • Finish the AD preparation on the DC VM.
    • This includes deploying Active Directory itself and configuring the KDS service.
    • You need to create the gMSA account which will be used in the credential spec.
    • You also need to create the standard user account to be stored in the Axure Key Vault.
  • Deploy the credential spec.
    • This is environment and application specific. Just keep in mind that some parameters used in the Terraform template are also needed in the credential spec.

 

Conclusion

It is possible to deploy a gMSA application on Windows containers on an AKS cluster. Automating this process reduces the chances of errors in the future and allows you to set up a CI/CD pipeline. This blog post covered the Terraform deployment of Azure resources for gMSA on AKS to work. It deploys all the Azure resources and configures it, while some environment specific actions are still needed.

I hope this is helpful. No doubt you’ll need to modify the template to your environment. Luckly, you can leverage the ITOpsTalk repo to do that – and even let us know if you have any feedback it by submitting a PR! Let us know what you think!

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.