TDE High availability with customer-managed key for Azure SQL Databases

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

When using CMK to protect data at rest, customers are responsible for and in a full control of a key lifecycle management (key creation, upload, rotation, deletion), the key used for encryption of the Database Encryption Key (DEK), called TDE protector, is a customer-managed asymmetric key stored in a customer-owned and customer-managed Azure Key Vault (AKV).

 

If the server loses access to the stored Database Encryption Key (DEK) in AKV, in up to 10 minutes a database will start denying all connections with the corresponding error message and change its state to Inaccessible. The only action allowed on a database in the Inaccessible state is deleting it. For more information see Inaccessible TDE protector.

 

Losing access to Key vault for TDE, and when TDE protector become Inaccessible:

There is a portal experience to re-validate key permission and to trigger a workflow to make the database available again. We don’t have an SLA published, however it depends on how large the database is to bring it back online. Its approximately from few min to hours. Within the 30 minutes timeline, we keep the database around by disabling external connection to the database for customer to notice that database lost key access and fix the issue immediately. After 30 minutes we want to move the database to stable state to avoid any issues in the system and to keep the database inaccessible since key is gone/revoked.

 

When a database become inaccessible for more than 30 minutes, the database service tier can play an important role here, for Standard/General Purpose service tiers, the recovery will be much faster than Premium/Business Critical service tiers that’s because the Standard/GP is using a remote storage and will use the attach/detach process retrieve data; while Premium/Business Critical service tiers will trigger a restore process on the backend and the restoration operation has an RTO up to 12 hours (Most database restores finish in less than 12 hours) depending on many factors that can affect the recovery time such as the size of the database, the compute size of the database, number of transaction logs involved, network bandwidth. More information here: Recovery time.

 

To ensure high availability it's highly recommended to configure the server to use two different key vaults in two different regions with the same key material.

 

AKV1.jpg

PowerShell example:

 

# add the key from Key Vault to the server

Add-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName> -KeyId <KeyVaultKeyId>

 

# Confirm server's keys added

Get-AzSqlServerKeyVaultKey -ResourceGroupName <SQLDatabaseResourceGroupName> -ServerName <LogicalServerName>

 

Note: The key in the secondary key vault in the other region shouldn't be marked as TDE protector, and it's not even allowed.

 

AKV2.jpg

 

  • After ensuring that SQL server has access to both Keys in the two different Azure key vaults, you can test by disabling public access on the primary Azure Key Vault that has the TDE protector key and wait for 10 minutes.
    We expect the server to automatically switch to the other linked key with the same thumbprint in the secondary key vault.

 

Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.