Internals of TDE Encryption scan

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Transparent Data Encryption (TDE) is a mechanism in SQL Server to secure the data and log files at rest by encrypting them using a key/certificate. Many customers use this feature to protect their data against security threats. Other than knowing that this feature encrypts the data, there are other aspects that would really be useful for you to aware of. Especially about how it may impact your OLTP workload, disk, memory and other things. So, let’s dig straight in.  

 

As a rule of thumb, keep in mind that 

  • All the data in-memory (i.e., Buffer Pool) is always decrypted 
  • All the data on disk is encrypted 

 

When users turn ON encryption on their database using the below command 

 

ALTER DATABASE [testdb] SET ENCRYPTION ON 

 

Two things should happen 

  1. All the existing user data on the files should be encrypted 
  2. Any new writes to the disk (say a new page allocated, file growth etc) should also be encrypted. 

To achieve (1), we start something called a TDE Encryption Scan. More on this shortly. 

To achieve (2), any write to disk during a flush (For example, any form of checkpoint, page evicted from Buffer Pool etc) will be encrypted before the page is written to disk. Before any flush, we see if TDE Encryption is enabled on the database, and if so, we encrypt before we write. As simple as that. 

 

TDE Encryption Scan 

Now, once encryption is turned ON for a database, all the existing user data on the data files should be encrypted. To do this, SQL Server starts something called a TDE Encryption Scan. It is basically a scanner, which goes through each page of each data file to ensure its encrypted. When the scanner completes its scan across all the files, that’s when we say that the database is ‘encrypted’. How the TDE Encryption scan works is crucial because of the effects it can have on the user workload. Let me explain. 

 

Consider a database contains 10 data files, each 100 GB. That’s ~1TB of data. When user turns ON encryption, the goal is to encrypt all this 1TB of data on disk. Now, as soon as encryption is turned ON, we mentioned that any write or flush that happens to disk will be encrypted before the flush. SQL Server utilizes this and the TDE Scanner is built around this concept. Below chart explains how a TDE Encryption scan works.

 

clipboard_image_0.png

 

 

When user turns ON encryption, a SQL Server worker thread (denoted in Blue) will perform an initial validation, start an Asynchronous Encryption Worker thread (denoted in Green) and simply exit. The Asynchronous Encryption worker will then spawn multiple Asynchronous Disk Worker threads (denoted in Orange) and wait for them to finish. Once the Asynchronous Disk Workers finish their task, the Asynchronous Encryption worker will perform a checkpoint, set the final state of database as Encrypted and exit.  

 

How many Asynchronous Disk Worker threads are spawned1 per volume. i.e., If there are 10 data files on 10 different volumes, then 10 workers, and if 10 data files on same volume, then just 1 worker. This is to minimize the impact of the TDE Encryption scan on the disk. For simplicity, lets consider there is 1 thread per data file. What does the Asynchronous Disk Worker do? It just loops through each page of the file (Each page is 8KB), reads it into the memory (In batches of 4 Extents or 32 Pages) and marks them dirty. It also creates a log record (to be replayed in secondary). Simple right? So, where does the encryption occur? Remember that we encrypt pages only during flush? Once all the Asynchronous Disk Workers dirty all the pages of all data files, the Asynchronous Encryption worker who is waiting for these workers to finish, performs a checkpoint which will encrypt all the dirty pages and flush them to disk. This is when all the pages of the database are fully encrypted. 

 

Why is this important? 

We have multiple workers dirtying each page while checkpointing performs the encryption. For this to happen,  

  • All the pages need to be read into the memory (Requires memory and disk resource) 
  • Every page needs to be encrypted (Requires CPU resource) 
  • All the pages need to be flushed to disk (Requires disk resource) 

Performing a TDE Encryption scan is a resource intensive operation as it uses valuable resources on CPU, memory and IO which could otherwise be used for user workload. Naturally, there is an impact to the user workload if a TDE Encryption scan is running.  

The exact same operations happen when an encrypted database is decrypted, or an encryption key is rotated as well. Except during decryption, the pages are not encrypted on flush. The scanner performs the exact same job of dirtying each page of the database. 

 

How do we manage the resources? 

Having the data encrypted as soon as possible is vital in terms of security. But if it has impact on user workloads, users can then schedule the TDE Encryption scan accordingly. Knowing what resources are used by the TDE Encryption scan can help users allocate sufficient resources to their user workloads. Users can also suspend the TDE Encryption scan using the below query 

 

ALTER DATABASE [testdb] SET ENCRYPTION SUSPEND 

 

And resume using 

 

ALTER DATABASE [testdb] SET ENCRYPTION RESUME 

 

The DMV sys.dm_database_encryption_keys surfaces information like the encryption_stateencryption_scan_state etc which can also be very useful to know what the scanner is doing at any point in time. 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.