Deep Dive into Change Data Capture (CDC) in Azure SQL Databases

Posted by

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

This blog is part of the Change Data Capture in Azure SQL Databases Blog Series, which started with the announcement on releasing CDC in Azure SQL Databases in early June 2021. You can view the release announcement here: https://aka.ms/CDCAzureSQLDB 

 

Key Change Data Capture Concepts  

Below is a list of the key CDC components that are worth understanding before enabling CDC at the database and table levels:  

  • Capture process (sp_cdc_scan) – Scans the transaction log for new change data, created when first source table is enabled for CDC.  
  • Cleanup process (sp_cdc_cleanup_change_tables) - Cleanup for all DB change tables based on a time retention value, created when first source table is enabled for CDC.  
  • Change tables - Source tables have associated change tables which have records of change data. 
  • Table-valued functions (fn_cdc_get_all_changes, fn_cdc_get_net_changes) - Enumerate the changes that appear in the change tables over a specified range, returning the information in the form of a filtered result set.  
  • Log sequence number - Identifies changes that were committed within the same transaction and orders those transactions. 
  • Monitoring CDC – Use DMVs such as sys.dm_cdc_log_scan_sessions and sys.dm_cdc_errors. 

Now in public preview, CDC in Azure SQL Databases offers a similar functionality to SQL Server and Azure SQL Managed Instance CDC. However, on Azure SQL Databases, CDC provides a scheduler which automatically runs the capture and cleanup processes, which are run as SQL Server Agent jobs on SQL Server and on Azure SQL Managed Instance.  

 

Enabling Change Data Capture on Azure SQL Databases 

1. Enable CDC at the database level: 

EXEC sys.sp_cdc_enable_db 

System tables (e.g. cdc.captured_columns, cdc.change_tables) and the cdc schema will be created, stored in the same database.  

 

2. Enable CDC at the table level: 

EXEC sys.sp_cdc_enable_table   

@source_schema = N'dbo',   

@source_name   = N'MyTable',   

@role_name     = N'MyRole',   

@filegroup_name = N'MyDB_CT',   

@supports_net_changes = 1   

 

The associated change table (cdc.dbo_MyTable_CT) will be created, along with the capture and cleanup jobs (cdc.cdc_jobs). Be aware that in Azure SQL Databases, capture and cleanup are run by the scheduler, while in SQL Server and Azure SQL Managed Instance they are run by the SQL Server Agent. 

 

3. Run DML changes on source tables and observe changes being recorded in the associated CT tables.  

 

4. Table-valued functions can be used to collect changes from the CT tables.  

 

5. Disable CDC at the table level: 

EXEC sys.sp_cdc_disable_table   

@source_schema = N'dbo',   

@source_name   = N'MyTable',   

@capture_instance = N'dbo_MyTable'   

 

6. Disable CDC at the database level: 

EXEC sys.sp_cdc_disable_db 

 

Limitations for CDC in Azure SQL Databases  

  • In Azure SQL Databases, the following tiers within the DTU model are not supported for Change Data Capture: Basic, Standard (S0, S1, S2). If you want to downgrade a Change Data Capture-enabled database to an unsupported tier, you must first disable Change Data Capture on the database and then downgrade. 
  • Running point-in-time-restore (PITR) on an Azure SQL Database that has Change Data Capture enabled will not preserve the Change Data Capture artifacts (e.g. system tables). After PITR, those artifacts will not be available. 
  • If you create an Azure SQL Database as an AAD user and enable Change Data Capture on it, a SQL user (e.g. even sys admin role) will not be able to disable/make changes to Change Data Capture artifacts. However, another AAD user will be able to enable/disable Change Data Capture on the same database. 

 

Performance implications for CDC in Azure SQL Databases  

The performance impact from enabling change data capture on Azure SQL Database is similar to the performance impact of enabling CDC for SQL Server or Azure SQL Managed Instance. Factors that may impact performance: 

  • The number of tracked CDC-enabled tables. 
  • Frequency of changes in the tracked tables. 
  • Space available in the source database, since CDC artifacts (e.g. CT tables, cdc_jobs etc.) are stored in the same database. 
  • Whether the database is single or pooled. For databases in elastic pools, in addition to considering the number of tables that have CDC enabled, pay attention to the number of databases those tables belong to. Databases in a pool share resources among them (such as disk space), so enabling CDC on multiple databases runs the risk of reaching the max size of the elastic pool disk size. Monitor resources such as CPU, memory and log throughput. 

Consider increasing the number of vCores or upgrade to a higher database tier to ensure the same performance level as before CDC was enabled on your Azure SQL Database. Monitor space utilization closely and test your workload thoroughly before enabling CDC on databases in production. 

 

Blog Series for Change Data Capture in Azure SQL Databases  

We are happy to continue the bi-weekly blog series for customers who’d like to learn more about enabling CDC in their Azure SQL Databases! This series explores different features/services that can be integrated with CDC to enhance change data functionality.  

The next blog will focus on using Azure Data Factory to send Change Data Capture Data to external destinations. 

This articles are republished, there may be more discussion at the original link. But if you found this helpful, you're more than welcome to let us know!

This site uses Akismet to reduce spam. Learn how your comment data is processed.