Monitoring CDC using Extended Events in SQL Server 2022

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Introduction

This Blog details how to comprehensively monitor and effectively troubleshoot Change Data Capture (CDC) using extended events. CDC Performance can be monitored using the following two CDC extended events that were introduced in SQL Server 2022.: 

  • repl_logscan_session 
  • cdc_cleanup_job_status 

These extended events were added to aid in troubleshooting and enable alerting while CDC jobs are running. These debug events provide the customer with a way to prove CDC configuration throughput and expose any errors encountered. 

 

What is Change Data Capture?

Change Data Capture (CDC) is currently available in SQL Server (all supported versions), Azure SQL Managed Instance, and Azure SQL Database. CDC records insert, update, and delete activity that applies to a table. This makes the details of the changes available in an easily consumed relational format. Column information and the metadata that is required to apply the changes to a target environment is captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables. Table-valued functions are provided to allow systematic access to the change data by consumers. Learn more here.   

 

LaReyes314_3-1674687387562.png

 

What do the new SQL 2022 CDC extended events track?

CDC jobs include:

Capture Job - Captures changes into capture table.

Clean-up Job - Removes older data from capture table.

 

LaReyes314_2-1674686655851.png

Each event will cover the different CDC job activities.

  • repl_logscan_session reports on capture job progress.
  • cdc_cleanup_job_status reports on Clean-up job progress 

repl_logscan_session extended event description.

This event reports change transactions as they flow into the change table. The following columns are returned detailing key metrics as data is captured into the change table:

  • The Change Data Capture (CDC) Session ID 
  • The Current phase (1-Read Config., 2-Build Hash Table, 3-Partial Rollback, 4-EOR, 5-Partial Rollback and EOR, 6-Schema Version and 7-Last Scan) of the session***  
  • Phase state enumerates begin phase = 0 end phase=1 
  • Maximum number of transactions to process in each scan cycle max_trans is int with a default of 500. If specified, the value must be a positive integer. 
  • Count of commands processed in phase 7 (last scan per session) 
  • Count of (replicated or tracked) transactions processed, set in phase 2, after which it remains unchanged. 
  • Count of log records scanned through, available in phase 2, after which it remains unchanged. 
  • Count of DDL operations detected, for changes to RE schema, SE schema and article schema. 
  • Log-scan start time. 
  • Starting LSN for the current session, available after the beginning of phase 2 
  • Ending LSN for the session, available after the end of phase 2 
  • The LSN processed, available after the end of phase 2, then kept unchanged. 
  • Earliest CDC LSN in this session, available in phase 7 otherwise 00000000:00000000:0000 
  • Last CDC LSN commit processed, available in phase 7 otherwise 00000000:00000000:0000. 

Replication and CDC Phases***

The CDC process scans the transaction log in 7 distinct phases to completion as follows:

1: Reading configuration
2: First scan, building hash table
3: Second scan (Partial Rollback)
4: Second scan (Error)
5: Second scan (Partial Rollback Error)
6: Schema versioning
7: Last scan
8: Done

When session_id = 0, this value is always "Aggregate".

 

LaReyes314_4-1674687873058.png

 

CDC_cleanup_job_status extended event description.

This event reports change transactions as they are purged to the retention point set for the change table. For long running CDC Cleanup jobs this event will report as each change table is cleared every 5 minutes. The following columns are returned detailing key metrics as data is removed from the change table:

  • The number of rows deleted from the change table. 
  • The object ID of the change table 
  • The Error message number encountered. 
  • The low_water_mark LSN (which identifies rows to be purged) 
  • The number of rows deleted from the LSN_Time_Mapping table. 
  • The Cleanup Job Retention value (time in minutes to keep in the CT table) 
  • The Cleanup Job Threshold value (the delete batch-size) 

The columns listed above along with a reporting timestamp, will be updated every 5 minutes (so long as a process takes 5 minutes).  That is, if a change-table only takes 5 minutes to clear, telemetry will be returned once that table has been cleared. On the other hand, if a table were to take 6 minutes to clear, there would be a row returned after 5 minutes and an additional row with the final number of rows cleared at the end of the cleanup process for the table.  

 

Once the last CDC change table has been cleared, the cleanup process will move on to clear records from the  LSN_Time_Mapping table. The same 5-minute reporting process outlined above until the table is cleared. This design was intended to provide a still alive signal, as well as report on the progress for long running cleanup jobs. 

 

LaReyes314_0-1674689589580.png

 

Monitoring CDC with events Guide

This section will provide step by step instructions and scripts to enable CDC and monitor:

  • Configure CDC environment.
  • CDC with simple inserts
  • CDC cleanup process
  • CDC during a partial rollback
  • CDC during a long running clean-up job.

Configure CDC environment.

Ensure SQL Server Agent is configured.

CLR is required for CDC. Execute the following script to enable CLR:

 

 

USE master; GO --Enable CLR EXEC sp_configure 'clr enabled' , '1'; RECONFIGURE; --Ensure CLR was enabled: EXEC sp_configure 'clr enabled';

 

 

Generate a database by running the following script:

 

 

use [master] GO Create database MyDB; GO --Size the database for testing USE [master] GO ALTER DATABASE [MyDB] MODIFY FILE ( NAME = N'MyDB', SIZE = 102400KB , FILEGROWTH = 256000KB ) GO ALTER DATABASE [MyDB] MODIFY FILE ( NAME = N'MyDB_log', SIZE = 307200KB , FILEGROWTH = 256000KB ) GO

 

 

CDC with simple inserts

Generate a simple generic table by running the following script:

 

 

USE MyDB; GO CREATE TABLE T1( C1 BIGINT PRIMARY KEY, C2 NVARCHAR(22)); --Ensure tables were generated SELECT * from sysobjects where type = 'U';

 

 

Configure the table generate for CDC:

The schema owner of the database needs to be able to see all the CDC components.  An easy workaround is to set the owner at least temporarily to ‘SA’ (it can be switched back to the actual schema owner once all the CDC tables are built.)

The following script shows how to enable CDC on a database:

 

 

-- Enable Database for CDC template exec sp_changedbowner [sa] -- ==== USE MyDB GO EXEC sys.sp_cdc_enable_db ; GO -- ========= -- Enable sample Tables for CDC -- ========= EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'T1', @role_name = NULL, @supports_net_changes = 1 GO --verify cdc tables have been generated. use myDB; select * from cdc.dbo_T1_CT; select * from sysobjects where type = 'U'; --check target table select * from T1;

 

 

The new empty CDC table will display no rows as pictured below.

LaReyes314_2-1674692259976.png

Generate a repl_logscan_session event by running the following script:

 

 

USE master; GO --Update filename and path as appropriate CREATE EVENT SESSION [cdc_test] ON SERVER ADD EVENT sqlserver.repl_logscan_session ADD TARGET package0.event_file(SET filename=N'C:\output\xevents\cdc_test_event.xel') WITH (STARTUP_STATE=OFF) GO

 

 

At this juncture, we are ready to commit some transactions to our table and can watch as CDC runs through each phase of copying changes.

Verify that the event session was generated.

Open SSMS object explorer, the database instance, management, extended events, sessions:

 

LaReyes314_3-1674692496709.png

Right click on the session generated earlier and select start session:

 

LaReyes314_4-1674692530365.png

Then select Watch Live Data. 

 

Note: We have not kicked off the cleanup job, nor have we made any changes yet, so we will just see the capture event scanning.  

Now, right click the columns to show in table. E.g., right click on articles completed and select show in table for desired columns.

Continue adding columns to the table:

 

LaReyes314_5-1674692567352.png

Notice the event displays the log scan session is running checking if there have been any changes to the table without any further configuration!

 

With no inserts to the change table, the session just sows log scans waiting for an update.

LaReyes314_6-1674692623189.png

Generate CDC transactions (open a new SSMS query window) and execute the following script: 

 

Note: Change the cnt1 limit to the desired number of rows (currently set to 255.)

 

 

--Start with the next row count and load table --with ascii character rows --Replace 255 with desired number of rows. --Note: arithmetic overflow at 1000000000 for ---int conversion so limit max rows. -- use myDB; DECLARE @cnt INT, @cnt1 INT select @cnt= (select count(*) from T1)+1; select @cnt1= (select count(*) from T1)+255; WHILE @cnt < @cnt1 BEGIN INSERT INTO T1 VALUES (@cnt, char(@cnt)); SET @cnt = @cnt + 1;

 

 

Switch back to the Live Data tab in SSMS to view telemetry from repl_logscan_session.

LaReyes314_0-1674693040078.png

Observation:

repl_logscan_session event shows: 

  • 0000004A:00008CB8:0001 LSN at the start of the transaction (first_begin_cdc_lsn) 
  • 254 commands (DML Changes) completed (Tran completed). 
  • 0000004A:000094A0:0003 LSN after the transaction was completed (last_commit_lsn). 
  • 760 log records were used from phase 2 (Hash tabled build) with no rollbacks, errors or partial commits and no schema version changes. 

Great! Everything is going well!  

 

CDC cleanup process

After performing steps through the previous section, verify all changes made to our source table “T1” were captured to our change table “cdc.dbo_T1_CT” by executing the following script:

 

 

select * from cdc.dbo_T1_CT;

 

 

LaReyes314_1-1675126344798.png

Check the CDC jobs settings for capture and cleanup by running this procedure:

 

 

EXEC sys.sp_cdc_help_jobs

 

 

The capture job is set to run continuously with a polling interval of 5 seconds.

The clean-up job Retention for clean-up is set too long for our testing (4320 minutes).

 

LaReyes314_2-1674693378753.png

To test the purge process for 3-minute retention reconfigure the cleanup job using the following script:

 

 

EXECUTE sys.sp_cdc_change_job @job_type = N'cleanup', @retention = 3 --three-minute retention GO

 

 

 

In most cases, retention would be longer e.g., the default 3 days (4320 minutes), or weekly, or monthly, or quarterly. 3 minutes to clear the change table makes for a convenient demonstration. Once the cleanup job has completed, we should have completely emptied the table. After these demos have completed, ensure to set the desired cleanup job retention setting as appropriate.

 

Wait at least 3 minutes.

Then run the clean-up job by right clicking on the job and select start job at step:

LaReyes314_3-1674693527396.png

Once the job has completed the status window should show “Success” as pictured below.

LaReyes314_4-1674693561794.png

Execute the script from the beginning of this section to verify the captured changes have been cleared from the change table:

 

LaReyes314_5-1674693639217.png

Obviously, 3-minute retention is not a real-world setting, but we are testing clearing the change tables so 3 minutes will keep our demo moving and provides a good visual.

 

CDC with partial rollback

Using this telemetry event, we can also monitor partial transactions:

SQL Server transaction commands (see more on transaction control language):

Commit

Rollback

Save

 

repl_logscan_session will report scan phase 3 for any partial rollbacks.

 

To generate a partial rollback, the following script will:

  • Start a transaction with begin tran.
  • Run a DML command to update our test table.
  • Generate a save point.
  • Run more DML commands.
  • Rollback any unsaved commands
  • Start another transaction point with another begin tran.
  • Run another command.
  • Run a commit.
  • Run another transaction and commit to clear the initial begin tran.

The script below will generate a transaction partially committed using a save point. The save point ensures that rollbacks will not affect transactions we intend to commit later in the process. That is, the save point maintains our contract with the ACID properties.

(Note: Adjust insert values as needed, the T1 table during this exercise had 819 rows.)

 

 

--Check last row and beyond select *from T1 where C1 > 819; select * from cdc.dbo_T1_CT; ------- BEGIN TRAN; INSERT INTO T1 VALUES (820, 'A'); PRINT 'A'; SAVE TRANSACTION FirstInsert select @@TRANCOUNT INSERT INTO T1 VALUES (821, 'B'); PRINT 'B'; INSERT INTO T1 VALUES (822, 'C'); PRINT 'C'; GO --Roll them back ROLLBACK TRANSACTION FirstInsert select @@TRANCOUNT BEGIN TRAN; --2 INSERT INTO T1 VALUES (823, 'D'); PRINT 'D'; COMMIT --2 select @@TRANCOUNT PRINT 'Commit 2'; INSERT INTO T1 VALUES (824, 'E'); PRINT 'E'; PRINT 'Commit E'; --Note this would also work if we just executed another --commit, but the demo just looks better ending with --a command. select @@TRANCOUNT select *from T1 where C1 > 819; select * from cdc.dbo_T1_CT;

 

 

In SQL Server Management Studio (SSMS) check the Event’s Live Data tab:

LaReyes314_0-1674755552774.png

Taking a closer look at the event columns.

CDC enters scan phase 3 - repl_logscan_session shows scan phase 3 indicating a Partial rollback. The save point issued after the rollback command indicates 1 log record but no commands executed.

 

LaReyes314_1-1674755822223.png

Validate that the save-point transaction was committed into the source table by running the following script.

 

 

select *from T1 where C1 > 819;

 

 

LaReyes314_2-1674755964848.png

Validate that the save-point transaction was committed into the change table by running the following script.

 

 

select * from cdc.dbo_T1_CT;

 

 

LaReyes314_3-1674756286837.png

 

Observations

 

In this example, one change capture article contains 2 updates one with 2 commands and one with just 1 command. That is, A was saved as a partial update for the next commit, while B and C were rolled back. Finally, A and D were committed and finally E was committed closing out the open transaction at the beginning of the script.

 

LaReyes314_5-1674756470934.png

Our partially committed A update made its way through the CDC process into the copy table.

 

CDC during a long running cleanup job.

Execute the following script to generate an event session for cdc_clean_up_job:

Update filename and path as needed.

 

 

CREATE EVENT SESSION [cdc_clean__up_job_test] ON SERVER ADD EVENT sqlserver.cdc_cleanup_job_status ADD TARGET package0.event_file(SET filename=N'C:\output\xevents\cdc_clean__up_job_test_event.xel') WITH (STARTUP_STATE=OFF) GO

 

 

Using SSMS, expand object explorer as described earlier, right-click and select start session:

LaReyes314_6-1674756680231.png

Note: This event will start showing data if and only if the change data capture cleanup job is running.

If a cleanup job runs for more than 5 minutes this event provides a way to validate that the process is not encountering any hindrances and the error column will display any errors encountered.

 

Execute the following script a new table to perform a cleanup job test:

 

 

CREATE TABLE T3( C1 BIGINT PRIMARY KEY, C2 NVARCHAR(100) NULL);

 

 

Execute the following script to enable CDC on the new table T3:

 

 

EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'T3', @role_name = NULL, @supports_net_changes = 1 GO

 

 

Generate enough transactions to require the cleanup job to run for at least 10 minutes by running the following script:

 

 

USE myDB; --Start with the next row count and load table with ascii character rows --Note: arithmetic overflow at 1000000000 for int conversion --so don't go that far. DECLARE @cnt INT, @cnt1 INT select @cnt= (select count(*) from T3)+1; select @cnt1= (select count(*) from T3)+5500000; WHILE @cnt < @cnt1 BEGIN INSERT INTO T3 VALUES (@cnt, '500000'); SET @cnt = @cnt + 1; END;

 

 

Execute the following script to get a starting point before running the cleanup job:

 

 

select * from T3; select * from cdc.dbo_T3_CT;

 

 

LaReyes314_7-1674757161936.png

Using SSMS, Initiate the cleanup job by selecting “Start Job at Step”.

LaReyes314_8-1674757188026.png

Return to the Watch live data tab in SSMS and in 5 minutes the cdc_cleanup_job_status job will add a row to the viewer. Right click on the details fields to show in the table view.

LaReyes314_9-1674757216546.png

After 5 minutes we get a row for this job run with nothing in the cleanup errors column.

 

LaReyes314_10-1674757237791.png

The job ran for just over 20 minutes on the test system.

(Keep in mind, the retention level set was 3 minutes and we waited 3 minutes before running cleanup, therefore the capture table should be empty.)

Verify the cleanup job by running the script:

 

 

select count(*) from T3; select count(*) from cdc.dbo_T3_CT;

 

 

The cleanup job successfully cleared all rows from the capture table entered more than 3 minutes ago.

LaReyes314_11-1674757421165.png

 

Summary

This document reviewed how to monitor the CDC processes using new extended events.

 

Capture Job - Captures changes into capture table.

Clean-up Job - Removes older data from capture table.

LaReyes314_12-1674757526920.png

We demonstrated that repl_logscan_session details capture job progress:

  • 0000004A:00008CB8:0001 LSN at the start of the transaction (first_begin_cdc_lsn) 
  • 254 commands (DML Changes) completed (Tran completed). 
  • 0000004A:000094A0:0003 LSN after the transaction was completed (last_commit_lsn). 
  • 760 log records were used from phase 2 (Hash tabled build) with no rollbacks, errors or partial commits and no schema version changes. 
  • CDC enters scan phase 3 - repl_logscan_session shows scan phase 3 indicating a Partial rollback. The save point issued after the rollback command indicates 1 log record but no commands executed.

    LaReyes314_13-1674757560950.png

We watched 5-minute updates using the cdc_cleanup_job_status to check progress as the cleanup job ran over a 20-minute period.

LaReyes314_14-1674757663348.png

Recommended content.

What is change data capture(CDC)?

Enable and Disable change data capture – SQL Server

Change Data Capture Tables (Transact-SQL) – SQL Server

Sys.sp_cdc_enable_table (Transact-SQL)

Work with Change Data – SQL Server

 

A more real-world example of CDC enabling change data to be consumed easily and systematically is pictured below and is covered here.

LaReyes314_15-1674757783214.png

Feedback and suggestions

If you have feedback or suggestions for improving this CDC asset, please contact the Azure Databases SQL CSE Team (datasqlninja@microsoft.com).

Thanks for your support!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.