Using Qlik Data Integration to Stream Data Changes from a CDC-Enabled Azure SQL Database

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

This blog is part of the Change Data Capture in Azure SQL Databases Blog Series, which started with the announcement on releasing CDC in Azure SQL Databases in early June 2021. You can view the release announcement here: https://aka.ms/CDCAzureSQLDB

 

Introducing Change Data Capture in Azure SQL Databases

Change data capture (CDC) provides historical change information for a user table by capturing both the fact that Data Manipulation Language (DML) changes (insert / update / delete) were made and the changed data. Changes are captured by using a capture process that reads changes from the transaction log and places them in corresponding change tables. These change tables provide a historical view of the changes made over time to source tables. CDC functions enable the change data to be consumed easily and systematically.

 

CDC is now available in public preview in Azure SQL, enabling customers to track data changes on their Azure SQL Database tables in near real-time. Now in public preview, CDC in PaaS offers a similar functionality to SQL Server and Azure SQL Managed Instance CDC, providing a scheduler which automatically runs change capture and cleanup processes on the change tables.

 

Using Qlik Data Integration for Streaming CDC Change Data to External Targets

Platforms such as Qlik Data Integration (formerly Attunity) can integrate with your CDC-enabled Azure SQL Database to stream data changes to diverse targets.

 

“Joint customers are excited about the potential of leveraging Qlik Data Integration alongside CDC in Azure SQL DB and CDC for SQL MI to securely access more of their valuable data for analytics in the cloud,” said Kathy Hickey, Vice President, Product Management at Qlik Data Integration. “We are happy to announce that in addition to support for Azure SQL MI as a source, the newly available MS-CDC capabilities will also allow us to support Azure SQL DB sources via our Early Access Program. We look forward to partnering with Microsoft on helping customers leverage these capabilities to confidently create new insights from their Azure managed data sources.” 

 

Current Use Case

For this tutorial, we will use Qlik Replicate (formerly Attunity Replicate) to send CDC change data from an Azure SQL Database to a PostgreSQL target. The database in Azure runs a traditional TPCC workload and is enabled for CDC. Apart from that, each table that is tracked for data changes is enabled for CDC. To learn more about enabling and disabling CDC on databases and tables, go here.

Qlik will connect to the source database and will push CDC changes from the change tables to a downstream target, which in our case is a PostgreSQL database.

 

Steps for Sending CDC Data Changes from an Azure SQL Database with Qlik

1. In the Qlik Replicate Web Console, start by creating a task, which involves defining a source endpoint and a target endpoint for your data replication.

2. Target configuration: In the Manage Endpoint Connections section, configure the target database details from New Endpoint Connection. From the General tab, you can edit target database info such as task name, description, role (target / source), type, host, port, username, password, database name. Within the same section, you should configure your Security preferences as well. Test your connection to target before continuing to the next step.

3. Source configuration: In the Manage Endpoint Connections section, configure the source database details from New Endpoint Connection. From the General tab you can edit source database info such as task name, description, role (target / source), type, server, username, password, database name. Test your connection to source before continuing to the next step.

In the Advanced tab, you can enable CDC on all captured tables (e.g. if tables are not already enabled; however, it’s required that the database is enabled for CDC) and to capture DDL events.

4. Once the source and target endpoints have been configured and enabled, create a task, which is a combination of source-target. In the New Task section, add the name and description of a new task. You can select which type of replication you prefer – unidirectional (source -> target), bidirectional (source -> target -> source), log stream, or Qlik cloud landing. Among the task options available, Full Load (i.e. copy all data that currently exists in the selected tables), Apply Changes (i.e. CDC data being captured from change tables), Store Changes (i.e. historical view of changes).

5. Once the task has been completed, select the tables from which you’d like to replicate data changes. As seen below, you can filter the selected tables. You can also create transformations on the data (e.g. concatenate columns) or filter out columns that you’d like to replicate data changes from.

6. To run the replication task, press the Run button.

7. You can view the replication progress live (by default 5 loads in parallel, as one is finished a new one is included).

8. As the replication is occurring, you can check your target database, which will update itself with the data changes. Additionally, in the Qlik web GUI you will see the replication progress.

 

Blog Series for Change Data Capture in Azure SQL Databases

We are happy to continue the bi-weekly blog series for customers who’d like to learn more about enabling CDC in their Azure SQL Databases! This series explores different features/services that can be integrated with CDC to enhance change data functionality.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.