Oracle to Azure PostgreSQL – Flexible Server Incremental Load Using ADF Data Flows – Part 2

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Introduction

This blog post provides a step by step guide for incremental data load from Oracle to Azure Database for PostgreSQL – Flexible Server using Azure Data Factory dataflows. By using this solution there is no additional set up required in the source Oracle database as typically needed using LogMiner or a traditional CDC tool (Change Data Capture).

There are 2 parts in the solution. Part 1 covers creating the Azure Data Factory Dataflows, creating the pipelines and triggers. Part 2 covers the remainder of the solution.

 

Create dataflows in Azure data factory

VenkatMR_0-1683756312987.png

 

The following steps are to be carried out for creating and running the complete incremental load.

 

Source table data from Oracle

In this step use the data sets created in Azure Blob Storage from the Oracle for tables as shown below, it is possible to have a data preview for each step by enabling the debug option.

 

Create a dataflow with source table option as shown below.

 

VenkatMR_1-1683756372425.png

 

Watermark table source data

Repeat the above step for the watermark table and create a dataflow with source option.

VenkatMR_2-1683756406148.png

 

Create a derived column from the source table data.

For creating a derived column in the source data set, use the + sign next to source dataflow and select the Derived Column option.

VenkatMR_3-1683756434455.png

 

Select LAST_MODIFIED for the Derived Column, if ADF identifies the column as a String data type, it can be easily converted to timestamp by using the expressions builder.

VenkatMR_4-1683756475979.png

Expression builder

VenkatMR_5-1683756503475.png

 

Join the watermark table column with the Derived Column from the table

As shown below select the left stream as the Derived Column where the column is from the source table and join the watermark table with the column where the LAST_MODIFIED time is updated in the last run.

In the example below the custom cross join condition is used to filter the incremental data (delta) after the last job run, however it is also possible to use left outer join or inner join based on the conditions

VenkatMR_6-1683756522610.png

Filter the only last modified (Incremental) records

In this step, the object is filtering the data in the table which is greater than the last run value in the watermark table

VenkatMR_7-1683756537067.png

 

Select the columns required for the incremental load

In this step, the object is to select the columns required for the incremental load as a file. It is possible to change the column name or any rule-based mappings during this step.

                                           

VenkatMR_8-1683756561664.png

 

Add Alter Row for the UPSERT in the target

In this step, the objective is to configure an alter row setting to add UPSERT to the data changes that are going to be applied in the target Azure PostgreSQL database:

VenkatMR_9-1683756597279.png

VenkatMR_10-1683756605729.png

 

Move the data to the target database

In this step, select the Sink data to UPSERT the data in the target database i.e Azure PostgreSQL – Flexible Server

VenkatMR_11-1683756635469.png

VenkatMR_12-1683756642401.png

 

Navigate to the Settings tab and make sure that it Allows UPSERT and that the key column selected is the Primary key column. The object is to Update or Insert the data based on the primary key values.

VenkatMR_13-1683756660390.png

 

The data can be inspected and previewed before loading it to the target database by enabling the debug option.

VenkatMR_14-1683756674220.png

 

Data Flow for the Delete Records

Repeat the above steps for the Delete records and ensure that Sink data settings are flagged with Allow Delete and the key column is selected as the Primary Key column.

VenkatMR_15-1683756695779.png

VenkatMR_16-1683756702474.png

 

Update the watermark table in the source Oracle database

Once the dataflow configuration is completed, the last step is to update the watermark table in the Oracle database for the next run to pick up the incremental data.

 

VenkatMR_18-1683756763996.png

 

Create pipelines and triggers.

When the dataflows are correctly created and configured, the next step is to create pipelines and schedule tasks to run the jobs in a given interval, even though ADF supports every 1min intervals, considering the data volume and the time to complete the whole workflow, our recommendation is to scheduled it to every 30mins

VenkatMR_19-1683756795602.png

VenkatMR_20-1683756807354.png

 

Final Thoughts

We hope that this post has helped you to configure Incremental Load using ADF and described scenarios and options for using this solution to migrate your databases successfully. If you have feedback or suggestions for improving this data migration asset, please contact the Azure Databases SQL Customer Success Engineering Team. Thanks for your support!

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.