CosmosDB change feed is supported in ADF now

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

You can use Azure Data Factory to get data from Azure Cosmos DB change feed intuitively by enabling it in the mapping data flow source transformation.

CosmosDB change feed.png

With these 2 properties checked, you can get changes and apply transformations before loading transformed data into destination datasets of your choice. You do not have to use Azure functions to read the change feed and then write custom transformations. You can use this option to move data from one container to another, prepare change feed driven material views for fit purpose or automate container backup or recovery based on change feed, and enable many more such use cases using visual drag and drop capability of Azure Data Factory.

Change feed (Preview): If checked, you will get data from Azure Cosmos DB change feeds which is a persistent record of changes to a container in the order they occur from last run automatically.
Start from beginning (Preview): If checked, you will get initial load of full snapshot data in the first run, followed by capturing changed data in next runs. If not checked, the initial load will be skipped in the first run, followed by capturing changed data in next runs. The setting is aligned with the same setting name in Cosmos DB reference.

Make sure you keep the pipeline and activity name unchanged, so that the checkpoint can be recorded by ADF for you to get changed data from the last run automatically. If you change your pipeline name or activity name, the checkpoint will be reset, which leads you to start from beginning or get changes from now in the next run.

When you debug the pipeline, this feature works the same. Be aware that the checkpoint will be reset when you refresh your browser during the debug run. After you are satisfied with the pipeline result from debug run, you can go ahead to publish and trigger the pipeline. At the moment when you first time trigger your published pipeline, it automatically restart from the beginning or from now on.

In the monitoring section, you always have the chance to rerun a pipeline. When you are doing so, the changed data is always captured from the previous checkpoint of your selected pipeline run.

Get more details in Copy and transform data in Azure Cosmos DB (SQL API) - Azure Data Factory & Azure Synapse | Microsoft Docs

Leave a Reply Cancel reply