Transform complex JSON structures from CosmosDB to SQL DB with Azure Data Factory

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

 

  1. Start with this JSON collection in ADF based on this Orders dataset. This data has orders from Northwind with order header and order details embedded in a single document per order.
  2. You can load it into CosmosDB as the video above explains, or start with a JSON file in Blob or ADLS as your source in ADF data flow.
  3. In ADF Data Flow, add a Flatten transformation following your source and choose "Details" as the array to unroll. Leave unroll root as empty/default.
  4. This will produce a new structure called "Details". Use a Derived Column next to add a new property to that struct called "totalAmount".
  5. In the Derived Column, choose the root level "Details" field. Click on Expression Builder.
  6. In the middle Functions pane, select the icon next to "Details" to add the expression structure automatically.
  7. in the left-side "Input Schema" pane, click the plus sign on the bottom of the structure and add a new column. Call it "totalAmount"
    h2h.png
  8. The formula for totalAmount is toString(round(toInteger(details.quantity)*toInteger(details.unitPrice),2),'###.##')​
  9. Now you're ready to sink your data to an Azure SQL DB table. In the Sink, set the dataset to Azure SQL DB.
  10.  Set the mapping to look like this:
     h2h2.png
  11. You can leave all of the root-level k/v fields set as they are by default. Choose the individual properties from each structure that you wish to map to a database table column.

The full online documentation for this ADF transformation can be found here.

 

You can download this sample pipeline demo using the ADF Pipeline Template here. Install it in your factory by going to New > Pipeline from template > import template.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.