How To Convert CSV File Into Array Of JSONs In ADF

Posted by

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Introduction
Azure Data Factory is good for data transformation, in this blog we will discuss how to convert CSV file into Json and explain about the aggregate activity.

Main Idea
In ADF, a JSON is a complex data type, we want to build an array that consists of a JSONs.
The idea is to create a DataFlow and add a key "Children" to the data, aggregate JSONs to build an array of JSONs using the aggregate activity.
We will use a dummy value (constant 1) and by this dummy value we will do the grouping to build the array.

Pre-requisites

we will require:

  • A basic knowledge on ADF including how to create a new pipeline and add activities/ dataflows to a pipeline etc.
  •  Knowing How to save data to blob storage

 

Prepare your data:
Input CSV file:

Sally_Dabbah_0-1658050590147.png
Expected Output:

{children: [
{"key1":"a1", "key2":"b1", "key3":"c1", "key4":"d1"},
{"key1":"a2", "key2":"b2", "key3":"c2", "key4":"d2"},
...
]}

services

we will need:


ADF DataFlow:

Sally_Dabbah_4-1658051527623.png

 

 The settings for the activities in the dataflow:

  • Source:
    Blob storage account, Load the CSV data and select first row as a header.
  • Map Drifted Columns:
    That will give us the ability to perform actions on the columns. 
  • Derived Columns:
    Here, we are adding the dummy column with a constant value of 1, and a children column that will hold the array of JSONs later on. 
    To build the Children Column, under Expressions -> Expression Builder -> click on children -> add 4 sub columns named key1, key2, key3, key4. 
    Sally_Dabbah_1-1658051171923.png

    Click on each key and pass the column as an input (expression) to this key (see the below snip)

    Sally_Dabbah_2-1658051237554.png

    Click on save and finish.



  • Aggregate By Dummy:
    In this activity we will group the data by the dummy column that we added and collect all values under children, that will help us to build the array of JSONs instead of JSON of JSONs. 
    Click on the activity -> group by dummy -> aggregates -> children -> collect(children)
    Sally_Dabbah_3-1658051419025.png
  • Drop Dummy Column:
    Select only children array.
  • Sink:
    Blob storage account, we will write to sink.

    Output:

    Sally_Dabbah_5-1658051865894.png

     

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.