Orchestrate and operationalize Synapse Notebooks and Spark Job Definitions from Azure Data Factory

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD. 

 

NOTESynapse notebook and SJD activities were only available in Synapse Pipelines previously. 

 

One of the critical benefits of Synapse notebooks is the ability to use Spark SQL and PySpark to perform data transformations. It allows you to use the best tool for the job, whether it be SQL for simple data cleaning tasks or PySpark for more complex data processing tasks.

 

How to get started with Synapse Notebooks in ADF?

 

1. Add Synapse Notebook activity into a Data Factory pipelines

 

AbhishekNarain_1-1674638533411.png

 

2. Create a connection to Synapse workspace through a new compute Linked Service (Azure Synapse Analytics Artifact)

 

AbhishekNarain_5-1674638816112.png

 

3. Choose an existing notebook to operationalize


AbhishekNarain_6-1674638924720.png

 

Note: If you do not specify 'Spark pool', 'Executor size', etc., it will use the one specified in the notebook. These properties are optional and only provides you additional spark configurations to override these during the operational run.


4. Grant the ADF Managed Identity the "Synapse Compute Operator" permissions to execute a Notebook / SJD in the Synapse Workspace 
     AbhishekNarain_8-1674639505172.png
 
Step 2 (Creation of Azure Synapse Analytics artifact linked service) highlights the Managed Identity Name of the Data Factory that needs to be granted permission to run a notebook / SJD. 
 

5. Monitor the notebook run details by accessing the activity output, which contains "sparkApplicationStudioUrl" that takes you to Synapse Workspace for detailed run monitoring. Notebook "exitValue" is also accessible in the output and can be referenced in the down stream activities.

 

AbhishekNarain_9-1674639603373.png

 

Resources

 

We are always open for feedback so please let us know your thoughts in the comments below or add to our Ideas forum.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.