ADF adds TTL to Azure IR to reduce Data Flow activity times

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

ADF has added a TTL (time-to-live) option to the Azure Integration Runtime for Data Flow properties to reduce data flow activity times.

azureir2.png

This setting is only used during ADF pipeline executions of Data Flow activities. Debug executions from pipelines and data preview debugging will continue to use the debug settings which has a preset TTL of 60 minutes.

 

If you leave the TTL to 0, ADF will always spawn a new Spark cluster environment for every Data Flow activity that executes. This means that an Azure Databricks cluster is provisioned each time and takes about 5-7 minutes to become available and execute your job.

 

However, if you set a TTL, ADF will maintain a pool of VMs which can be utilized to spin-up each subsequent data flow activity against that same Azure IR. This reduces the amount of time needed to start-up the environment before your job is executed.

 

ADF will maintain that pool for the TTL time after the last data flow pipeline activity executes. Note that this will extend your billing period for a data flow to the extended time of your TTL. However, your data flow job execution time will decrease because of the re-use of the VMs from the compute pool. The compute resources are not provisioned until your first data flow activity is executed using that Azure IR.

 

Read more about the Azure Integration Runtime here. And here is an ADF Data Flow performance guide to help you optimize your environment.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.