ETL in the Cloud is Made Easy Together with Azure Data Factory and Azure Databricks

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Data engineering in the cloud has emerged as the most crucial aspect of every successful data modernization project in recent years. Without accurate and timely data, business decisions that are based on analytical reports and models can lead to bad results. The life of a data engineer is not always glamorous, and you don’t always receive the credit you deserve. But the importance of the data engineer is undeniable. Microsoft Azure Data Factory's partnership with Databricks provides the Cloud Data Engineer's toolkit that will make your life easier and more productive.

The combination of these cloud data services provides you the power to design workflows like the one above. ADF has built-in facilities for workflow control, data transformation, pipeline scheduling, data integration, and many more capabilities to produce quality data at cloud scale and cloud velocity all from a single pane of glass.

If you are a data developer who writes and debugs Spark code in Azure Databricks Notebooks, Scala, Jars, Python, SparkSQL, etc. you can point to your data routines directly from an ADF pipeline Databricks activity. Now, you can combine that logic with any of the other activities available in ADF including looping, stored procedures, Azure Functions, REST APIs, and many other activities that allow you optimize other Azure services:

ADF provides hooks into your Azure Databricks workspaces to orchestrate your transformation code. So, while you build-up your extensive library of data transformation routines either as code in Databricks Notebooks, or as visual libraries in ADF Data Flows, you can now combine them into pipelines for scheduled ETL pipelines.

If you prefer the more visually-oriented approach to data transformation, ADF has built-in data flow capabilities that provide an easy-to-code UI that allows you to construct complex ETL process like this generic approach to a slowly changing dimension:

Use the ADF visual design canvas to construct ETL pipelines in minutes with live interactive debugging, source control, CI/CD, and monitoring.

Whichever paradigm you prefer, Azure Data Factory provides best-in-class tooling for data engineers who are tasked with solving complex data problems at scale using Azure Databricks for data processing.

Leave a Reply Cancel reply