This post has been republished via RSS; it originally appeared at: Azure Data Explorer Blog articles.
Azure Data Explorer (ADX) is a fully managed, high-performance, big data analytics platform that makes it easy to analyze high volumes of data in near real-time and is an obvious choice for many industries to extract key insights, spot patterns and trends, and create forecasting models.
This blog post focuses on how Contoso supermarket integrated their various data sources into ADX using a data pipeline to analyze data in near real-time.
Data Integration Architecture
Contoso supermarket uses Cosmos DB to collect and store grocery orders received from various stores located across the country and from online orders. Contoso supermarket also uses an IoT hub to gather sensor telemetry data from the freezer monitor and store it in ADX. As data collected from various sources is vital to generate meaningful insights for business decisions, Contoso supermarket decided to use ADX to generate insights and create dashboard reports, which can be leveraged by store managers and the leadership team for better decision-making. To bring data into the central ADX database, Contoso created a data pipeline by integrating Cosmos DB and IoT Hub with ADX. The data received in real-time is used to generate insights and reports as needed.
Below is the data pipeline and integration architecture of Contoso supermarket orders.
This blog post focuses on ADX integration with Cosmos DB to demonstrate data flow from Cosmos DB to the ADX database. Contoso supermarket orders are used as an example to demonstrate data flow into ADX. Let's understand order schema, data mapping, data connection, the flow of orders into ADX, and querying data to generate dashboard reports.
Below is a sample order schema for Contoso supermarket orders from different store locations in its simplified form to highlight the data integration with the ADX cluster.
To demonstrate a data integration pipeline, you need the following pre-requisites setup to create Azure resources.
Azure Subscription: You need an Azure subscription to create Azure resources by using the instructions in this blog post. If you do not have an existing subscription, you can sign up for a free trial at Create Your Azure Free Account Today | Microsoft Azure. You need to have subscription Owner or Contributor and User Access Administrator permission to complete all the steps in this blog post.
Windows PowerShell 7.3.x or above: Make sure you have PowerShell version 7.3 installed or use Azure Cloud Shell from Azure Portal.
Instructions in this blog post use Azure Cloud Shell to run all required commands to create Azure services, grant access permissions, and create Cosmos DB data connection for data flow.
Login to Azure Portal:
Login to the Azure portal using the account with owner or contributor permissions to the subscription. Once you login to the Azure portal, open Azure Cloud Shell to run PowerShell commands to complete the steps in this blog post.
Switch to PowerShell and make sure you have the right PowerShell version.
Install required PowerShell modules to use Azure Cosmos DB and ADX command lets. These modules are readily available on Cloud Shell. If you are not using Cloud Shell, open PowerShell with administrator privileges and install.
Login to Azure Resource Manager: If you are using Azure Cloud Shell this step is optional as you are already logged in to the Azure portal.
Verify Azure subscription: If you have multiple Azure subscriptions make sure you set the right subscription as the current context using the commands below.
Set the current subscription using the command below.
Create a resource group using the name and Azure region of your choice using the commands below.
Create Cosmos DB Account, Database, and Container:
The steps below create an Azure Cosmos DB account with a serverless pricing tier, Orders database, and Orders container. Please refer to Create Azure Cosmos DB resources from the Azure portal | Microsoft Learn for additional details on creating a Cosmos DB account.
Create a Cosmos DB Account by providing the Cosmos DB account name of your choice and executing the commands below. Cosmos DB account name is globally unique and make sure to use a name that does not exist.
Create a Cosmos DB database by executing the commands below.
Create a Cosmos DB Container by executing the commands below
Create ADX Cluster, Orders database, and Orders table.
The steps below create a basic ADX cluster, Orders database, and Orders table. Please refer to Create an Azure Data Explorer cluster and database - Azure Data Explorer | Microsoft Learn for additional details.
Create an ADX Cluster by using the name of your choice and executing the commands below. This creates a basic ADX cluster and assigns System Assigned Identity to the cluster to grant access permissions to the Cosmos DB account and database created earlier.
Assign Role Permissions to the Cosmos DB account by executing the commands below.
The command below assigns Cosmos DB Built-in Data Reader Role (00000000-0000-0000-0000-000000000001) at the database level. Please refer to Configure role-based access control with Azure AD - Azure Cosmos Db | Microsoft Learn for more details on Cosmos DB database permissions.
The command below assigns Azure RBAC built-in role Cosmos DB Account Reader Role (bdf93bf-df7d-467e-a4d2-9458aa1360c8) at the account level. Please refer to Azure role-based access control in Azure Cosmos DB | Microsoft Learn for Azure Cosmos DB built-in RBAC roles. The command below requires Subscription Owner or User Administrator permissions.
Create an Orders database in ADX by executing the commands below.
Create a database table in ADX by executing the below Kusto query in the Kusto query window in the Azure portal.
Go to the Azure portal, open the ADX cluster created above, open the Kusto Query window, and execute the below Kusto query in the Orders database.
Create OrdersMapping by executing the below Kusto query in the Kusto query window in the Azure portal. This creates field mapping to transform the Order document Cosmos DB into an ADX Order table to ingest Orders data.
Create an ingestion batching policy by executing the Kusto query below in the Kusto query window in the Azure portal. This creates an ingestion policy to poll Cosmos DB at the one-minute interval and ingest data into the ADX Order table. This also specifies the number of orders ADX can pull from Cosmos DB in a single batch to ingest into the ADX database table.
Once all the required tables, mapping, and policy are created in the ADX cluster, the next step is to create a Cosmos DB connection ADX Orders database.
Create a Cosmos DB connection in ADX by executing the below command in the Azure Cloud PowerShell window.
Check the connection status.
Once the Cosmos DB connection is created, go to the ADX Orders database in Azure Portal and check the connection status to make sure it is created successfully and healthy.
At this point, the data integration pipeline setup is complete and the ADX is ready to pull new orders from the Cosmos DB database
Upload Orders into the Cosmos DB account
Copy the below Order data JSON paste it into a new notepad window and save it as a .json file.
In the Azure portal, open the Cosmos DB account created earlier and navigate to the Orders database and Orders container use the Data Explorer to upload the Orders JSON file created.
Refresh the Orders container and make sure the Orders entry is created as shown below. Wait for a minute for ADX to pull this new Order data from Cosmos DB and ingest it into the ADX database.
Query data in ADX
In the Azure portal, open the ADX cluster created earlier, open a Kusto Query window, and execute the below Kusto query to see Orders data ingested from the Cosmos DB account.
You can integrate other data sources such as IoT Hub, Event Hub, and Event Grid to ingest data into ADX and create dashboard reports using Azure Data Explorer Dashboards. Jumpstart Agora Contoso Supermarket has full implementation of data integration with IoT Hub and Cosmos DB. Deploy Jumpstart Agora using the instructions provided in the documentation to experience data flow from different data sources.