This post has been republished via RSS; it originally appeared at: Azure Data Explorer Blog articles.
Microsoft Defender ATP advanced hunting is a query-based threat-hunting tool that lets you explore up to 30 days of raw data. You can proactively inspect events in your network to locate interesting indicators and entities. The flexible access to data facilitates unconstrained hunting for both known and potential threats. Advanced hunting is based on the Kusto query language. You can use Kusto syntax and operators to construct queries that locate information in the schema specifically structured for advanced hunting.
In some scenarios customers would like to centralize their logs from Microsoft Defender ATP with their other logs into Azure Data Explorer or keep the logs accessible for a longer period or build a customer solutions and visualization around this data. In this article, I am going to provide step by step instructions on how to stream Microsoft Defender ATP advance hunting events to Azure Data Explorer using Event Hub.
Before I begin, a few words about the platform. Azure Data Explorer (ADX) is a lightning fast service optimized for data exploration. It supplies users with instant visibility into very large raw datasets in near real-time to analyze performance, identify trends and anomalies, and diagnose problems. In addition to these amazing capabilities, customers can choose their own data retention period.
Let’s get started with this integration.
Stream Advanced hunting events in Microsoft Defender ATP
First, you are going to setup the streaming of Microsoft Defender hunting events to either a Storage Account (Blob) or to Event Hub.
- Stream advanced hunting events to your Azure storage account
- Stream advanced hunting events to Azure Events hub
For this article, I am going to demonstrate on how to integrate with Event Hub. Integration with Storage account is very similar and uses Event Grid integration.
Let’s focus on event hub message schema to understand in which format you are going to receive the data and how to use that to design the schema in next step. The schema of the events in Event hub is something like –
- Each event hub message in Azure Event Hubs contains list of records that may belong to different tables in ATP.
- Each record contains the event name (as category), the time Microsoft Defender ATP received the event, the tenant it belongs (you will only get events from your tenant), and the event in JSON format in a property called "properties".
- “properties” schema could be different for each record
- Click here to learn more about the schema for each ATP table
Setup Ingestion with Azure Data Explorer
Now that hunting events are being streamed to Event Hub, you are going to setup Azure Data Explorer data connection to build to pipeline to ingest messages into a table.
Before you create the data connection, let’s review the schema that you are going to create to setup the ingestion. Since you are going to get a JSON array and each array can have data for different event, you will need to first get the data into a Staging table (you can give this table a different name) and then fork the data to its individual tables during ingestion by using update policy. I would suggest creating the staging table with following schema and ingestion mapping -
Once the table is created, you can follow the documentation here to setup the data connection in your Azure Data Explorer Cluster. Few notes here –
- You can leave “My data includes routing info” unselected
- Table name is Staging (unless you choose a different name for the table)
- Data format is MULTILINE JSON
- Column Mapping is StagingMapping (This is the name of the mapping you created earlier)
- You may not need to follow the entire documentation which includes generating sample data. You already have data published in Event hub in the first step
Once the data connection is created, you should start receiving data in Staging table. You can run the following query to review all the different events you are receiving the data for (ideally this list would contain all tables that you have selected while setting up streaming configuration from Microsoft defender ATP to event hub) –
Route hunting events to individual tables
Now that you have data available in Staging table, you need to fork this data in its own individual table. This exercise would be divided into three parts –
- Create a function that is going to filter the data in Staging table by each hunting event
- Create a table for each hunting event and populate data from Staging table
- Setup the update policy to populate data into new table during Ingestion from event hub
I am going to provide instructions for DeviceAlertEvents. You can follow similar approach for other events.
Create a function for Device Alert Event
For this, I am going to take an example of one of the tables in Microsoft Defender ATP – “AdvancedHunting-DeviceAlertEvents”
First, lets filter the records from Staging table, that belongs to this specific event –
To find the correct mapping of each column, I used the reference that is documented here. You will find this schema reference for other tables as well.
Also learn more about mv-expand operator in KQL here.
Now that you have your query ready to filter records for DeviceAlertEvents, you are going to create a function in your database.
Create a table for DeviceAlertEvents
Now that you have the function ready which can populate the columns that are required to create DeviceAlertEvents table.
Learn more about .set-or-append here.
Setup the update policy
This is your final step to fork the data into DeviceAlertEvents table during ingestion. With the update policy, any time a new ingestion is going to run on Staging table, the function FilterDeviceAlertEvents is going to be executed and if there is result, that result set is going to be ingested into DeviceAlertEvents table.
Data Retention & Batching on Staging table
For the most part, you are done with streaming Microsoft Defender ATP hunting events in Azure Data Explorer. Before I finish this article, I wanted to provide some more information on how to manage the
Once you have all the tables and functions created for all hunting events, you practically don’t want to retain any data in Staging table outside the ingestion cycle. This would also save cost by not storing any data in the Staging table. To do this, you need to define the retention policy on Staging table as zero –
Ingestion latency (Batching) is the time taken for ingestion before data is available for you to query. By default, if there is no policy defined, Azure Data Explorer will use a default value of 5 minutes as the maximum delay time or 1000 items or total size of 1G for batching.
You can set batching policy at the database level or set a different policy for each table based on your business scenario. If the policy is not set for a certain entity, it will look for a higher hierarchy level policy, if all are set to null the default value will be used.
To do this, you can update the Ingestion Batching policy using the following command –
Please follow the ingestion best practices to optimize your throughput.
With this you should have data being streamed from Microsoft Defender ATP hunting events to Azure Data Explorer and you can now specify different retention for your tables that meets your business requirement.
You can use following resources to learn more about Azure Data Explorer and its query language: