What’s new in Azure Data Explorer at //Build 2019

This post has been republished via RSS; it originally appeared at: Azure Data Explorer articles.

Build teaser.png

“Azure Data Explorer is built by developers for developers” observed one of our users! Azure Data Explorer, one of the newest and yet, one of the most internally battle-tested advanced analytics Azure services, powering many of your favorite Azure services, is a part of Build 2019. We strive to deliver a service that is fast, powerful and easy-to-use and what better opportunity than at //Build to see if we met that promise!

Peer showcase: Our best customers and partners!

We are super excited to host some of our best customers and partners! Eric Fleischman, VP Engineering at DocuSign, explains how Azure Data Explorer transformed the way DocuSign engineers use and leverage data. Robert Pack, project lead for digitization at BASF showcases how BASF uses advanced analytics and machine learning powered by Azure Data Explorer to realize the “Verbund” concept – Integrated, harmonized operation of hundreds of chemical processes with over millions of sensors, to increase efficiency and minimize waste. See Subhra Bose, CEO of Financial Fabric, describes how financial institutions can get real time visibility into their activities, trades and risks using Financial Fabric’s multi-tenant analytical solution as a service.

Make the most of your Azure Data Lake investment

Azure Data Lake Storage is the heart of Azure Data Analytics. It is where the data lives, and from where it is leveraged via Azure SQL DW, Azure Databricks, and Azure Data Explorer to generate reports, models, and insights. Azure Data Lake Storage Gen2 is built on Azure Blob Storage and enables performant and secure ways to access the data.

Azure Data Explorer is already integrated tightly with Azure Data Lake Storage Gen2, providing fast, cached, and indexed access to data in the lake. Not only can you do one time data ingestion from the lake into an Azure Data Explorer table, but you can also set Azure Data Explorer to watch specific areas in the lake and automatically ingest incoming data, making it immediately available for analytics.

In many cases, the data is streamed directly into the Azure Data Explorer via Event Hubs, Kafka pipes, IoT Hubs and more. There is immense value in being able to analyze fresh data – You can detect quickly, respond quickly, and ensure that you are delivering the best value for your customers. In Azure Data Explorer, you can also leverage update policies to enrich, filter, and beautify the data as it streams in. Some of this data is also required to be persisted in the Data Lake for other analytics use cases. Starting today, you can use continuous export to constantly package the incoming processed data streams to CSV or Parquet files in the lake, archive them, or make them available for other engines and technologies in a cost-effective way.

Azure Data Explorer offers incredible performance by caching data in a highly optimized format on fast storage. In some cases, people may want to look at archived data or data in the lake that was not ingested into Azure Data Explorer. Today at Build, we are announcing the ability to query data in the lake in its natural format using Azure Data Explorer. Simply define this data as an external table in Azure Data Explorer and query it. You can then join/union it with more data from Azure Data Explorer, SQL servers, and more.

Data Science and Machine Learning

Once you start generating value from data, you get an appetite for more. Data Science and Machine Learning are unlocking new value, but the data science function is expensive, therefore productivity is essential.

Azure Data Explorer already provides out of the box advanced analytics tool sets to enable fast and productive analytics: Automatic clustering, regression, anomaly detection, forecasting and pattern detection are all available. These building blocks accelerate the work of the data scientist, but they are not always enough. Today, we are launching in preview, the ability to run custom python code embedded in the query, for ad hoc queries as well as user defined stored queries (see this simple query generating a sine chart using Python) .

The python code is executed in a distributed sandbox close to the data, enabling a high level of parallelism.

Now you can execute in a single interactive query all of the following phases: filtering and cleaning logic, featurization via calculated columns, running your Python algorithm, as well as analyzing and viewing the results.

However, massive model training on the full data may require a lot of compute and run for a significant amount of time. Azure Databricks is typically the best tool for this job. Therefore, we introduced the Azure Data Explorer Spark Connector that makes data stored in Azure Data Explorer available for consumption in a Spark job. You can now push predicates down, and even specify a Azure Data Explorer view to use as an input for the Spark job. When the job is done and the model trained, you can easily upload it back into Azure Data Explorer and use it with the python plugin to score new data, either ad hoc or as it streams in.

Get Started with Azure Data Explorer

We created a new starter dev/test configuration that runs for less than $0.30 per hour and can be suspended whenever you are not actively using it. Get one going!

Come to our Sessions or watch it online - Learn to shorten the digital feedback loop with Azure Data Explorer – BRK3099

The entire Azure Data Explorer team looks forward to meeting you and helping you meet your unique needs – come by our booth! Just look for the Azure Data Explorer booth in the “Data & Cloud Analytics” area. You can also always reach us at Stack overflow , Twitter, and tech community.

Happy //build-ing!

Leave a Reply Cancel reply