Just Enough Azure Data Explorer for Architects

This post has been republished via RSS; it originally appeared at: Azure Data Explorer articles.

JUST ENOUGH AZURE DATA EXPLORER FOR CLOUD ARCHITECTS

This article provides a high-level overview of all the key capabilities in Azure Data Explorer, positioning, and resources, and is intended to demystify Azure Data Explorer for cloud architects.

WHAT IS AN ANALYTICAL DATABASE?

Analytical databases are purpose-built and optimized to query and run advanced analytics on large volumes of data with extremely low response times. Modern analytical databases are generally distributed, scalable, fault-tolerant, and feature columnar, compressed formats, and intelligently use in-memory and disk caching technologies. In contrast, transactional databases are optimized for single digit millisecond point reads by key or intra-partition range query, support random CRUD and ACID semantics, and more.

WHAT IS AZURE DATA EXPLORER?

Azure Data Explorer, code name, Kusto, is a low latency, big data, Microsoft proprietary, append-only analytical database. It is feature and functionality rich as a platform – with durable storage, query engine, search engine, proprietary Kusto Query Language, support for ingestion of data in disparate formats from disparate sources, in batch and streaming, with advanced analytics/data science and geospatial analytical capabilities out of the box, server side programmability and support for visualization, scheduling, orchestration and automation.

Azure Data Explorer is mature and heavily used internally at Microsoft and powers many core Azure offerings such as Azure Monitor, Azure Sentinel, Azure Time Series Insights and more. It is fully managed, scalable, secure, robust, and enterprise ready and is a popular platform for log analytics, time series analytics, IoT, and general-purpose exploratory analytics. In a subsequent section, we cover, what Azure Data Explorer is not.

USE CASES FOR AZURE DATA EXPLORER

Azure Data Explorer is a big data solution, ideal for-
Analytics on IoT telemetry
Analytics on all manners of time series data
Analytics on all manners of logs
Analytics on clickstream and any other form of live sources
Text search
Geo-spatial analytics
Advanced analytics
Exploratory environment for data scientists and business/data analysts

Reference architecture is covered further in this article, after the value proposition.

VALUE PROPOSITION AND SALIENT FEATURES

1. LOW LATENCY QUERY ENGINE

Azure Data Explorer leverages innovative, and contemporary compressed columnar and row stores along with a hierarchical cache paradigm, featuring configurable hot and cold caches backed by memory and local disk, with data persistency on Azure Storage. With this architecture, terabytes and petabytes of data can be queried, and results returned within milliseconds to seconds. Refer the white paper for more implementation details.

For applications where performance of analytical queries is critical, Azure Data Explorer is a great fit as a backend analytical database.

2. DISTRIBUTED BY DESIGN

Azure Data Explorer is a big data, cluster computing system comprised of engine nodes that serve out queries and data management service nodes that perform/orchestrate a variety of data related activities including ingestion and shard management. A few key capabilities of distributed systems are partitioning, consistency, replication, scalability and fault tolerance.

Data persisted in Azure Data Explorer is durably backed by Azure Storage that offers replication out of the box, locally within an Azure Data Center, zonally within an Azure Region.
An Azure Data Explorer cluster can be scaled up vertically or down as workload resource requirements change. Azure Data Explorer clusters can be scaled out horizontally and scaled in, manually, or with auto-scale.
From a partitioning perspective, Azure Data Explorer natively partitions by ingestion time with a proprietary formula to prevent hot spotting, but also offers a user configurable partitioning based on data attributes/columns. Out of the box, the service offers options of strong and weak consistency, with strong consistency as the default. All data ingested into Azure Data Explorer is automatically indexed, allowing fast lookup queries.

For analytical applications where scale and reliability are a must, Azure Data Explorer is a great fit.

3. RICH INTEGRATION AND CONNECTOR ECOSYSTEM

Azure Data Explorer has a rich connector eco-system for batch and near real-time ingestion, with Microsoft first party services as well as open source distributed systems.

It supports open source integration systems and aggregator, connector frameworks, like Apache Kafka’s Kafka Connect and ELK stack’s Logstash. It supports Apache Spark for both read and persist, opening up numerous integration possibilities with open source big data systems, whether transactional or analytical.

From a streaming perspective, Azure Data Explorer supports Azure IoT, Azure Event Hub and Apache Kafka. For bulk ingestion or event-driven ingestion, Azure Data Explorer integrates with Azure Storage, with a queued ingestion functionality, and with Azure Event Grid for configurable event driven ingestion.

From an ingestion format perspective, Azure Data Explorer supports a number of formats from CSV and JSON to binary formats like Avro, Parquet and ORC, with support for multiple compression codecs.

It is a common pattern today to curate all enterprise information assets in a data lake. Azure Data Explorer supports the external table paradigm with an integration with Azure Data Lake Store. It also features continuous ingest and continuous export capabilities to Azure Data Lake Store.

Collectively, with the connector eco-system, support for disparate ingestion formats and compression, whether in batch or streaming modes, with support for business workflows, scheduling, and orchestration, Azure Data Explorer is enterprise-ready from an integration perspective.

4. QUERY LANGUAGE AND SERVER-SIDE PROGRAMMABILITY

Azure Data Explorer features a proprietary Kusto Query Language (KQL), that is expressive, and intuitive. KQL supports querying structured, semi-structured, and unstructured(text search) data, all the typical operators of a database query language, typical aggregation and sorting capability, relational query grammar with joins, and hints and more, cross-cluster and cross database queries, and is feature rich from a parsing (json, XML etc) perspective. It supports geospatial analytics and advanced analytics as well.

Azure Data Explorer supports server-side stored functions, continuous ingest and continuous export to Azure Data Lake store. It also supports ingestion time transformations on the server side with update policies, and precomputed scheduled aggregates with materialized views (preview).

Azure Data Explorer with Kusto Query Language capabilities, and its support for ingesting disparate formats, in batch or near real time, allows enterprises to gain insights instantly from data, in its raw form, in previously unthought of ways and proactively, reactively respond. With the server-side capabilities detailed above, Azure Data Explorer supports building common automated analytical application features.

5. ADVANCED ANALYTICS

Azure Data Explorer offers time series capabilities, including a large set of functions from basic element-wise operation (adding, subtracting time series) via filtering, regression, seasonality detection up to anomaly detection and forecasting. Time series functions are optimized for processing thousands of time series in seconds.  Azure Data Explorer also offers clustering plugins for pattern detection that are very powerful for diagnosis of anomalies, and root cause analysis.  You can extend Azure Data Explorer capabilities by embedding python code in KQL query, leveraging Python open source eco-system for ML, statistics, signal processing and a lot more.  Using inline Python you can do in-place machine learning model training, leveraging Azure Data Explorer compute, against data stored within ADX, or train your model anywhere, export it to ADX and use ADX solely for scoring. You can also query ADX from Jupyter/Azure notebooks by using KqlMagic extension from within the notebook. To note, ADX supports ONNX models.

With its support for data science languages and libraries, training and scoring, both in batch and near real time, Azure Data Explorer makes a compelling modern advanced analytics platform.

6. VISUALIZATION

Azure Data Explorer offers visualization out of the box with its Web/Desktop based integrated development environment. It also offers native dashboarding out of the box, with support for a variety of charts, and direct query support. Further, it has native integration with Power BI with support for predicate and projection pushdown and native connectors for Grafana, Kibana and Redash (now Databricks), and ODBC support for Tableau, Sisense, Qlik and more. With KQL used in Jupyter notebooks along with Python, Python visualization libraries can be leveraged as well.

An analytical database platform is incomplete without a complementary and strong visualization story. With its own native dashboarding, charting, and support for popular BI/dashboarding ISV solutions, Azure Data Explorer makes it possible to build visualization rich analytical solutions.

7. AUTOMATED INFORMATION LIFECYCLE MANAGEMENT

Azure Data Explorer offers configurable time duration based hot cache policy with automated eviction upon expiration. It offers configurable time to live/retention policy with automated removal of data upon expiration.

Historical data can be persisted in Azure Data Lake Store, with the “External Table” feature to query via KQL. Azure Data Lake Store offers its own information lifecycle management with tiers, automated retention and archival.

8. SECURITY

Azure Data Explorer offers enterprise grade security and has adoption in security critical industry domains such as healthcare and finance, and governments.

PERIMETER PROTECTION

Azure Data Explorer is a Vnet injectable service and offers a perimeter protection solution with network isolation, with network security group rules and firewall for inbound and outbound access control respectively. It also supports service endpoint to storage to bypass the public internet and optimally route over the Azure backbone.

AUTHENTICATION

Azure Data Explorer supports Azure Active Directory (AAD) authentication out of the box.

AUTHORIZATION

Azure Data Explorer supports access control based on AAD security groups, with Role Based Access Control at a cluster, database, and table row level granularity.

DATA MASKING

Azure Data Explorer supports configurable data masking.

ENCRYPTION

Azure Data Explorer supports it supports encryption over the wire with TLS 1.2 by default, and AES256 bit “at rest” encryption with Microsoft Managed keys (default) or “Bring Your Own Keys” at its durable storage layer (cold cache), with user configurable “at rest” encryption for its hot cache (disks). Intra-cluster shuffle is encrypted with IPSec.

AUDITING

All operations are audited and persisted.

COMPLIANCE CERTIFICATIONS & GDPR

It is rich from a compliance certification perspective and has delete support for GDPR. Information in Azure Data Explorer can be catalogued in Azure Data Catalog to fulfill governance requirements. Azure Data Explorer audit/monitoring data can be ingested into Azure Sentinel or any other SIEM of choice.

9. WORKLOAD ISOLATION AND COST CENTER CHARGEABILITY

Azure Data Explorer supports centralizing ingestion and data engineering in a (leader) cluster and in-place querying federated to follower clusters made possible by Azure Data Share. This simplifies cost center chargeability, and sizing query clusters, preventing resource contention between data engineering and querying workloads with the separate cluster per workload model.

10. DEVELOPER AND OPERATIONS FRIENDLY

Azure Data Explorer has SDKs in C#, Java, Python, Node, R and Go, and a REST interface for management and data operations. From an IDE perspective it offers a Web user interface and a Desktop IDE with built-in powerful intellisense, visualization and query-sharing/collaboration.

Azure Data Explorer offers logging, monitoring, diagnostics and alerting with its Azure Monitor integration. Patching of infrastructure, health monitoring etc are fully managed by the service. Provisioning and deployment automation is simplified through support for Azure DevOps. Due to its managed nature, it does not require a database administrator.

The capabilities detailed above make the service both developer and operations friendly.

11. COST OPTIMIZATION

Azure Data Explorer offers reserved instance pricing, which is cost optimization with a term commitment model. It offers information lifecycle management by automatically evicting expired data from hot cache, and deleting expired data in cold cache as a cost optimization. It offers configurable auto-scale out and scale-in capability to right-size a cluster based on workload in execution. Azure Advisor recommendations are available for optimization of the cluster, including rightsizing. Azure Data Explorer offers a developer SKU and the ability to pause clusters when not in use as further cost optimizations. Review our

12. AVAILABILITY AND DISASTER RECOVERY

Azure Data Explorer is built natively highly available from the ground up, leverages the natively resilient Azure storage for data durability, supports zonal redundancy, and offers an SLA of 99.9% with configurable options for disaster recovery. Azure Data Explorer’s underlying services are provisioned highly available across compute nodes, load balanced as required and are self-healing.

Azure Data Explorer is available in all Azure regions and is one of the foundational services deployed after infrastructure and identity in a region.

For analytical applications that need to be resilient to all manners of failures, Azure Data Explorer is a great fit. Learn more about high availability and disaster recovery in Azure Data Explorer here.

13. AUTOMATION, SCHEDULING & ORCHESTRATION

Azure Data Explorer supports Azure Data Factory, Azure Logic Apps, and Power Automate (previously Microsoft Flow) for data movement/composing business workflows, for scheduling and orchestration.

REFERENCE ARCHITECTURE

RESOURCES

Product Page:  http://aka.ms/AzureDataExplorer
Architecture: https://azure.microsoft.com/en-us/blog/azure-data-explorer-technology-101/
Detailed White-paper: https://azure.microsoft.com/en-us/resources/azure-data-explorer/
Documentation:  https://aka.ms/adx.docs
Blogsite: https://aka.ms/adx.blog
Pricing Page:  https://azure.microsoft.com/en-us/pricing/details/data-explorer/
Cost Estimator:  http://aka.ms/adx.cost
Web Explorer: https://dataexplorer.azure.com/
Tech Community:  https://techcommunity.microsoft.com/t5/Azure-Data-Explorer/bd-p/Kusto
User Voice:  http://aka.ms/adx.uservoice
Training: https://azure.microsoft.com/en-us/updates/learn-everything-you-need-to-know-about-azure-data-explorer/