Azure Data Explorer Newsletter March 2020

This post has been republished via RSS; it originally appeared at: Azure Data Explorer articles.

As Azure Data Explorer (ADX) recently celebrated its one-year GA anniversary on February 9th, and for the first year anniversary this is a good time to start a monthly newsletter highlighting the progress of the service. Since this is the first edition, it will cover the many new features that have been added over the last 12 months.

 

Over the last year we focused on (1) addressing enterprises compliance needs (2) cost reduction and performance (3) expanding the ingestion interfaces (4) creating deep Azure Data Lake integration (5) new language capabilities (6)  enhancing the data science capabilities and (7) new tools capabilities and visualizations options.

 

Here are more details about the features in production, there are many more enhancements in progress so stay tuned for the next editions -:)

 

Enterprise and Security

Over the last year we worked with many companies to certify Azure Data Explorer for usage with their top secrete data, this required adding the following features: 

Cost reduction and performance

  • New SKUs were added to provide the most optimal VM type for any scenario starting with a Dev/Test cluster at the cost of ~$200 per month.
  • ADX RI (Reserved Instances) can now be purchased for one- or three-years commitments with up to 30% discount. Reserved instances can also be bought for the compute and storage used by the cluster to achieve overall discount of ~40% (depending on the cluster setup)
  • Follower database provides a way to run different workloads on the same data using different clusters. See more in the Data lake integration section.

Data ingestion

  • Data pipelines: new connectors offer easy integration of sources like the Azure IoT Hub or open source technologies such as Kafka and Logstash.
  • Azure Data Factory provides multiple ADX connectors that allow to easily ingest and export from/to many data sources.
  • Streaming Ingestion offers better efficiency in the “trickling data” scenario where many tables are ingested with relatively small data size for each table. In addition, streaming ingestion can reduce the ingestion latency to less than a couple of seconds.
  • New data formats were added such as Avro, ORC and Parquet, either uncompressed, in ZIP or GZIP compression. The ADX community also added an Apache Samza Avro export to Azure blob storage which makes it easy to complete the pipeline by setting up an Event Grid data connection to ADX.

Azure Data Lake integration

  • External tables allows to easily query and export from/to the Data Lake. These are special Kusto tables where the data is stored in the Data Lake or in SQL databases and not in the cluster itself. 
  • Continuous data export allows exporting the cluster’s data in a regular interval ensuring that data will not be duplicated to the Data Lake using multiple formats such as parquet, json, and csv.
  • Kusto Spark connector was added to support modern data science and engineering workflows. This connector simplifies integration with Spark products such as Azure Databricks, HD Insight, and other Spark distributions.
  • Follower databases and Azure Data Share integration allows for using new or existing ADX clusters to query data already in a different (leader) ADX cluster. This enable running different workloads on their own dedicated hardware in order to tune performance, create security isolation and easily assign cost to the department who uses the data. 

New Kusto language capabilities

The Kusto query language usage is expanding rapidly and is now exposed in different products and services including Azure Log analytics and Application Insights, Windows advanced threat protection, Episerver, Squaredup, and Azure PlayFab.  Over the last year we added many new functions and operators and made it open source in order to accelerate its adoption. Here is the detailed list:

 

Data science

  • Python and R plugins offers the ability to run python and R code within the query execution. This inline execution of code in the context of the query allows for highly customized queries using ML models and usage of the most popular R and Python package directly on tables, offering highly efficient data processing. Check out this GitHub repo for examples.
  • Python debugging in VSCode provides an integrated, end-to-end development and debugging experience of python and R code within Visual Studio code.
  • New time-series functions: series_pearson_correlation() Calculates the Pearson correlation coefficient of two numeric series inputs.

Tools and Visualization

 

Dashboards and external tools   

  • Many dashboards now have native connectors to ADX including Power BI, Grafana, Tableau, Sisense or Redash.
  • K2Bridge project adds the ability to use Elasticsearch’s Kibana with ADX backend, this is ideal for transitioning existing ELK deployment to ADX. In addition, the Logstash Kusto plugin allow to easily route the data collected by Logstash to ADX.    

Kusto Web Explorer

Kusto web explorer is the web tool for ADX interactive query experience shares many experiences including the intellisense, colorizing, sharing queries and results, dark and light themes with Kusto Explorer, here are a few selected improvements:

 

  • Inline JSON viewer that makes it easy to view and navigate JSON and long text data:

2.png

  • OneClick Ingestion is a unique experience to Kusto Web explorer that provide a curated experience to ingest data from files including auto creation of tables and ingestion mappings:

4.png

  • Format numbers option allows turning off number formatting, this is useful when the number represents identifiers. This and other useful options can be found in the settings->appearance section:
     

    6.png

Kusto Explorer (Windows)

  • Geospatial visualizations can be used with some of the chart types by adding the “kind=map” to the chart properties, here is an example:

7.png

  • IntelliSense has been greatly improved. It is much more powerful and has better documentation. Also, pressing F1 in Kusto Explorer on a specific part of your query opens the applicable operator/function documentation.
  • Exporting query results directly to local disk is now available using the “Run Query into CSV” button:

10.png

  • Code refactoring and code navigation capabilities have been added including ‘rename’, ‘extract let statements’,  ‘go to definition’ and ‘find all references’.    
  • Code Analyzer evaluates the current query and outputs a set of applicable improvement recommendations:
  • Analyze cluster health pre-defined queries are available via right-click on the cluster name in the connection pane or by pressing Ctrl+Shift+F1.

9.png

  • A new card view has been added to better visualize individual rows of a result set. Right click on a row in the result set and select “Show record details” or press Ctrl+F10.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.