Announcing Preview of Approximate Percentile Functions for Azure SQL DB and Azure SQL MI

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Approximate query processing was introduced to enable operations across large data sets where responsiveness is more critical than absolute precision. Approximate operations can be used effectively for scenarios such as KPI and telemetry dashboards, data science exploration, anomaly detection, and big data analysis and visualization. Approximate query processing family has enabled a new market of big data HTAP customer scenarios, including fast-performing dashboard and data science exploration requirements.  

 

Today, we are announcing preview of native implementation of APPROX_PERCENTILE in Azure SQL Database and Azure SQL Managed Instance. This function will calculate the approximated value at a provided percentile from a distribution of numeric values.

 

Imagine a table of temperature sensor data which has millions of rows, and you want to find 99th percentile quickly with acceptable rank-based error bound to take quick decisions. These functions can give output in a single pass with less CPU and memory usage. Here is the quick demonstration to show that (we have used order_line table from TPCC database).

 

Exact Percentile calculation for 50th percentile ol_amount for warehouses (ID between 1 to 500)

 

 

 

 

SELECT DISTINCT ol_w_id ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ol_amount) OVER (PARTITION BY ol_w_id) AS MedianCont ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ol_amount) OVER (PARTITION BY ol_w_id) AS MedianDisc FROM order_line WHERE ol_w_id BETWEEN 1 AND 500 ORDER BY ol_w_id

 

 

 

 

Memory grant information from XML plan.

<QueryPlan DegreeOfParallelism="8" MemoryGrant="33230744" CachedPlanSize="120" CompileTime="8" CompileCPU="8" CompileMemory="1008">

Same calculation using new approximate percentile functions.

 

 

 

 

SELECT ol_w_id, APPROX_PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ol_amount) AS MedianCont, APPROX_PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ol_amount) AS MedianDisc FROM order_line WHERE ol_w_id BETWEEN 1 AND 500 GROUP BY ol_w_id ORDER BY ol_w_id

 

 

 

 

Memory grant information from XML plan.

<QueryPlan DegreeOfParallelism="8" MemoryGrant="50688" CachedPlanSize="56" CompileTime="4" CompileCPU="4" CompileMemory="464">

As we can see the memory grant for approximate version is significantly less as compared to exact version. 

 

Further references:

Link to documentation for approximate percentile functions

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.