Announcing General Availability of Approximate Percentile Functions for Azure SQL DB and MI

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Today, we are announcing General Availability (GA) of native implementation of APPROX_PERCENTILE in Azure SQL Database and Azure SQL Managed Instance. We announced preview of these functions in October 2022. Since then, many customers have adopted these for the applications where response time of percentile calculation was more important than the accuracy of the result.

 

These aggregate functions (APPROX_PERCENTILE_CONT & APPROX_PERCENTILE_DISC) need lesser memory and lesser CPU as compared to their precise analytic counterparts (PERCENTILE_CONT & PERCENTILE_DISC). Here is a quick example to demonstrate this.

 

To follow along, you would need to use WideWorldImportersDW sample database and increase the number of rows using below script.

https://github.com/microsoft/bobsql/blob/master/sql2019book/ch2_intelligent_performance/iqp/extendwwidw.sql

 

For this demo, we have imported WideWorldImportersDW-Full.bacpac to a Azure SQL Database – Business Critical tier and made it bigger by using script mentioned above. Let's start with looking at the 95th percentile for order total (including tax) by employee using PERCENTILE_CONT

 

 

DBCC DROPCLEANBUFFERS GO SET STATISTICS TIME,IO ON GO SELECT DISTINCT de.Employee , PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY [Total Including Tax]) OVER (PARTITION BY de.Employee) AS [95 percentile CONT] FROM Fact.OrderHistory AS foh INNER JOIN Dimension.Employee AS de ON foh.[Salesperson Key] = de.[Employee Key] ORDER BY de.Employee; GO SET STATISTICS TIME,IO OFF

 

 

Here is the output (only partial shown to save some electrons).

Statistics IO

 

 

Table 'Employee'. Scan count 1, logical reads 5, physical reads 1, page server reads 0, read-ahead reads 10, page server Table 'OrderHistory'. Scan count 9, logical reads 101000, physical reads 0, page server reads 0, read-ahead reads 100937 Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server Table 'Worktable'. Scan count 44, logical reads 21189783, physical reads 0, page server reads 0, read-ahead reads 19097, Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server

 

 

Statistics time

 

 

SQL Server Execution Times: CPU time = 50469 ms, elapsed time = 12969 ms.

 

 

Now let's look at how to achieve the same thing with APPROX_PERCENTILE

 

 

DBCC DROPCLEANBUFFERS GO SET STATISTICS TIME,IO ON GO SELECT de.Employee , APPROX_PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY [Total Including Tax] ) AS [95 percentile APPROX_CONT] FROM Fact.OrderHistory AS foh INNER JOIN Dimension.Employee AS de ON foh.[Salesperson Key] = de.[Employee Key] GROUP BY de.Employee ORDER BY de.Employee; GO SET STATISTICS TIME,IO OFF

 

 

Statistics IO

 

 

Table 'OrderHistory'. Scan count 9, logical reads 101000, physical reads 0, page server reads 0, read-ahead reads 100937, page serve Table 'Employee'. Scan count 0, logical reads 202, physical reads 3, page server reads 0, read-ahead reads 0, page server read-ahead Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 101, page server read-ahea Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead r Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead

 

 

Statistics time

 

 

SQL Server Execution Times: CPU time = 11096 ms, elapsed time = 1583 ms.

 

 

If we compare the elapsed time (in Statistics time output), it is ~1.5 seconds vs. ~13 seconds. IO also has significantly reduced using approximate function (in Statistics IO output). Since each query plan has parallelism, we are seeing more CPU time than elapsed time.

 

Here is the memory used by percentile (taken from XML plan)

 

 

<MemoryGrantInfo SerialRequiredMemory="3072" SerialDesiredMemory="383784" RequiredMemory="27080" DesiredMemory="407816" RequestedMemory="407816" GrantWaitTime="0" GrantedMemory="407816" MaxUsedMemory="204872" MaxQueryMemory="6471920" />

 

 

and from approx_percentile query plan

 

 

<MemoryGrantInfo SerialRequiredMemory="1536" SerialDesiredMemory="2944" RequiredMemory="13512" DesiredMemory="14920" RequestedMemory="14920" GrantWaitTime="0" GrantedMemory="14920" MaxUsedMemory="3640" MaxQueryMemory="6471920" />

 

 

Above snippets show low memory requirement of approx_percentile family of function.

 

Here are the links to documentation for approximate percentile functions to learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.