Autoscaling Azure SQL Hyperscale

Azure SQL Hyperscale is the latest architectural evolution of Azure SQL, that has been natively designed to take advantage of the cloud. One of the main key features of this new architecture is the complete separation of Compute Nodes and Storage Nodes. This allow for independent scale of each service, making Hyperscale more flexible and elastic.

In this article I will describe how it is possible to implement a solution to automatically scale your Azure SQL Hyperscale database up or down, to dynamically and automatically adapt to different workload levels without the requiring manual .

Now, first things first. I won’t go into details here, as they are well described in the official doc, but it is important to understand how the SQL Server engine has been refactored into Azure SQL Hyperscale, as it is the foundation for a very fast scale up and down of the selected service tier. Here’s a picture that explains it very well:

One of the main benefit of this architecture is that scaling up or down the Service Level Objective, which means cores and ram, is something completely unrelated to the size of the database. Let’s say for examples that you want to move from a HS_Gen5_2 to HS_Gen5_4. This is on overview of what happens:

A new Hyperscale compute node, with the new requested Service Level Objective, is created

New compute finishes initialization process and gets in ready state

Existing connections in the active compute node are terminated and running transactions are rolled back

New connections are routed to the new node

The old Hyperscale compute node is no longer used for this database

As there is no data movement involved, this workflow happens pretty quickly. In several tests I ran lately I had a pretty constant value of maximum 2 minutes to spin up a new compute node of the Service Level Objective you are asking for, and something around 12 seconds to actually switch the control to the newly created node.

For all the time required to spin up the new compute node, all existing connections remain open and transactions against the database are processed as usual. It’s only during the 12 seconds of switch that, at some point during that time range, connections will be moved to the new compute node. If you have a long-running transaction being executed at the time of cut-off, it will be rolled back: thanks to the new Accelerated Database Recovery, this operation will be almost immediate and it won’t prevent the scaling action to complete. Of course is always a good to avoid to re-run transactions, as they may be resource-intensive, so follow the best practice to keep the connection open only for the strictly needed duration of time. If you are used to work with HTTP and REST, you are already doing this: just keep doing the same with Azure SQL too. If you are using some tool like Dapper, you don’t even have to bother about managing connections, unless you have some specific requirements. It will automatically open the connection just before you run a command, and automatically close it once the command is done. This is all you need to write:

Monitoring Resource Usage

Azure SQL database provides a useful DMV, sys.dm_db_resource_stats, to monitor resource usage:

It retains data for about 1 hour, aggregated by 15-seconds buckets. With such data it would be possible to do some really fancy stuff using machine learning and some cool forecasting algorithms, but for the sake of simplicity we will use a very simple moving average to smooth out peaks and figure out if we need to scale up our Azure SQL Hyperscale database. If you are interested in better algorithms to determine if you need to scale up or down, you will find some links and resources at the end of this post.

For now, we can use a window function to calculate the moving average over the last minute

Aside from the moving average, the query will also return the number of data points used to calculate the average. This is useful when determine if scale up or down action should really happen. If fact, just after a change of Service Level, say from 2vCores to 4vCores, the used DMV will be completely empty. So for one minute you will have an average calculated on way less data that what you want…and this could lead you to make wrong assumption about the real resource usage. Due to application reconnecting to the database, the first seconds could be even without load…and you should not take any action (for example scaling down) based on that incomplete set of data. For this reason, the sample solution will ignore all actions suggested by a dataset that doesn’t have a least a minute of data.

If you want to calculate the moving average over a longer time duration, to smooth out very short spikes of workload that could trigger unwanted scaling actions, you can just increase the number of data points, keeping in mind that each data point represent 15 seconds of workload. This is the code you want to look at:

As you can see, it now takes the current data point plus the 4 earlier data points (as data is ordered in descending order by time).

In case you need more than one hour, you can use the sys.resource_stats view, where data is stored, with a 5-minute granularity, for 14 days.

Take a look at OVER Clause documentation to understand what other options you have and how it works.

Autoscaling up or down

Now that we have a simple algorithm that will tell us when scale up is needed or when scale down is recommended, we just have to implement a monitoring job that will get such data and kick off the scaling operation.

This can be done using Azure Functions, as they support Timer Triggers, that will allow us to exactly do what we want.

Every 15 seconds the Azure Function will wake up, and will get the current performance data from an Azure SQL Hyperscale database, along with the 1-minute moving average. If the moving average is within some defined boundaries it will go back to sleep. Otherwise a scaling action will be initiated. Here’s an example of such Azure Function in action, logging into dbo.Autoscaler table so that it is possible to check what happened and why:

As visible in the picture above, in red box (box “1”), around 22:12 GMT a workload spike began, bringing Azure SQL Hyperscale CPU usage close to 80%. After a minute of this sustained workload, the moving average went over the 70% boundary and thus the autoscaler started to request a bump in the Service Objective, asking for a scale up to 4 vcores.

The scale up happened within 30 seconds and then autoscaler waited for the warmup of the new Service Objective to finish (the NULL values in the RequestedSLO column), as during warmup too few data points were available to make any decision. Once the number of data points was 5 again, it started to evaluate the workload and found that 4 cores were enough to handle the existing workload. In fact, in the yellow box (box “2”), you can see that CPU usage is half of the value existed before, when the workload spike started.

One thing that is important to keep in mind, is that every time that scale up or down happens, the local RPBEX cache will start fresh. If very critical workload this can have an impact on initial performances, as data needs to be pulled from page servers into the local compute node cache. To help the cache to be warm as fast as possible you may want execute some specific query as part of the autoscaler solution, just to make sure the most-used data is in the local cache. This is not a requirement and really depends on the workload you have, but is some cases can really help a lot to boost the performances after the new Service Level Objective is ready to be used, so keep this option in mind if you see slower than expected performances just after scale up or down happens.

At a later point, around 22:20 GMT the workload spike ends and in a minute the autoscaler detects that 4 vCores are now even too much for the current workload and initiates a scale-down to 2 cores (green box, number “3”).

Simulating workload

To simulate a simple workload you can use powerful load testing tools like Locust, K6 or Apache JMeter where you can simulate even the most complex and realistic workload. To create a simple workload, a nice tool originally written by Adam Machanic is also available: SQL Query Stress.

Albeit quite old now, it allows you to execute a query using the desired amount of parallel threads, so that you can create some workload in the database. It is very easy to use, configurable enough and is great to quickly test the autoscaling sample. In fact, a .json configuration file is also included in autoscaler’s GitHub repo, so you can get started right away.

Source Code

Sample source code is available here. If you want to contribute, maybe adding some of the improvements described below, please feel free to do so. I’ll be more than happy to accept contributions to this repo.

Improvements

Using the moving average to figure out if Azure SQL Hyperscale tier should be scaled up or down can be a little bit too simplistic if you have a scenario where workload can vary with very complex patterns. It works well when you have a situation where the workload spike endures for a consistent amount of time, and if your workload fits that use case, great! Just keep in mind that in the sample we’re only monitoring CPU usage, but there are also data and log IO usage that could be monitored. As you can guess, what you want to monitor, and how you want to use that data to activate the scale up or down, is really up to you and very specific to your workload scenario.

As mentioned already, this is just a sample, and it should be used a starting point to create something there fits your own use case. But that’s the only improvement you can think of.

In fact, the current sample just reacts to an increased or decreased workload…which means that it will starts to take an action when the workload has already started to change. If the workload spike is huge, the allocated resources may not be able to satisfy it within the desired performance SLA you must guarantee, as scaling will happen as a consequence of that workload change, and sometimes even causing a larger impact due to the necessary failover during the spike.

Another approach would be to try to be proactive and predict the workload so that Azure SQL Hyperscale tier could be changed before the spike happens, and thus guaranteeing always the best performance possible.

Machine learning is an option in this case, of course, as well other existing prediction algorithms. A nice blog post on how Netflix approaches to this problem is available on their blog: Predictive Auto Scaling Engine.

More in general the problems fall under the umbrella of “Demand Forecasting” and there are already some cool algorithms in Azure Machine Learning services:

Accurate Forecast with Machine Learning

Demand Forecasting

If you want even more options this blog post is really amazing: Time Series in Python, Exponential Smoothing and ARIMA.

Azure Machine learning can be easily integrated with Azure Functions, and you can really create spectacular solution using Azure SQL Hyperscale, Azure Machine Learning and Azure Functions.

Alternatives

There are two alternatives to the proposed sample. The first is to use Azure Alerts, where you can do pretty much the same as described here, but without writing code. Just be aware that the telemetry log available to Azure Alerts have some latency – usually between 2 and 5 minutes – which means that response to workload spike will be not as fast as polling the database DMV as the sample does. If you don’t need to respond to workload spikes in less than 5 minutes, Azure Alerts can be a great, simpler, alternative.

Another “sort of” alternative is to use Azure SQL Serverless database, which incorporates scale up and down natively.

I said “sort of” because the Serverless option is not available to Azure SQL Hyperscale at the moment so it is not really an alternative, but if you are fine using the General Purpose service tier, you’ll get autoscaling right away, just by configuring it.

Conclusion

Autoscaling a service is a challenging job, especially if workload is not easy predictable. But thanks to Azure SQL Hyperscale elasticity and quick scalability this can be done using an Azure Function, with minimal service interruption, especially if following the best practices and implementing correct connection retry logic. Remember also to use the latest .NET driver as they already implement some reconnection logic that in several common cases will make the reconnection completely transparent.