Introducing new compute throttling policies

This post has been republished via RSS; it originally appeared at: Azure Compute Blog articles.

Microsoft Azure Compute is pleased to announce the introduction of new throttling policies for compute API requests. These throttling policies are designed to reduce the throttling experienced by customers on Azure Virtual Machine and Virtual Machine Scale Set API requests. The new throttling policies offer several benefits:

They will replace the existing multiple throttling time windows by 1 minute. This will ensure faster retries, shorter lock out periods and a uniform measure of 1 minute for all throttling limits.
No single resource can use up all the limits under a subscription as limits are defined at resource level in the new policies.
Token Bucket Algorithm is getting introduced with new throttling policies. This will provide additional buffer to the customers, while making high number of API requests.

There are no changes required from the customer’s end to use the throttling new policies. To learn more about Azure compute throttling and how it works for Compute Resource Provider, please see Request limits and throttling - Azure Resource Manager | Microsoft Learn.

How will the new throttling policies work?

The new throttling policies implement a limit on the number of API requests that can be made per resource, as well as a maximum throttling limit for a subscription per minute per region. If the number of API requests exceeds the per-resource limit or the subscription limit per region, the requests will be throttled. Token Bucket Algorithm is used to determine these limits. Token bucket algorithm determines the number of API requests that can be made per minute by counting the number of tokens in the bucket at that time. Bucket is replenished with new tokens every minute by a fixed amount called Bucket Refill Rate for a resource and a subscription per region.

Example:

Let us assume that throttling policy for UpdateVM API defines the Bucket Refill Rate as 4 tokens per min, Maximum Bucket Capacity is 12 tokens, and a user makes the below UpdateVM API request pattern on a Virtual Machine. In this example, the bucket contains 12 tokens at the start of throttling window. As shown in the below table, at the 4^th minute, the customer uses all the 12 tokens making the bucket empty. At the 5^th minute, bucket is filled by 4 tokens as per Bucket Refill Rate. 4 API requests can be made at the 5^th minute, while 1 API request ends up getting throttled.

Call rate informational response headers

The call rate informational headers provide information about the throttling policies that apply to an API request, including their limits. Multiple throttling policies can be applied on a single API request; hence the headers return a response containing a combined list of all applicable throttling policies. The introduction of new throttling policies does not affect the way call rate informational headers function, but the response will now contain updated throttling policy names. It is advised not to take dependency on throttling policy names as they are subject to change.

Rollout of new compute throttling policies

These changes will take effect over the next few months, starting August 2023. These changes will be rolled out region by region. Once the changes are complete in a region, the throttling policies for Virtual Machines and Virtual Machine Scale Sets will automatically change to new policies.

The new changes to compute throttling policies will provide a better customer experience on Azure Virtual Machine and Virtual Machine Scale Set resources. If you have any questions or concerns about this change, please reach out to our support team for assistance. You may refer Azure Support Request to raise an Azure support request.

Thank you for choosing Microsoft Azure!

How will the new throttling policies work?

Call rate informational response headers

Rollout of new compute throttling policies

Leave a Reply Cancel reply