[EventHub] Types of Throttling Errors and how to mitigate it

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Are you getting 50002 Error or 50008 Error return code in your EventHub? If that’s the case, you are in the right place.

In this documentation, we will go through what these two return codes are and how to prevent your EventHub to throttle.

[50002 Error - ServerBusyException]

Pre-requisite:

Are you getting 50002 Error from your EventHub? Is your Throughput appropriately configured? Is your load evenly distributed across all partitions? If so, keep on scrolling down, as you may find the answer to your problem.

I. What is a Throughput Unit?

The throughput capacity of Event Hubs is controlled by throughput units (TU). If the TU has gone beyond limit, EventHub is throttled and a ServerBusyException is returned. For more detailed information please visit aka.ms/event-hubs-scalability.

II. What is 50002 Error and throttling request?

50002 error occurs when EventHub indicates that a server is overloaded and returns ServerBusyException.

There are two good reasons why you are getting 50002 error. It can mean that your TU capacity has surpassed it’s capacity. Thus, you may need to increase the TU accordingly.

Another reason could be, that your load is not being distributed evenly across all partitions causing a overload to a partition or partitions.

III. What are the resolutions and how to resolve them?

How can you increase your TU?

To increase the throughput units (TUs) on your Azure Event Hubs namespace, you can either configure it on the Scale page or Overview page of your Event Hubs namespace in the Azure portal, or use the Auto-inflate feature.

Auto-inflate automatically scales up by increasing the number of TUs to meet usage needs.

Note that auto-inflate can only increase up to 20 TUs. To raise it to exactly 40 TUs, you need to submit a support request to us.

For more information visit aka.ms/auto-inflate.

How to distribute even load to all of your partitions?

You can revise the partition distribution strategy or try distribution by using EventHubClient.Send operation might do the trick.

IV. In conclusion

You may need to increase your TU or check if your loads are being distributed evenly. By doing so, you will be able to mitigate the throttling for 50002 error occurring on your EventHub. However, if you have any additional support, do not hesitate to contact us.

_____________________________________________________________________________________________________________________________

[50008 Error - Too many GetEntityRuntimeinfo]

Pre-requisite:

Are you getting 50008 Error from EventHub while using Databrick? Are you using EventHub Spark SDK? If you so, this is document might help you with resolving the issue.

I. What is GetEntityRuntimeinfo?

GetEntityRuntimeInfo is an operation that is used to retrieve information of the entity to read or send the message to the Databrick side by using Spark SDK. This operation is hard coded in the Azure Eventhub Spark SDK. [Line 109]

Unfortunately, the information cannot be cached because Spark driver/executor can change over time, and all the information is Round Robin Database (RRD - it stores data and displays the stored data over time.) based (in memory).

So, they are not being cached, and so driver will always call the runtime info calls per batching interval. The default interval is 500ms. Therefore, every time the operation needs to be called in order to get an event to the Databrick.

II. What is 50008 error and throttling request?

50008 Error is occurring when GetEntityRuntimeInfo is called more than 50 times per second which results in Throttling Request. Therefore, limiting this operation is crucial if you are getting such error messages.

Below is an example of the error message you can see from when this issue is occurring.

III. What is the resolution or how to limit the request?

Make sure to use the trigger option is enabled in the Spark Client side in order to increase the intervals between the GetEntityRuntimeinfo being called. As mentioned earlier, the default is 500ms, therefore, calling this operation too often may cause a disturbance to your system. Also make sure your container’s, running code for your namespace, performance is fine since this issue can happen due to low CPU as the documentation states.

IV. In conclusion

By adding the trigger option can mitigate the situation, especially, if you are using numerous consumers and partitions in the EventHub. However, if the throttling still occurs, please contact us to resolve your issue.

Leave a Reply Cancel reply