How Azure Front Door cache can help protect against DDoS attacks

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Recently at work I have been helping customers protect websites that have been impacted by DDoS attacks, specifically layer 7 application attacks, which take a website offline by overwhelming it with HTTP requests. These types of attacks are relatively easy for attackers to automate and execute via bot networks, and are particularly effective against web services that use older web frameworks and content management systems. A DDoS attack can completely disable a website that is not adequately prepared.

The good news is that cloud computing platforms like Microsoft Azure provide global services like Azure Front Door that help protect from DDoS attacks, providing several layers of defense to reduce the impact of an attack and deter attackers.

Layer 7 application attacks

The three main types of DDoS attacks are volumetric, protocol and application or resource attacks. In this post I explain how caching can improve an application’s resilience to layer 7 application attacks.

A resource attack is effected by sending as many (legitimate) requests to a service as possible in an attempt to overwhelm it - a so-called “denial of service” (DoS) attack. When an attack is distributed using a sophisticated bot-network, the attack is known as a “distributed denial of service attack” (DDoS). The objective of a DDoS attack is often to take a valuable web asset offline (or threaten to) and then extort a ransom¹.

Unlike traditional layer 3/4 (volumetric and protocol) attacks, resource attacks contain well-formed requests that conform to protocols and look legitimate. This is so that attack traffic can circumvent protocol blocking and web application firewall controls. Website and application owners can make their applications more resilient to attack by combining several layers of defence including the DDoS protection features provided by Azure Front Door: capacity absorption, protocol blocking, caching, and a web application firewall (WAF) with bot protection.

Caching

Caching is one of the simplest ways to reduce attack traffic as well as improve performance and resilience, and reduce costs. Azure Front Door Cache enables an application to scale to handle more load and absorb attack traffic if required. So how does it work?

Caching static assets

Caching static web assets (images, scripts, stylesheets, videos) in a content delivery network (CDN) is a good example of a caching strategy to improve performance and scalability. Static assets are often cached for days. These assets can be immutable , or can be purged at release time.

Caching dynamic responses

It is also good practice to cache resources that are dynamic but "slow moving". For example, the homepage of a website may be dynamically generated by a content management system, but the content returned to a user does not need to change often - certainly not every request. In this case, responses to requests for home page resources can be cached for a short time; usually seconds or minutes.

Let's imagine an attacker is sending 30 million requests per minute to the home page of your website via a bot network. With no mitigations in place the origin server will need to be able to scale to serve five hundred thousand requests per second (500K RPS), which is a lot. However, if the homepage response is cached for 1 minute on Front Door, only a few hundred of those 30 million requests will make it to the origin server². For scenarios where an expiry time of one minute is too long, it’s worth considering a TTL of even 1-2 seconds as this can still dramatically reduce traffic.

Time to live

How long a resource should be cached for is known as TTL (time to live) or expiry time. Expiry times should be determined by the business; some teams will define the expiry of resources as a non-functional requirement during sprint planning, or as a definition of done. A good question to ask when determining the expiry time of a resource is "how soon would a user expect this content to be updated?"

Most anonymous pages (those that don't require the user to login) are good candidates for caching with a short TTL. For example:

Home page
About us
Contact us
Product catalogues
Slow moving reference data (country codes for example)

Not all pages, APIs and resources are suitable for caching. For example POST, PATCH, PUT and DELETE verbs. Or any data where real-time consistency is required. To take full advantage of caching developers should follow good web design practices, designing sites to be cacheable to squeeze maximum benefit from it.

Cache-control

The beauty of Front Door caching is that it does not require any configuration apart from turning it on. Once enabled, Front Door Cache will simply honor any cache-control header that is returned by the origin service. If there is no header, Front Door will cache the response for between one and three days. However, if the cache-control header contains no-cache, private, or no-store, then Front Door will never cache the response, even if caching is enabled on the route. Either of these scenarios (no directive or a directive not to cache) can have unintended consequences, so it is important that origins send the correct directives. For more information on caching behaviors, see Caching with Azure Front Door.

Checking cache headers

Watch a video of how to check cache headers on YouTube.

You can check to see if cache directive headers are present using curl. In my example I have an origin that is behind an Azure Front Door.

Requests to the Front Door endpoint are routed to an origin group by a routing rule, with caching enabled. The origin group is an Azure App Service with access restrictions in place to only accept traffic from Azure Front Door, or my developer workstation.

First, let's use curl to view the response headers from the origin App Service:

Notice there is no cache-control header in this response. Now let's try the Front Door URL, twice:

The first response has x-cache: TCP_MISS
The second response has x-cache: TCP_HIT

Because the origin service did not return a cache directive, Front Door will cache this response for up to 3 days! What would be better is for the origin to determine how long the response should be cached for. Let's make a quick change to the code:

The new line 10 adds a hard-coded cache-control header value of 60 seconds. Let's publish the webapp, purge the Front Door cache, and try again:

The first request misses the cache.
The second request, 11 seconds later, is served from Front Door cache.
The third request, 72 seconds after the first, misses the cache again because the cache item has expired.

The origin App Service is now directing the Front Door Cache to cache responses to homepage requests (only) for 60 seconds, improving performance, and reducing the ability of an attacker to overwhelm the service.

Monitoring the cache

Finally, the performance of the cache can be monitored in the Azure Portal using Azure Front Door reports, or by writing your own Kusto queries in Front Door Logs (Azure Monitor).

Conclusion

In this article I have covered caching, just one of the DDoS mitigations available using Azure Front Door and other Azure services. In future articles I will also cover:

Enabling Azure Front Door Premium WAF Bot protection.
Rate limiting (per IP) using Azure Front Door WAF custom rules.
Redirecting all authentication requests to Azure AD B2C
Scaling App Services to meet attack load using auto-scale rules.

Comments and questions are welcomed in the comments section below. You are also welcome to join an Ask FTA session if you would like to discuss specific questions on Azure Front Door caching with FastTrack for Azure (FTA) customer engineers. See FastTrack for Azure Live to register.

¹ Resource/application attacks may be combined with volumetric and protocol attacks. DDoS attacks may also be used as distractions from other attacks.

² PoP (point of presence) edge caches are eventually consistent. Each PoP may make a request to origin. If an attack was able to be distributed enough to hit every one of the Azure Front Door PoP nodes (which is highly unlikely) then each one of those nodes may make a request to origin. However, this is not always the case as Front Door has multiple layers of caches.