Announcement – Target-Based Scaling Support in Azure Logic Apps Standard

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In collaboration with Rohitha Hewawasam, Nikhil Rao Sira and Wagner Silveira

With Azure Logic Apps Standard, customers can seamlessly build and execute their workflows in the cloud, enjoying the flexibility to select their preferred compute resources and configure their applications to dynamically scale in response to varying workload demands.

Beginning in January 2024, we will be rolling out an update to refine the underlying dynamic scaling mechanism, resulting in faster scale-out and scale-in times. This improvement empowers customers to achieve heightened throughput and reduced latency for their fluctuating workloads running on Logic Apps Standard.

In this post, we will take a closer look at the new scaling mechanism, namely target-based scaling, explaining how it can help with managing applications’ performance with asynchronous burst loads more efficiently.

What is target-based scaling?

Target-based scaling is a new feature in Azure Logic Apps Standard that provides a fast and intuitive scaling model for customers that replaces the current incremental scaling model as the default for applications.

To understand the benefits of the target-based scaling, let’s look at how Azure Logic Apps Standard scales at runtime:

The block diagram above shows the different components of the scaling process for Azure Logic Apps. In incremental scaling, Scale Monitor in Logic App Host would either vote to scale up, down, or keep the current number of instances for the app based on the workflow job queue execution delays.[1] With target-based scaling, Scale Monitor calculates the desired number of instances that would be needed to process jobs across the job queue(s) and returns this number to the controllers to make the scale decision. The sequence diagram below shows how these components shown in the block diagram interact with each other to perform target-based scaling:

Azure Functions Host Controller obtains the desired number of instances from the Logic App Scale Monitor, with which the compute demand is determined. This information is then passed to the Scale Controller, which makes the final decision on whether to scale out or scale in, and how many instances should be added or removed. Instance Allocator allocates/deallocates the required number of instances for the Logic App.

With the current incremental scaling method, the Instance Allocator adds or removes a maximum of one worker at each new instance rate, involves complex decision-making to determine when to scale based on the job queue latency, which is not an ideal signal for scale-in and may cause apps to be left in an over-scaled state. In contrast, target-based scaling allows faster scaling, up to four instances at a time, and the scaling decision is based on a simple equation:

Job queue length is calculated by the Azure Logic App runtime extension. When multiple storage accounts are configured, a sum across the job queues is taken.
Target scaling factor is a numerical value between 0.05 and 1.0 that is used to determine the degree of scaling intensity. A higher target scaling factor will result in more aggressive scaling, while a lower target scaling factor will result in more conservative scaling. Customers can fine-tune the target scaling factor for each application, setting a Runtime.TargetScaler.TargetScalingFactor value within host.json, as in the following example:

{ "version": "2.0", "extensionBundle": { "id": "Microsoft.Azure.Functions.ExtensionBundle.Workflows", "version": "[1.*, 2.0.0)" }, "extensions": { "workflow": { "Settings": { "Runtime.TargetScaler.TargetScalingFactor": "0.62" } } } }

When not set, the default value for TargetScalingFactor is 0.3.

Target executions per instance value is the maximum number of jobs a compute instance is expected to process at any given time. This value is calculated differently, depending on whether the dynamic or static concurrency execution mode is being used by the Azure Logic Apps Standard application. For static concurrency, the value is a fixed number that is set at configuration via host.json. For dynamic concurrency, the value is determined by the Logic App engine during runtime, which adjusts the number of dispatcher worker instances based on workflow’s nature and its current job processing status.

Dynamic Concurrency

In this configuration, the target executions per instance value is determined automatically by the Logic Apps runtime extension, per the equation below:

Job concurrency is the number of jobs being processed by a single worker at the time of sampling.
Actual CPU utilization is the processor utilization percentage of the worker instance at the time of sampling.
Target CPU utilization is the maximum processor utilization percentage that’s expected at target concurrency. Users can modify the target CPU utilization percentage with Runtime.TargetScaler.TargetScalingCPU setting from host.json, as in the following example:

The default value for TargetScalingCPU is 70.

Azure Logic Apps Standard’s dynamic scaling feature intelligently adapts to the nature of the tasks at hand. For example, during compute-intensive workloads, there may be a limitation on the number of concurrent jobs per instance, as opposed to scenarios where less compute-intensive tasks allow for a higher number of concurrent jobs. In situations where a mix of both types of tasks are being processed, the Dynamic scaling feature can seamlessly adapt, automatically adjusting to ascertain the appropriate level of concurrency based on the current types of jobs being processed, to ensure optimal scaling performance.

Azure Logic Apps Standard also supports host-level static configuration model. For scenarios where the dynamic concurrency feature is not suitable for specific workload needs, static concurrency can be configured to override dynamic scaling.

Static Concurrency

In this configuration, TargetConcurrency setting governs the Target executions per instance value. Users can set a value for the targeted maximum concurrent job polling with Runtime.TargetScaler.TargetConcurrency setting from host.json, as in the following example:

The default value for TargetConcurrency is null, meaning that the app will default to using dynamic concurrency if the setting is not configured.

While configuring static concurrency can give users control of the scaling behavior of their apps, it can be difficult to determine the optimal values for the TargetConcurrency setting. Generally, the user must arrive at acceptable values via a trial-and-error process of load testing their Logic App workflows. Even if a value working for a particular load profile is determined, the number of incoming trigger requests may change from day to day. This variability may cause the Logic App to run with suboptimal scaling configurations.

Ideally, the system should allow instances to process as much work as they can while keeping each instance healthy and latencies low, which is what dynamic concurrency is designed to achieve.

Opting Out

Target-based scaling is enabled by default for Logic Apps hosted on a Standard plan. To disable target-based scaling and fall back to incremental scaling, add the following app setting to your logic app:

App Setting	Value
TARGET_BASED_SCALING_ENABLED	0

Considerations

The following considerations apply when using target-based scaling:

Target-based scaling isn't supported for logic apps running on an App Service Environment (ASE), or Consumption plan.
Your logic app's Functions runtime version must be 4.3.0 or later.
Your logic app’s workflow runtime version must be 1.55.1 or later.
When there are scale-in requests without any scale-out requests, the max scale in value is used. Note that target-based scaling can bring down unused instances faster, resulting in more efficient resource usage.

To learn about the performance benchmark results of target-based scaling, compared to incremental scaling, refer to the following article: Logic Apps Standard Target-Based Scaling Performance Benchmark — Burst Workloads.

[1] Workflow Job Execution Delay - at runtime, workflow actions are divided into individual jobs and placed in a queue for execution. Dispatchers regularly poll the job queue to retrieve and execute these jobs (for more details, refer to this blog post on how the runtime schedules and runs jobs). However, if there is insufficient compute capacity available to pick up these jobs, they will remain in the queue for a longer duration, resulting in increased execution delays. The scaler would monitor and make scaling decisions to keep the execution delays under control.

What is target-based scaling?

Dynamic Concurrency

Static Concurrency

Opting Out

Considerations

Leave a Reply Cancel reply