Service Fabric Cluster Balancing

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Have you ever observed that your Azure Service Fabric cluster has un-even resource usage (CPU and memory etc.) in the nodes, like few nodes have little high CPU and/or memory utilization whereas remaining nodes barely used?

 

Or have you ever wanted the Service Fabric cluster to re-balance the user services more appropriately to make resource usage even across all nodes?  Then the following details may help you with some insights.

 

In this blog, I’ve tried to list down few of the Resource manager and governance settings that we can configure which will help in better balancing of the cluster.

 

Warnings:

The following details will help you to configure few settings to appropriate values to balance the cluster evenly for scenarios where the resource usage difference between nodes is very high and impacting clusters. It’s recommended to try all these configurations on your test cluster first, then monitor the cluster and based on results, you can do similar changes in higher environments.

You shouldn’t follow this if CPU and memory utilization across nodes aren’t of significant difference and not impacting clusters.

 

By default, the cluster will make balancing decision based on some default metrics as mentioned here:

  • PrimaryCount - count of Primary replicas on the node
  • ReplicaCount - count of total stateful replicas on the node
  • Count - count of all service objects (stateless and stateful) on the node

Ref: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-metrics#default-metrics

 

While in most of the Service Fabric clusters and scenario, these default metrics does the job as expected however in few scenarios or use cases, you may see your cluster is not balanced as expected with these default metrics.

 

In such cases, we can do the following for better balancing of the clusters:

  1. You can configure custom metrics which can represent your load more accurately and can include these metrics as part of the load balancing decision.

    Please find more details on how to configure custom metrics: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-metrics#custom-metrics

    However, this will need you configure your services to report these custom metrics.

  2. You can also use the following configuration with appropriate values to tell the cluster when to trigger re-balancing or how to move services from one node to another in case of re-balancing:

I. Use Resource Governance specifying request and limits of resource utilization for a service on a node using RequestsOnly, LimitsOnly and both RequestsAndLimits specifications for CPU and memory. You can specify these values in the service’s ServiceManifest.xml files.

These values represent the resource consumption that the Cluster Resource Manager considers when making placement decisions. Limit values represent the actual resource limits applied when a process or a container is activated on a node.

Please find more details with scenarios along with their examples values for better understanding: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-resource-governance

II. Configure balancing thresholds to more appropriate values (if default thresholds don’t help) at the cluster level to tell cluster when it’s imbalanced and need re-balancing of services.

A Balancing Threshold is the main control for triggering rebalancing. The Balancing Threshold for a metric is a ratio. If the load for a metric on the most loaded node divided by the amount of load on the least loaded node exceeds that metric's BalancingThreshold, then the cluster is imbalanced. As a result, balancing is triggered the next time the Cluster Resource Manager checks.

Please find more details here: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-balancing#balancing-thresholds

 

III. For most customers and scenarios, automatic detection of node capacities for CPU and memory is the recommended configuration (automatic detection is turned on by default), however if you want to customize the node capacity, you can specify the same in the NodeType section in Cluster manifest files.

Please find more details on it: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-resource-governance#cluster-setup-for-enabling-resource-governance.

In order to provide more flexibility in specifying the node capacities, you can specify either node buffer or overbooking capacity. When node buffer or overbooking capacity is specified for a metric, the Cluster Resource Manager will attempt to place or move replicas in such a way that the buffer or overbooking capacity remains unused, but allows the buffer or overbooking capacity to be used if necessary for actions that increase service availability such as:

 

a. New replica placement or replacing failed replicas
b. Placement during upgrades
c. Fixing of soft and hard constraint violations
d. Defragmentation

Please find more details on node buffer and overbooking capacity in detail here: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-cluster-description#node-buffer-and-overbooking-capacity

 

IV. Use the DefaultMoveCost parameter to appropriate values in service manifest file so that it can take decision on which services can be moved to other nodes as part of re-balancing.

MoveCost values helps you find the solutions that cause the least disruption overall and are easiest to achieve while still arriving at equivalent cluster balance.

Please find more details on how to specify MoveCost for the services: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-movement-cost

 

Summary
While all these configuration and settings help from certain cluster balancing aspect, I would suggest testing the combination of configurations along with appropriate values on your test cluster first and then monitors how it works, based on result, you can make similar configurations in your higher environments.

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.