Azure Monitor Alert Alerts: Log Search Alerts with Dynamic Thresholds

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Azure Monitor introduces Dynamic Thresholds also for Log Search Rules, revolutionizing how you set up log and monitor search alerts. Say goodbye to manual threshold tuning and hello to intelligent, adaptable monitoring.

DTPic.jpg

Here’s why dynamic thresholds are a game-changer:

  1. Automatic Calibration: Dynamic thresholds calculate the right alert levels for you. They adjust as your system evolves, ensuring timely alerts without false positives.
  2. Smart Learning: Dynamic thresholds analyzing historical data, learning patterns and trends. They adapt to your application’s unique behavior, whether it’s daily spikes or weekly lulls.
  3. Alerting At Scale: Create a single rule for any multi dimensions alert. Dynamic thresholds define different alert threshold band for every dimension combination.
  4. Effortless Setup: Just enable dynamic thresholds, no need to have a specific knowledge of the data to setup alert thresholds.

Dynamic thresholds empower you to stay proactive, minimize downtime, and keep your systems running smoothly.

 

Use Cases

Here you can find use cases for dynamic threshold:

 

Use Case: Monitoring CPU Behavior in Virtual Machines

Background: Users can now calculate guest VM metrics using the Perf table in Log Analytics, enabling the creation of a single alert rule for all your VMs across different regions using dimensions. Previously, customers could only set up dynamic threshold metric alerts for host CPU usage.

Goal Statement: The primary goal of this use case is to monitor the CPU behavior within virtual machines (VMs) and detect irregular patterns that may indicate performance issues.

Scenario definitions:

  1. Problem Identification:
    • The team wants to ensure optimal performance and identify any CPU-related issues promptly.
  2. Use Case Description:
    • The CPU utilization data is being collected from each VM.
    • The system using the model analyses the CPU behavior over time, looking for deviations from the expected pattern.
    • Deviations may include sudden spikes, prolonged high usage, or unexpected drops in CPU utilization.
  3. Trigger:
    • Azure monitor triggers a log search alert once the CPU is higher than the regular patterns, which means that the alert is out of the upper boundaries.
  4. Benefits:
    • Early detection of CPU-related problems helps prevent performance degradation.
    • Proactive monitoring ensures efficient resource utilization.
    • Improved system stability and responsiveness.
    • In Perf table there is an option to monitor other “Counter Value” instead of CPU. Examples can be found here.
  5. ARM template example
    {
        "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
        "contentVersion": "1.0.0.0",
        "parameters": {
            "scheduledqueryrules_PerfDemoRule_name": {
                "defaultValue": "PerfDemoRule",
                "type": "String"
            },
            "workspaces_PerfDemoWorkspace_externalid": {
                "defaultValue": "/subscriptions/XXXX-XXXX-XXXX-XXXX/resourceGroups/XXXX/providers/Microsoft.OperationalInsights/workspaces/PerfDemoWorkspace",
                "type": "String"
            }
        },
        "variables": {},
        "resources": [
            {
                "type": "microsoft.insights/scheduledqueryrules",
                "apiVersion": "2024-01-01-preview",
                "name": "[parameters('scheduledqueryrules_PerfDemoRule_name')]",
                "location": "eastus2",
                "properties": {
                    "displayName": "[parameters('scheduledqueryrules_PerfDemoRule_name')]",
                    "severity": 3,
                    "enabled": true,
                    "evaluationFrequency": "PT5M",
                    "scopes": [
                        "[parameters('workspaces_PerfDemoWorkspace_externalid')]"
                    ],
                    "targetResourceTypes": [
                        "Microsoft.Compute/virtualMachines"
                    ],
                    "windowSize": "PT5M",
                    "criteria": {
                        "allOf": [
                            {
                                "query": "Perf | where CounterName == \"Available MBytes\" and InstanceName == \"_Total\" | project TimeGenerated, CounterValue, Computer,_ResourceId\n",
                                "timeAggregation": "Average",
                                "metricMeasureColumn": "CounterValue",
                                "dimensions": [],
                                "resourceIdColumn": "_ResourceId",
                                "operator": "GreaterThan",
                                "alertSensitivity": "High",
    							"criterionType": "DynamicThresholdCriterion",
                                "failingPeriods": {
                                    "numberOfEvaluationPeriods": 1,
                                    "minFailingPeriodsToAlert": 1
                                }
                            }
                        ]
                    },
                    "autoMitigate": false
                }
            }
        ]
    }

 

Use Case: Monitor Behavior Network in Application Insight Virtual Machines

Goal Statement: The primary goal of this use case is to monitor the network write behavior within virtual machines (VMs) and detect irregular patterns that may indicate performance issues or anomalies.

Scenario Definitions:

  1. Problem Identification:
    • The team aims to ensure optimal performance and promptly identify any network write-related issues within their VMs.
  2. Use Case Description:
    • The system periodically collects network write data from each VM using dynamic thresholds models.
    • The models analyze the network write behavior over time, specifically looking for deviations from the expected pattern.
    • Deviations may include sudden spikes, prolonged high usage, or unexpected drops in network write activity.
  3. Trigger:
    • Azure monitor triggers a log search alert when network write behavior exceeds the regular patterns, indicating that the alert is beyond the upper boundaries.
  4. Benefits:
    • Early detection of network write-related problems helps prevent performance degradation.
    • Proactive monitoring ensures efficient resource utilization.
    • Improved system stability and responsiveness.
  5. ARM template example:  

 

{
	"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
	"contentVersion": "1.0.0.0",
	"parameters": {
		"scheduledqueryrules_LogSearch1ActionGroup_name": {
			"defaultValue": "LogSearch1ActionGroup",
			"type": "String"
		},
		"components_ACME_Portal_externalid": {
			"defaultValue": "/subscriptions/XXXX-XXXX-XXXX-XXXX/resourceGroups/XXXX-XXXX/microsoft.insights/components/ACME-Portal",
			"type": "String"
		}
	},
	"variables": {},
	"resources": [
		{
			"type": "microsoft.insights/scheduledqueryrules",
			"apiVersion": "2024-01-01-preview",
			"name": "[parameters('scheduledqueryrules_LogSearch1ActionGroup_name')]",
			"location": "eastus",
			"properties": {
				"displayName": "[parameters('scheduledqueryrules_LogSearch1ActionGroup_name')]",
				"severity": 3,
				"enabled": true,
				"evaluationFrequency": "PT5M",
				"scopes": [
					"[parameters('components_ACME_Portal_externalid')]"
				],
				"targetResourceTypes": [
					"microsoft.insights/components"
				],
				"windowSize": "PT30M",
				"criteria": {
					"allOf": [
						{
							"query": "InsightsMetrics| where Origin == \"vm.azm.ms\"| where Namespace == \"Network\" and Name == \"WriteBytesPerSecond\"| extend NetworkInterface=tostring(todynamic(Tags)[\"vm.azm.ms/networkDeviceId\"])|summarize AggregatedValue = avg(Val) by bin(TimeGenerated, 15m), Computer, _ResourceId, NetworkInterface,
							
							"timeAggregation": "Average",
							"metricMeasureColumn": "AggregatedValue",
							"dimensions":[
                                      {
                                        "name": "Computer",
                                        "operator": "Include",
                                        "values": "[[parameters('computersToInclude')]"
                                      },
                                      {
                                        "name": "NetworkInterface",
                                        "operator": "Include",
                                        "values": "[[parameters('networkInterfacesToInclude')]"
                                      }
                                    ],
							"operator": "GreaterThan",
							"alertSensitivity": "High",
							"criterionType": "DynamicThresholdCriterion",
							"resourceIdColumn": "_ResourceId",
							"failingPeriods": {
								"numberOfEvaluationPeriods": 1,
								"minFailingPeriodsToAlert": 1
							}
						}
					]
				},
				"autoMitigate": false
			}
		}
	]
}

 

 

How to create Dynamic Threshold ARM template

You can easily change a log search rule (with a static threshold) template to be a dynamic one by making the following changes:

  • In the “allOf” condition:
    1. Addition of "criterionType": "DynamicThresholdCriterion"
    2. Addition of “alertSensitivity”.
    3. Removal of “threshold” parameter.
  • Update api-version in template to be “2024-01-01-preview”NogaLaviMendes_0-1723462196586.png

     

Summary

In the world of monitoring and alerting, precision matters. Enter Dynamic Thresholds—a game-changer for Log Search Rules. Here’s why they’re essential:

  1. Anomaly Detection:
    • Dynamic thresholds rely on advanced algorithms to calculate expected performance ranges based on historical data.
    • They identify anomalies—sudden spikes, drops, or irregular patterns—that warrant attention.
  2. Efficiency Boost:
    • No more manual threshold tuning. Dynamic thresholds adapt automatically.
    • Scale alerts across hundreds of dimension combinations series with a single rule.
  3. Stay Ahead:
    • Early detection prevents performance degradation.
    • Proactively manage resource utilization for improved stability and responsiveness.

Dynamic thresholds empower you to be proactive, responsive, and precise.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.