Experiencing Alerting failure for Metric Alerts – 05/31 – Resolved

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Final Update: Wednesday, 01 June 2022 05:45 UTC

We've confirmed that all systems are back to normal with no customer impact as of 06/01, 05:30 UTC. Our logs show the incident started on 05/19, 23:00 UTC and that during the 12 days 6 hours & 30 minutes that it took to resolve the issue. Some of the customers might have experienced viewing two alerts for notification. When a resource is unhealthy, two notifications are being raised and similarly for alert resolution as well.
  • Root Cause: The failure was due to one of the backend dependency.
  • Incident Timeline:  12 days 6 Hours & 30 minutes - 05/19, 23:00 UTC through 06/01, 05:30 UTC.
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.

-Srikanth

Update: Tuesday, 31 May 2022 23:28 UTC

Root cause has been isolated to a bad deployment. To address this issue we are in the process of rolling back. As a customer of Metric Alerts you could be experiencing viewing two alerts for unhealthy resources. When a resource is unhealthy, two notifications are being raised and similarly for alert resolution, two notifications are being sent
  • Work Around:
  • Start time: 5/19/2022
  • Next Update: Before 06/01 05:30 UTC
-Eric Singleton

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.