Latency and Data Loss issue in App Insights ingestion (many regions) – 02/11 – Resolved

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Final Update: Wednesday, 12 February 2020 00:11 UTC

We've confirmed that all systems are back to normal with no customer impact as of 02/11, 23:00 UTC. Our logs show the incident started on 02/11, 20:49 UTC and that during the 2 hours 11 minutes that it took to resolve the issue some of the customers in West US 2, West US, Southcentral US, Canada Central and East US regions may have experienced data loss, data latency and misfired alerts.
  • Root Cause: The failure was due to one of dependency service.
  • Incident Timeline: 2 Hours & 11 minutes - 02/11, 20:49 UTC through 02/11, 23:00 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Leela

Update: Tuesday, 11 February 2020 22:33 UTC

Root cause has been isolated to issues with Azure Storage which was impacting data ingestion. The Storage team has begun mitigation efforts and we will update this blog again when the issue has been fully mitigated. In addition, this issue affects the following regions: West US 2, West US, Southcentral US, Canada Central and East US. Some customers may continue to experience data loss, data latency and misfired alerts.
  • Next Update: Before 02/12 01:00 UTC
-Jack Cantwell

Update: Tuesday, 11 February 2020 22:32 UTC

We continue to investigate issues within Application Insights. As noted below, the cause is a problem with the underlying Azure Storage our service uses. The Storage team has identified the problem and has begun miti. Some customers continue to experience . We are working to establish the start time for the issue, initial findings indicate that the problem began at . We currently have no estimate for resolution.
  • Work Around:
  • Next Update: Before 02/12 01:00 UTC
-Jack Cantwell

Update: Tuesday, 11 February 2020 21:50 UTC

We continue to investigate issues within Application Insights. The issue is a result of a problem with underlying Azure Storage, but it is not clear what that problem is, and the Storage team is still investigating. Some customers continue to experience data latency, data loss or misfired alerts. We are working to establish the start time for the issue, initial findings indicate that the problem began at 02/11 20:49 UTC. We currently have no estimate for resolution.
  • Next Update: Before 02/12 00:00 UTC
-Jack Cantwell

Initial Update: Tuesday, 11 February 2020 21:07 UTC

We are aware of issues within Application Insights and are actively investigating. Starting at 20:49 on 02/11/20 (UTC), some customers may experience latency and data loss.

  • Next Update: Before 02/11 22:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.

-Jack Cantwell

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.