Making Azure the Best Place to Observe Your Apps with OpenTelemetry

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Thank you to about two dozen colleagues who offered feedback.

 

Our goal is to make Azure the most observable cloud. To that end, we are refactoring Azure’s native observability platform to be based on OpenTelemetry, an industry standard for instrumenting applications and transmitting telemetry.

 

Microsoft decided to take a bet on OpenTelemetry in 2019. We are among the top contributors to OpenTelemetry. Initially, our motivation was to converge fragmented telemetry instrumentation across Microsoft, standardize on OpenTelemetry APIs, and expand our data collection offerings to new programming and frameworks.

 

As OpenTelemetry gained momentum and became more mature, customers asked for OpenTelemetry compatibility in downstream agents and service endpoints. As a result, we have significantly increased OpenTelemetry-related investments across the Azure Monitor platform to expand our support for OpenTelemetry on Azure.

 

Our investments in OpenTelemetry complement other cloud native investments such as Managed Service for Prometheus and Azure Managed Grafana. Together with OpenTelemetry, we are making Azure the best place to observe cloud native workloads.

 

This blog describes how OpenTelemetry investments are reshaping Azure Monitor in 2024, across four areas:

  1. Instrumentation
  2. Collection
  3. Ingress
  4. Experience

OTel_Azure_component_diagram.png

1.  Instrumentation

 

Azure Monitor OpenTelemetry Distro

We recently announced General Availability (GA) of the “Azure Monitor OpenTelemetry Distro”. The Distro is your one-stop-shop to power Azure Monitor. For those who think of distro in a Linux context, OpenTelemetry has adopted the word “distro” to refer to OpenTelemetry-based offerings published by a specific vender.

 

In the case of Azure Monitor, the Azure Monitor Distro includes the basic capabilities you would get from the open source OpenTelemetry package: logs, metrics, and distributed traces. In addition, the Distro is designed and tested to ensure enterprise-readiness and a first-class experience on Azure.

 

It includes Azure-specific capabilities not available in OpenTelemetry, such as:

  • Enablement with a single line of code
  • Authentication (formerly AAD Authentication)
  • Application Insights Standard Metrics
  • Offline storage for more reliable data transport
  • Interoperability with Classic Application Insights SDKs
  • Live Metrics (in progress)

 

The Distro is fully supported by Microsoft and will eventually replace the Classic Application Insights SDK. It’s fully extensible so you can, for example, plug in OpenTelemetry community instrumentation libraries. The Distro provides a platform where Azure can innovate and offer unique value, while in the long run we contribute back to OpenTelemetry and stay tightly coupled with open source.

 

Azure Monitor offers OpenTelemetry Distros in four supported languages:

 

Instrumentation for other languages such as Go, Rust, Ruby, PHP, C++, Swift, and Erlang is available in the OpenTelemetry community, and Azure Monitor’s plans to collect these signals are described in Sections 2 and 3, below.

 

AKS Auto-Instrumentation

In Azure Kubernetes Service, we’re releasing a one-click integration that automatically injects the Azure Monitor OpenTelemetry Distro. This makes enabling observability at-scale easy, making it possible to deploy at scale via templates or scripting and to apply policy. A private preview of AKS integration for .NET, Java, and JavaScript (Node.js) is underway now. Scroll down to “Next Steps” for an opportunity to participate.

 

Azure SDKs

Looking across Azure, we want each Azure service and SDK to be observable to any system that collects OpenTelemetry signals. As a first step, every Azure SDK is instrumented with the OpenTelemetry API. Starting with telemetry from HTTP calls, the “Azure Core Tracing” package was first announced in 2021 Q4, and we plan to release 1.0 of these packages after updating them to the latest OpenTelemetry semantics. To accelerate these efforts, Microsoft led the OpenTelemetry workstream to stabilize OpenTelemetry HTTP semantic conventions. At the same time, we plan to stabilize signal collection for database requests from Azure Cosmos DB SDKs and begin adding OpenTelemetry metrics to other Azure SDKs, starting with the .NET and Java Azure SDKs. As a result, customers will benefit from more robust tracing and richer metric signals across Azure’s Services.

 

.NET

Microsoft’s commitment to OpenTelemetry extends to .NET, where we made the telemetry APIs part of the .NET runtime. Since the release of .NET 5, in November 2020, we’ve added new OpenTelemetry collection capabilities with each subsequent release. As part of the .NET 8 release in November 2023, we introduced .NET Aspire, an opinionated cloud-native stack that includes observability by default with OpenTelemetry. We introduced a new “Developer Dashboard” to observe OpenTelemetry signals in real-time during debugging. The .NET Team even used it to help improve the quality of .NET 8!

 

2.  Collection

Azure has long relied on managed agents to collect telemetry in a reliable and secure manner and with centralized management. Azure Monitor Agent (AMA) remains Azure Monitor’s single agent and still aims to replace all of Microsoft’s legacy monitoring agents. It is an enterprise grade, fully managed offering, optimized for high efficiency with a minimal footprint.

 

We have begun work to enable OpenTelemetry Protocol (OTLP) collection in Azure Monitor Agent (AMA). This will enable customers who run workloads on Azure Virtual Machines and hybrid virtual machines to collect logs, metrics, and distributed traces in OpenTelemetry format and send them to Azure Monitor. Similarly, customers who already use Microsoft Defender for Cloud or Azure Sentinel already run AMA and will get the OpenTelemetry update automatically.

 

Retrofitting AMA to accept OTLP will provide an option to route Azure Monitor application insights telemetry via an agent. This will provide several benefits:

  • Improve the performance of high-load applications by offloading data quickly.
  • Enable observability for code written in languages where a Distro is unavailable, such as Go, Rust, Ruby, PHP, C++, Swift, and Erlang.
  • Enable authentication for Azure Monitor for OpenTelemetry Community SDKs.
  • Managed updates that lessen ongoing maintenance and reduce toil.
  • Shift retries, batching, and authentication to outside the application process.

 

Additionally, data processing is an important part of data collection, and AMA will include basic row-based filtering and aggregation. Advanced data processing scenarios such as removing sensitive data, calculated columns, data enrichment, removing noisy events, and cost reduction can be performed by the Azure Monitor Service with Ingestion Time Transformations. Still, other data processing can be achieved with the OpenTelemetry Collector, and we plan to make it easy for customers to run the Collector as a local pipeline on Azure.

 

3.  Ingress

What if you want to use pure open-source technologies? This is where OpenTelemetry Protocol (OTLP) becomes important. OpenTelemetry specifies more than technical components. It specifies a protocol – how the data is sent. It even goes a step further and specifies the semantics – what data to collect and how to structure it. The protocol and semantics become the common interface that tie the OpenTelemetry API and Azure Monitor together in an OpenTelemetry-based world.

 

Whether you are using the Azure Monitor OpenTelemetry Distro, a pure open-source OpenTelemetry SDK on an Azure PaaS environment, or running the community OpenTelemetry Collector, we want to make it easy to point your telemetry to Azure Monitor. Beyond OpenTelemetry support, this is a significant investment to streamline how data enters Azure Monitor and ensure a stable and scalable service for the next decade.

 

Customers are beginning to recognize OTLP, not only as a standard for app telemetry but for all telemetry including infrastructure and devices. As such, Azure will support OTLP ingestion at select ingress points. Regardless of where the signals are generated, so long as they are in OTLP, you can leverage Azure Monitor. OTLP Ingestion for Azure Monitor, including logs, metrics, and distributed traces, will initially be available via Azure Monitor Agent (AMA).

 

AMA will provide an OTLP ingestion path on Azure and hybrid virtual machines. Similarly, Azure Kubernetes Service (AKS) will offer OTLP ingestion via a containerized AMA that is automatically deployed with Azure Monitor container insights. Whether using OpenTelemetry via AMA or AKS, a common control plane called Data Collection Rules (DCRs) will power centralized policy for enablement and management.

 

What about other Azure PaaS environments, such as Azure Functions, Azure App Service, and Azure Container Apps? What about edge workloads such as those running on browsers or devices? For these scenarios, we plan to treat OTLP as a first-class protocol at the Azure Monitor service ingestion endpoint to enable customers to send OTLP signals from anywhere. We plan to start with native OTLP ingestion for metrics, followed by native OTLP ingestion for logs and traces.

 

In short, Azure plans to offer first-class integration with pure open-source software from OpenTelemetry. Azure’s OpenTelemetry-based components including the Azure Monitor Distro and Agent provide additional value on top of OpenTelemetry including ease-of-enablement, Azure-specific capabilities, stability guarantees, and formal Azure Support.

 

4.  Experience

What value do Azure’s customers realize from a decision to converge and standardize on OpenTelemetry? One such example is new metrics capabilities. Metrics are foundational to monitoring Azure’s service health, and as a result, Microsoft has made significant investments over the last decade to ensure its time series metrics database runs reliably at cloud scale.

 

By converging on OpenTelemetry, it enables Microsoft’s customers to monitor their services with the same platform we use to monitor Azure. To realize this value, we’ve introduced Azure Monitor Workspace (AMW). Similar to how “Log Analytics” powers log-based experiences in Azure Monitor, AMW will eventually house your metrics across Azure and power metrics-based experiences in Azure Monitor including Application Insights.

 

To illustrate the scale now available to Microsoft’s customers, a single AMW ingests up to one million active time series by default and is configurable up to 50 million. As a basis of comparison, Application Insights customers today are subject to a cap of 50,000 active time series.

 

For context, OpenTelemetry expands on the success of Prometheus and is a superset of Prometheus. While Prometheus is just for metrics, OpenTelemetry collects additional signal types such as logs and traces. OpenTelemetry’s client libraries are available in more programming languages than Prometheus client libraries. Whether you use Prometheus and/or OpenTelemetry, Azure Monitor provides common metrics experiences and query capabilities built on AMW. This ensures interoperability and makes the upgrade path to OpenTelemetry smooth.

 

AMW maximizes the value from OpenTelemetry metrics with support for OpenTelemetry Histograms and Exemplars. This will enable customers to adopt better Service Level Indicators (SLIs) and more efficient diagnostic experiences. We plan to expose these new platform capabilities in Azure Metrics Explorer and Azure Monitor application insights.

 

Azure Monitor application insights customers who upgrade to OpenTelemetry will be gradually phased into AMW and receive the benefits it offers including higher time series caps, percentiles, and exemplars.

 

Thank you for being with us on the journey to embrace OpenTelemetry and make Azure the most observable cloud. We look forward to several exciting announcements in 2024, as OpenTelemetry-related investments reshape Azure Monitor’s instrumentation, collection, ingress, and experience.

 

Next Steps

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.