This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Every app will be reinvented with Generative AI and new apps will be built that weren’t possible before. Generative AI helps to build intelligent apps using the Large Language Model (LLM) capabilities. As the number of intelligent applications grows alongside the adoption of various large language models (LLMs), enterprises encounter significant challenges in efficiently federating and managing generative AI resources. Key issues include ensuring resiliency and high availability of models, tracking model usage and implementing chargeback mechanisms for users, managing increased latency, and addressing data sovereignty concerns. This demands for a centralized solution “Gen AI Gateway” that must seamlessly integrate, optimize, and distribute the workloads across a federated network of GenAI resources.

This blog post provides an overview of how Azure APIM Management can be used as a GenAI Gateway leveraging the new accelerator scenario named “GenAI Gateway Accelerator” published on the APIM Landing Zone Accelerator.

GenAI Gateway

LLMs are accessible via their rest endpoints. Typically, large enterprises hide these endpoints behind a secure gateway providing centralized access over resources.

Azure API Management is a globally available and proven API management solution, which allows organizations to abstract, secure, observe, and publish APIs. APIM’s gateway component is used to create the GenAI Gateway that can serve as an intelligent interface/middleware that dynamically balances incoming traffic across backend resources to achieve optimizing resource utilization. In addition to load balancing, GenAI Gateway can be equipped with extra capabilities to address the challenges around billing, monitoring etc.

Key benefits from GenAI Gateway

APIM Landing Zone Accelerator

Azure API Management (APIM) Landing Zone Accelerator (LZA) offers comprehensive guidance including reference architecture and implementation strategies. It also provides design guidance, recommendations and considerations on key areas critical to provisioning APIM with a secure baseline. These guidance and reference implementation are aligned with industry proven practices, such as those presented in Azure Landing Zones guidance in the Cloud Adoption Framework.

APIM LZA follows a layered approach provisioning APIM in a secure baseline as the base layer, on top of which backends such as Azure OpenAI, App Service, Azure Container Apps, etc., can be deployed.

Azure API Management - GenAI Backend

The new GenAI scenario demonstrates how to provision and interact with Generative AI resources through API Management. The capabilities below are handled by the accelerator:

Capability	Description
Load balancing (round-robin)	Load balance traffic across PAYG endpoints using simple and weighted round-robin algorithm.
Managing spikes with PAYG	Manage spikes in traffic by routing traffic to PAYG endpoints when a PTU is out of capacity.
Adaptive rate limiting	Dynamically adjust rate-limits applied to different workloads
Tracking token usage	Record the token consumption for usage tracking and attribution

Reference Architecture

The below reference architecture illustrates APIM provisioned in a secure baseline, fronted by an Application Gateway. It includes private deployments of Azure OpenAI endpoints, and the policies specifically tailored for GenAI use cases. All the components are secured with Network Security Groups and supporting services such as Event Hub, Key Vault are accessed through private endpoints ensuring a robust and secure infrastructure.

Deployment

You can deploy the Bicep based deployment as below

Deploy Azure API Management - Secure Baseline scenario.
Run the following command to deploy the scenarios

./scripts/deploy-workload-genai.sh

Supported Regions

Some of the new Azure OpenAI policies are not available in all the regions yet. If you see the deployment failures, try choosing a different region. The following regions are more likely to work.

australiacentral, australiaeast, australiasoutheast, brazilsouth, eastasia, francecentral, germanywestcentral, koreacentral, northeurope, southeastasia, southcentralus, uksouth, ukwest, westeurope, westus2, westus3

Test/Demo setup

If you are looking for a quick way to test or demo these capabilities with a minimalistic non - production like APIM setup against an Azure OpenAI simulator, check out this repository.

:play_button: GenAI Gateway Test Toolkit

AI Hub Gateway capabilities

Looking for a comprehensive reference implementation to provision your AI Hub Gateway? Check out AI Hub Gateway scenario.

:play_button:AI Hub Gateway

Resources

Special thanks to the APIM Landing Zone Accelerator team Andrei Kamenev, Ben Briggs, Srini Padala, Prasanna Nagarajan, Vivek Soni, Lucas Huet, Stuart Leeks, Mohammed Saif for their contribution to launch the new GenAI Gateway Accelerator scenario.