Intelligent app on Azure Container Apps Landing Zone Accelerator

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

AI Apps are on the raise with LLMs capabilities made easier for app integration with Azure OpenAI and Azure Container Apps helps developers focus on building the AI apps faster in a serverless container environment without worrying about container orchestration, server configuration and deployments details.

To fast-track your journey to production with AI applications, it's crucial to implement your solutions adhering to the most effective practices in security, monitoring, networking, and operational excellence.

This blogpost will show you how to leverage Azure Container Apps (ACA) Landing Zone Accelerator (LZA) to deploy the AI apps in a production grade secure baseline.

App Overview

To demonstrate the deployment, Java Azure AI reference template is used that provides a complete end-to-end solution demonstrating the Retrieval-Augmented Generation (RAG) pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

The business scenario showcased in the sample is a B2E intelligent chat app to help employees answer questions about company benefits plan, internal policies, as well as job descriptions and roles. The repo includes sample pdf documents in the data folder so it's ready to try end to end. Furthermore, it provides:

Chat and Q&A interfaces
Various options to help users evaluate the trustworthiness of responses with citations, tracking of source content, etc.
Possible approaches for data preparation, prompt construction, and orchestration of interaction between model (ChatGPT) and retriever (Azure AI Search)
Possible AI orchestration implementation using the plain Java Open AI sdk or the Java Semantic Kernel sdk
Settings directly in the UX to tweak the behavior and experiment with options

App Architecture

The API app is implemented as springboot 2.7.x app using Microsoft JDK. It provides ask and chat apis which are used by the chat web app. It's responsible for implementing the RAG pattern orchestrating the interaction between the LLM model (Open AI - ChatGPT) and the retriever (Azure AI Search).
The Chat Web App is built in React and deployed as a static web app on nginx. Furthermore, Nginx act as reverse proxy for api calls to the API app. This also solves the CORS issue.
The indexer App is implemented as springboot 2.7.x app using Microsoft JDK. It is responsible for indexing the data into Azure Cognitive Search and it's triggered by new BlobUploaded messages from Service Bus. The indexer is also responsible for chunking the documents into smaller pieces, embedding them and store them in the index. Azure Document Intelligence is used to extract text from PDF documents (including tables and images)
Azure AI Search is used as RAG retrieval system. Different search options are available: you have traditional full text (with semantic search) search, or vector-based search and finally you can opt for hybrid search which brings together the best of the previous ones.
Event Grid System topic is used to implement a real time mechanism to trigger the indexer app when a new document is uploaded to the blob storage. It's responsible for reading BlobUploaded notification from azure storage container and push a message to the service bus queue containing the blob url.

Deployment Architecture

The Java Azure AI reference template is deployed on top of the Azure Container Apps Landing Zone Accelerator 'internal scenario' infrastructure. Furthermore, the Azure services required to implement the E2E chat with your data solution, are deployed following the Landing Zone Accelerator (LZA) security, monitoring, networking and operational best practices.

From Networking standpoint, Landing zone uses a hub and spoke model with container apps connected to supporting services securely via private end points.
All the traffic is secured within the LZA hub and spoke networks and public access is disabled for all Azure services involved in the solution
All the resources have diagnostic monitoring configured to send logs and metrics to the Log Analytics workspace deployed in the spoke vnet.
The solution is designed to be regional high-available enabling zone redundancy for the Azure services that support it. Azure Open AI doesn't provide a built-in mechanism to support zone redundancy. You need to deploy more Azure Open AI instances in the same or different region and use a load balancer to distribute the traffic.
You can implement the load balancing logic in the client app or in a dedicated container running in ACA or you can use Azure service like API Management which also provide support for advanced Open AI scenarios like costs charge-back, rate limiting and retry policies. In this sample the resiliency logic is implemented in the client app using the default Open AI Java SDK retry capabilities to overcome transient failures with Azure Open AI chat endpoint or retry with exponential backoff to handle throttling errors during document ingestion process raised by the embeddings endpoint.
For more detailed guidance about Azure Open AI resiliency and performance best practices from Well Architected Framework perspective see here.

Deployment

The deployment is done in two parts 1. Deploy the infrastructure and 2. Deploy the application

You can provision the infrastructure using “azd provision” that will automatically provision the container apps in a secure baseline along with supporting services (Azure AI Search, Azure Document Intelligence, Azure Storage, Azure Event Grid, Azure Service Bus) required by the app to work following the best practices provided by the ACA LZA infrastructure.
To deploy the app, connect to the jumpbox using bastion and follow the pre-requisites before using “azd deploy” to build and deploy the app.
Run ./scripts/prepdocs.sh to ingest the predefined documents in the data folder. Allow few minutes for the documents to be ingested in the Azure AI Search index. You can check the status of the ingestion in the Azure portal in indexer app log stream.
From your local browser connect to the public azure application gateway using https. To retrieve the App Gateway public IP address, go to the Azure portal and search for the application gateway resource in the spoke resource group. In the overview page copy the "Frontend public IP address" and paste it in your browser.

Special thanks to Davide Antelmo for authoring the detailed guidance to deploy the AI app to Azure Container Apps in a landing zone environment.

Resources: https://aka.ms/java-ai-aca-accelerator

Leave a Reply Cancel reply