Journey Series for Generative AI Application Architecture – Foundation

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In Build last year, Microsoft CTO Kevin Scott proposed Copilot Stack to provide problem-solving ideas for Generative AI applications. Based on the Coplit Stack, community have developed many frameworks in the past year, such as Semantic Kernel, AutoGen, and LangChain. These frameworks are more biased toward front-end applications, and enterprises need a better engineering solution. This series hopes to give you some ideas based on Microsoft Cloud and related frameworks and tools.

Generative AI has brought great changes to different industries and application scenarios. 2024 will be the year when the application of Generative AI explodes. Enterprises not only invest at the application level, but also make changes in infrastructure. The following four aspects are very important for enterprises, and can also be said to be the key when transforming to AI.

In addition to using Azure OpenAI Service for Generative AI models, more companies hope to use open source SLMs to combine their own data to construct their own models.
Both the fine-tuning of open source SLMs and the reference of open source SLMs require computing power. The combination of hybrid computing power is very important. Hybrid computing power runs through all aspects of Generative AI from development, testing, and production environments.
Integrating better business prompt into enterprise business logic, and evaluating the effectiveness of the prompt on the business is also very critical.
Rapidly deploy applications to respond to business changes

Everything is model-centric

Azure OpenAI Service offers the most powerful Generative AI models, making it the hands-down first choice. Open source SLMs such as LLama, Mistral, Nemotron, and Phi-2 are also the choices of enterprises. After all, enterprises prefer to use open source LLMs to build their own business models based on their own data. It is impossible to choose only a single model as a solution within an enterprise. It must be an application scenario where multiple models are used together. Azure AI Studio provides more model choices on Azure. You can choose the model you need through Azure AI Studio.

With Azure AI Studio you can compare capabilities between models to make the best choice based on your business

From the perspective of usage scenarios of open source models, we can divide them into fine-tuning and inference.

Fine tunning

Generally, enterprises will import some of their own data and business processes to fine-tune based on open source SLMs (regardless of the effect of fine-tuning). Enterprise data is a very important step. You can introduce it through data on Microsoft Fabric or Azure Data as support, and then fine-tune the open source SLMs through LoRA or QLoRA. The process of fine-tuning is very long, and you can complete it through the cloud or local computing instances.

Inference

After fine-tuning, the open source model can be deployed on Azure AI Studio or a local computing instance, allowing the enterprise to fine-tune the model to support business work. As enterprises need to evaluate the performance of fine-tuned models, especially their low latency, high concurrency, and business accuracy, frameworks such as Prompt flow are very critical in model reference.

Tools and Frameworks

Fine-tuning and referencing are very critical steps. Based on these two scenarios, we will introduce some tools and frameworks based on the combination of cloud and local.

Azure AI Studio

Azure AI Studio is a cloud platform for enterprises to build Generative AI, providing a closed loop throughout the entire LLMOps part of enterprise team development. Through Azure AI SDK and Azure AI CLI, it is combined with Generative AI development application scenarios, covering the process from data, model selection, model fine-tuning, testing, application deployment, content security and other processes.

Windows AI Studio

The characteristic of Windows AI Studio is that it allows developers, especially individual developers, to fine-tune, reference and deploy open-source SLMs (such as Phi2, Llama 2, Mistral, etc.) through the local environment (Windows + WSL), allowing the device uses a small language model locally with local computing power, and of course it can also be used in conjunction with Azure AI Studio. Windows AI Studio has a extension based on Visual Studio Code, through which developers or development teams can well manage the entire Generative AI application development process.

Microsoft Olive

Microsoft Olive is a very easy-to-use open source model optimization tool that can cover both fine-tuning and reference in the field of Generative AI. It only requires simple configuration, combined with the use of open source SLMs and related runtime environments (AzureML / local GPU, CPU, DirectML), the fine-tuning or reference of the model can be completed through automatic optimization, allowing you to find the best model deployment to the cloud or edge devices. Microsoft Olive has been introduced on Windows AI Studio.

Prompt Flow

In the process of Generative AI, we need prompts to interact with the Generative AI model. The goal of Prompt flow is to simplify the end-to-end development cycle of Generative AI-based applications, from ideation, prototyping, testing, and evaluation to production deployment and monitoring. It makes the integration of prompt projects and business more convenient and can be evaluated based on business prompt projects.

Architectural

As mentioned at the beginning, if we want to engineer based on generative artificial intelligence, we need to start thinking from a lower level. Combining the tools mentioned above, combined with the Copilot Stack, I rethought the process of Generative AI architecture.

The bottom layer is the model. We have more possibilities in model selection, including not only Azure OpenAI Service, but also open source small models provided by Azure AI Studio or Huggingface.
We can use Microsoft Olive combined with Windows AI Studio to complete the local fine-tuning of the open source small language model. Of course, in the stage where the parameters are relatively complex, we can also migrate the fine-tuning to Azure AI Studio.
We can use ONNX Runtime and Microsoft Olive to run the model at the reference layer, or directly reference and deploy the model through Windows AI Studio and Azure AI Studio.
Use Prompt flow to evaluate the effectiveness of prompt projects and models in enterprise application scenarios to improve generative artificial intelligence intelligence.
Combine different needs and frameworks to quickly build applications and deploy them to different terminals

The following is an architecture diagram based on the above five points.

Summary

Generative AI is changing every day, and everyone is deeply involved in this era. For companies that need to quickly transform to Generative AI , having a good architecture will get twice the result with half the effort. I hope this series can give you some ideas. Of course, I also hope to communicate with you more. The following is the plan for the series, I hope you will pay more attention in the next few weeks

Journey Series for Generative AI Application Architecture - Basics
Journey Series for Generative AI Application Architecture - Microsoft Olive takes a step towards fine-tuning open source SLMs
Journey Series for Generative AI Application Architecture - Build Generative AI applications using Azure AI Studio
Journey Series for Generative AI Application Architecture - Tips for optimizing prompts in Prompt flow

Resources

Learn about Microsoft Olive https://microsoft.github.io/Olive/
Learn about Azure AI Studio https://learn.microsoft.com/azure/ai-studio/what-is-ai-studio
Learn about Windows AI Studio https://learn.microsoft.com/windows/ai/studio/
Learn about Prompt flow https://learn.microsoft.com/azure/ai-studio/how-to/prompt-flow
Read "Hugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure" https://huggingface.co/blog/hugging-face-endpoints-on-azure