Introducing the Azure AI Model Inference API

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

We launched the model catalog in early 2023, featuring a curated selection of open-source models that customers can trust and consume in their organizations. The Azure AI model catalog offers around ~1700 models, including the latest open-source innovations like Llama3 from Meta, but also models coming from partnerships like OpenAI, Mistral, and Cohere. Each of these models with unique capabilities that we think will inspire developers to build the next generation of copilots.



A screenshot of the Azure AI model catalog displaying the large diversity of models it brings in for customers.


To enable developers to get access to these capabilities consistently, we are launching the Azure AI model inference API, which enables customers to consume the capabilities of those models using the same syntax and the same language. This API introduces a single layer of abstraction, yet it allows each model to expose unique features or capabilities that differentiate them.


Starting today, all language models deployed as serverless API support this common API. This means you can interact with  GPT-4 from Azure OpenAI Service, Cohere Command R+, or Mistral-Large, in the same way without the need for translations. Soon, these capabilities will also be available on models deployed to our self-hosted managed endpoints, unifying the consumption experience across all our inferencing solutions.



A graphic depicting that the Azure AI model inference API can be used to consume models from Cohere, Mistral, Meta LLama, Microsoft (including Phi-3) and Core42 JAIS, and it’s also compatible with Azure OpenAI Service model deployments.


This is the same API utilized within Azure AI Studio and Azure Machine Learning. You can use prompt flow to build intelligent experiences that can now leverage various models. Since all the models speak the same language, you can run evaluations to compare them across different tasks, determine which one to use for each use case, exploit their strengths, and build experiences that delight your customers.



A screenshot showing the comparison of 3 different evaluations of a prompt flow chat application that implements the RAG pattern. The evaluation was run using 3 different variations of the same prompt flow, each of them running GPT-3.5 Turbo, Mistral-Large, and Llama2-70B-chat, using the same prompt message for the generation step. 

We see more customers eager to combine the innovation from across the industry and redefine what’s possible. They are either integrating foundational models as building blocks for their applications or by fine-tuning them to achieve niche capabilities in specific use cases. We hope these new set of capabilities unlock the experimentation and evaluation required to move across models, picking the right one for the right job.


We want to help customers to fulfill that mission, empowering every single AI developer to achieve more with Azure AI.    




Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.