Affordable Innovation: Unveiling the Pricing of Phi-3 SLMs on Models as a Service

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

At this year's Microsoft Build, we introduced the Phi-3 series of small language models (SLMs), a groundbreaking addition to our Azure AI model catalog. The Phi-3 models, which include Phi-3-mini, Phi-3-medium, represent a significant advancement in the realm of generative AI, designed to deliver large model performance in a compact, efficient package.


The power of Phi-3 models


The Phi-3 series stands out by offering the capabilities of significantly larger models while requiring far less computational power. This makes Phi-3 models ideal for a wide range of applications, from enhancing mobile apps to powering devices with stringent energy requirements. These models support extensive context lengths—up to 128K tokens—pushing the boundaries of what small models can achieve.


Features and Benefits


  1. Versatility and Scalability: Phi-3 models are versatile across various NLP tasks, including text generation, summarization, and more complex language understanding tasks, making them adaptable to both commercial and academic uses.
  2. Optimized Performance: Designed for efficiency, these models excel in environments where quick response times are crucial without sacrificing the quality of outcomes.
  3. Cost-Effectiveness: By optimizing the quality-cost curve, Phi-3 models ensure that users can deploy cutting-edge AI without the high resource costs typically associated with large models.
  4. Ease of Integration: Available on Azure AI Studio, Hugging Face and Ollama, these models can be seamlessly integrated into existing systems, allowing developers to leverage their capabilities with minimal setup.


Pricing and Availability


Experience the efficiency and agility of Phi-3 small language models on Azure AI model catalog through Pay-As-You-Go (PAYGO) offering via Serverless APIs. PAYGO allows you to pay only for what you use, perfect for managing costs without compromising on performance. For consistent throughput and minimal latency, Phi-3 models offer competitive pricing per unit, providing you with a clear and predictable cost structure. The pricing starts on June 1st, 2024 at 00:00 am UTC i.e. 05:00 pm PST on May 31st, 2024.


These models are available in East US2 and Sweden Central regions.




Input (Per 1,000   tokens)

Output (Per 1,000 tokens)



















Stay tuned for more updates on Phi-3, and prepare to transform your applications with the efficiency, versatility, and power of Phi-3 small language models. For more information, visit our product page or contact our sales team to see how Phi-3 can fit into your technology stack.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.