Path to Production Azure OpenAI Instances – PowerSchool Power Buddy

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Introduction:

 

In the rapidly evolving world of education technology, EdTech customers are increasingly turning to generative AI platforms to revolutionize the way they approach course content. Leveraging the power of Azure OpenAI models, these innovative platforms are transforming personalized content generation, course design, development, and even grading assessments. PowerSchool is a leading Education Technology provider of cloud-based K-12 software, with its award-winning software solutions. 

 

Recognizing the transformative potential of Generative AI in education, PowerSchool's New Solutions division has been dedicatedly exploring avenues to harness the capabilities of Large Language Models (LLMs) to drive positive change.  The Azure OpenAI Service, with its advanced AI models such as GPT-4 Turbo, GPT-4, and Vision, has become a cornerstone for these EdTech initiatives. Power School has been rigorously testing these models in various use cases, discovering the immense value they bring. The performance of the platform, the ability to scale API requests across different regions, the security of the end-to-end platform, and compliance with enterprise-wide organizational standards are just a few of the benefits that have been recognized.

 

Transitioning from proof of concept and pilot phases to full-scale production, Power School is now faced with the task of monitoring, scaling, and optimizing AI applications and cost.

 

Power Buddy Overview:

 

PowerBuddy is PowerSchool's AI assistant, designed to meet the diverse needs of students, families, teachers, and administrators. Its goal is to provide personalized insights, foster engagement, and create a supportive environment throughout the educational journey. In the realm of assessment, Performance Matters PowerBuddy excels by helping educators create tailored assessments aligned with grade levels, standards, and topics. Integrated seamlessly into the Performance Matters platform, it generates questions and passages efficiently. Additionally, PowerBuddy's conversational chatbot feature assists students with learning assignments. With its adaptability and support for all stakeholders, PowerBuddy enriches the educational experience for everyone involved.

 

Path to Production

 

As PowerSchool’s AI applications gain traction across school districts in the US, ensuring observability has emerged as a crucial component integrated into the solution rather than a mere monitoring mechanism. During our initial pilot phases, we identified key observability properties essential for understanding the internal state of our AI systems, including the following metrics.

 

  • Latency
  • Successful calls/Error rate
  • Processed Inference tokens
  • Completion Tokens
  • Total API Requests

We need comprehensive monitoring solutions to proactively monitor and optimize and scale the resources before moving to production.

 

To monitor and optimize Azure OpenAI instances, PowerSchool utilizes Azure Monitor, a comprehensive solution for collecting and analyzing telemetry data. Azure Workbooks are built on Azure Monitor to create custom dashboards and reports, while Azure Managed Grafana enhances visualization with dynamic dashboards. Azure Log Analytics Workspace centralizes log data collection and analysis, providing deeper insights to refine metrics and improve monitoring. The following diagram shows how we can monitor different tools for different use cases.

 

Gana_Chandrasekaran_0-1713929307637.png

 

The initial method involved utilizing Azure Monitor - Metrics and Alerts, offering a straightforward means to monitor and optimize Azure Open AI instances, particularly in single-instance deployments where basic metrics viewing, and simple alert setup are paramount. The accompanying diagram illustrates the process of monitoring Azure Open AI instances using Azure Monitor.

 

Gana_Chandrasekaran_1-1713929342420.png

 

 

In scenarios involving multiple Azure Open AI instances spread across diverse regions and models (as exemplified by PowerBuddy), a more sophisticated monitoring capability proved necessary. This advanced functionality should seamlessly visualize data from various Azure Open AI instances and formats, highlighting the relevance of the subsequent approach - Azure Workbooks. These workbooks furnish dashboards showcasing total OpenAI requests within specific regions, latency across multiple instances, total tokens across resources etc. configurable to custom schedules.

 

Using Azure Workbooks provides a scalable platform to monitor and optimize Azure Open AI instances for production, especially beneficial for large-scale deployments requiring data combination and visualization from multiple sources and formats. The following diagram shows the total processed inferenced tokens across two different regions.

 

Gana_Chandrasekaran_2-1713929381826.png

 

 

In cases where Azure Open AI instances are deployed in a multi-cloud environment, necessitating resource monitoring across different cloud vendors, a more comprehensive monitoring capability is essential.

 

The third approach is to use Azure Grafana Monitoring and Reporting. Grafana supports data sources from other cloud providers, enabling the monitoring and optimization of Azure Open AI instances in a multi-cloud environment. This provides a comprehensive view of resources across various cloud environments as would be needed in a company like PowerSchool which uses a multi cloud infrastructure. Additionally, dashboards can be exported and shared for collaborative monitoring and optimization efforts. The following dashboards show the total alerts, warnings, severity, and conditions.

 

Gana_Chandrasekaran_3-1713929419399.png

 

 

In leveraging Azure Log Analytics Workspace for monitoring and enhancing Azure Open AI instances, a centralized and scalable solution emerges. This workspace facilitates the collection and analysis of log data from Azure Open AI instances and other associated resources. Empowered by the Kusto Query Language (KQL), users can efficiently query and manipulate log data, crafting bespoke insights and solutions. Dashboards within the workspace display vital metrics like total processed inference tokens over the past 120 days, employing dynamic KSQL queries for real-time analysis.

 

Gana_Chandrasekaran_4-1713929460416.png

 

 

To address this challenge, we collaborated with the Microsoft Global Black Belt team to develop comprehensive monitoring solutions such as Azure workbooks comprising dashboards and insights for each metric outlined in our observability manifesto across multiple OpenAI instances. Additionally, we fine-tuned our alerting thresholds, switching between dynamic and static thresholds based on traffic patterns, alert frequency, and specific use cases.  Here is the sample dashboard that we have developed to measure the performance of the AI Apps distributed across different instances.

 

Gana_Chandrasekaran_5-1713929535088.png

 

 

As our understanding of usage patterns across products evolves, we are committed to continuously enhancing our observability and monitoring processes within Azure Open AI. One key recommendation is to iteratively monitor traffic, usage, and patterns across products, updating dashboards and alerts to deliver maximum value. Our experience at PowerSchool underscores the importance of continuous improvement to meet evolving needs in education technology.

 

Conclusion:

 

PowerSchool is leveraging Azure OpenAI models to transform education through generative AI, enhancing course content, design, and assessments. To monitor and optimize Azure OpenAI instances, PowerSchool uses Azure Monitor, Workbooks, Managed Grafana, and Log Analytics Workspace for comprehensive telemetry data analysis and visualization. Collaborating with Microsoft's Global Black Belt team, PowerSchool develops monitoring solutions for scaling AI applications in production, emphasizing continuous improvement and iterative monitoring for enhanced observability in their AI applications.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.