Azure OpenAI Service Multitenant Load Balancing and Token Per Minute Tracking via Prometheus Metrics
Azure OpenAI Service provides various isolation and tenancy models for different scenarios. Some models use a dedicated Azure OpenAI Service resource per tenant, while others rely on a multitenant application sharing one or more Azure Op… Continue reading Azure OpenAI Service Multitenant Load Balancing and Token Per Minute Tracking via Prometheus Metrics
