RUBICON: Evaluating conversations between humans and AI systems
RUBICON evaluates AI-driven conversations and improves their quality by learning detailed domain-specific rubrics from minimal data. It gathers insights on AI assistant performance while maintaining user privacy and data security.
The post RUBICON: Evaluating conversations between humans and AI systems appeared first on Microsoft Research.
Continue reading RUBICON: Evaluating conversations between humans and AI systems