Using Cohere Binary Embeddings in Azure AI Search and Command R/R+ Model via Azure AI Studio

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In April 2024, we proudly announced our partnership with Cohere, allowing customers to seamlessly leverage Cohere models via the Azure AI Studio Model Catalog, as part of the Models as a Service (MaaS) offering. At Build 2024, Azure AI Search launched support for Binary Vectors. In this blog, we are excited to continue from our previous discussion on int8 embeddings and highlight two powerful new capabilities: utilizing Cohere Binary Embeddings in Azure AI Search for optimized storage and search, and employing the Cohere Command R+ model as a Large Language Model (LLM) for Retrieval-Augmented Generation (RAG).

Cohere Binary Embeddings via Azure AI Studio

Binary vector embeddings use a single bit per dimension, making them much more compact than vectors using floats or int8, while still yielding surprisingly good quality given the size reduction. Cohere's binary embeddings offer substantial efficiency, enabling you to store and search vast datasets more cost-effectively. This capability can achieve significant memory reduction, allowing more vectors to fit within Azure AI Search units or enabling the use of lower SKUs, thus improving cost efficiency and supporting larger indexes.

"Cohere's binary embeddings available in Azure AI Search provide a powerful combination of memory efficiency and search quality, ideal for advanced AI applications." - Nils Reimers, Cohere's Director of Machine Learning.

With int8 and binary embeddings, customers can achieve up to a 32x reduction in vector size under optimal conditions, translating to improved cost efficiency and the ability to handle larger datasets. Read the full announcement from Cohere here: Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets

Cohere Command R+ Model for RAG

The Cohere Command R+ model is a state-of-the-art language model that can be used for Retrieval-Augmented Generation (RAG). This approach combines retrieval of relevant documents with the generation capabilities of the model, resulting in more accurate and contextually relevant responses.

Step-by-Step Guide

Here's how you can use Cohere Binary Embeddings and the Command R model via Azure AI Studio:

Install Required Libraries

First, install the necessary libraries, including the Azure Search Python SDK and Cohere Python SDK.

pip install --pre azure-search-documents pip install azure-identity cohere python-dotenv

Set Up Cohere and Azure AI Search Credentials

Set up your credentials for both Cohere and Azure AI Search. For this walkthrough, we'll use Cohere Deployed Models in Azure AI Studio. However, you can also use the Cohere API directly.

import os import cohere from azure.core.credentials import AzureKeyCredential from azure.identity import DefaultAzureCredential from azure.search.documents import SearchClient from azure.search.documents.indexes import SearchIndexClient from azure.search.documents.indexes.models import SearchIndex, SearchField, SimpleField, VectorSearch, VectorSearchProfile, HnswAlgorithmConfiguration, HnswParameters, VectorEncodingFormat, VectorSearchAlgorithmKind, VectorSearchAlgorithmMetric, AzureMachineLearningVectorizer, AzureMachineLearningParameters from dotenv import load_dotenv load_dotenv() # Azure AI Studio Cohere Configuration AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY") AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT") AZURE_AI_STUDIO_COHERE_COMMAND_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_KEY") AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT") # Index Names INT8_INDEX_NAME = "cohere-embed-v3-int8" BINARY_INDEX_NAME = "cohere-embed-v3-binary" # Azure Search Service Configuration SEARCH_SERVICE_API_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY") SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT") # Create a Cohere client using the AZURE_AI_STUDIO_COHERE_API_KEY and AZURE_AI_STUDIO_COHERE_ENDPOINT from Azure AI Studio cohere_azure_client = cohere.Client( base_url=f"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1", api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY )

Generate Embeddings using Azure AI Studio

Use the Cohere Embed API via Azure AI Studio to generate binary and int8 embeddings for your documents.

def generate_embeddings(texts, input_type="search_document", embedding_type="ubinary"): model = "embed-english-v3.0" texts = [texts] if isinstance(texts, str) else texts response = cohere_azure_client.embed( texts=texts, model=model, input_type=input_type, embedding_types=[embedding_type], ) return [embedding for embedding in getattr(response.embeddings, embedding_type)] # Example usage documents = ["Alan Turing was a pioneering computer scientist.", "Marie Curie was a groundbreaking physicist and chemist."] binary_embeddings = generate_embeddings(documents, embedding_type="ubinary") int8_embeddings = generate_embeddings(documents, embedding_type="int8")

Create an Azure AI Search Index

Create an Azure AI Search index to store the embeddings. Note, that Azure AI Search only supports unsigned binary at this time.

def create_or_update_index(client, index_name, vector_field_type, scoring_uri, authentication_key, model_name): fields = [ SimpleField(name="id", type=SearchFieldDataType.String, key=True), SearchField(name="text", type=SearchFieldDataType.String, searchable=True), SearchField( name="embedding", type=vector_field_type, vector_search_dimensions=1024, vector_search_profile_name="my-vector-config", hidden=False, stored=True, vector_encoding_format=( VectorEncodingFormat.PACKED_BIT if vector_field_type == "Collection(Edm.Byte)" else None ), ), ] vector_search = VectorSearch( profiles=[VectorSearchProfile(name="my-vector-config", algorithm_configuration_name="my-hnsw")], algorithms=[HnswAlgorithmConfiguration(name="my-hnsw", kind=VectorSearchAlgorithmKind.HNSW, parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE if vector_field_type == "Collection(Edm.SByte)" else VectorSearchAlgorithmMetric.HAMMING))] ) index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search) client.create_or_update_index(index=index) # Example usage search_index_client = SearchIndexClient(endpoint=search_service_endpoint, credential=credential) create_or_update_index(search_index_client, "binary-embedding-index", "Collection(Edm.Byte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0") create_or_update_index(search_index_client, "int8-embedding-index", "Collection(Edm.SByte)", AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT, AZURE_AI_STUDIO_COHERE_EMBED_KEY, "embed-english-v3.0")

Index Documents and Embeddings

Index the documents along with their embeddings into Azure AI Search.

def index_documents(search_client, documents, embeddings): documents_to_index = [{"id": str(idx), "text": doc, "embedding": emb} for idx, (doc, emb) in enumerate(zip(documents, embeddings))] search_client.upload_documents(documents=documents_to_index) # Example usage search_client_binary = SearchClient(endpoint=search_service_endpoint, index_name="binary-embedding-index", credential=credential) search_client_int8 = SearchClient(endpoint=search_service_endpoint, index_name="int8-embedding-index", credential=credential) index_documents(search_client_binary, documents, binary_embeddings) index_documents(search_client_int8, documents, int8_embeddings)

Perform a Vector Search

Use the Azure AI Search client to perform a vector search using the generated embeddings.

def perform_vector_search(search_client, query, embedding_type="ubinary"): query_embeddings = generate_embeddings(query, input_type="search_query", embedding_type=embedding_type) vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="embedding") results = search_client.search(search_text=None, vector_queries=[vector_query]) for result in results: print(f"Text: {result['text']}") print(f"Score: {result['@search.score']}\n") # Example usage perform_vector_search(search_client_binary, "pioneers in computer science", embedding_type="ubinary") perform_vector_search(search_client_int8, "pioneers in computer science", embedding_type="int8")

Int8 Results: Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Score: 0.6225287 Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time. Score: 0.5917698 Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher. Score: 0.5746157 Binary Results: Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Score: 0.002610966 Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time. Score: 0.0024509805 Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher. Score: 0.0023980816

Ground the Results to Cohere Command R+ for RAG

Use the Cohere Command R+ model to generate a response based on the retrieved documents.

# Create a Cohere client for Command R+ co_chat = cohere.Client( base_url=f"{AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT}/v1", api_key=AZURE_AI_STUDIO_COHERE_COMMAND_KEY ) # Extract the documents from the search results documents_binary = [{"text": result["text"]} for result in results_binary] # Ground the documents from the "binary" index chat_response_binary = co_chat.chat( message=query, documents=documents_binary, max_tokens=100 ) print(chat_response_binary.text)

Binary Results: There are many foundational figures who have made significant contributions to the field of computer science. Here are some of the most notable individuals: 1. Alan Turing: Often considered the "father of computer science," Alan Turing was a British mathematician and computer scientist who made groundbreaking contributions to computing, cryptography, and artificial intelligence. He is widely known for his work on the Turing machine, a theoretical device that served as a model for modern computers, and for his crucial role in breaking German Enigma codes during World War II. 2. Albert Einstein: Known for his theory of relativity and contributions to quantum mechanics, Albert Einstein was a German-born physicist whose work had a profound impact on the development of modern physics. His famous equation, E=mc^2, has become one of the most well-known scientific formulas in history. 3. Isaac Newton: An English mathematician, physicist, and astronomer, Isaac Newton is widely recognized for his laws of motion and universal gravitation. His work laid the foundation for classical mechanics and significantly advanced the study of optics and calculus.

Full Notebook

Find the full notebook with all the code and examples here.

Getting Started

Azure AI Search Documentation:
- Learn more about setting up and using Azure AI Search.
- Dive into the specifics of Binary Vectors in Azure AI Search.
Cohere Documentation:
- Explore how to integrate Cohere models via Cohere’s API.
- Learn how to install and use the Cohere Python SDK and how to deploy the Cohere Embed Model-As-A-Service with Azure AI Studio.
Additional Resources:
- Learn more about indexing binary vector types.
- Explore the latest features of Azure AI Search.
- Start creating a search service in the Azure Portal, Azure CLI, the Management REST API, ARM template, or a Bicep file.

By integrating Cohere Binary Embeddings and the Command R/R+ model into your Azure AI workflow, you can significantly enhance the performance and scalability of your AI applications, providing faster, more efficient, and contextually relevant results.