Optimizing Retrieval for RAG Apps: Vector Search and Hybrid Techniques

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

In our previous blog, we talked about LLMs and incorporating domain knowledge techniques such as Retrieval Augmentation Generation (RAG) to solve the issue of outdated knowledge. 





In this blog we are going to dive into optimizing our search strategy with Hybrid search techniques. Common practices for implementing the retrieval step in retrieval-augmented generation (RAG) applications are;  

  • Keyword search 
  • Vector Search 
  • Hybrid search (Keyword + Vector) 
  • Hybrid + Semantic ranker 

Optimal search strategy 






  • Keyword search - Uses traditional full-text search methods – content is broken into terms through language-specific text analysis, inverted indexes are created for fast retrieval, and the BM25 probabilistic model is used for scoring. 
  • Vector search - is best for finding semantically related matches, which is a fully supported pattern in Azure AI Search . Documents are converted from text to vector representations using an embedding model. Retrieval is performed by generating a query embedding and finding the documents whose vectors are closest to the query’s.  We used Azure Open AI text-embedding-ada-002 (Ada-002) embeddings and cosine similarity. 

Vector embeddings - An embedding encodes an input as a list of floating-point numbers. Different models output different embeddings, with varying lengths. 

”dog” → [0.017198, -0.007493, -0.057982, 0.054051, -0.028336, 0.019245,…]





Vector similarity - We compute embeddings so that we can calculate similarity between inputs. The most common distance measurement is cosine similarity. 




In the above image we can use Cosine similarity, a common way to measure vector similarity is by calculating the cosine similarity between two vectors. A cosine similarity value close to 1 indicates high similarity, while a value close to 0 indicates dissimilarity, this is in the context of natural language processing (NLP) and machine learning. 

  • Let’s say we have word embeddings for “dog,” “woof,” and “cat.” 
  • If we calculate the cosine similarity between the vectors for “dog” and “woof,” we might get a high value (close to 1) because they are related. 
  • However, the cosine similarity between “dog” and “cat” would likely be lower (closer to 0) because they represent different animals with distinct characteristics. Remember that this concept extends beyond words; it applies to any vectors in an embedding space. Whether it’s words, images, or other data points, measuring similarity helps us understand relationships and make informed decisions in various applications 




  • Hybrid search (Keyword + Vector) - combines vector search and keyword search, optimally using Reciprocal-Rank-Fusion for merging results and a Machine Learning model to re-rank results after 
  • Hybrid + Semantic ranker - generative AI scenarios typically use the top 3 to 5 results as their grounding context to prioritize the most important results. AI Search applications work best with a calibrated relevance score that can be used to filter out low quality results. The semantic ranker runs the query and documents text simultaneously though transformer models that utilize the cross-attention mechanism to produce a ranker score. 

A score of 0 represents a very irrelevant chunk, and a score of 4 represents an excellent one. In the chart below, Hybrid + Semantic ranking finds the best content for the LLM at each result set size.  See code example on this repo by Pamela fox. 





RAG with hybrid search 






Using the above image, when user sends the prompt “do my company...” the embedding model creates vector representations of that text. These embeddings capture the semantic meaning allowing similarity comparisons between different pieces of text. These embedding models include word2vec, BERT, and GPT (Generative Pre-trained Transformer). Using hybrid search we are able to use keyword + vector search and sematic search to retrieve more accurate response from our source ie pdf. 


RAG With Vector Databases 







We can extend this robust retrieval for RAG with a vector Database. A vector database is specifically designed for first, efficient storage and retrieval of high-dimensional vectors. There two types of Vector database there two types of Vector databases 

  • Integrated vector database 
  • pure vector database 

A pure vector database is designed to efficiently store and manage vector embeddings, along with a small amount of metadata; it is separate from the data source from which the embeddings are derived 

A vector database that is integrated in a highly performant NoSQL or relational database provides additional capabilities. The integrated vector database in a NoSQL or relational database can store, index, and query embeddings alongside the corresponding original data. This approach eliminates the extra cost of replicating data in a separate pure vector database. Moreover, keeping the vector embeddings and original data together better facilitates multi-modal data operations, and enables greater data consistency, scale, and performance. 


Integrated vector database on Azure 



Read more 



Code samples 

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.