Building high scale RAG applications with Microsoft Fabric Eventhouse

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Introduction

In this article I will guide you on how to build a Generative AI application in Microsoft Fabric.
This guide will walk you through implementing a RAG (Retrieval Augmented Generation) system in Microsoft Fabric using Azure OpenAI and Microsoft Fabric Eventhouse as your vector store.

Why MS Fabric Eventhouse?

Fabric Eventhouse is built using the Kusto Engine that delivers top-notch performance for similarity search at high scale.

If you are looking to build a RAG application with a large number of embeddings vectors, look no more, using MS Fabric you can leverage the processing power for building the Vector Database and the high performant engine powering Fabric Eventhouse DB.

If you want to know more about using Fabric Eventhouse as a Vector store here are some links.

Azure Data Explorer for Vector Similarity Search

Optimizing Vector Similarity Search on Azure Data Explorer – Performance Update

Optimizing Vector Similarity Searches at Scale

What is RAG - Retrieval Augmented Generation?

Large Language Models (LLMs) excel in creating text that resembles human writing.
Initially, LLMs are equipped with a broad spectrum of knowledge from extensive datasets used for their training. This grants them flexibility but may not provide the specialized focus or knowledge necessary in certain topics.

Retrieval Augmented Generation (RAG) is a technique that improves the pertinence and precision of LLMs by incorporating real-time, relevant information into their responses. With RAG, an LLM is boosted by a search system that sifts through unstructured text to find information, which then refines the LLM's replies.

What is a Vector Database?

The Vector Database is a vital component in the retrieval process in RAG, facilitating the quick and effective identification of relevant text sections in response to a query, based on how closely they match the search terms.

Vector DBs are data stores optimized for storing and processing vector data. Vector data can refer to data types such as geometric shapes, spatial data, or more abstract high-dimensional data used in machine learning applications, such as embeddings.

These databases are designed to efficiently handle operations such as similarity search, nearest neighbour search, and other operations that are common when dealing with high-dimensional vector spaces.

For example, in machine learning, it's common to convert text, images, or other complex data into high-dimensional vectors using models like word embeddings, image embeddings, etc. To efficiently search and compare these vectors, a vector database or vector store with specialized indexing and search algorithms would be used.

In our case we will use Azure OpenAI Ada Embeddings model to create embeddings, which are vector representations of the text we are indexing and storing in Microsoft Fabric Eventhouse DB.

The code

The code can be found here.

We will use the Moby Dick book from the Gutenberg project in PDF format as our knowledge base.

We will read the PDF file, cut the text into chunks of 1000 characters and calculate the embeddings for each chunk, then we will store the text and the embeddings in our Vector Database (Fabric Eventhouse)

We will then ask questions and get answers from our Vector DB and send the question and answers to Azure OpenAI GPT4 to get a response in natural language.

Processing the files and indexing the embeddings

We will do this once – only to create the embeddings and then save them into our Vector Database – Fabric Eventhouse

Read files from Fabric Lakehouse
Create embeddings from the text using Azure OpenAI ada Embeddings model
Save the text and embeddings in our Fabric Eventhouse DB

RAG - Getting answers

Every time we want to search for answers from our knowledge base, we will:

Create the embeddings for the question and search our Fabric Eventhouse for the answers, using Similarity search
Combining the question and the retrieved answers from our Vector Database, we will call Azure OpenAI GPT4 model to get “natural language” answer.

Prerequisites

To follow this guide, you will need to ensure that you have access to the following services and have the necessary credentials and keys set up.

Microsoft Fabric.
Azure OpenAI Studio to manage and deploy OpenAI models.

Setup

Create a Fabric Workspace

Create a Lakehouse

Upload the moby dick pdf file

Create an Eventhouse DB called “GenAI_eventhouse”

Click on the DB name and then “Explore your data” on the top-right side

Create the “bookEmbeddings” table

Paste the following command and run it

.create table bookEmbeddings (document_name:string, content:string, embedding:dynamic)

Import our notebook

Grab your Azure openAI endpoint and secret key and paste it in the notebook, replace your models deployment names if needed.

Get the Eventhouse URI and paste it as “KUSTO_URI” in the notebook

Connect the notebook to the Lakehouse

Let’s run our notebook

This will install all the python libraries we need

%pip install openai==1.12.0 azure-kusto-data langchain tenacity langchain-openai pypdf

Run cell 2 after configuring the environment variables for:

OPENAI_GPT4_DEPLOYMENT_NAME="gpt-4" OPENAI_DEPLOYMENT_ENDPOINT="<your-azure openai endpoint>" OPENAI_API_KEY="<your-azure openai api key>" OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = "text-embedding-ada-002" KUSTO_URI = "<your-eventhouse cluster-uri>"

Run cell 3

Here we create an Azure OpenAI client and define a function to calculate embeddings

client = AzureOpenAI( azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT, api_key=OPENAI_API_KEY, api_version="2023-09-01-preview" ) #we use the tenacity library to create delays and retries when calling openAI embeddings to avoid hitting throttling limits @retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6)) def generate_embeddings(text): # replace newlines, which can negatively affect performance. txt = text.replace("\n", " ") return client.embeddings.create(input = [txt], model=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME).data[0].embedding

Run cell 4

Read the file, divide it into 1000 chars chunks

# splitting into 1000 char long chunks with 30 char overlap # split ["\n\n", "\n", " ", ""] splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=30, ) documentName = "moby dick book" #Copy File API path fileName = "/lakehouse/default/Files/moby dick.pdf" loader = PyPDFLoader(fileName) pages = loader.load_and_split(text_splitter=splitter) print("Number of pages: ", len(pages))

Run cell 5

Save the text chunks to a pandas dataframe

#save all the pages into a pandas dataframe import pandas as pd df = pd.DataFrame(columns=['document_name', 'content', 'embedding']) for page in pages: df.loc[len(df.index)] = [documentName, page.page_content, ""] df.head()

Run cell 6

Calculate embeddings

# calculate the embeddings using openAI ada df["embedding"] = df.content.apply(lambda x: generate_embeddings(x)) print(df.head(2))

Run cell 7

Write the data to MS Fabric Eventhouse

df_sp = spark.createDataFrame(df) df_sp.write.\ format("com.microsoft.kusto.spark.synapse.datasource").\ option("kustoCluster",KUSTO_URI).\ option("kustoDatabase",KUSTO_DATABASE).\ option("kustoTable", KUSTO_TABLE).\ option("accessToken", accessToken ).\ mode("Append").save()

Let’s check the data was saved to our Vector Database

Go to the Eventhouse and run this query

bookEmbeddings | take 10

Go back to the notebook and run the rest of the cells

Creates a function to call GPT4 for a NL answer

def call_openAI(text): response = client.chat.completions.create( model=OPENAI_GPT4_DEPLOYMENT_NAME, messages = text, temperature=0 ) return response.choices[0].message.content

Creates a function to retrieve answers using embeddings with similarity search

def get_answer_from_eventhouse(question, nr_of_answers=1): searchedEmbedding = generate_embeddings(question) kusto_query = KUSTO_TABLE + " | extend similarity = series_cosine_similarity(dynamic("+str(searchedEmbedding)+"), embedding) | top " + str(nr_of_answers) + " by similarity desc " kustoDf = spark.read\ .format("com.microsoft.kusto.spark.synapse.datasource")\ .option("kustoCluster",KUSTO_URI)\ .option("kustoDatabase",KUSTO_DATABASE)\ .option("accessToken", accessToken)\ .option("kustoQuery", kusto_query).load() return kustoDf

Retrieves 2 answers from Eventhouse

nr_of_answers = 2 question = "Why does the coffin prepared for Queequeg become Ishmael's life buoy once the Pequod sinks?" answers_df = get_answer_from_eventhouse(question, nr_of_answers)

Concatenates the answers

answer = "" for row in answers_df.rdd.toLocalIterator(): answer = answer + " " + row['content']

Creates a prompt for GPT4 with the question and the 2 answers

prompt = 'Question: {}'.format(question) + '\n' + 'Information: {}'.format(answer) # prepare prompt messages = [{"role": "system", "content": "You are a HELPFUL assistant answering users questions. Answer the question using the provided information and do not add anything else."}, {"role": "user", "content": prompt}] result = call_openAI(messages) display(result)

That’s it, you have built your very first RAG app using MS Fabric

All the code can be found here.

Thanks

Denise