This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.
Teach ChatGPT to Answer
Teach ChatGPT to Answer Questions Based on PDF content: Using Azure Cognitive Search and Azure OpenAI (Semantic Kernel.ver)
Semantic Kernel vs. Lang Chain
Readers have the opportunity to explore two different approaches - using either Semantic Kernel or Lang Chain.
For those interested, here's the link to the Lang Chain version of this tutorial: Teach ChatGPT to Answer Questions: Using Azure Cognitive Search & Azure OpenAI (Lang Chain.ver)
Can't I just copy and paste text from a PDF file to teach ChatGPT?
This tutorial is related to the following topics
- Semantic Kernel
Learning objectives
Prerequisites
Microsoft Cloud Technologies used in this Tutorial
Table of Contents
Series 2: Implement a ChatGPT Service with Azure OpenAI
Series 1: Extract Key Phrases for Search Queries Using Azure Cognitive Search
1. Create a Blob Container
2. Store PDF Documents in Azure Blob Storage
3. Create a Cognitive Search Service
NOTE:
NOTE:
4. Connect to Data from Azure Blob Storage
5. Add Cognitive Skills
How to keep sensitive data private?
To ensure the privacy of sensitive data, Azure Cognitive Search provides a Personally Identifiable Information (PII) detection skill. This cognitive skill is specifically designed to identify and protect PII in your data. To learn more about Azure Cognitive Search's PII detection skill, read the following article.
Personally Identifiable Information (PII) Detection cognitive skill
- To enable this feature, select Extract Personally identifiable information.
6. Customize Target Index and Create an Indexer
You can change the fields to suit your data. I have attached a document with a description of each field in the index. (Depending on your settings for the index fields, the code you implement may differ from the tutorial.)
7. Extract Key Phrases for Search Queries Using Azure Cognitive Search
Series 2: Implement a ChatGPT Service with Azure OpenAI
In this series, we will implement the feature to answer questions based on PDFs using Azure Cognitive Search and Azure OpenAI. In Series 2, we'll implement this feature in code.
Intent of the Code Design
The primary goal of the code design in this tutorial is to construct the code in a way that is easy to understand, especially for first-time viewers. Each segment of the code is encapsulated as a separate function. This modularization ensures that the main function acts as a clear, coherent guide through the logic of the system.
Ex. Part of the main function.
async def main():
…
kernel = await create_kernel(sk)
await create_embeddings(kernel)
await create_vector_store(kernel)
await store_documents(kernel, file_content)
…
Overview of the code
Part 1: Retrieving and Scoring Documents
We'll use Azure Cognitive Search to retrieve documents related to our question, score them for relevance to our question, and extract documents with a certain score or higher.
Part 2: Document Embedding and Vector Database Storage
We'll embed the documents we extracted in part 1. We will then store these embedded documents in a vector database, organizing them into pages (chunks of 5,000 characters each).
Part 3: Extracting Relevant Content and Implementing a Function to Answer Questions
We will extract the most relevant page from the vector database based on the question.
Then we will implement a function to generate answers from the extracted content.
1. Change your indexer settings to use Azure OpenAI
"outputFieldMappings": [
{
"sourceFieldName": "/document/content/pages/*/keyphrases/*",
"targetFieldName": "keyphrases"
},
{
"sourceFieldName": "/document/content/pages/*",
"targetFieldName": "pages"
}
]
2. Create an Azure OpenAI
3. Set up the project and install the libraries
mkdir azure-proj
cd azure-proj
mkdir gpt-proj
cd gpt-proj1
Python -m venv .venv
.venv\Scripts\activate.bat
(.venv) C:\Users\sms79\azure-proj\gpt-proj1>pip install OpenAI
(.venv) C:\Users\sms79\azure-proj\gpt-proj1>pip install semantic-kernel
4. Set up the project in VS Code
# Library imports
from collections import OrderedDict
import requests
# Semantic kernal library imports
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureTextEmbedding
# Azure Search Service settings
SEARCH_SERVICE_NAME = 'your-search-service-name' # 'test-search-service1'
SEARCH_SERVICE_ENDPOINT = f'https://{SEARCH_SERVICE_NAME.lower()}.search.windows.net/'
SEARCH_SERVICE_KEY = 'your-search-service-key'
SEARCH_SERVICE_API_VERSION = 'your-API-version' # '2023-07-01-preview'
# Azure Search Service Index settings
SEARCH_SERVICE_INDEX_NAME1 = 'your-search-service-index-name' # 'azureblob-index1'
# Azure Cognitive Search Service Semantic configuration settings
SEARCH_SERVICE_SEMANTIC_CONFIG_NAME = 'your-semantic-configuration-name' # 'test-configuration'
# Azure OpenAI settings
AZURE_OPENAI_NAME = 'your-openai-name' # 'testopenai1004'
AZURE_OPENAI_ENDPOINT = f'https://{AZURE_OPENAI_NAME.lower()}.openai.azure.com/'
AZURE_OPENAI_KEY = 'your-openai-key'
AZURE_OPENAI_API_VERSION = 'your-API-version' # '2023-08-01-preview'
5. Search with Azure Cognitive Search
# Configuration imports
from config import (
SEARCH_SERVICE_ENDPOINT,
SEARCH_SERVICE_KEY,
SEARCH_SERVICE_API_VERSION,
SEARCH_SERVICE_INDEX_NAME1,
SEARCH_SERVICE_SEMANTIC_CONFIG_NAME,
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY,
AZURE_OPENAI_API_VERSION,
)
# Cognitive Search Service header settings
HEADERS = {
'Content-Type': 'application/json',
'api-key': SEARCH_SERVICE_KEY
}
async def search_documents(question):
"""Search documents using Azure Cognitive Search"""
# Construct the Azure Cognitive Search service access URL
url = (SEARCH_SERVICE_ENDPOINT + 'indexes/' +
SEARCH_SERVICE_INDEX_NAME1 + '/docs')
# Create a parameter dictionary
params = {
'api-version': SEARCH_SERVICE_API_VERSION,
'search': question,
'select': '*',
'$top': 3,
'queryLanguage': 'en-us',
'queryType': 'semantic',
'semanticConfiguration': SEARCH_SERVICE_SEMANTIC_CONFIG_NAME,
'$count': 'true',
'speller': 'lexicon',
'answers': 'extractive|count-3',
'captions': 'extractive|highlight-false'
}
# Make a GET request to the Azure Cognitive Search service and store the response in a variable
resp = requests.get(url, headers=HEADERS, params=params)
# Return the JSON response containing the search results
return resp.json()
async def filter_documents(search_results):
"""Filter documents that score above a certain threshold in semantic search"""
file_content = OrderedDict()
for result in search_results['value']:
# The '@search.rerankerScore' range is 1 to 4.00, where a higher score indicates a stronger semantic match.
if result['@search.rerankerScore'] > 1.5:
file_content[result['metadata_storage_path']] = {
'chunks': result['pages'][:10],
'captions': result['@search.captions'][:10],
'score': result['@search.rerankerScore'],
'file_name': result['metadata_storage_name']
}
return file_content
def main():
QUESTION = 'Tell me about effective prompting strategies'
# Search for documents with Azure Cognitive Search
search_results = search_documents(QUESTION)
file_content = filter_documents(search_results)
print('Total Documents Found: {}, Top Documents: {}'.format(
search_results['@odata.count'], len(search_results['value'])))
# 'chunks' is the value that corresponds to the Pages field that you set up in the Cognitive Search service.
# Find the number of chunks
docs = []
for key,value in file_content.items():
for page in value['chunks']:
docs.append(Document(page_content = page,
metadata={"source": value["file_name"]}))
print("Number of chunks: ", len(docs))
# execute the main function
if __name__ == "__main__":
main()
6. Get answers from PDF content using Azure OpenAI and Cognitive Search
Now that Azure Cognitive Search is working well in VS Code, it's time to start using
Azure OpenAI.
In this chapter, we'll create functions related to Azure OpenAI and ultimately create
and run a program in `example.py` that answers a question with Azure OpenAI based on
the search information from Azure Cognitive Search.
1. We will create functions related to Azure OpenAI and Semantic Kernel and run them from
the main function.
- Add the following functions above the main function.
async def create_kernel(sk):
"""Create a semantic kernel"""
return sk.Kernel()
async def create_embeddings(kernel):
"""Create an embedding model"""
return kernel.add_text_embedding_generation_service(
"text-embedding-ada-002", # This parameter is related to the prompt templates, but is not covered in this tutorial. You can call it whatever you want.
AzureTextEmbedding(
"text-embedding-ada-002",
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY
))
async def create_vector_store(kernel):
"""Create a vector store"""
kernel.register_memory_store(memory_store=sk.memory.VolatileMemoryStore())
kernel.import_skill(sk.core_skills.TextMemorySkill())
async def store_documents(kernel, file_content):
"""Store documents in the vector store"""
for key, value in file_content.items():
page_number = 1
for page in value['chunks']:
page_id = f"{value['file_name']}_{page_number}"
await kernel.memory.save_information_async(
collection='TeachGPTtoPDF',
id=page_id,
text=page
)
page_number += 1
async def search_with_vector_store(kernel, question):
"""Search for documents related to your question from the vector store"""
related_page = await kernel.memory.search_async('TeachGPTtoPDF', question)
return related_page
async def add_chat_service(kernel):
"""Add a chat service"""
return kernel.add_chat_service(
'gpt-35-turbo', # This parameter is related to the prompt templates, but is not covered in this tutorial. You can call it whatever you want.
AzureChatCompletion(
'gpt-35-turbo', # Azure OpenAI Deployment name
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY
)
)
async def answer_with_sk(kernel, question, related_page):
"""Answer question with related_page using the semantic kernel"""
prompt = """
Provide a detailed answer to the <question> using the information from the <related_page>.
<question>
{{$question}}
</question>
<related_page>
{{$related_page}}
</related_page>
Answer:
"""
chat_function = kernel.create_semantic_function(prompt, max_tokens=500, temperature=0.0, top_p=0.5)
context = kernel.create_new_context()
context['question'] = question
context['related_materials'] = related_page[0].text
return await chat_function.invoke_async(context=context)
2. Add the code below to your main function.
async def main():
QUESTION = 'Tell me about effective prompting strategies'
# Search for documents with Azure Cognitive Search
...
# Answer your question using the semantic kernel
kernel = await create_kernel(sk)
await create_embeddings(kernel)
await create_vector_store(kernel)
await store_documents(kernel, file_content)
related_page = await search_with_vector_store(kernel, QUESTION)
await add_chat_service(kernel)
answer = await answer_with_sk(kernel, QUESTION, related_page)
print('Question: ', QUESTION)
print('Answer: ', answer)
print('Reference: ', related_page[0].id)
# execute the main function
if __name__ == "__main__":
import asyncio
asyncio.run(main())
3. Now let's run it and see if it answers your question.
- The result of executing the code.
```
Total Documents Found: 5, Top Documents: 3
Question: Tell me about effective prompting strategies
Answer: Effective prompting strategies are techniques used to encourage individuals to engage in desired behaviors or complete tasks. These strategies can be particularly useful for individuals with disabilities or those who struggle with executive functioning skills. Some effective prompting strategies include:
- Visual prompts: These can include pictures, diagrams, or written instructions that provide a visual cue for the individual to follow.
- Verbal prompts: These can include verbal reminders or instructions given by a caregiver or teacher.
- Gestural prompts: These can include physical cues, such as pointing or gesturing, to guide the individual towards the desired behavior or task.
- Modeling: This involves demonstrating the desired behavior or task for the individual to imitate.
- Graduated guidance: This involves providing physical assistance to the individual as they complete the task, gradually reducing the amount of assistance as they become more independent.
- Time-based prompts: These can include setting a timer or providing a schedule to help the individual stay on task and complete the task within a designated time frame.
Overall, effective prompting strategies should be tailored to the individual's needs and abilities, and should be used consistently to help them develop independence and achieve success.
Reference: Prompting GPT-3 To Be Reliable.pdf_1
```
NOTE: Full code for example.py and config.py
# Azure Search Service settings
SEARCH_SERVICE_NAME = 'your-search-service-name' # 'test-search-service1'
SEARCH_SERVICE_ENDPOINT = f'https://{SEARCH_SERVICE_NAME.lower()}.search.windows.net/'
SEARCH_SERVICE_KEY = 'your-search-service-key'
SEARCH_SERVICE_API_VERSION = 'your-API-version' # '2023-07-01-preview'
# Azure Search Service Index settings
SEARCH_SERVICE_INDEX_NAME1 = 'your-search-service-index-name' # 'azureblob-index1'
# Azure Cognitive Search Service Semantic configuration settings
SEARCH_SERVICE_SEMANTIC_CONFIG_NAME = 'your-semantic-configuration-name' # 'test-configuration'
# Azure OpenAI settings
AZURE_OPENAI_NAME = 'your-openai-name' # 'testopenai1004'
AZURE_OPENAI_ENDPOINT = f'https://{AZURE_OPENAI_NAME.lower()}.openai.azure.com/'
AZURE_OPENAI_KEY = 'your-openai-key'
AZURE_OPENAI_API_VERSION = 'your-API-version' # '2023-08-01-preview'
# Library imports
from collections import OrderedDict
import requests
# Semantic_kernal library imports
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, AzureTextEmbedding
# Configuration imports
from config import (
SEARCH_SERVICE_ENDPOINT,
SEARCH_SERVICE_KEY,
SEARCH_SERVICE_API_VERSION,
SEARCH_SERVICE_INDEX_NAME1,
SEARCH_SERVICE_SEMANTIC_CONFIG_NAME,
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY,
AZURE_OPENAI_API_VERSION,
)
# Cognitive Search Service header settings
HEADERS = {
'Content-Type': 'application/json',
'api-key': SEARCH_SERVICE_KEY
}
async def search_documents(question):
"""Search documents using Azure Cognitive Search"""
# Construct the Azure Cognitive Search service access URL
url = (SEARCH_SERVICE_ENDPOINT + 'indexes/' +
SEARCH_SERVICE_INDEX_NAME1 + '/docs')
# Create a parameter dictionary
params = {
'api-version': SEARCH_SERVICE_API_VERSION,
'search': question,
'select': '*',
'$top': 3,
'queryLanguage': 'en-us',
'queryType': 'semantic',
'semanticConfiguration': SEARCH_SERVICE_SEMANTIC_CONFIG_NAME,
'$count': 'true',
'speller': 'lexicon',
'answers': 'extractive|count-3',
'captions': 'extractive|highlight-false'
}
# Make a GET request to the Azure Cognitive Search service and store the response in a variable
resp = requests.get(url, headers=HEADERS, params=params)
# Return the JSON response containing the search results
return resp.json()
async def filter_documents(search_results):
"""Filter documents that score above a certain threshold in semantic search"""
file_content = OrderedDict()
for result in search_results['value']:
# The '@search.rerankerScore' range is 1 to 4.00, where a higher score indicates a stronger semantic match.
if result['@search.rerankerScore'] > 1.5:
file_content[result['metadata_storage_path']] = {
'chunks': result['pages'][:10],
'captions': result['@search.captions'][:10],
'score': result['@search.rerankerScore'],
'file_name': result['metadata_storage_name']
}
return file_content
async def create_kernel(sk):
"""Create a semantic kernel"""
return sk.Kernel()
async def create_embeddings(kernel):
"""Create an embedding model"""
return kernel.add_text_embedding_generation_service(
"text-embedding-ada-002", # This parameter is related to the prompt templates, but is not covered in this tutorial. You can call it whatever you want.
AzureTextEmbedding(
"text-embedding-ada-002",
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY
))
async def create_vector_store(kernel):
"""Create a vector store"""
kernel.register_memory_store(memory_store=sk.memory.VolatileMemoryStore())
kernel.import_skill(sk.core_skills.TextMemorySkill())
async def store_documents(kernel, file_content):
"""Store documents in the vector store"""
for key, value in file_content.items():
page_number = 1
for page in value['chunks']:
page_id = f"{value['file_name']}_{page_number}"
await kernel.memory.save_information_async(
collection='TeachGPTtoPDF',
id=page_id,
text=page
)
page_number += 1
async def search_with_vector_store(kernel, question):
"""Search for documents related to your question from the vector store"""
related_page = await kernel.memory.search_async('TeachGPTtoPDF', question)
return related_page
async def add_chat_service(kernel):
"""Add a chat service"""
return kernel.add_chat_service(
'gpt-35-turbo', # This parameter is related to the prompt templates, but is not covered in this tutorial. You can call it whatever you want.
AzureChatCompletion(
'gpt-35-turbo', # Azure OpenAI Deployment name
AZURE_OPENAI_ENDPOINT,
AZURE_OPENAI_KEY
)
)
async def answer_with_sk(kernel, question, related_page):
"""Answer question with related_page using the semantic kernel"""
prompt = """
Provide a detailed answer to the <question> using the information from the <related_page>.
<question>
{{$question}}
</question>
<related_page>
{{$related_page}}
</related_page>
Answer:
"""
chat_function = kernel.create_semantic_function(prompt, max_tokens=500, temperature=0.0, top_p=0.5)
context = kernel.create_new_context()
context['question'] = question
context['related_materials'] = related_page[0].text
return await chat_function.invoke_async(context=context)
async def main():
QUESTION = 'Tell me about effective prompting strategies'
# Search for documents with Azure Cognitive Search
search_results = await search_documents(QUESTION)
file_content = await filter_documents(search_results)
print('Total Documents Found: {}, Top Documents: {}'.format(
search_results['@odata.count'], len(search_results['value'])))
# Answer your question using the semantic kernel
kernel = await create_kernel(sk)
await create_embeddings(kernel)
await create_vector_store(kernel)
await store_documents(kernel, file_content)
related_page = await search_with_vector_store(kernel, QUESTION)
await add_chat_service(kernel)
answer = await answer_with_sk(kernel, QUESTION, related_page)
print('Question: ', QUESTION)
print('Answer: ', answer)
print('Reference: ', related_page[0].id)
# execute the main function
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Congratulations!
In this tutorial, we have navigated through a practical journey of integrating Azure Blob Storage, Azure Cognitive Search, and Azure OpenAI to create a powerful search and response mechanism.