This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

We’ve seen huge interest from organizations that want to use Azure OpenAI service to access Large Language Models (LLMs) in combination with their own data. Allowing these applications to access your organization’s knowledge base allows inclusion of data relevant to the conversation, creating a richer and more useful experience. However, this introduces new problems if the Generative AI application isn't aware of any access control requirements. We’ve recently updated the Cognitive Search OpenAI Demo to allow user login and access control, which enables the Generative AI application to tailor responses on a per-user basis.

How do you combine your own data with Generative AI?

Normally, you can just ask ChatGPT general questions and get good responses. However, this approach breaks down when you ask ChatGPT specific questions about your organization. When I ask, “What is included in my Northwind Health Plus plan that is not in standard”, I get a much less useful “I’m sorry, but I don’t have access to your health plans” response. If only there was some way to combine data from your organization with ChatGPT! Fortunately, this is possible using a Retrieval Augmented Generation (RAG) approach.

Using Cognitive Search

Cognitive Search is an AI-powered information retrieval platform that allows you to combine LLMs with your organization’s data using the RAG approach. Documents from your organization’s knowledge base are chunked and embedded using an Azure Open AI embedding model. The embeddings are then indexed in Cognitive Search alongside the document text.

Cognitive Search combines multiple search methods to improve your results. Keyword search over the document text allows matching specific terms in your documents. Vector search goes a step further and finds the sections of documents that are semantically similar to your search query. The search results from both steps are combined using a hybrid approach called Reciprocal Rank Fusion (RRF). Finally, semantic ranking leverages the power of machine learning models from Microsoft Bing to further improve these hybrid search results.

The need for security

Not all documents in your knowledge base are meant for public consumption outside of your organization. Sales reports, competitive research, or other sensitive documentation might not even be visible to all members of your organization. If you create a simple application that allows you to chat with all the data in your knowledge base, you might inadvertently be exposing your documents to the wrong audience.

In general, any access control solution requires two components:

A system to manage user accounts and their permissions.
Document storage enabling the association of permissions with files or folders.

Understanding Identity and Access Management

Before we dive into the specifics of our solution, it’s important to have a general understanding of how identity and access management works in Azure. One of the main concerns when deploying a cloud-based application is ensuring your users can securely access it. Microsoft Entra ID (formerly Azure Active Directory) is Azure’s identity and access management solution, facilitating secure access to external and internal resources for your organization. Here’s a brief overview of Microsoft Entra ID terminology:

Tenant – Represents your entire organization. All Azure subscriptions have a trust relationship with a tenant. The tenant performs identity and access management related operations for the subscription. Subscriptions can only trust a single tenant, but a single tenant may be trusted by many subscriptions.

User – Represents an individual who has a relationship with your organization. This individual might be an employee, or a guest from a different organization that’s working with you. The account information for this user might be stored in your tenant, or might come from an external social identity provider such as Facebook. Tenants can have many users.
Authentication – Confirming users are who they say they are. Your organization may have different policy requirements for authentication, such as requiring multi-factor authentication or only allowing authentication from an organization-managed device.
Authorization – Once a user has been authenticated, granting them permission to do something. For example, admin users might have the ability to add new users to the tenant, but normal users do not have this permission. The terms authentication and authorization are often used interchangeably, but they are actually distinct concepts.

Authentication and authorization in Microsoft Entra ID are implemented using the OAuth 2.0 and Open ID Connect protocols.

Application – Represents an application used by your organization. Registering your application with your tenant allows you to delegate authentication and authorization to Microsoft Entra ID. Tenants can have many applications, but an application can only be registered in a single tenant.
Permissions and Consent – Applications need access to resources. For example, a calendar application needs access to a user’s calendar to schedule a meeting. Resources like a user’s calendar are protected by Microsoft Entra ID through a permission-based system. Applications access these protected resources using two general models:
1. Delegated Access – A user signs into the application and provides consent to the application so it can access protected resources on their behalf. If the user does not provide consent, the application is not allowed to access the protected resources.
2. App-Only Access – The application accesses a protected resource without any signed-in user. This model is typically used for background services that don’t require user interaction. A tenant administrator must provide consent for these applications to access protected resources.

Security Token – When a user authenticates and signs into an application, they are given a security token. This security token represents a user’s identity and their consent to give the application access to protected resources. The token contains information about the user in the form of key-value pairs called claims. Tokens are cryptographically secure and are validated by Microsoft Entra ID when performing authentication and authorization related tasks. Microsoft Entra ID supports a variety of authentication flows for user sign-in.

Now that we understand the basics of identity and access management, let’s see how we can enhance the security of our Generative AI application.

Integrating Cognitive Search with Microsoft Entra ID

To explain how integration works, we'll be using this demo repository, which anyone can deploy as long as they have the follow prerequisites:

An Entra ID Tenant
An Azure account with any of the following roles allowing application management:
1. Application administrator
2. Application developer
3. Cloud application administrator

Original App Architecture

Without any identity or access management, the demo uses the following architecture:

When the demo application is deployed, sample data is indexed into Cognitive Search.
The user interacts with a single-page application. They can either chat with the sample data or ask a question about it.
The single-page application makes API calls to a backend API server. The API server implements the RAG pattern by querying Cognitive Search to find relevant sample data and combining the resulting context and query using Azure OpenAI.

Adding Access Control

When we add identity and access management the demo architecture changes:

The sample data is augmented with sample groups for your tenant.
1. Sample scripts are provided to create these groups and upload the sample data with access control to a Data Lake Storage Gen2 account. Data Lake Storage Gen2 supports integration with Microsoft Entra ID for access control on individual files and folders.
Sample scripts are provided to update the index structure in Cognitive Search to support identity and access management. Additional string collection fields are added to store user and group identifiers alongside document content.
1. If these access control fields in Cognitive Search are populated correctly, any data source that can store Microsoft Entra ID user or group identifiers can be used with the demo application.
The single-page application is registered and integrated with Microsoft Entra ID to support user authentication. A login and logout button are added to the application interface. Additional options are added to the developer options pane:

“Use oid security filter”. This option toggles whether the user’s identifier in tenant is used for security filtering in Cognitive Search.
“Use groups security filter”. This option toggles whether the groups the user is a member of in the tenant are used for security filtering in Cognitive Search.
“ID Token Claims”. This table shows all claims in the ID token received after login to the single-page application.

The app server is registered and integrated with Microsoft Entra ID to support user authorization.
The RAG pattern is integrated with Microsoft Entra ID in the API server. Documents retrieved from Cognitive Search are filtered using the logged-in user’s identity or group memberships. When security filtering is enabled, users can only chat with data they have access to in the sample knowledge base.

Let’s walk through exactly how the demo integrates with Microsoft Entra ID.

Understanding the integration with Microsoft Entra ID

The steps to set up the demo are documented in the repository. Here’s how the setup steps are used to integrate with Microsoft Entra ID at a high level:

A new Entra app registration is created in your tenant, representing the backend API server.
1. The API server requires secret credentials to authenticate itself to the tenant.
2. The API server uses the on-behalf-of flow to get a token from the tenant, but it still needs permission to do this. The API server application exposes a new permission called “access_as_user” so the single-page application can integrate with it.
3. The logged-in user’s groups are added to the token using an optional group claim. However, the claim is limited to 200 groups to prevent the token from becoming too large. To work around this limit, the API server requires the delegated Microsoft.Read permission to read the user’s group membership from Microsoft Graph.
Another Entra app registration is created in your tenant, representing the single-page frontend application.
1. The single-page application does not need any credentials to authenticate itself to the tenant.
2. The single-page application uses the authentication code flow to get a token. This requires registering a redirect URI so the tenant can communicate the token back to the single-page application after a successful login.
3. The single-page application must request the “access_as_user” permission from the API server application for the on-behalf-of flow to work correctly.
4. Finally, the single-page application must be registered as a “known client application” of the API server application. This ties consent to the single-page application to the API server application, avoiding the need for a second consent dialog.

The following diagram illustrates how the single-page application interacts with the API server and integrates with Microsoft Entra ID:

Use authorization information to perform filtering in Cognitive Search.
1. The user’s identifier is extracted from the token claims and combined with the user’s group memberships.
2. A filter containing the authorization information is added to the query sent by the API server to Cognitive Search.
  1. This filter uses a specific search.in syntax that can support hundreds or even thousands of group or user identifiers in a single query.
  2. For example, to filter over the groups field using a single group identifier, the filter would be groups/any(g: search.in(g, 'x'))
  3. Filters can be combined using either an and or an or operator. For example, to match documents where either the user id or the group id is present in the access control fields, the filter would be groups/any(g: search.in(g, 'x')) or users/any(g: search.in(g, 'y'))

The following diagram demonstrates how the API server uses filters to retrieve documents from Cognitive Search that match the permissions of the logged-in user:

Next Steps

Combining Generative AI and access control can unlock a myriad of new use cases that enhance security, compliance, and productivity. We invite you to explore this cutting-edge technology by deploying our sample application.

Access Control in Generative AI applications with Azure Cognitive Search