Armchair Architects: Large Language Models (LLMs) & Vector Databases

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

David Blank-Edelman and our armchair architects Uli Homann and Eric Charran will be focusing on large language models (LLMs) and vector databases and their role in fueling AI, ML, and LLMs.

What are vector databases?

Eric defines vector databases as a way in which we store meaningful information about multi-dimensional aspects of data such as what's called vectors, which are numerical, typically numerical integers which work very much like a traditional relational database system.

What's interesting about the vector databases is that they help us solve different types of queries. One type of query is like a “nearest neighbor”. For example, if Spotify knows that Eric Charran loves Def Leppard and he loves this one song, what are some of the nearest other songs that are very, very similar based on a number of dimensions that Spotify might have so that it can recommend this other song. The way that it works is that it's just using numerical distance between the vectors to figure that answer to that question out.

Uli added that vector databases in the context of AI are effectively using text and they're converting text into these numerical representation. If you go into the PostgreSQL community for example, the PostgreSQL teams have already added a plug-in into PostgreSQL where you can take any text field, turn it into a vector and then you can take that vector and embed it into an LLM.

Vectors have been around for a very long time as it is part of the of the neural network model which at the end of the day days, those are vectors as well. This is now data specific because it's not just databases, while databases will be prevalent, you will see search systems also expose their search index as vectors.

Azure Cognitive Search for example does that and Uli theorizes that other search systems do that as well. You can take that index and make it part, for example, of an OpenAI system or Bard or whatever AI system you like.

Vector databases is one way of implementing an AI system, the other method is embedding.

Vector Databases and Natural Language Processing (NLP)

Let’s look into how vector databases are used for in the real world and NLP, where embedding is used. For example taking word embeddings, sentence embeddings, making them specifically integer base so LLMs can actually include them in the corpus of information that's used to train it. That is one vector database use case another use case is the “nearest neighbor” example earlier.

If you recall, for the nearest neighbor use case, if I have this particular item, object input, what are the nearest things closest or the farthest things away from it. This can include image and video retrieval, taking unstructured data but vectorizing it so that you can find it, surface it, and do all of those comparison things that are important. This can also include anomaly detections and geospatial data and then machine learning.

What does embedding mean in the context of LLMs?

LLMs get trained primarily on the Internet, so if you're looking at Bard or you're looking at OpenAI, they get a copy of the Internet as the corpus of knowledge and conceptually how it works is it is vectorized and put it into the LLM.

Now that's a great set of knowledge, and if you use ChatGPT or Bing Chat or something similar, you will effectively access that Internet. This is great, but most of these LLMs are static, for example, the OpenAI models got compiled sometime in 2021. If you ask the model without any helper about an event in 2022, it won't know because it got compiled, conceptually speaking, with knowledge that didn’t include 2022 events.

So now what happens is you bring in models from the Internet, for example, that effectively allow these LLMs to understand “oh, there is something beyond what I already know” and bring it in. This scenario would apply for example in an internet search.

If you're an enterprise, you care about the global knowledge but you want your enterprise’s specific knowledge also to be part of this search. So that if somebody like Eric is looking for specific things in his new company, the company's knowledge is available for him as well. That's what's called data grounding, you ground the model with the data that your enterprise has and expand the knowledge, and embedding is one technique of doing that.

Embedding simply says, take this vector of knowledge and fold it into your larger model so that every time you run a query, this embedding will be part of the query that the system evaluates before it responds to you. The way that Eric thinks about it is the vector database stores the integer-based representation of a concept found within a corpus of information of a web page on the Internet and what it allows you to do is to link near concepts together.

That's how the LLMs, if it's trained on these vectors and these embeddings, really understands concepts. It's the vectorization of semantic concepts and then the distance equation between them allows the model to stitch these things together and respond accordingly.

Using One or Multi-shot training of an LLM

LLM teams now have the technology and tools to make it easy to bring vector stores and vector databases into your models through embeddings. A tip is to make sure you use the right tooling to help you, however before you do that you can use prompt engineering to also feed example data of what you're looking for into the model. Part of the prompt engineering is what's called one or multi-shot training. As part of the prompt you're saying, “I'm expecting this kind of output.”

Then the system takes that into account and says “ah, that's what you're looking for” and responds in kind. You can feed it quite a lot of sample data, this is what I want you to look at and that's obviously far cheaper than doing embeddings and other things because it's part of the prompt and therefore should be always considered first before you go into embeddings.

Corporation will end up using embeddings, but you should start with one shot and multi-shot training because there is a lot of knowledge in those models that can be coaxed out if you give it the right prompt.

LLM Fine Tuning
Fine tuning is the way in which you take a model and you try to develop a fit for purpose data set. Whether it's a pre-compiled model that you've downloaded or used or have already trained, you engage in additional training loops in order for it to train on a fit for purpose data set so that you can basically tune it to respond in the ways in which you intend for it to respond.

The fine tuning element is to adjust the model's parameters based on iterative training loops around purpose driven data sets. The key part is where you bring specific data sets, and you take a layer of the model and train it. You are adding more constraints to the general-purpose model and it's using your data to affect the training itself for the specific domain you are in for example healthcare or industrial automation domain. Fine tuning also helps extremely with hallucinations because you tell the system this is what you need to pay attention to and it will effectively adapt and be more precise.

Limitations of LLMs and One shot or Multi-shot Training

There are two things that large language modes are really bad at. One is math, so don't ask it to do calculus for you, that's not going to work. As of today, the second one is, you cannot point a LLM to a structured database and have the system just automatically understand this is the schema, this is the data and ask it to produce good responses. There is still much more work involved.

The state-of-the-art right now is that you effectively build shims or work arounds it; you write code that you can then integrate into the prompt. For example, OpenAI has a way for you to call a function inside your prompt, and that function can, for example, bring back relational or structured data.

From the one shot or multi-shot training, you can take the result set, which cannot be too large, and feed it into the prompt. There is also pipeline based programming, which is explained in this scenario.

You are an insurance company.
David is a customer of the insurance company and would like to understand his claim status.
You go into your the website and you type in the chat and the first thing that I need to know is who is David? In my CRM system you will know who is David, what insurances he has, what claims are open, etc.…That is all structured information out of the CRM system and maybe a claims management system.
- The first phase is to parse the language, the text that David entered using GPT for example, and pull out the information that's relevant and then feed it through structured API calls.
- You get the result sets back.
You then create the prompt that you really want to use for the response using prompt engineering with the one and multi-shot training.
Then the system generates the response that you're looking for in terms of “Hi, David, great to see you again here's the status of the claim.”
You then had just used the OpenAI model in this case twice, not just once.

In summary, you use it first to understand the language and extract what you need to call the structured APIs and then you feed the response from the structured system back into the prompt so that it generates the response you're looking for.

Increased Adoption of Vector Indexes

Eric brought up as an architect how do I figure out whether or not I need a dedicated vector database or can I use these things called vector search indexes? Another question may be how do I build a system so that I can help LLM developers be better at their job or more efficient at their job?

Eric thinks that we're reaching a transition point in which vector databases used to be a Relational Database Management System (RDBMS) for vectors and answering queries associated with vectors.

He is seeing a lot of the lake house platforms and traditional database management systems adopt vector indexes so that developers don’t have to pick the data up and move it to another specific place, vectorize it and store it. Now there are these components of relational database management systems or lake houses that create vectors on top of where the data lives, so on top of delta tables for example. That's one architectural consideration and it should make architects happy because the heaviest thing in the world to move is data and architects hate doing it.

Architectural Considerations around Vector Databases
The other architectural consideration is how do you actually arrive at the vectorization? Is it an ETL scheme on write operation? Is there logic associated with the vectorization themselves? All of that information is important to consider when you're trying to create a platform for your organization.

If you're creating your own foundational models, vectorization and the process by which data becomes vectorized if or embedded becomes very important.

You also have to worry about whether or not you're allowed to store specific information based on your industry such as in financial services, life sciences, health, all those different things you may need to scan and tokenize your data as well before it goes into the vectorized process.

Another consideration for architects to think about is that although vectorization is a key technique, but we have now seen real data, in this case Microsoft, that vectorization is not necessary alone the answer to the question. Microsoft has seen that a search index plus vectorization is actually faster and more reliable from a response perspective for an open AI system than just the vector or just the query of the index.

When developing a solution, you should be much more flexible in this case where you say, “how am I going to go and get this data?” Sometimes it's a combination of techniques, not just one technique that will work or be most efficient.

Architecture for Uli is an understanding of the tools that you have and really picking on what it is and ideally not looking for black or white answers as the world is mostly gray and picking the right tools together makes the right answer rather than a singular tool or technique.

Resources

Related episodes

Recommended Next Steps

If you’d like to learn more about the general principles prescribed by Microsoft, we recommend Microsoft Cloud Adoption Framework for platform and environment-level guidance and Azure Well-Architected Framework. You can also register for an upcoming workshop led by Azure partners on cloud migration and adoption topics and incorporate click-through labs to ensure effective, pragmatic training.

You can view the whole videos below and check our more videos from the Azure Enablement Show.

Leave a Reply Cancel reply