Large Language Models (LLMs) continue to amaze us with their capabilities. However, the utilization of LLMs in production AI applications requires the integration of private data. Join us as we have a captivating conversation with Jerry Liu from LlamaIndex, where he provides valuable insights into the process of data ingestion, indexing, and query specifically tailored for LLM applications. Delving into the topic, we uncover different query patterns and venture beyond the realm of vector databases.
Jerry Liu: Yeah, that’s a good question. And maybe just to kind of frame this with a bit of context - I think it’s useful to think about certain use cases for each index. So the thing about vector index, or being able to use a vector store, is that they’re typically well-suited for applications where you want to ask kind of fact-based questions. And so if you want to ask a question about specific facts in your knowledge corpus, using a vector store tends to be pretty effective.
[26:13] For instance, let’s say your knowledge corpus is about American history, or something, and your question is, “Hey, what happened in the year of 1780?” That type of question tends to lend well to using a vector store, because the way the overall system works is you would take this query, you would generate an embedding for the query, you would first do retrieval from the specter store in order to fetch back the most relevant chunks to the query, and then you would put this into the input prompt of the language model.
So the set of retrieved items that you would get would be those that are most semantically similar to your query through embedding distance. So again, going back to embeddings - the closer different embeddings are between your query and your context, the more relevant that context is, and the farther apart it is, then the less relevant. So you get back the most relevant context or query, feed it to a language model, get back an answer.
There are other settings where standard Top-K embedding base lookup - and I can dive into this in as much technical depth that you guys would want to, but there’s a setting that’s really standard, kind of like Top-K embedding-based retrieval doesn’t work well. And one example where it doesn’t typically work well - and this is a very basic example - is if you just want to get a summary of an entire document or an entire set of documents. Let’s say instead of asking a question about a specific fact, like “What happened and 1776?” maybe you just want to ask the language model “Can you just give me an entire summary of American history in the 1800s?” That type of question tends to not lend well to embedding-based lookup, because you typically fix a Top-K value when you do embedding-based lookup, and you would get back very specific context. But sometimes you really want the language model to go through all the different contexts within your data.
So a vector index, storing it with embeddings would create a query interface where you can only fetch the k most relevant nodes. If you store it, for instance, with like a list index, you could store the items in a way such that it’s just like a flat list. So when you query this list index, you actually get back all the relevant items within this list, and then you’d feed it to our synthesis module to synthesize the final answer. So the way you do retrieval over different indices actually depends on the nature of these indices.
Another very basic example is that we also have a keyword table index, where you can kind of look up specific items by keywords, instead of through embedding-based essence. Keywords, for instance, are typically good for stuff that requires high precision, and a little bit lower recall. So you really want to fetch specific items that match exactly to the keywords. This has the advantage of actually allowing you to retrieve a bit more precise context than something that factor-based embedding lookup doesn’t.
The way I think about this is a lot of what Llama Index wants to provide is this overall query interface over your data. Given any class of queries that you might want to ask, whether it’s like a fact-based question, whether it’s a summary question, or whether it’s some more interesting questions, we want to provide the tool sets so that you can answer those questions. And indices, defining the right structure of your data is just one step of this overall process, and helping us achieve this vision of a very journalizable query interface over your data.
Some examples of different types of queries that we support - there’s the fact-based question lookup, which is semantic search using vector embeddings, that you can ask summarization questions through using our list index. You could actually run a structured query, so if you have a SQL database, you could actually run structured analytics over your database, and do text-to-SQL. You can do compare and contrast type queries, where you can actually look at different documents within your collection, and then look at the differences between them. You could even look at temporal queries, where you can reason about time, and then go forwards and backwards, and basically kind of say “Hey, this event actually happened after this event. Here’s the right answer to this question that you’re asking about.”
And so a lot of what Llama Index does provide is a set of tools, the indices, the data ingesters, the query interface to solve any of these queries that you might want to answer.