Daniel & Chris explore the advantages of vector databases with Roie Schwaber-Cohen of Pinecone. Roie starts with a very lucid explanation of why you need a vector database in your machine learning pipeline, and then goes on to discuss Pinecone’s vector database, designed to facilitate efficient storage, retrieval, and management of vector data.
Featuring
Sponsors
Plumb – Low-code AI pipeline builder that helps you build complex AI pipelines fast. Easily create AI pipelines using their node-based editor. Iterate and deploy faster and more reliably than coding by hand, without sacrificing control.
Notes & Links
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | Welcome to Practical AI | 00:43 |
2 | 00:43 | AI Engineer World's Fair | 01:35 |
3 | 02:18 | Origins of Pinecone | 02:34 |
4 | 04:53 | Vector storing methods | 02:56 |
5 | 07:49 | Advantages of vector search | 03:37 |
6 | 11:25 | Compressing representation | 02:06 |
7 | 13:39 | Sponsor: Plumb | 01:25 |
8 | 15:16 | What is a vector db? | 02:44 |
9 | 18:00 | Key functionality | 04:31 |
10 | 22:32 | Onboarding enterprises | 04:40 |
11 | 27:11 | Internal vs external RAG | 01:11 |
12 | 28:22 | Serverless experience | 03:53 |
13 | 32:15 | User experience | 03:30 |
14 | 35:44 | What is Pinecone enabling? | 02:26 |
15 | 38:11 | It shouldn't be complicated | 01:52 |
16 | 40:03 | A look into the future | 02:49 |
17 | 42:52 | Thanks for joining us! | 00:32 |
18 | 43:24 | Outro | 00:46 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I am the CEO and founder at Prediction Guard, where we’re enabling AI accuracy at scale, and I’m joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris?
Doing great today, Daniel. How’s it going? I know we’re recording leading into a holiday weekend here.
We are. And so many exciting things… Last week I got the chance to briefly attend the AI Engineer World’s Fair, sort of prompted in certain ways by our friends over at the Latent Space podcast… And that was awesome to see. And of course, a big topic there was all things having to do with vector databases, RAG, all sorts of – you know, retrieval search sorts of topics… And to dig into a little bit of that with us today we have Roie Schwaber-Cohen, who is a developer advocate at Pinecone. Welcome.
Hi, guys. Thanks for having me today. Really excited to be on the show.
Yeah, well, I mean, we were talking a little bit before the show… Pinecone is, from my perspective, one of the OGs out there in terms of coming to the vector search, semantic search embeddings type of stuff… Not that that concept wasn’t there before Pinecone, but certainly, when I started hearing about vector search and retrieval and these sorts of things, Pinecone was already a name that people were saying… So could you give us a little bit of background on Pinecone, and kind of how it came about and what it is position-wise in terms of the AI stack?
So Pinecone was started about four years ago, give or take, and our founder, Edo Liberty, was one of the people who were instrumental in founding SageMaker over at Amazon, and had a lot of experience in his work at Yahoo. And I think that one of the fundamental kind of insights that he had was that the future of pulling insights out of data was going to be found not exclusively, but predominantly in our capability to construct vectors out of that data. And that representation that was produced by neural networks was very, very useful, and was going to be useful moving forward.
I think he had that insight way before tools like ChatGPT became popular, and so that really gave Pinecone a great edge at being kind of the first mover in this space. And we’ve seen the repercussions of that ever since. With the rise of LLMs, I think people very quickly came to recognize the limitations that LLMs may have, and it was clear that there needed to be a layer that sort of bridged the gap between the semantic world and the structured world in a way that would allow LLMs to rely on structured data, but also leverage their capabilities as they are. And that is one of the places where vector databases play a very strong role.
Vector databases are distinct from vector indices, in the sense that they are databases, and not indices. So an index basically is limited by the memory capacity that the machine that it’s running on allows it to have… Whereas vector databases behave in the way that traditional databases behave, and in the way that they scale. Of course, there’s a completely different set of challenges, algorithmic challenges that come with the territory of dealing with vectors and high-dimensional vectors, that don’t exist in the world of just simple textual indexing and columnar data… And that’s where the secret sauce of Pinecone lives. Its ability to handle vector data at scale, but maintain the speed and maintainability and resiliency of a database.
As you’re kind of comparing vector databases to indices, and then kind of bringing that compare to that… One of the things that I run across a lot are people – you know, vector databases are really incredibly helpful now, but there’s still a lot of people out there who don’t really understand how they fit in. They don’t really get it, versus the NoSQL, versus relational databases…
Or fine-tuning…
Yeah. And they hear you say it does vectors, and stuff like that.. Could you take a moment – since we have you as an expert in this thing, and kind of lay out the groundwork a little bit before we dive deeper into the conversation about what’s different about a vector database that is storing vectors, versus storing the same vectors in something else? Why go that way for somebody who hasn’t quite really ramped up on that yet?
So the basic premise is you want to use the right tool for the job. And the basic difference between a relational database, a graph database and a vector database - or a document database, for that matter - is the type of content that they are optimized to index… Meaning a relational database is meant to index a specific column and create an index that would be easily traversable. And in scale, it would be able to traverse that across different machines, and do it effectively.
[00:06:16.29] A graph database does the same thing, only its world is nodes and edges. And it’s supposed to be able to build an optimized representation of the graph, such that it could do traversals on that graph efficiently.
In vector databases - vector databases are meant to deal with vectors, which are essentially a long, high-dimensional set of numbers, meaning… You can think of an array with a lot of real numbers inside of that array. And you can think of this collection of vectors as being points in a high-dimensional space, and the vector database is building effective representations to find similarities, or geometric similarities between those vectors in high-dimensional space. And that means that basically it would be very effective at, given a vector, finding a vector that is very “close” to that vector in a very large space.
So to do that, you need to use a very specific set of algorithms that index the data in the first place, and then query that data to retrieve that similar set of vectors to the query vector at a small amount of time, and also being able to update or make modification to that high-dimensional vector space in a way that is not cost prohibitive, or time prohibitive. And that’s the crux of the difference between a vector database and other types of databases.
Just to draw that out a little bit more… So from your perspective, what would be – if you were to kind of explain to someone “Hey, here I’ve got one piece of text, and I’m wanting to match to some close piece of text in this vector space”, what might be advantageous about using this vector-based search approach and these embeddings, in terms of what they mean, and what they represent, versus doing like a… You know, TF-IDF has been around for a long time; I can search based on keywords, I can do a full-text search… There’s lots of ways to search text. That concept isn’t new. But this vector searches seems to be powerful in a certain way. From your perspective, how would you describe that?
Yeah, I think that the linchpin here is the word embedding. The vector search capability itself is a pretty straightforward mathematical operation, that in and of itself doesn’t necessarily have value. It basically – it’s like other mathematical operations. It’s a tool. The question is “Where does the value come from?” And I would argue that the value comes from the embeddings. And we’ll talk about what exactly they are; we’ll just point a flag and say “Embeddings are represented as vectors”, which is why the vector database is so critical in this scenario.
But why are embeddings helpful in the first place? So embeddings come from a different set. A very wide set of neural networks, that have been trained on textual data. And they create within them representations of different terms, different surface forms, sentences, paragraphs etc. that map onto a certain location in vector space. The cool thing about embeddings is that it just so happens - and we can talk about why - it just so happens that terms that have semantic similarity have a closeness in vector space. And that means that if I search for the word “queen”, and I have the word “king” embedded as well in my vector database, and I also have the word “dog”, because the word “king” is more semantically similar to the word “queen”, I will get that as a result, and not the word “dog”.
[00:10:07.05] And that allows me to basically leverage the “understanding” of the world, that machine learning models, and specifically neural networks have, large language models have of the world, in a way that I can’t quite leverage from other modalities like TF-IDF, BM 25 etc. that look at a more lexical kind of perspective on the world.
So when we talk about practical use cases, RAG comes up very, very frequently, and the reason for that is because in semantic space a user interacts with the system in semantic space, so that means that they ask the system a question in natural language; we can take that natural language and basically, again, “understand” the user’s intent, and map it, again, into our high-dimensional vector space, and find content that we’ve embedded, that has some similarity to that intent. So we’re not looking for an exact lexical match, but we’re actually able to take a step back, and look at the more ambiguous intention and meaning of the query itself, and match to it things that are semantically similar, if that makes sense.
Would it be fair to say that you’re essentially – because the output of those structures being embeddings, and those are vectors, and therefore you’re essentially storing it and operating on it in a closer representation to how they naturally would be, and so you’re not doing a bunch of translation just to fit it into a storage medium and to operate on it, therefore it’s gonna be quite a bit faster… Is that fair? Is that a fair way of thinking about it?
Perhaps. In a way we’re compressing the representation into something very small, in a sense. So you can think of an image, for example. An image that could be like a megabyte big. We can get a representation – in terms of its actual size, in terms of the vector, it’s order of magnitudes smaller. And we can use that representation, instead of using the entire image to do our search.
Now, it just so happens that, again, when we’re doing embeddings for images, we get that same quality. We’re not looking at n exact match, or like pixel-matching, pixel to pixel with images that we have. We can actually look at the semantic layer, meaning what is actually in that picture. So if it’s a picture of a cat, we would get, as a result, other pictures of cat we’ve embedded and saved in the database. And that will come out of the representation itself of the embeddings that were a result of say like a clip model that we used to embed our image.
So I don’t know if it necessarily means that it simplifies things. In a lot of ways, it actually adds a lot of more oomph to the representation. So you can actually match on things that you wouldn’t necessarily expect. And that’s kind of like the beauty of semantic search in that sense, is that users can write something and then get back results that don’t even contain anything remotely similar in terms of the surface form to their query, but semantically it would be relevant.
Break: [00:13:33.09]
Well, Roie, I really appreciate also the statement about adding oomph to your representations. I think that would be some – there’s some type of good T-shirt that could be derived out of that. For listeners who are just listening on audio, Roie’s wearing a shirt that says “Love thy nearest neighbor”, which is definitely applicable to today’s conversation.
Well, this is great… So we’ve kind of got a baseline, in a sense, from your perspective, what a vector database is, why it’s useful in terms of what it represents in these embeddings and allows you to search through.
You mentioned RAG, we’ve talked a lot about RAG on the show over time… But maybe for listeners that this is the first episode that they’ve listened to, what would be the kind of 30 seconds, or some type of quick sort of “Remember RAG is x” from Roie?
Right. So I love quoting Andrej Karpathy with his observation on LLMs and hallucinations. Usually, when people talk about RAG, they say “Oh, RAGs sometimes hallucinate, and that’s really bad.” And Andrej Karpathy says “Actually, no. They always hallucinate. They do nothing but hallucinate.” And that’s really true, because LLMs don’t have any kind of tethering to real knowledge in a way that we can trust. We don’t have a way to say “Hey, I can prove to you that what the LLM said is correct or incorrect based on the LLM itself.” We need to go out and look and search.
And RAG to me is that opportunity where we can take the user’s intent, we can tie it using a, for example, semantic similarity search to structured data that we can point to and say “This is the data that is actually trusted”, and then feed that back to the LLM to produce a more reliable and truthful answer.
Now, that’s not to say that RAG is going to solve all of your problems, but it’s definitely going to give you at least a handle on what’s real, and what’s not, what’s trusted and what’s not, and where the data is coming from, where those responses are coming from… And it shifts the role of the LLM from being your source of truth to basically being a thin natural language wrapper that takes the response and makes it palatable and easy to consume to a human being.
Yeah. I think a lot of people have done a sort of – maybe they’ve done even their own demo with sort of a naive RAG, maybe pulling in a chunk from a document that they’ve loaded into some vector database, they inject it into a prompt, and they get some useful output… One of the things that I think we haven’t really talked about a lot on this show – we’ve talked about advanced RAG methods to one degree or another, but I know Pinecone, along with other vector database providers offer more than a simple just search, and that’s the only function you can do. There’s a lot more to it that can make things useful, in particular having – you mentioned Pinecone mentions kind of namespaces that can be used, metadata filters, sort of hybridized ways of doing these searches… Could you kind of help our listeners understand a little bit – so they may understand “Here’s my user statement. I can search that against the database and get maybe a matched document.” But for an actual application, an application in my company that I’m building on top of this, what are some of these other key pieces of functionality that may be needed for enterprise application or for a production application that go beyond just the sort of naive search functionality in a vector database?
Yeah, for sure. So we can take this one by one. So metadata is definitely one of those capabilities that vector databases have that are above and beyond what a vector index would provide to you. And basically what they are is, again, the ability to perform a filtering operation after your vector search is completed, and so you can basically limit the results set to things that are applicable in the application context.
So you can imagine different controls and selection boxes etc. that come from the application, that are more set in stone, so to speak. They’re not just like natural language. They’re categorical data, for example. And you can use those to limit the result set, so that you hit only what you want. That is something that is very common to see in a lot of different production scenarios.
And could you give maybe an example of that, like in a particular use case that kind of you’ve run across? What might be those categories? Just to give people something concrete in their mind.
[00:20:11.08] Yeah, for example you can imagine a case where – I’m not going to name the customer, but you can imagine the case where you want to perform a RAG operation, but you want to do it on a corpus of documents, but not on the entire corpus, but rather on a particular project within that corpus. So imagine that you have multiple projects that your product is handling, like finance, and HR, and whatever. Engineering. And you want to perform that search, and then limit it only to a particular project. And in that case, you would use the categorical data that is associated with the vectors that you’ve embedded and saved in Pinecone to only get the data for that particular project. That is like a kind of super-simple example.
But it can go beyond that, and move into the logic of your application. So you can imagine a case where you’re looking at a movie dataset. and you want to search through different plot lines of movies, but you want to limit the results only to a particular genre. That’s another case. We can just leverage metadata. You can think of wanting to limit the results to a timespan, a start and end date. Things of that sort, that kind of like have to do more with the nature of when and how and what category the vector belongs into, and not specifically the contents of the vector. So that’s one thing.
Namespaces are another feature that we’ve seen as being incredibly important for a multi-tenant kind of situation. And multi-tenant RAG has become kind of like a very strong use cases for us. That’s where you see a customer, and that customer has customers of their own, and not one or two, but many, many, many. And in that case, you definitely don’t want to have all of the documents that all of the sub customers have to be collocated in one index… And in that case, you basically break them apart. So they’re still on one index, so management of the index overall is maintained under one roof, but the actual content and the vectors themselves are separated out physically from one another in namespaces. They’re sort of sub-indexes to that super-index. And that’s another feature that we’ve seen as being super-important to our enterprise customers.
As you’re looking at these enterprise customers, and with maybe most enterprises getting into RAG at this point at some level, and trying to find use cases for their business to do that - I know my company and lots of other companies are doing this - what are some of the ways that they should be thinking about these different use cases when we’re talking about RAG, and semantic search, and multimodal, things that Pinecone does? What are good entry pathways for them to be thinking about how to do this? Because they may have come up with kind of own internal platform, it might have some open source, it might have some products already in play… But maybe they don’t have a vector database in play yet. And so how do they think about where they’re at when you guys are talking to them and you’re saying “Let me –” Because we’ve been talking in the show so far about kind of the value of the vector database, and these use cases, but not necessarily kind of an easy pathway. So how do you onboard enterprise people to take advantage of the goodness on this?
Yeah, that’s an excellent question. And in fact, it’s quite a big of a challenge, because it ends up being a straightforward pipelining challenge that has existed from the beginning of – the big data era. How do I leverage all the insight that is locked in my data in a beneficial way? And the sad part about this story is that it always depends on the specific use case, and it’s hard to give a silver bullet.
[00:24:03.01] A sort of light at the end of the tunnel is that we’ve recently published a tool called the RAG Planner, and its purpose is to basically help you figure out what do you need to do to get from where you are to an actual RAG application, and follow through all of the different steps that are required in between… From an understanding of where your data is stored, how frequently it updates, what the scale of your data is etc. to the point where it could give you some recommendation as to what are like the steps that you have to do, in terms of “Do you build a batch pipeline? Do you build a streaming pipeline? What tools should you be using to do those things? What kind of data cleaning are you going to need to do? What embedding models are you going to want to use to do this? How are you going to evaluate the results of your RAG pipeline?” So all of these questions are pretty complex.
So what I would say is a general rule of thumb, first of all you have to evaluate whether or not RAG is for you. For example, there are a lot of situations where RAG may be the wrong choice… Because the data that you have, and the actual capability of answering end users’ questions based on that data does not match up. And that’s how you get to see cases where chatbots sort of spit out results that may seem ridiculous, but nobody catches it… And companies get into a lot of hot water because of it.
There are a lot of scenarios where it’s much easier to start that journey and to sort of develop the muscle memory that’s required in order to set these things up. In a lot of these use cases you see a lot more internal processes, definitely in bigger companies, where there’s a very big team that just needs access to its internal knowledge base in an efficient way… But it’s not a system that is going to be mission-critical, in any way. So if a person gets a wrong answer, it’s not going to be the end of the world, nobody’s gonna get sued.
So what I would say is there’s definitely a learning curve here, for big organizations, for sure. It’s usually recommended to develop, again, that internal knowledge of what the expectation versus the realities on the ground is going to be, to have like a really good idea of how you assess risk in those situations, and most importantly, how to evaluate the results that are produced by those systems. Because a lot of people are like “Okay, we build the RAG system. Great. It now produces answers. I’m done. Everybody’s happy.” That’s farthest from the truth that you could possibly be. These systems need to be continuously monitored, and feedback needs to be continuously collected, to the point where you can understand how changes in your data and the way that you’re interacting with it changes in the large language models that you’re implying are actually affecting the end results are going to be, and how your users are actually interacting with the system overall. How all of these things kind of coexist and happen together, and are they working in the way that you want them to. And of course, you want to do that in a quantitative and not qualitative way. So there’s a lot of instrumentation that has to go into it.
I’m curious, as a little follow-up to that - and obviously, leaving specific customers out of it - are you tending to see more internal use cases of RAG deployment to internal groups of employees, and stuff? Maybe from a risk reduction, are you seeing more of an external “I’m gonna get this right out to my customers, and try to beat my competition to it?” Where do you think the balance is, as of today?
I think that there’s a wide spread, and I think that it’s a journey. I think that the more tech-native companies that we see, that are more, I would say forward-looking or technologically adept to kind of do these things quickly, are more ready to not only take risks, but take educated risks in this space, with the evaluation that comes with it. So these are not just like “Let’s set and forget”, but they actually know what they’re doing. In those cases, you see them going out to production with very big deployments. That is our bread and butter, I would say, at the moment. With companies that are more traditional, that are not necessarily tech-native, you see a more cautious sort of progression, which is only to be expected. I think that’s kind of like natural to see.
[00:28:21.29] Well, Roy, I have something that I saw on your website, which was new to my knowledge, which I think is also really interesting… One of the things that I’ve really liked in experimenting with vector database RAG type of systems as an AI developer is having the ability to run something without a lot of compute infrastructure; maybe in an embedded way, or an [unintelligible 00:28:46.18] index, something that I can spin up quickly, something that I don’t have to deploy a Kubernetes cluster or something, or set up a bunch of kind of client-server architecture to set up and test out maybe a prototype that I’m doing… And I see Pinecone is talking about Pinecones serverless now, which is really intriguing to me just based on my experience in working with people… These sort of serverless sort of implementations of this vector search I think can be really powerful. So could you tell us a little bit about that and how that kind of evolved, and what it is, what’s the current state, and how Pinecone thinks about the serverless side of this?
So serverless came about after we’ve realized that tying compute and storage together is going to limit the growth factor that our bigger customers are expecting to see. And it basically makes growth kind of prohibitive in the space. And so we had to find a way to break apart these two considerations, while maintaining the performance characteristics that our customers are expecting and are used to having from our previous architecture.
So essentially, serverless has been a pretty big undertaking on our side to ensure that the quality of the database is maintained, but at the same time, we can reduce cost dramatically for customers. To just give you an idea, for the same cost of storing about, I don’t know, around 500,000 vectors before, you can now store 10 million. And that’s a humongous difference. It’s an order of magnitude difference. I think that to accomplish that there was like a lot of very clever engineering that had to happen… Because again, now having compute and storage separated apart means that storage can become very cheap, but on the other hand it requires you to handle the storage strategy and retrieval in a lot cleverer way.
We have a lot of content on the website that kind of delves deeper into how exactly technically that was achieved, and we won’t be able to cover that, given the time that we have… But the basic premise is that you can now grow your vectors’ index to theoretically infinity, but practically to tens of billions and hundreds of billions of vectors, without the cost of the expense becoming prohibitive… Which is the main drive for us with our bigger customers, and also with smaller customers. You can start experimenting, we have an incredibly generous free tier that allows you to start - like you said, if I’m just a developer, on my own, testing things and trying to understand how vector database works in my world, it’s very unlikely that I’ll be able to tap the entire free tier plan, even several months in, with many, many vectors stored. And it will work the same way that our pro serverless tiers work in terms of its performance. So it’s not like a reduced capacity or performance in any way. So you get to feel exactly what it would feel like, and the effort that’s required to stand it up is minimal to negligible. You just set up an account, and the SDK is super, super-easy to use.
[00:32:12.11] Yeah, and in my understanding, or to sort of representing things right, in terms of the massive – so there’s a massive engineering effort, I’m sure, as you mentioned, to achieve this, because it’s not a trivial thing… But in terms of the user perspective, if people used Pinecone before, and they’re using Pinecone now, you already mentioned the performance… Is the interaction similar? It’s just this sort of scaling and sort of – from the user perspective, scaling and pricing? And maybe also you could touch on… So Pinecone is – people might be searching for different options out there, and some of them would require you to have your own infrastructure, or some of them are hosted solutions… Pinecone, at least in its kind of most typical form would be hosted by you. Yeah, could you just talk a little bit about the user experience pre/post serverless? And then also, kind of the infrastructure side, what do people need to know, and what are the options around that?
In terms of what happened pre and post… So before serverless, there was a lot of possible configuration choice that you could do. There was in fact a lot of confusion with our users, like “What exactly is the best configuration for me? Should I use this performance kind of configuration? Should I use the throughput optimized configuration? What exactly am I supposed to use?” And the pricing mechanism was a little bit convoluted… And I think that serverless, the attempt there was to simplify as much as possible, and to make it really, really dead simple for people to start and use, but also grow with us.
So again, like I said, the bottom line is the external view into what Pinecone offers may have looked pretty similar. So if you’re just a user, you may say “Hey, I got a cheaper Pinecone bill this month, and I can store a lot more –”
Always a good thing.
Right. Always a good thing, but not super-amazing. But the end result is – the question is what happens when you can actually store a lot more vectors? What does that unlock for you? And I think at the end of the day, the way that we see Pinecone - and this may help us kind of talk about what’s next for Pinecone - is a place where your knowledge lives, and it allows you to build knowledgeable AI applications. And having more knowledge is always net positive in that context.
So the assumption is that as AI applications grow, they accumulate more and more knowledge, and they become that more powerful with any additional knowledge that you can stuff into them. And so there’s actual value – beyond the fact that you can store more, and it’s cool, your application actually becomes more powerful because it can handle more types of use cases, it has a better ability to be more accurate and respond truthfully to a user when they are interacting with it.
So I think that in general there’s this blatant kind of value that is only going to be apparent once people really experience what it means to have a million documents that are stored in Pinecone, versus 10 million documents that are stored in Pinecone. And that effect is going to be very powerful. I think that’s the majority of the benefit that I see.
Maybe that gets to the next thing which I was going to ask about, which - I also see the announcement around Pinecone Assistant… And I’d love to hear more about that. Of course, sometimes maybe that can be loaded language also for people in the AI space, but in terms of this assistance functionality for Pinecone, what are you trying to enable, and where do you see it headed?
[00:36:09.00] So that has to do with the question that Chris had before, which is like “What is the journey for customers?” And I think that as a general purpose that we had around Assistant was to reduce the friction between me having a bunch of documents that I want to interact with an LLM, or an AI in some form and capacity, to the point where that actually works. There are a bunch of ways of going about it. I think Pinecone wants to bring, on top of our very robust vector database, a very smooth experience, that lets users really do very little, and get all the value out of Pinecone without having to think too much about it. So for that purpose, we don’t only have the ability to take your documents and then [unintelligible 00:36:55.14] them and do the end to end process of creating that completion endpoint for you… We’re also the ones providing the actual inference layers as well.
Again, if you’d asked this question of “How do you build a RAG pipeline?” a year ago even, I’d have to tell you “Hey, you have to go to some embedding provider, you have to find someone who would do your PDF extraction”, or take the data and chunk it, and do all this stuff… No more. The reality here is you can take a set of documents, throw them at this knowledge assistant, and the rest is kind of “magic”, really. It just happens for you behind the scenes, while maintaining the quality that you want to get, and at the scale that Pinecone can deliver, which is again, another differentiator. So like I said before, Pinecone is built to withstand hundreds of billions of documents of vectors that you would store with us, and still be able to produce responses in a reasonable amount of time… And that’s true for the knowledge assistant, because the assistant sits on top of the vector database.
So it sounds like that may be a really good way, especially for smaller organizations… We talked about enterprise, and they have a certain infrastructure and teams deal with that… But there’s so many more small organizations out there that have very little in terms of have trained people necessarily to do that, and they don’t have all the infrastructure in place, and with assistance and serverless they’re looking for simple ways to onboard and get utility out of it. Would you say that the combination of serverless and assistants, and then maybe whatever they might have in AWS or whatever platform that they’re using, is kind of just made to jell easily for them, so they can get to something working pretty quick?
Yeah. I mean, at the end of the day it, if we think about it, the process shouldn’t be as complicated as it is. It’s just that there are many parts to it, and nobody picked up the gauntlet of saying “Hey, we’ll just do it all.” Because all of it is quite complicated to do right.
So yeah, I think that initially, we’ll see smaller organizations kind of picking that up because they don’t have the resources, but as time moves along, you’re gonna have to ask yourself, even as a bigger organization, “Do I want to own this pipeline? Is it something that I need to own? What value am I getting from actually owning all this?” So yeah, it would be interesting to see – so this is a very, very new product, still in public beta, and it will be interesting to see how the market kind of reacts to it, and sort of experiments with it… But my bet is that, as time progresses and knowledge assistants themselves become more capable, doing things maybe beyond RAG, or beyond “simple RAG”, that more and more sophisticated organizations might want to actually give it a try.
[00:40:02.25] And that really brings us maybe to a good way that we like to end episodes, which is asking our guests to sort of look into the future a little bit and - not necessarily predict it, because that’s always hard, but to look into the future, and kind of… What are you excited about? It could be related to vector databases specifically, or Pinecone specifically, but maybe it’s more generally in terms of how the AI industry is developing, the sorts of things that you’re seeing customers do that are encouraging… Whatever that is, what sort of keeps you excited about where things are headed going into the rest of this year?
I’m excited about the fact that we’re seeing sort of like a resurgence of what you would call traditional AI kind of come back into the fold, in the form of, for example, Graph RAG. I think the notion here is that, for the longest time - and I think it’s been since like GPT 3.5 - you basically saw this over-indexing on LLMs. And for good reasons. They’re super-exciting, they’re very powerful, and they can do really, really cool things. But with that said, it’s as if every other technology that has ever existed before just like dropped off the face of the earth, and nobody has ever talked about “Okay, wait, so what can we do with those things AND LLMs?” Like, where do LLMs fit in the bigger picture?
I think that vector databases kind of like put LLMs in their place a little bit, in the sense that – you know what I mean? You’re not thinking of the LLM as being the end-all-be-all, like “This is the only tool that we need.” I’m very excited to think of LLMs as these operators or agents that can tap into the capabilities that exist in other systems, and I think that what we’re going to see more and more is that people are going to figure out in what subset of the ecosystem does each tool belong, so what set of problems does each tool solve. For example, a vector database solves the problem of bridging the gap between the semantic world and the structured world, a graph database can solve problems like formal reasoning over well-structured data, relational databases can solve a whole set of different problems that they used to be solving, like aggregation etc. And then you can imagine that LLMs and agents can sit as sort of like an orchestrating mechanism, and a natural language interface mechanism on top of all those things together.
And that’s what I’m excited to see… It’s kind of when the community as a whole is going to like wake up from its LLM fever dream and sort of realize that there’s other things out there, and realize that it has so many more powers that it could yield to make really exciting applications.
That’s awesome. Well, thanks for painting that picture for us, Roie, and for taking time to dig into so many amazing insights about vector databases, and embeddings, and knowledge management in general… So yeah, I appreciate what you all are doing at Pinecone, and I hope to have you on the show again to update us on all those things.
Thank you so much. Thanks for having me.
Our transcripts are open source on GitHub. Improvements are welcome. 💚