Apple Intelligence & Advanced RAG (Practical AI #275)

All Episodes

Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

Changelog++ members save 6 minutes on this episode because they made the ads disappear. Join!

45 minutes
Recorded Jun 19, 2024
Published Jun 25, 2024
Download (44MB)
Transcript
🎧 30,407

Featuring

Chris Benson – Website, GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

Neo4j – Is your code getting dragged down by JOINs and long query times? The problem might be your database…Try simplifying the complex with graphs. Stop asking relational databases to do more than they were made for. Graphs work well for use cases with lots of data connections like supply chain, fraud detection, real-time analytics, and genAI. With Neo4j, you can code in your favorite programming language and against any driver. Plus, it’s easy to integrate into your tech stack.

Plumb – Low-code AI pipeline builder that helps you build complex AI pipelines fast. Easily create AI pipelines using their node-based editor. Iterate and deploy faster and more reliably than coding by hand, without sacrificing control.

Backblaze – Unlimited cloud backup for Macs, PCs, and businesses for just $99/year. Easily protect business data through a centrally managed admin. Protect all the data on your machines automatically. Easy to deploy across multiple workstations with various deployment options.

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:00	Welcome to Practical AI	00:44
2	00:44	Sponsor: Neo4j	01:18
3	02:06	Staying fully connected	00:48
4	02:54	Our past predictions	01:04
5	03:59	Reality of adopting AI	02:15
6	06:14	Shifting roles	02:58
7	09:12	Realizing usability	02:28
8	11:39	Full stack AI development	01:36
9	13:27	Sponsor: Plumb	01:25
10	15:06	Apple Intellegence	04:55
11	20:01	Closed model concerns	04:08
12	24:09	Balancing privacy & performance	04:46
13	29:09	Sponsor: Backblaze	01:50
14	31:14	RAG pipelines	05:00
15	36:14	How to improve your RAG	03:45
16	40:00	Hybrid search	01:44
17	41:44	Re-ranking	02:14
18	43:57	It's been fun!	00:22
19	44:19	Outro	00:46

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another Fully Connected episode of the Practical AI podcast. In these Fully Connected episodes Chris and I keep you connected to a bunch of different things that are happening in the AI community, and try to plug you in with some learning resources to help you level up your machine learning game. I’m Daniel Whitenack, I’m founder and CEO at Prediction Guard, where we’re safeguarding private AI models. I’m joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris?

Doing great, Daniel. So many things happening in the news, and I just was looking forward to a chance for us to finally – we’ve hit specific topics a lot lately, and I’m hoping we have a chance to jump in and just talk about all the stuff.

Daniel Whitenack

How’s your mind at these days in relation to AI? We haven’t done a sort of general check-in on - both of us probably I think know each other well enough that probably we’re both fairly hopeful about things looking forward, and seeing many good things… But yeah, generally, how has this year looked for you in relation to how you thought it might look, and/or in your own work with this technology? How’s your view of how the technology is shaping up? Changed, or stayed the same?

In the large, I think some of the things that we have kind of predicted are happening. A lot of the developments are somewhat predictable. If you do something with imagery, you’ll probably get there with video, and things like that. And we made some predictions last year about that… And I think those types of things are playing out more or less, in a broad scale, kind of how we would have expected. Would you agree with that, in general?

Daniel Whitenack

Yeah, I would say definitely multimodality-wise…

Yes, we talked about that a lot.

Daniel Whitenack

What about, I guess – both of us either are at or have interacted with friends of ours or colleagues at a variety of enterprise organizations… What do you think is the reality on the ground in terms of adoption of AI, versus what’s all in the news, and the hype, and that sort of thing? …in terms of the practicalities and actual adoption rate of generative AI versus kind of the things that we’ve always had with us, the machine learning, the data science types of projects.

Yeah, I think it’s interesting… There’s a lot of reality checking happening this year, especially in the last few months. Everyone’s been hit with so much Gen AI marketing and just all the hype and everything with it… But we’re also expected to get things done at work. And so trying to finally get past the hype and get stuff - which requires a lot of hard decisions to make. So companies kind of going “Well, I’m looking at the cost of one of the big providers”, with Open AI being kind of the leader in those, and “Do I want to pay for that, for everything? And thus, do I also want to send my data out? How can we do things with smaller models, or other large language models that are open source?” And across multiple companies, as I’m watching people make these choices, and we’re having conversations about it offline and stuff, there’s an agony associated with trying to navigate correctly, and not end up in a bad position for your organization, and stuff like that…

Daniel Whitenack

Yeah.

And so I’m seeing a lot of “Well, we’re going to use some open source for this, and we’re going to use some API calls to commercial stuff for that, and we’re going to use some smaller models over here… And how are we going to put them all together?” And we’ve talked about these issues across a bunch of different conversations. Even though the last guest we had, we had this conversation… But I think people are really challenged with making it all work. As I talk to people at different companies, I don’t necessarily see everyone doing it the same. There’s enough variability to where we haven’t arrived at the world of best practices yet, in my view.

Daniel Whitenack

[00:06:12.22] Yeah. Well, one of the things that you highlighted is the sort of multi-model future. People kind of spreading out their workloads across multiple model providers and open models. I think that’s something that seems to be only increasing and will continue… You know, with all of this emphasis on large language models, generative AI, have you got a sense from data scientists that you’re interacting with, or others, that the actual day to day of data science teams is shifting, or they’re still just training their support vector machines, and whatever time series forecasting, and whatever those things might be?

I think the thing, to that point - they are shifting, but also, the field has exploded out in the number of positions to support… And way back when I was young, which was back when the dinosaurs were roaming the Earth, you had like software developers and they kind of had to do everything, which was very reminiscent of how AI has been in previous years… And in the last couple of years we’ve seen an explosion of – we first saw machine learning engineers, beyond data science, and then you kept adding each title in position, and there are… Now there’s UX people in AI concerns… And it feels very much like software exploded from the 1980s, when I was a kid, into young adulthood for me in the ‘90s and 2000s, and now we’re seeing that same – it’s very similar to me. I look at it and it’s déjà vu for me.

So it’s a maturing of the industry, and people are starting to figure it out… I think there’s a recognition, finally, outside of the marketing and hype machine, which goes hard and constant always, I think for the worker bees like me there’s a realization that it’s part of software. And we’ve talked about this for a long time, that that needed to happen… And so a little bit less hype, and more about “What can the models do? How do I combine them? And what sizes?” and it’s putting the jigsaw puzzle together of what makes value for a particular organization. And that’s been interesting for me to be part of that in my own organization, and help us navigate through the morass, and every other organization I’m talking to is doing the same. So yeah…

Daniel Whitenack

Yeah, I was just sort of curious about boots on the ground. What’s changing day to day for data scientists? It seems like one of the things that you’re indicating is it’s more that roles and teams are expanding.

They are.

Daniel Whitenack

Yeah. Versus the data science teams that have existed sort of ceased to exist in their form, creating scikit-learn models and start moving over to Gen AI… Which is probably not the case. I was just looking at Google trends of terms, which is always a fun thing to look at… And I was looking at Gen AI versus scikit-learn, and there’s sort of a – you know, scikit-learn’s still quite impressive search, but you can see this surge of interest kind of in the data science hype period, at least as far as I can tell… But then there’s also been a surge since kind of 2022 and on, and that’s gone down a little bit… So I don’t know how much you can draw from that, but the data science team still lives as far as I can tell.

[00:09:42.09] I don’t think it’s going to die, because people are also - to your point, they’re also realizing the limitations and constraints of Gen AI, and what types of things it doesn’t do well. And so people are, I think, a bit smarter about it in 2024 versus 2023, and definitely 2022. It seems like the wisdom is finally kind of spreading out, and people will kind of go – instead of just saying “Gen AI is going to solve everything”, they’re recognizing it for what it is, and the capabilities it has, and they’re starting to say “This is a good use case for it, but we need to pair it maybe with a reinforcement learning model.” And they’re starting to remember “Oh yeah, there’s all these other capabilities which we were quite enamored with until Gen AI came along, and they’re still really good technologies to use.” And so trying to start recognizing what should and what shouldn’t be an AI at all, and combining those together in unique value propositions for their organizations is the thing I’m seeing.

One other point that I’ve noticed also for the first time this year is that companies, like the software side and the AI side are finally really coming together operationally, instead of being very stuck apart… Which is one of the problems we’ve talked about on the show many times. And I’m seeing agile methodologies play out that had been on the software side for years in these organizations, and they’re now including the AI and data science teams in how they’re – I’m just making it up; like, if they’re using SAFe, or Scrum, or whatever they’re using, they’re starting to account for that, and it feels more real life to me. It feels like “Ah, we’re finally getting to a point of maturity, and recognizing that all the pieces need to come into play, and we need to be efficient in how we do that.” So that’s been my kind of enterprise observation of the last few months.

Daniel Whitenack

Yeah. And I don’t know - again, Google search trends only gives you so much, but it seems like the main trend with data science as a function, at least according to searching, just sort of keeps going up pretty steadily… Even though there’s a switching of technologies. It would be nice though - I know that we talked quite a while back in a number of episodes about being a full-stack data scientist, and I know recently we had some of that discussion around full-stack AI agent development… But that sort of idea that there would be more integration of that software side into data science teams and vice versa is something that - maybe this is a push that’s kind of materializing some of that.

You know, it’s interesting… The term “full stack” is so loaded, in terms of how people perceive it… And it’s a bigger thing if you’re in a very small organization, and what it means there is “Thank goodness I’ve got somebody who can handle all these things that we have gaps on, because we don’t have enough resource to go buy somebody in all these different areas.” And so it’s more meaningful in a smaller or midsized organization based on the nature of the organization. You’re going to see it a lot more job-specific and role-specific in the enterprise, which in my view is a good thing… Because you don’t want to just put full stack this, full stack into everyone, because in a large enough organization, that doesn’t help with your efficiencies.

Daniel Whitenack

But team-wise, there could be more integration.

There could be. And I think integration is really important… And so I think this is the first year where I have a little sense of actually seeing that out in the workplace.

Break: [00:13:16.20]

Daniel Whitenack

Well, Chris, we have got to Apple Intelligence this last cycle that we went through… Everybody’s got their AI play now, and I guess, of course, Apple had been in AI in one way or another, so it’s not like they were totally absent… But we got the announcement about Apple Intelligence. So what do you think? First impressions? Excited? Confused? A mix? What’s your impression?

I’m always skeptical about everything when it comes out, because of the hype machine, as you know… But as an Apple user, I’m looking forward to it. I want to see what they do. I buy a certain degree into the Apple ecosystem, but I also am not 100% invested in every way, the way some folks are. I use Google as well, in various things… So they were very slow. Apple’s received a lot of criticism in the last couple of years, because once upon a time having been perceived as the Steve Jobs-esque leader of “We’re the ones that bring you completely new ideas, that are going to change your world”, like the iPhone when it was released originally, they have definitely not been fulfilling that role. They’ve been slow.

Having said that, having thrown the criticism out first, they have certainly – I would say the announcement seems differentiated, in that Apple is kind of putting it out that they’re a product-focused company, and they’ve made these AI announcements that clearly position AI as a feature, and not the product itself, which a lot of the other big companies are – it’s almost AI is the product that they’re trying to do.

So with the announcements at WWDC 2024, which is their developer conference every year, often called DubDub by insiders there, they are talking about AI in the context of the devices and of the tasking that their users are doing. And so I actually like that. As you know, you and I both get absolutely inundated with reachout from startups and companies always promoting and hyping their AI product… And at least to see Apple talking about it being feature enhancement rather than the thing itself is good. It’s a little bit of fresh air on that one.

Daniel Whitenack

Yeah. I think that when you have a little button, Summarize, or Rewrite, or whatever that button is, that’s very much how I like the sort of first wave of pre-ChatGPT AI features, a lot of them that came out, where you just see suggested text, or you see something that makes sense… And UX-wise, that’s probably, like you say, very fitting with Apple and their approach to things. I know that there was definitely some shade thrown by some, including Elon Musk…

Indeed.

Daniel Whitenack

…about the reliance on Open AI in Apple Intelligence. Did you see any of that?

That’s Elon Musk being Elon Musk. I think part of it is just a gambit for the spotlight at any moment, inserting himself into any spotlight… I won’t go into [unintelligible 00:18:12.15]

Daniel Whitenack

If you’re listening, Elon, you can come on our show and steal the spotlight. You’re welcome. Although it might be an interesting –

Interesting conversation there.

Daniel Whitenack

…interesting experience.

Yeah. He’s the same age I am, more or less, and so… Yeah, I always kind of go – I’m just trying to imagine when he does some of the things. But anyway, back to the Apple bit, without derailing on that one… Elon came out with the specific criticism of “Okay, you’re gonna send everything off to the GPT API, and that’s a huge privacy breach”, and stuff like that. But Apple had already clearly in the announcement said “Every user on a per-use basis will be given the option of –” Siri will say “Do you want to send this off to GPT for an answer?” And the user, on a per-use basis, can say “Yes, I want to do that”, or “No, I don’t.” And they were very explicit about that upfront. So I’m like, “Elon, if you’re gonna use an iPhone or an iPad, just say no. Just say no, and stop.” Because that way, you still have control. And that seems like a reasonable thing.

I use the Open AI app all the time on my iPhone, and it’s one of those things that’s open all the time. And that is good enough for many cases. But there are times when I would certainly like to integrate that capability into my other activities on the iPhone, in a more integrated way… And this gives me that opportunity. So Elon was saying “Only have the Open AI app”, and as a user myself, I say “No. Give me the option.” Sometimes I’m just going to have the Open AI app there, but other times let me integrate it with my other activities. Apple’s gonna give me the choice on whether I want to do it. I’m happy with that. All good. So he’s not speaking for me in that capacity.

Daniel Whitenack

[00:20:00.22] Yeah. It might be interesting to talk just for a second about - whether it’s Elon, or… You know, there’s certainly, in a less public or meming way, a lot of people out there that do have concerns about these sort of closed model providers. And some people that are still blocked by using these. I’m wondering, from your perspective – I have a few of my own thoughts, but from a practitioner’s perspective, just to kind of make it practical, as we are on Practical AI… What are those sort of trade-offs, I guess, as you see it now, with closed model providers, or kind of using open models, or some version of hosted open models, in kind of the enterprise or development scenario, versus kind of your personal device? Certainly, there’s a direct to consumer type of angle to what we’ve discussed so far, but in terms of the practitioner themselves… We’ve touched on this occasionally, but I think it’s probably good to continue touching on it occasionally, because things are changing over time, and changing very rapidly… So yeah, from your perspective, do you have any thoughts on that?

Sure. I think that is a an issue that every large organization is navigating, because you have a certain amount of funding to support your operations, and almost everybody has some tie into one or more of the large commercial APIs. And it’s a different context from a personal user. Like I talked about, I’m paying my Open AI monthly fee, and I use that all the time for a variety of different tasks. But in the enterprise, it’s a bit different. There may be that capability, but I’m also seeing enterprises that are really concerned about their data going out, and about their information going out. If it’s not their own, it’s their customers that they have. By using a public API that you’re paying for, where that data goes outside of your control, there is a huge concern and risk, not only about the immediate privacy concerns, but also about the liability and the legal concerns around that. Because most organizations have a mixture of their own and other organizations, a whole bunch of partnership agreements in large companies… Maybe you’re okay with your data going out, but of the 50 partners that you have, it might be that 36 of them aren’t too keen about their data, that they have an agreement with you holding those out as part of your data to a third party. So that makes it pretty challenging to use third party APIs in a manner that everyone is comfortable with.

So I’m really seeing a lot of open source models being internally hosted… There is still a lag, because Google, Open AI, Anthropic, they keep pushing the boundaries on what they’re offering, and the open source community is not typically all the way to the – it’s not just the model, but the services built around the model that makes it easy to use… So you have to kind of recreate that, or use existing open source capabilities that are out there. And that requires effort. And there is funding and a good bit of money spent on that. But I would say of the ring of people that I hang out with across multiple companies, that I’m seeing more of the internal hosting of models with the effort of trying to stay on top of current releases and monitor that as being the more widespread way. And there’s a recognition that those models may not give you quite as good of an answer if it’s a very expansive thing you’re prompting on as a GPT model would… But that’s okay, in a lot of cases; it can get you through. And if you have multiple models to choose from, and combine, then you can usually be very productive without kind of violating all those concerns that I enumerated. So it’s both, but for me, I’m seeing more people turn inward. Now, I run in a national security world, and so we might be a little bit more conservative about that across the various defense companies, and stuff like that, and so I’m acknowledging that there’s a bit of an array of possibilities there.

Daniel Whitenack

[00:24:07.24] Yeah. One of the things that maybe has shifted a little bit in my mind since the last time I’ve been considering this question and we’ve talked is - I would say there’s all of the sort of privacy data misuse, data leaving your network, all of that, I think, is a big piece of it. There’s been kind of a developing mindset that I’ve kind of picked up on, which is slightly different… And I started to pick this up from – I’ve mentioned this a couple of times, because it’s been helpful for me, a16z’s recent surveys and reports… But the fact that yes, there’s a privacy element, but a lot of times organizations are using open models because of the control element… And it took me a while to, I think, fully parse through the implications of that. And I think some of what it gets down to is when you’re connecting to one of these closed systems - and there’s hosted open models in a variety of places you could get. So when I say hosted closed system, I mean like you literally don’t know what’s happening behind that API, or how the model is called, and that sort of thing. Those are productized AI systems, which means that they’re making opinionated choices about how to improve the performance of that product surrounding the model, right?

Yes.

Daniel Whitenack

And that actually can be an amazing thing. Open AI functionality is spectacular, without doubt. These other systems, Anthropic, others - really spectacular functionality. But there’s this element of it where they’ve made some opinionated choices for you about how to process the data that you’re putting into that system before and after it hits the model. And so there’s a lot more going in. And I think you see this come out very much in, for example, the stuff that happened with Gemini, where you put in your prompt to generate an image of American founding fathers, and there’s clearly, however, that works, a modification of your prompt or extra instructions to bias that output to look a certain way.

Sure.

Daniel Whitenack

If you’re interested, go look it up. There’s lots of interesting pictures. And to be fair, they’ve rectified that situation as far as I know. But when you have that sort of decision made, you don’t have full control; like, it’s not just your prompt going into the model, and you kind of choosing how to govern or bias that, or process user inputs, or do your prompt templating… And so it can be really good, but it can be sort of frustrating at that level, where you get like 80% to 90% of the way towards what you want, and then for some reason you just can’t figure out why you can’t get that last bit, or you can’t figure out why this error is happening, or there’s latency types of fluctuations, or… Whatever those things might be; it could be bias in the output.

So I think that that opinionated productized thing - it’s both a good and a bad. And depending on your scenario, that may actually be what you need. I’m not gonna worry about these things. I trust the way that sort of these things are being handled internally in a system like this… And I’m guessing that will be fine for many people. But then there’s people that want to build kind of these competitive AI features into what they’re creating as a company… And they want full control to figure out – you know, to build those in exactly the way they want, to make sure that they can test those in exactly the way they want, and to have that control element. I think that’s way more crystallized in my mind than it was previously.

[00:27:58.08] I think that’s a fantastic insight right there. And I think most people miss that, because with the hype machine going we have a habit of talking about the models themselves all the time, kind of as product, and therefore there’s so much that these companies that are putting out these as a service are doing; there’s so many humans involved that you never see. And yes, that can really make it much better in some ways, because they’re kind of shortcutting what the model may not be able to do on its own. They are shortcutting and greasing the skids to make you get what you want, but at the same time, anytime there’s a human involved, you’re going to have the bias as well. And they’re trying to make it safe, and controlled, and not have some sort of thing that ends up in the news in a very negative way… And that puts constraints around it [unintelligible 00:28:43.18] Yeah, I think it’s really key that we look at it as not only a model, but model plus the services around it, whether you’re building them or whether someone’s building them for you.

Break: [00:28:56.13]

Daniel Whitenack

Alright, Chris, well, a lot of times in these Fully Connected episodes we do try to kind of bring some learning element to the forefront as people are exploring these topics… One of the ones that has been coming up a lot for me, which we’ve talked a lot about on the show, is RAG or retrieval augmented generation. But we’ve only sort of talked about it at a surface level and at this sort of naive RAG level, which might misrepresent sort of some of what people are doing with this approach under the hood. And this would generally kind of be framed – I think if you search “advanced rag”, you’ll find a whole bunch of articles, and really what’s happening is there’s a naive approach to this RAG type of workflow… Which can get you to some really amazing results really quickly, but then when you have to kind of fine-tune, improve that system, load in more data, use documents that are closely related one to another, various types of documents… There’s a lot to dig into in terms of fine-tuning that system and fine-tuning both how the retrieval and the generation works. And there’s a whole variety of sort of workflows that have been developed by the community that can help you improve your RAG setup. So that’s kind of one of the things that I wanted to bring up here and maybe talk through a couple of those. I know our friend Dmitrios - we talked to him and he’s got some opinions about RAG versus fine-tuning. That’s one thing you’ll hear. But yeah, I would love to dig into that if it sounds interesting to you.

Absolutely. And I think you’re the right person to do this, given that you’re diving into this stuff all the time.

Daniel Whitenack

Yeah, well, certainly RAG pipelines are probably the first thing that people are building with generative AI… And the idea is not too complicated. The idea is let’s say I have a bunch of documents that contain information relevant to questions or queries that I might have. Instead of just asking an LLM to give me an answer, or to do something, which would rely on the probabilities of that model in generating its own text, and the data that was trained on… You could get any sorts of answers out of that LLM. Rather than just relying on that, I’m going to inject on the fly some of that external data that I have into the prompt, to help answer the question or the query, something like that.

[00:33:55.21] So we do this all the time with ChatGPT and other systems… When I say “Summarize this email for me”, and then I paste in an email. That’s how I’m injecting data into the prompt, into the model, right at the time that I need it to run. So there’s no fine-tuning of the model here, it’s just a strategic insertion of data when I’m prompting the model. And often this happens in like “Oh, I have a bunch of developer documentation, or onboarding materials for my company”, or a wiki, or a bunch of webinars, or a bunch of podcasts or whatever, and I want to answer questions out of that material. Then these can be loaded into a vector database, which allows you to do the retrieval part, so to find the relevant chunk of information that’s required to answer the question, and then you take that relevant chunk, insert it into a prompt, and then respond, or let the LLM generate based on that given context.

So that’s sort of the naive RAG approach, where you sort of have a user query, you find a single chunk of information in some repository of information, insert it into the prompt as context, and hopefully get an answer… Which - it is naive, but surprisingly, it works amazingly well in many cases.

It does. It’s interesting, as we get in from kind of the naive to advanced ideas in it, and you also just mentioned for a second fine-tuning along the way… It is definitely the first step. I think it’s easy to implement in general for people, which is why it’s the first step… But I also think, before you go on, I think a lot of organizations are getting stuck on naive RAG, and just kind of stopping there. And I’ve noticed that. So keep going.

Daniel Whitenack

Yeah, yeah. I think you’re totally right, and I think – this is why I wanted to bring up this topic, because some will hit… And they’ll get sort of okay performance out of their RAG system, but then they don’t realize that there’s more options to improve that system.

Yeah, I’ve seen a lot of people thinking it solves it.

Daniel Whitenack

Yeah.

Like, “All we need is an LLM, and then we’re just gonna give it the data for RAG, we’re going to inject into it, and we’re done.” And I’m hoping we can break some of that perspective over the next few minutes.

Daniel Whitenack

Yeah. And the question would be “Well, okay, if you’re getting some good answers, and some not good answers, for example, what do you do to improve your RAG system?” And that’s where there’s a whole variety of things to explore. And like I say, if you’re really interested in this, I’d recommend searching for “advanced rag”. I’d recommend looking at the LLaMA Index blog, the LanceDB blog… There’s a lot of really good content out there to help you parse through this. But let me kind of inspire with a few snippets of things that you could keep in mind.

So the first, I think, is around context enrichment. This is a very simple thing that you can do, where let’s say you have 100 documents, and you split them up into little chunks, which you embed in the vector database, and you search against those little chunks to find the relevant thing that might help you answer their question. Well, depending on how you chunked up that information, it might not give you all the context you need to answer. It might be in like the previous chunk, it might be in the next one, it might be in the one that you’ve found, or it might be in a combination. And so this sort of idea of context enrichment might be that you just find that chunk that’s relevant, and then instead of inserting just that chunk, insert that chunk plus the one before it and the one after it, for example. Just expand it a little bit, enrich it a little bit.

Another sort of common thing is maybe you want to pull the three most relevant chunks, rather than the one most relevant, and add more context there. So there’s more that you can add in, more than just a single chunk.

[00:38:00.14] The other sort of related methodology here, before we get into maybe the more fancy stuff, is actually doing a two-level search over your data. So if you think about it, let’s say that I have, again, 100 documents. And there might be similar content across those documents, they might overlap in certain cases, but they’re different documents. Well, if you take and summarize with an LLM each of those documents or pages of those documents, and then you also chunk it up into smaller chunks that you eventually want to use for your RAG, you could first search on the summary, which would kind of point you to the right document that’s going to answer your question, and then do a second phase of retrieval within that document itself to pull out the relevant section. This helps you kind of hone in on the right document that you’re using.

So those are two fairly easy to implement in terms of how you set up your vector database, and how you do your querying… But they can provide a boost. Now, there’s more complicated things, and we can get to those in a second. But do those make sense?

It does make sense. Yes. I’m just curious, a two-second question… When people are going in your first case there, where they’re just going for the answer, and not adding the chunks around it, do you think that’s kind of biased from traditional database operations, where you find the answer and that’s it? I was just wondering, as you were saying that, why people might be limiting themselves in that way.

Daniel Whitenack

Yeah, I think it’s a perception problem. There’s so many examples out there of getting started with RAG… And that’s all you kind of see, unless you kind of really dig in. So maybe it’s just a perception issue. It is also maybe that sort of holdover from how you would retrieve things in a traditional database sense.

Gotcha.

Daniel Whitenack

Those might both factor in. There’s another kind of hybrid or two-level type of search that happens, and this is implemented in several different vector databases, even natively now, because it can be quite useful… Is actually doing two levels of searching, but the first which is a traditional full-text search or keyword search, and then a vector comparison, rather than just relying on the vector comparison. So you kind of hone in on the full text kind of keywords first, and then do a vector comparison.

[00:40:38.00] And you could even ensemble these in various ways, and use one for re-ranking or ordering versus the other one… There’s a variety of ways to implement this. But this would be kind of generally categorized as hybrid ways of searching, I think is most frequently the term. So there’s the context enrichment, there’s the hierarchical search or index retrieval - that’s the kind of summary, then chunk - and then there’s the hybrid search, which would be actually using two different search methodologies.

And notice all of this has to do with the retrieval part for the most part that we’re talking about here, not mostly the LLM side… Although you could use an LLM to generate the summaries for the hierarchical approach. So it’s interesting that those TFIDF, keywords searching, full-text search sort of things are coming up again… So back to our original way we started this episode, the data science pieces still survive, in many ways.

Yeah, it’s still relevant there. And I don’t think that’s changing.

Daniel Whitenack

Yeah. The last two that I’ll highlight - one that comes up a lot that people will use is a method called re-ranking. So there’s actually models out there known as cross-encoders. And what happens is you might do a first-level vector search to get a smaller number of candidate documents, and then use maybe a more expansive model-based approach to actually rescore the candidates that you pulled, and reorder them - hence the name re-ranking - reorder them or filter them to the most relevant documents. So that’s kind of the re-ranking approach.

There’s a couple of really interesting ones where you use an LLM in the loop. One of those is called HyDE. LanceDB has a good blog post about this. It uses LLMs to generate sort of hypothetical documents that should answer this question… And then you kind of use those hypothetical documents in the retrieval.

There’s people that do also query transformation. So they actually take the query - and this kind of fits our previous discussion about modifying a prompt, except now maybe you’re in control of it… Where you take that prompt in and you actually regenerate the query, such that it’s more favorable to the retrieval task.

That was a lot, and I know it was quick, but I think it might be good for people to hear that and just kind of see that there’s a much wider picture of these advanced RAG techniques. And I didn’t even get a chance to sort of get through all of them. People are exploring a lot of things. But that I think paints a much more rich picture of what can happen in these RAG pipelines, versus just that naive approach.

Thank you very much for kind of bringing this to attention. I think it would be well-advised for people to recognize they’re kind of getting to first base with the typical RAG approach, and that’s working for them in some cases quite well… But these tools are out there now, where it’s not so hard to then go on and move past that. But I’m seeing a lot of people get stuck there, so thank you for kind of covering that territory and giving people an opportunity if they’re not familiar with it to maybe dive into this.

Daniel Whitenack

Yeah, definitely. Well, it’s been a fun one, Chris, to bring back some data science discussions into our podcast… And yeah, excited to see what’s coming over the next couple of weeks that we can catch up on soon.

Absolutely. Talk to you later, Daniel.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art