Big data is dead, analytics is alive
We are on the other side of “big data” hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.
Matched from the episode's transcript 👇
Daniel Whitenack: [00:23:54.21] Some of our listeners might be curious why a person like me is sort of living day-to-day in the AI world is super-excited to talk about DuckDB. I mean, certainly I have a past in more broadly data science, and this is pain I’ve felt over time… But also, there’s a very relevant piece of this that intersects with the needs of the AI community more broadly, and the workflows that they’re executing. And one of those is - where I kind of started getting into this is in these sort of dashboard-killing AI apps that people are trying to build, in the sense that “Hey–” Another pain of mine as a data scientist in my life is building dashboards. Because you always build them, and they never answer the questions that people actually have… And so there’s this real desire to have like a natural language question input, and you can then compute very quickly the answer to that natural language question by using the LLM to generate a SQL query to a number of data sources.
But then when you start thinking about “Oh, well, now I have these CSV files that people have uploaded into a chat interface, or I have these types of databases that I need to connect to, or I have this data in S3 buckets”, and my answer could come from these different places, all of a sudden this kind of rich SQL dialect that you talked about, that’s very quick, and can run with a standardized API across those sources becomes incredibly intriguing for me. Transparently, that’s how I sort of like got into this, is I’m thinking of all of these sources of data that I could answer questions out of using an LLM… But how do I standardize a fast interface to all of these diverse sets of data, and also do it in a way that is easy to use from a developer’s perspective? But I also know that you all see much more than I do, and maybe that is an entry point that you’re seeing. I’m wondering if one of you could talk a little bit more broadly of how the problems that DuckDB is solving, and the problems that your customers are looking at are intersecting with this rapidly developing world of AI workflows.