Daniel Whitenack Avatar

Daniel Whitenack

280 episodes

Practical AI Practical AI #296

scikit-learn & data science you own

Play
2024-11-19T21:00:00Z #ai +1 🎧 15,164

We are at GenAI saturation, so let’s talk about scikit-learn, a long time favorite for data scientists building classifiers, time series analyzers, dimensionality reducers, and more! Scikit-learn is deployed across industry and driving a significant portion of the “AI” that is actually in production. :probabl is a new kind of company that is stewarding this project along with a variety of other open source projects. Yann Lechelle and Guillaume Lemaitre share some of the vision behind the company and talk about the future of scikit-learn!

Practical AI Practical AI #295

Creating tested, reliable AI applications

Play
2024-11-13T19:30:00Z #ai 🎧 20,656

It can be frustrating to get an AI application working amazingly well 80% of the time and failing miserably the other 20%. How can you close the gap and create something that you rely on? Chris and Daniel talk through this process, behavior testing, and the flow from prototype to production in this episode. They also talk a bit about the apparent slow down in the release of frontier models.

Practical AI Practical AI #292

Big data is dead, analytics is alive

Play
2024-10-24T15:30:00Z #ai +1 🎧 24,512

We are on the other side of “big data” hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.

Practical AI Practical AI #291

Practical workflow orchestration

Play
2024-10-15T20:00:00Z #ai +1 🎧 27,328

Workflow orchestration has always been a pain for data scientists, but this is exacerbated in these AI hype days by agentic workflows executing arbitrary (not pre-defined) workflows with a variety of failure modes. Adam from Prefect joins us to talk through their open source Python library for orchestration and visibility into python-based pipelines. Along the way, he introduces us to things like Marvin, their AI engineering framework, and ControlFlow, their agent workflow system.

Practical AI Practical AI #290

Towards high-quality (maybe synthetic) datasets

Play
2024-10-09T13:30:00Z #ai +1 🎧 26,503

As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.

Practical AI Practical AI #289

Understanding what's possible, doable & scalable

Play
2024-10-03T15:45:00Z #ai +1 🎧 27,580

We are constantly hearing about disillusionment as it relates to AI. Some of that is probably valid, but Mike Lewis, an AI architect from Cincinnati, has proven that he can consistently get LLM and GenAI apps to the point of real enterprise value (even with the Big Cos of the world). In this episode, Mike joins us to share some stories from the AI trenches & highlight what it takes (practically) to show what is possible, doable & scalable with AI.

Practical AI Practical AI #288

GraphRAG (beyond the hype)

Play
2024-09-25T18:30:00Z #ai +1 🎧 29,828

Seems like we are hearing a lot about GraphRAG these days, but there are lots of questions: what is it, is it hype, what is practical? One of our all time favorite podcast friends, Prashanth Rao, joins us to dig into this topic beyond the hype. Prashanth gives us a bit of background and practical use cases for GraphRAG and graph data.

Practical AI Practical AI #287

Pausing to think about scikit-learn & OpenAI o1

Play
2024-09-17T19:00:00Z #ai +1 🎧 28,006

Recently the company stewarding the open source library scikit-learn announced their seed funding. Also, OpenAI released “o1” with new behavior in which it pauses to “think” about complex tasks. Chris and Daniel take some time to do their own thinking about o1 and the contrast to the scikit-learn ecosystem, which has the goal to promote “data science that you own.”

Practical AI Practical AI #285

AI is more than GenAI

Play
2024-09-05T14:00:00Z #ai +2 🎧 31,975

GenAI is often what people think of when someone mentions AI. However, AI is much more. In this episode, Daniel breaks down a history of developments in data science, machine learning, AI, and GenAI in this episode to give listeners a better mental model. Don’t miss this one if you are wanting to understand the AI ecosystem holistically and how models, embeddings, data, prompts, etc. all fit together.

Practical AI Practical AI #284

Metrics Driven Development

Play
2024-08-29T20:45:00Z #ai +1 🎧 30,007

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Practical AI Practical AI #283

Threat modeling LLM apps

Play
2024-08-22T13:30:00Z #ai +2 🎧 28,713

If you have questions at the intersection of Cybersecurity and AI, you need to know Donato at WithSecure! Donato has been threat modeling AI applications and seriously applying those models in his day-to-day work. He joins us in this episode to discuss his LLM application security canvas, prompt injections, alignment, and more.

Practical AI Practical AI #282

Only as good as the data

Play
2024-08-14T21:15:00Z #ai +1 🎧 30,367

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, benchmarks, etc.). They also discuss the latest developments in AI regulation with the EU’s AI Act coming into force.

Practical AI Practical AI #281

Gaudi processors & Intel's AI portfolio

Play
2024-08-07T13:45:00Z #ai 🎧 29,403

There is an increasing desire for and effort towards GPU alternatives for AI workloads and an ability to run GenAI models on CPUs. Ben and Greg from Intel join us in this episode to help us understand Intel’s strategy as it related to AI along with related projects, hardware, and developer communities. We dig into Intel’s Gaudi processors, open source collaborations with Hugging Face, and AI on CPU/Xeon processors.

Practical AI Practical AI #280

Broccoli AI at its best 🥦

Play
2024-07-31T21:40:00Z #ai +1 🎧 31,179

We discussed “🥦 Broccoli AI” a couple weeks ago, which is the kind of AI that is actually good/healthy for a real world business. Bengsoon Chuah, a data scientist working in the energy sector, joins us to discuss developing and deploying NLP pipelines in that environment. We talk about good/healthy ways of introducing AI in a company that uses on-prem infrastructure, has few data science professionals, and operates in high risk environments.

Practical AI Practical AI #278

The first real-time voice assistant

Play
2024-07-18T12:45:00Z #ai +1 🎧 26,544

In the midst of the demos & discussion about OpenAI’s GPT-4o voice assistant, Kyutai swooped in to release the first real-time AI voice assistant model and a pretty slick demo (Moshi). Chris & Daniel discuss what this more open approach to a voice assistant might catalyze. They also discuss recent changes to Gartner’s ranking of GenAI on their hype cycle.

Practical AI Practical AI #277

Vectoring in on Pinecone

Play
2024-07-10T17:30:00Z #ai +2 🎧 25,835

Daniel & Chris explore the advantages of vector databases with Roie Schwaber-Cohen of Pinecone. Roie starts with a very lucid explanation of why you need a vector database in your machine learning pipeline, and then goes on to discuss Pinecone’s vector database, designed to facilitate efficient storage, retrieval, and management of vector data.

Practical AI Practical AI #276

Stanford's AI Index Report 2024

Play
2024-07-02T19:45:00Z #ai 🎧 27,784

We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

Player art
  0:00 / 0:00