Data Science Icon

Data Science

111 episodes
All Topics

Practical AI Practical AI #296

scikit-learn & data science you own

Play
2024-11-19T21:00:00Z #ai +1 🎧 13,117

We are at GenAI saturation, so let’s talk about scikit-learn, a long time favorite for data scientists building classifiers, time series analyzers, dimensionality reducers, and more! Scikit-learn is deployed across industry and driving a significant portion of the “AI” that is actually in production. :probabl is a new kind of company that is stewarding this project along with a variety of other open source projects. Yann Lechelle and Guillaume Lemaitre share some of the vision behind the company and talk about the future of scikit-learn!

Practical AI Practical AI #291

Practical workflow orchestration

Play
2024-10-15T20:00:00Z #ai +1 🎧 27,260

Workflow orchestration has always been a pain for data scientists, but this is exacerbated in these AI hype days by agentic workflows executing arbitrary (not pre-defined) workflows with a variety of failure modes. Adam from Prefect joins us to talk through their open source Python library for orchestration and visibility into python-based pipelines. Along the way, he introduces us to things like Marvin, their AI engineering framework, and ControlFlow, their agent workflow system.

Practical AI Practical AI #290

Towards high-quality (maybe synthetic) datasets

Play
2024-10-09T13:30:00Z #ai +1 🎧 26,453

As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.

JS Party JS Party #329

A standard library for JavaScript

Play
2024-07-04T14:00:00Z #javascript +1 🎧 11,639

Philipp Burckhardt, Athan Reines & the team behind stdlib.io believe in a future in which the web is a preferred environment for numerical computation. They’ve been working toward building that future for over a decade. Thanks to listener, Brian Zelip, Jerod sits down with Philipp to learn all about this excellent effort: where it’s been & where it’s headed.

Changelog Interviews Changelog Interviews #538

Livebook's big launch week

Play
2023-05-03T19:00:00Z #elixir +2 🎧 28,195

José Valim joins Jerod to talk all about what’s new in Livebook – the Elixir-based interactive code notebook he’s been working on the last few years.

José made a big bet when he decided to bring machine learning to Elixir. That bet is now paying off with amazing new capabilities such as building and deploying a Whisper-based chat app to Hugging Face in just 15 minutes.

José demoed that and much more during Livebook’s first-ever launch week. Let’s get into it.

Practical AI Practical AI #217

Accelerated data science with a Kaggle grandmaster

Play
2023-04-04T20:00:00Z #ai +3 🎧 27,108

Daniel and Chris explore the intersection of Kaggle and real-world data science in this illuminating conversation with Christof Henkel, Senior Deep Learning Data Scientist at NVIDIA and Kaggle Grandmaster. Christof offers a very lucid explanation into how participation in Kaggle can positively impact a data scientist’s skill and career aspirations. He also shared some of his insights and approach to maximizing AI productivity uses GPU-accelerated tools like RAPIDS and DALI.

Practical AI Practical AI #203

AI competitions & cloud resources

Play
2022-12-07T21:00:00Z #ai +2 🎧 20,754

In this special episode, we interview some of the sponsors and teams from a recent case competition organized by Purdue University, Microsoft, INFORMS, and SIL International. 170+ teams from across the US and Canada participated in the competition, which challenged students to create AI-driven systems to caption images in three languages (Thai, Kyrgyz, and Hausa).

Practical AI Practical AI #201

Protecting us with the Database of Evil

Play
2022-11-16T17:20:00Z #ai +3 🎧 20,783

Online platforms and their users are susceptible to a barrage of threats – from disinformation to extremism to terror. Daniel and Chris chat with Matar Haller, VP of Data at ActiveFence, a leader in identifying online harm – is using a combination of AI technology and leading subject matter experts to provide Trust & Safety teams with precise, real-time data, in-depth intelligence, and automated tools to protect users and ensure safe online experiences.

Practical AI Practical AI #197

Data for All

Play
2022-10-18T14:05:00Z #datascience +1 🎧 20,816

People are starting to wake up to the fact that they have control and ownership over their data, and governments are moving quickly to legislate these rights. John K. Thompson has written a new book on the topic that is a must read! We talk about the new book in this episode along with how practitioners should be thinking about data exchanges, privacy, trust, and synthetic data.

Practical AI Practical AI #196

What's up, DocQuery?

Play
2022-10-12T15:00:00Z #ai +3 🎧 19,664

Chris sits down with Ankur Goyal to talk about DocQuery, Impira’s new open source ML model. DocQuery lets you ask questions about semi-structured data (like invoices) and unstructured documents (like contracts) using Large Language Models (LLMs). Ankur illustrates many of the ways DocQuery can help people tame documents, and references Chris’s real life tasks as a non-profit director to demonstrate that DocQuery is indeed practical AI.

Practical AI Practical AI #195

Production data labeling workflows

Play
2022-09-27T19:40:00Z #ai +2 🎧 21,275

It’s one thing to gather some labels for your data. It’s another thing to integrate data labeling into your workflows and infrastructure in a scalable, secure, and useful way. Mark from Xelex joins us to talk through some of what he has learned after helping companies scale their data annotation efforts. We get into workflow management, labeling instructions, team dynamics, and quality assessment. This is a super practical episode!

Practical AI Practical AI #191

Privacy in the age of AI

Play
2022-08-30T19:20:00Z #privacy +4 🎧 19,095

In this Fully-Connected episode, Daniel and Chris discuss concerns of privacy in the face of ever-improving AI / ML technologies. Evaluating AI’s impact on privacy from various angles, they note that ethical AI practitioners and data scientists have an enormous burden, given that much of the general population may not understand the implications of the data privacy decisions of everyday life.

This intentionally thought-provoking conversation advocates consideration and action from each listener when it comes to evaluating how their own activities either protect or violate the privacy of those whom they impact.

Practical AI Practical AI #187

AI IRL & Mozilla's Internet Health Report

Play
2022-08-02T20:30:00Z #ai +3 🎧 18,380

Every year Mozilla releases an Internet Health Report that combines research and stories exploring what it means for the internet to be healthy. This year’s report is focused on AI. In this episode, Solana and Bridget from Mozilla join us to discuss the power dynamics of AI and the current state of AI worldwide. They highlight concerning trends in the application of this transformational technology along with positive signs of change.

Practical AI Practical AI #183

AI's role in reprogramming immunity

Play
2022-06-28T19:00:00Z #ai +2 🎧 19,067

Drausin Wulsin, Director of ML at Immunai, joins Daniel & Chris to talk about the role of AI in immunotherapy, and why it is proving to be the foremost approach in fighting cancer, autoimmune disease, and infectious diseases.

The large amount of high dimensional biological data that is available today, combined with advanced machine learning techniques, creates unique opportunities to push the boundaries of what is possible in biology.

To that end, Immunai has built the largest immune database called AMICA that contains tens of millions of cells. The company uses cutting-edge transfer learning techniques to transfer knowledge across different cell types, studies, and even species.

Practical AI Practical AI #171

Clothing AI in a data fabric

Play
2022-03-16T13:40:00Z #ai +3 🎧 21,982

What happens when your data operations grow to Internet-scale? How do thousands or millions of data producers and consumers efficiently, effectively, and productively interact with each other? How are varying formats, protocols, security levels, performance criteria, and use-case specific characteristics meshed into one unified data fabric? Chris and Daniel explore these questions in this illuminating and Fully-Connected discussion that brings this new data technology into the light.

Practical AI Practical AI #166

Exploring deep reinforcement learning

Play
2022-02-01T20:00:00Z #ai +3 🎧 24,362

In addition to being a Developer Advocate at Hugging Face, Thomas Simonini is building next-gen AI in games that can talk and have smart interactions with the player using Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP). He also created a Deep Reinforcement Learning course that takes a DRL beginner to from zero to hero. Natalie and Chris explore what’s involved, and what the implications are, with a focus on the development path of the new AI data scientist.

Practical AI Practical AI #164

Democratizing ML for speech

Play
2022-01-19T15:30:00Z #ai +2 🎧 22,164

You might know about MLPerf, a benchmark from MLCommons that measures how fast systems can train models to a target quality metric. However, MLCommons is working on so much more! David Kanter joins us in this episode to discuss two new speech datasets that are democratizing machine learning for speech via data scale and language/speaker diversity.

Player art
  0:00 / 0:00