As you start developing an AI/ML based solution, you quickly figure out that you need to run workflows. Not only that, you might need to run those workflows across various kinds of infrastructure (including GPUs) at scale. Ville Tuulos developed MetaFlow while working at Netflix to help data scientists scale their work. In this episode, Ville tells us a bit more about MetaFlow, his new book on data science infrastructure, and his approach to helping scale ML/AI work.
Any AI play that lacks an underlying data strategy is doomed to fail, and a big part of any data strategy is labeling. Michael, from Label Studio, joins us in this episode to discuss how the industry’s perception of data labeling is shifting. We cover open source tooling, validating labels, and integrating ML/AI models in the labeling loop.
Yonatan Geifman of Deci makes Daniel and Chris buckle up, and takes them on a tour of the ideas behind his amazing new inference platform. It enables AI developers to build, optimize, and deploy blazing-fast deep learning models on any hardware. Don’t blink or you’ll miss it!
In this episode, Peter Wang from Anaconda joins us again to go over their latest “State of Data Science” survey. The updated results include some insights related to data science work during COVID along with other topics including AutoML and model bias. Peter also tells us a bit about the exciting new partnership between Anaconda and Pyston (a fork of the standard CPython interpreter which has been extensively enhanced to improve the execution performance of most Python programs).
We’re back with another Fully Connected episode – Daniel and Chris dive into a series of articles called ‘A New AI Lexicon’ that collectively explore alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI. The fun begins early as they discuss and debate ‘An Electric Brain’ with strong opinions, and consider viewpoints that aren’t always popular.
In Kenya, 33% of maternal deaths are caused by delays in seeking care, and 55% of maternal deaths are caused by delays in action or inadequate care by providers. Jacaranda Health is employing NLP and dialogue system techniques to help mothers experience childbirth safely and with respect and to help newborns get a safe start in life. Jay and Sathy from Jacaranda join us in this episode to discuss how they are using AI to prioritize incoming SMS messages from mothers and help them get the care they need.
SLICED is like the TV Show Chopped but for data science. Competitors get a never-before-seen dataset and two-hours to code a solution to a prediction challenge. Meg and Nick, the SLICED show hosts, join us in this episode to discuss how the show is creating much needed data science community. They give us a behind the scenes look at all the datasets, memes, contestants, scores, and chat of SLICED.
AI is being used to transform the most personal instrument we have, our voice, into something that can be “played.” This is fascinating in and of itself, but Yotam Mann from Never Before Heard Sounds is doing so much more! In this episode, he describes how he is using neural nets to process audio in real time for musicians and how AI is poised to change the music industry forever.
Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.
9 out of 10 AI projects don’t end up creating value in production. Why? At least partly because these projects utilize unstable models and drifting data. In this episode, Roey from BeyondMinds gives us some insights on how to filter garbage input, detect risky output, and generally develop more robust AI systems.
How did we get from symbolic AI to deep learning models that help you write code (i.e., GitHub and OpenAI’s new Copilot)? That’s what Chris and Daniel discuss in this episode about the history and future of deep learning (with some help from an article recently published in ACM and written by the luminaries of deep learning).
Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.
William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.
Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.
Tuhin Srivastava tells Daniel and Chris why BaseTen is the application development toolkit for data scientists. BaseTen’s goal is to make it simple to serve machine learning models, write custom business logic around them, and expose those through API endpoints without configuring any infrastructure.
Today we’re sharing a special crossover episode from The Changelog podcast here on Practical AI. Recently, Daniel Whitenack joined Jerod Santo to talk with José Valim, Elixir creator, about Numerical Elixir. This is José’s newest project that’s bringing Elixir into the world of machine learning. They discuss why José chose this as his next direction, the team’s layered approach, influences and collaborators on this effort, and their awesome collaborative notebook that’s built on Phoenix LiveView.
90% of AI / ML applications never make it to market, because fine tuning models for maximum performance across disparate ML software solutions and hardware backends requires a ton of manual labor and is cost-prohibitive. Luis Ceze and his team created Apache TVM at the University of Washington, then left founded OctoML to bring the project to market.
To say that Jeff Adams is a trailblazer when it comes to speech technology is an understatement. Along with many other notable accomplishments, his team at Amazon developed the Echo, Dash, and Fire TV changing our perception of how we could interact with devices in our home. Jeff now leads Cobalt Speech and Language, and he was kind enough to join us for a discussion about human computer interaction, multimodal AI tasks, the history of language modeling, and AI for social good.
Smart home data is complicated. There are all kinds of devices, and they are in many different combinations, geographies, configurations, etc. This complicated data situation is further exacerbated during a pandemic when time series data seems to be filled with anomalies. Evan Welbourne joins us to discuss how Amazon is synthesizing this disparate data into functionality for the next generation of smart homes. He discusses the challenges of working with smart home technology, and he describes how they developed their latest feature called “hunches.”
Ro Gupta from CARMERA teaches Daniel and Chris all about road intelligence. CARMERA maintains the maps that move the world, from HD maps for automated driving to consumer maps for human navigation.
Nhung Ho joins Daniel and Chris to discuss how data science creates insights into financial operations and economic conditions. They delve into topics ranging from predictive forecasting to aid small businesses, to learning about the economic fallout from the COVID-19 Pandemic.
Dave Lacey takes Daniel and Chris on a journey that connects the user interfaces that we already know - TensorFlow and PyTorch - with the layers that connect to the underlying hardware. Along the way, we learn about Poplar Graph Framework Software. If you are the type of practitioner who values ‘under the hood’ knowledge, then this is the episode for you.
Nikola Mrkšić, CEO & Co-Founder of PolyAI, takes Daniel and Chris on a deep dive into conversational AI, describing the underlying technologies, and teaching them about the next generation of voice assistants that will be capable of handling true human-level conversations. It’s an episode you’ll be talking about for a long time!
Chris has the privilege of talking with Stanford Professor Margot Gerritsen, who co-leads the Women in Data Science (WiDS) Worldwide Initiative. This is a conversation that everyone should listen to. Professor Gerritsen’s profound insights into how we can all help the women in our lives succeed - in data science and in life - is a ‘must listen’ episode for everyone, regardless of gender.
David Sweet, author of “Tuning Up: From A/B testing to Bayesian optimization”, introduces Dan and Chris to system tuning, and takes them from A/B testing to response surface methodology, contextual bandit, and finally bayesian optimization. Along the way, we get fascinating insights into recommender systems and high-frequency trading!
Our Slack community wanted to hear about AI-driven drug discovery, and we listened. Abraham Heifets from Atomwise joins us for a fascinating deep dive into the intersection of deep learning models and molecule binding. He describes how these methods work and how they are beginning to help create drugs for “undruggable” diseases!
Empirical analysis from Roy Schwartz (Hebrew University of Jerusalem) and Jesse Dodge (AI2) suggests the AI research community has paid relatively little attention to computational efficiency. A focus on accuracy rather than efficiency increases the carbon footprint of AI research and increases research inequality. In this episode, Jesse and Roy advocate for increased research activity in Green AI (AI research that is more environmentally friendly and inclusive). They highlight success stories and help us understand the practicalities of making our workflows more efficient.
In this Fully-Connected episode, Chris and Daniel discuss low code / no code development, GPU jargon, plus more data leakage issues. They also share some really cool new learning opportunities for leveling up your AI/ML game!
Elad Walach of Aidoc joins Chris to talk about the use of AI for medical imaging interpretation. Starting with the world’s largest annotated training data set of medical images, Aidoc is the radiologist’s best friend, helping the doctor to interpret imagery faster, more accurately, and improving the imaging workflow along the way. Elad’s vision for the transformative future of AI in medicine clearly soothes Chris’s concern about managing his aging body in the years to come. ;-)
John Myers of Gretel puts on his apron and rolls up his sleeves to show Dan and Chris how to cook up some synthetic data for automated data labeling, differential privacy, and other purposes. His military and intelligence community background give him an interesting perspective that piqued the interest of our intrepid hosts.