You might know about MLPerf, a benchmark from MLCommons that measures how fast systems can train models to a target quality metric. However, MLCommons is working on so much more! David Kanter joins us in this episode to discuss two new speech datasets that are democratizing machine learning for speech via data scale and language/speaker diversity.
We have all seen how AI models fail, sometimes in spectacular ways. Yaron Singer joins us in this episode to discuss model vulnerabilities and automatic prevention of bad outcomes. By separating concerns and creating a “firewall” around your AI models, it’s possible to secure your AI workflows and prevent model failure.
In the second of the “AI in Africa” spotlight episodes, we welcome guests from Radiant Earth to talk about machine learning for earth observation. They give us a glimpse into their amazing data and tooling for working with satellite imagery, and they talk about use cases including crop identification and tropical storm wind speed estimation.
The time has come! OpenAI’s API is now available with no waitlist. Chris and Daniel dig into the API and playground during this episode, and they also discuss some of the latest tool from Hugging Face (including new reinforcement learning environments). Finally, Daniel gives an update on how he is building out infrastructure for a new AI team.
This episode is a follow up to our recent Fully Connected show discussing federated learning. In that previous discussion, we mentioned Flower (a “friendly” federated learning framework). Well, one of the creators of Flower, Daniel Beutel, agreed to join us on the show to discuss the project (and federated learning more broadly)! The result is a really interesting and motivating discussion of ML, privacy, distributed training, and open source AI.
Recently, GitHub released Copilot, which is an amazing AI pair programmer powered by OpenAI’s Codex model. In this episode, Natalie Pistunovich tells us all about Codex and helps us understand where it fits in our development workflow. We also discuss MLOps and how AI is influencing software engineering more generally.
In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.
Each year we discuss the latest insights from the Stanford Institute for Human-Centered Artificial Intelligence (HAI), and this year is no different. Daniel and Chris delve into key findings and discuss in this Fully-Connected episode. They also check out a study called ‘Delphi: Towards Machine Ethics and Norms’, about how to integrate ethics and morals into AI models.
There are a lot of people trying to innovate in the area of specialized AI hardware, but most of them are doing it with traditional transistors. Lightmatter is doing something totally different. They’re building photonic computers that are more power efficient and faster for AI inference. Nick Harris joins us in this episode to bring us up to speed on all the details.
When is the last time you had a eureka moment? Chris had a chat with Nicholas Mohnacky, CEO and Cofounder of bundleIQ, where they use natural language processing algorithms like GPT-3 to connect your Google GSuite with other personal data sources to find deeper connections, go beyond the obvious, and create eureka moments.
This is the first episode in a special series we are calling the “Spotlight on AI in Africa”. To kick things off, Joyce and Mutembesa from Makerere University’s AI Lab join us to talk about their amazing work in computer vision, natural language processing, and data collection. Their lab seeks out problems that matter in African communities, pairs those problems with appropriate data/tools, and works with the end users to ensure that solutions create real value.
Federated learning is increasingly practical for machine learning developers because of the challenges we face with model and data privacy. In this fully connected episode, Chris and Daniel dive into the topic and dissect the ideas behind federated learning, practicalities of implementing decentralized training, and current uses of the technique.
Tivadar Danka is an educator and content creator in the machine learning space, and he is writing a book to help practitioners go from high school mathematics to mathematics of neural networks. His explanations are lucid and easy to understand. You have never had such a fun and interesting conversation about calculus, linear algebra, and probability theory before!
Polarity Mapping is a framework to “help problems be solved in a realistic and multidimensional manner” (see here for more info). In this week’s fully connected episode, Chris and Daniel use this framework to help them discuss how an organization can strike a good balance between human intelligence and AI. AI can’t solve everything and humans need to be in-the-loop with many AI solutions.
As you start developing an AI/ML based solution, you quickly figure out that you need to run workflows. Not only that, you might need to run those workflows across various kinds of infrastructure (including GPUs) at scale. Ville Tuulos developed Metaflow while working at Netflix to help data scientists scale their work. In this episode, Ville tells us a bit more about Metaflow, his new book on data science infrastructure, and his approach to helping scale ML/AI work.
Any AI play that lacks an underlying data strategy is doomed to fail, and a big part of any data strategy is labeling. Michael, from Label Studio, joins us in this episode to discuss how the industry’s perception of data labeling is shifting. We cover open source tooling, validating labels, and integrating ML/AI models in the labeling loop.
Yonatan Geifman of Deci makes Daniel and Chris buckle up, and takes them on a tour of the ideas behind his amazing new inference platform. It enables AI developers to build, optimize, and deploy blazing-fast deep learning models on any hardware. Don’t blink or you’ll miss it!
In this episode, Peter Wang from Anaconda joins us again to go over their latest “State of Data Science” survey. The updated results include some insights related to data science work during COVID along with other topics including AutoML and model bias. Peter also tells us a bit about the exciting new partnership between Anaconda and Pyston (a fork of the standard CPython interpreter which has been extensively enhanced to improve the execution performance of most Python programs).
We’re back with another Fully Connected episode – Daniel and Chris dive into a series of articles called ‘A New AI Lexicon’ that collectively explore alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI. The fun begins early as they discuss and debate ‘An Electric Brain’ with strong opinions, and consider viewpoints that aren’t always popular.
In Kenya, 33% of maternal deaths are caused by delays in seeking care, and 55% of maternal deaths are caused by delays in action or inadequate care by providers. Jacaranda Health is employing NLP and dialogue system techniques to help mothers experience childbirth safely and with respect and to help newborns get a safe start in life. Jay and Sathy from Jacaranda join us in this episode to discuss how they are using AI to prioritize incoming SMS messages from mothers and help them get the care they need.
SLICED is like the TV Show Chopped but for data science. Competitors get a never-before-seen dataset and two-hours to code a solution to a prediction challenge. Meg and Nick, the SLICED show hosts, join us in this episode to discuss how the show is creating much needed data science community. They give us a behind the scenes look at all the datasets, memes, contestants, scores, and chat of SLICED.
AI is being used to transform the most personal instrument we have, our voice, into something that can be “played.” This is fascinating in and of itself, but Yotam Mann from Never Before Heard Sounds is doing so much more! In this episode, he describes how he is using neural nets to process audio in real time for musicians and how AI is poised to change the music industry forever.
Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.
9 out of 10 AI projects don’t end up creating value in production. Why? At least partly because these projects utilize unstable models and drifting data. In this episode, Roey from BeyondMinds gives us some insights on how to filter garbage input, detect risky output, and generally develop more robust AI systems.
How did we get from symbolic AI to deep learning models that help you write code (i.e., GitHub and OpenAI’s new Copilot)? That’s what Chris and Daniel discuss in this episode about the history and future of deep learning (with some help from an article recently published in ACM and written by the luminaries of deep learning).
Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.
William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.
Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.
Tuhin Srivastava tells Daniel and Chris why BaseTen is the application development toolkit for data scientists. BaseTen’s goal is to make it simple to serve machine learning models, write custom business logic around them, and expose those through API endpoints without configuring any infrastructure.
Today we’re sharing a special crossover episode from The Changelog podcast here on Practical AI. Recently, Daniel Whitenack joined Jerod Santo to talk with José Valim, Elixir creator, about Numerical Elixir. This is José’s newest project that’s bringing Elixir into the world of machine learning. They discuss why José chose this as his next direction, the team’s layered approach, influences and collaborators on this effort, and their awesome collaborative notebook that’s built on Phoenix LiveView.