Any AI play that lacks an underlying data strategy is doomed to fail, and a big part of any data strategy is labeling. Michael, from Label Studio, joins us in this episode to discuss how the industry’s perception of data labeling is shifting. We cover open source tooling, validating labels, and integrating ML/AI models in the labeling loop.
Yonatan Geifman of Deci makes Daniel and Chris buckle up, and takes them on a tour of the ideas behind his amazing new inference platform. It enables AI developers to build, optimize, and deploy blazing-fast deep learning models on any hardware. Don’t blink or you’ll miss it!
In this episode, Peter Wang from Anaconda joins us again to go over their latest “State of Data Science” survey. The updated results include some insights related to data science work during COVID along with other topics including AutoML and model bias. Peter also tells us a bit about the exciting new partnership between Anaconda and Pyston (a fork of the standard CPython interpreter which has been extensively enhanced to improve the execution performance of most Python programs).
We’re back with another Fully Connected episode – Daniel and Chris dive into a series of articles called ‘A New AI Lexicon’ that collectively explore alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI. The fun begins early as they discuss and debate ‘An Electric Brain’ with strong opinions, and consider viewpoints that aren’t always popular.
SLICED is like the TV Show Chopped but for data science. Competitors get a never-before-seen dataset and two-hours to code a solution to a prediction challenge. Meg and Nick, the SLICED show hosts, join us in this episode to discuss how the show is creating much needed data science community. They give us a behind the scenes look at all the datasets, memes, contestants, scores, and chat of SLICED.
AI is being used to transform the most personal instrument we have, our voice, into something that can be “played.” This is fascinating in and of itself, but Yotam Mann from Never Before Heard Sounds is doing so much more! In this episode, he describes how he is using neural nets to process audio in real time for musicians and how AI is poised to change the music industry forever.
Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.
9 out of 10 AI projects don’t end up creating value in production. Why? At least partly because these projects utilize unstable models and drifting data. In this episode, Roey from BeyondMinds gives us some insights on how to filter garbage input, detect risky output, and generally develop more robust AI systems.
How did we get from symbolic AI to deep learning models that help you write code (i.e., GitHub and OpenAI’s new Copilot)? That’s what Chris and Daniel discuss in this episode about the history and future of deep learning (with some help from an article recently published in ACM and written by the luminaries of deep learning).
Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.
William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.
Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.
Tuhin Srivastava tells Daniel and Chris why BaseTen is the application development toolkit for data scientists. BaseTen’s goal is to make it simple to serve machine learning models, write custom business logic around them, and expose those through API endpoints without configuring any infrastructure.
90% of AI / ML applications never make it to market, because fine tuning models for maximum performance across disparate ML software solutions and hardware backends requires a ton of manual labor and is cost-prohibitive. Luis Ceze and his team created Apache TVM at the University of Washington, then left founded OctoML to bring the project to market.
To say that Jeff Adams is a trailblazer when it comes to speech technology is an understatement. Along with many other notable accomplishments, his team at Amazon developed the Echo, Dash, and Fire TV changing our perception of how we could interact with devices in our home. Jeff now leads Cobalt Speech and Language, and he was kind enough to join us for a discussion about human computer interaction, multimodal AI tasks, the history of language modeling, and AI for social good.
Smart home data is complicated. There are all kinds of devices, and they are in many different combinations, geographies, configurations, etc. This complicated data situation is further exacerbated during a pandemic when time series data seems to be filled with anomalies. Evan Welbourne joins us to discuss how Amazon is synthesizing this disparate data into functionality for the next generation of smart homes. He discusses the challenges of working with smart home technology, and he describes how they developed their latest feature called “hunches.”
Ro Gupta from CARMERA teaches Daniel and Chris all about road intelligence. CARMERA maintains the maps that move the world, from HD maps for automated driving to consumer maps for human navigation.
Nhung Ho joins Daniel and Chris to discuss how data science creates insights into financial operations and economic conditions. They delve into topics ranging from predictive forecasting to aid small businesses, to learning about the economic fallout from the COVID-19 Pandemic.
Dave Lacey takes Daniel and Chris on a journey that connects the user interfaces that we already know - TensorFlow and PyTorch - with the layers that connect to the underlying hardware. Along the way, we learn about Poplar Graph Framework Software. If you are the type of practitioner who values ‘under the hood’ knowledge, then this is the episode for you.
Nikola Mrkšić, CEO & Co-Founder of PolyAI, takes Daniel and Chris on a deep dive into conversational AI, describing the underlying technologies, and teaching them about the next generation of voice assistants that will be capable of handling true human-level conversations. It’s an episode you’ll be talking about for a long time!