Practical AI Practical AI #150

From notebooks to Netflix scale with MetaFlow

As you start developing an AI/ML based solution, you quickly figure out that you need to run workflows. Not only that, you might need to run those workflows across various kinds of infrastructure (including GPUs) at scale. Ville Tuulos developed MetaFlow while working at Netflix to help data scientists scale their work. In this episode, Ville tells us a bit more about MetaFlow, his new book on data science infrastructure, and his approach to helping scale ML/AI work.

Practical AI Practical AI #147

Anaconda + Pyston and more

In this episode, Peter Wang from Anaconda joins us again to go over their latest “State of Data Science” survey. The updated results include some insights related to data science work during COVID along with other topics including AutoML and model bias. Peter also tells us a bit about the exciting new partnership between Anaconda and Pyston (a fork of the standard CPython interpreter which has been extensively enhanced to improve the execution performance of most Python programs).

AI (Artificial Intelligence)

Jina – build search-as-a-service powered by deep learning in just minutes

Jina calls itself a “cloud-native neural search framework”. What is neural search, exactly?

The core idea of neural search is to leverage state-of-the-art deep neural networks to build every component of a search system. In short, neural search is deep neural network-powered information retrieval. In academia, it’s often called neural IR.

And what can it do for you?

Thanks to recent advances in deep neural networks, a neural search system can go way beyond simple text search. It enables advanced intelligence on all kinds of unstructured data, such as images, audio, video, PDF, 3D mesh, you name it.

For example, retrieving animation according to some beats; finding the best-fit memes according to some jokes; scanning a table with your iPhone’s LiDAR camera and finding similar furniture at IKEA. Neural search systems enable what traditional search can’t: multi/cross-modal data retrieval.

This project looks quite established and collaborative. 172 contributors and counting…

The Verge Icon The Verge

OpenAI Codex translates english into code

Codex is a descendant of GPT-3 – its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.

“We see this as a tool to multiply programmers,” OpenAI’s CTO and co-founder Greg Brockman told The Verge. “Programming has two parts to it: you have ‘think hard about a problem and try to understand it,’ and ‘map those small pieces to existing code, whether it’s a library, a function, or an API.’” The second part is tedious, he says, but it’s what Codex is best at. “It takes people who are already programmers and removes the drudge work.”

Practical AI Practical AI #146

Exploring a new AI lexicon

We’re back with another Fully Connected episode – Daniel and Chris dive into a series of articles called ‘A New AI Lexicon’ that collectively explore alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI. The fun begins early as they discuss and debate ‘An Electric Brain’ with strong opinions, and consider viewpoints that aren’t always popular.

Mozilla Icon Mozilla

Mozilla Common Voice adds 16 new languages and 4,600 new hours of speech

That’s a big addition. Here’s what Hillary Juma (Common Voice’s community mgr) had to say about it:

Internet access is increasingly mediated through speech: Voice assistants and smart speakers give us directions, search for information, connect us to friends, used in assistive technology and much more. Yet this technology doesn’t work for millions of people. For example, neither Amazon’s Alexa, Apple’s Siri, nor Google Home support a single native African language.

By giving individuals the ability to share their speech, we can help ensure all communities have access to voice technology and the opportunity it unlocks.

What a great initiative! (I first heard about Common Voice on Practical AI.)

Practical AI Practical AI #142

Building a data team

Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.

Chip Huyen

A free book on how to survive the machine learning interview process

Chip Huyen has been on both sides of ML-related interviews and has a lot of expertise on the process:

If you’ve picked up this book because you’re interested in working with one of the key emerging technologies of the 2020s but not sure where to start, you’re in the right place. Whether you want to become an ML engineer, a platform engineer, a research scientist, or you want to do ML but don’t yet know the differences among those titles, I hope that this book will give you some useful pointers.

Practical AI Practical AI #139

Vector databases for machine learning

Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.

Facebook Engineering Icon Facebook Engineering

A data augmentations library for audio, image, text, and video

AugLy is a great library to utilize for augmenting your data in model training, or to evaluate the robustness gaps of your model! We designed AugLy to include many specific data augmentations that users perform in real life on internet platforms like Facebook’s – for example making an image into a meme, overlaying text/emojis on images/videos, reposting a screenshot from social media. While AugLy contains more generic data augmentations as well, it will be particularly useful to you if you’re working on a problem like copy detection, hate speech detection, or copyright infringement where these “internet user” types of data augmentations are prelevant.

Practical AI Practical AI #138

Multi-GPU training is hard (without PyTorch Lightning)

William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.

Practical AI Practical AI #137

Learning to learn deep learning 📖

Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.

Command line interface

Command-line tools for speech and intent recognition on Linux

This isn’t merely a speech-to-text thing. It also provides intent recognition, which makes it great for doing voice commands. For example, when trained with this template, the following command:

$ voice2json transcribe-wav \
      < turn-on-the-light.wav | \
      voice2json recognize-intent | \
      jq .

Produces this JSON event:

    "text": "turn on the light",
    "intent": {
        "name": "LightState"
    "slots": {
        "state": "on"

And it can be retrained quickly enough to do it at runtime. Cool stuff!

Practical AI Practical AI #135

Elixir meets machine learning

Today we’re sharing a special crossover episode from The Changelog podcast here on Practical AI. Recently, Daniel Whitenack joined Jerod Santo to talk with José Valim, Elixir creator, about Numerical Elixir. This is José’s newest project that’s bringing Elixir into the world of machine learning. They discuss why José chose this as his next direction, the team’s layered approach, influences and collaborators on this effort, and their awesome collaborative notebook that’s built on Phoenix LiveView.

