Data Science Icon

Data Science

100 Stories
All Topics

Practical AI Practical AI #150

From notebooks to Netflix scale with MetaFlow

As you start developing an AI/ML based solution, you quickly figure out that you need to run workflows. Not only that, you might need to run those workflows across various kinds of infrastructure (including GPUs) at scale. Ville Tuulos developed MetaFlow while working at Netflix to help data scientists scale their work. In this episode, Ville tells us a bit more about MetaFlow, his new book on data science infrastructure, and his approach to helping scale ML/AI work.

Practical AI Practical AI #147

Anaconda + Pyston and more

In this episode, Peter Wang from Anaconda joins us again to go over their latest “State of Data Science” survey. The updated results include some insights related to data science work during COVID along with other topics including AutoML and model bias. Peter also tells us a bit about the exciting new partnership between Anaconda and Pyston (a fork of the standard CPython interpreter which has been extensively enhanced to improve the execution performance of most Python programs).

Practical AI Practical AI #146

Exploring a new AI lexicon

We’re back with another Fully Connected episode – Daniel and Chris dive into a series of articles called ‘A New AI Lexicon’ that collectively explore alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI. The fun begins early as they discuss and debate ‘An Electric Brain’ with strong opinions, and consider viewpoints that aren’t always popular.

Practical AI Practical AI #144

SLICED - will you make the (data science) cut?

SLICED is like the TV Show Chopped but for data science. Competitors get a never-before-seen dataset and two-hours to code a solution to a prediction challenge. Meg and Nick, the SLICED show hosts, join us in this episode to discuss how the show is creating much needed data science community. They give us a behind the scenes look at all the datasets, memes, contestants, scores, and chat of SLICED.

SLICED on Practical AI

Practical AI Practical AI #142

Building a data team

Inspired by a recent article from Erik Bernhardsson titled “Building a data team at a mid-stage startup: a short story”, Chris and Daniel discuss all things AI/data team building. They share some stories from their experiences kick starting AI efforts at various organizations and weight the pro and cons of things like centralized data management, prototype development, and a focus on engineering skills.

Practical AI Practical AI #139

Vector databases for machine learning

Pinecone is the first vector database for machine learning. Edo Liberty explains to Chris how vector similarity search works, and its advantages over traditional database approaches for machine learning. It enables one to search through billions of vector embeddings for similar matches, in milliseconds, and Pinecone is a managed service that puts this capability at the fingertips of machine learning practitioners.

Practical AI Practical AI #138

Multi-GPU training is hard (without PyTorch Lightning)

William Falcon wants AI practitioners to spend more time on model development, and less time on engineering. PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research that lets you train on multiple-GPUs, TPUs, CPUs and even in 16-bit precision without changing your code! In this episode, we dig deep into Lightning, how it works, and what it is enabling. William also discusses the Grid AI platform (built on top of PyTorch Lightning). This platform lets you seamlessly train 100s of Machine Learning models on the cloud from your laptop.

Practical AI Practical AI #137

Learning to learn deep learning 📖

Chris and Daniel sit down to chat about some exciting new AI developments including wav2vec-u (an unsupervised speech recognition model) and meta-learning (a new book about “How To Learn Deep Learning And Thrive In The Digital World”). Along the way they discuss engineering skills for AI developers and strategies for launching AI initiatives in established companies.

Lj Miranda ljvmiranda921.github.io

How to improve software engineering skills as a researcher

In which Lj Miranda proposes an exercise that data scientists can do to learn relevant software skills (with a tangible output in the end).

Create a machine learning application that receives HTTP requests, then deploy it as a containerized app.

I’m willing to wager that this is a worthy goal even if you’re coming from the software engineering side of the spectrum. Don’t worry, he’ll walk you through the steps.

Practical AI Practical AI #127

Women in Data Science (WiDS)

Chris has the privilege of talking with Stanford Professor Margot Gerritsen, who co-leads the Women in Data Science (WiDS) Worldwide Initiative. This is a conversation that everyone should listen to. Professor Gerritsen’s profound insights into how we can all help the women in our lives succeed - in data science and in life - is a ‘must listen’ episode for everyone, regardless of gender.

Practical AI Practical AI #122

The AI doc will see you now

Elad Walach of Aidoc joins Chris to talk about the use of AI for medical imaging interpretation. Starting with the world’s largest annotated training data set of medical images, Aidoc is the radiologist’s best friend, helping the doctor to interpret imagery faster, more accurately, and improving the imaging workflow along the way. Elad’s vision for the transformative future of AI in medicine clearly soothes Chris’s concern about managing his aging body in the years to come. ;-)

Career mihaileric.com

We don't need data scientists, we need data engineers

TLDR:

There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.

This vibes with what I’ve been hearing on Practical AI lately. Organizations are facing big challenges when it comes to deploying, maintaining, and improving data processing tools and platforms in production settings. Big challenges produce big opportunities. And what does a data engineer do? According to this article:

Develops a robust and scalable set of data processing tools/platforms. Must be comfortable with SQL/NoSQL database wrangling and building/maintaining ETL pipelines.

If you have that skillset, you are in high demand today. And if you can adapt that skillset and be considered a ML engineer, you will be in high demand for a long, long time.

We don't need data scientists, we need data engineers
0:00 / 0:00