It’s one thing to gather some labels for your data. It’s another thing to integrate data labeling into your workflows and infrastructure in a scalable, secure, and useful way. Mark from Xelex joins us to talk through some of what he has learned after helping companies scale their data annotation efforts. We get into workflow management, labeling instructions, team dynamics, and quality assessment. This is a super practical episode!
In this Fully-Connected episode, Daniel and Chris discuss concerns of privacy in the face of ever-improving AI / ML technologies. Evaluating AI’s impact on privacy from various angles, they note that ethical AI practitioners and data scientists have an enormous burden, given that much of the general population may not understand the implications of the data privacy decisions of everyday life.
This intentionally thought-provoking conversation advocates consideration and action from each listener when it comes to evaluating how their own activities either protect or violate the privacy of those whom they impact.
As machine learning moves towards real-time, streaming technology is becoming increasingly important for data scientists. Like many people coming from a machine learning background, I used to dread streaming. In our recent survey, almost half of the data scientists we asked said they would like to move from batch prediction to online prediction but can’t because streaming is hard, both technically and operationally…
Over the last year, working with a co-founder who’s super deep into streaming, I’ve learned that streaming can be quite intuitive. This post is an attempt to rephrase what I’ve learned.
Every year Mozilla releases an Internet Health Report that combines research and stories exploring what it means for the internet to be healthy. This year’s report is focused on AI. In this episode, Solana and Bridget from Mozilla join us to discuss the power dynamics of AI and the current state of AI worldwide. They highlight concerning trends in the application of this transformational technology along with positive signs of change.
A common argument against using Nx for a new machine learning project is its perceived lack of a library/support for some common task that is available in Python. In this post, I’ll do my best to highlight areas where this is not the case, and compare and contrast Elixir projects with their Python equivalents. Additionally, I’ll discuss areas where the Elixir ecosystem still comes up short, and using Nx for a new project might not be the best idea.
Sean is a prominent member of the Elixir community, so that’s the perspective on display here, but it’s a thorough and well-reasoned comparison. He concludes:
While there are still many gaps in the Elixir ecosystem, the progress over the last year has been rapid. Almost every library I’ve mentioned in this post is less than two years old, and I suspect there will be many more projects to fill some of the gaps I’ve mentioned in the coming months.
Drausin Wulsin, Director of ML at Immunai, joins Daniel & Chris to talk about the role of AI in immunotherapy, and why it is proving to be the foremost approach in fighting cancer, autoimmune disease, and infectious diseases.
The large amount of high dimensional biological data that is available today, combined with advanced machine learning techniques, creates unique opportunities to push the boundaries of what is possible in biology.
To that end, Immunai has built the largest immune database called AMICA that contains tens of millions of cells. The company uses cutting-edge transfer learning techniques to transfer knowledge across different cell types, studies, and even species.
Hugging Face is increasingly becomes the “hub” of AI innovation. In this episode, Merve Noyan joins us to dive into this hub in more detail. We discuss automation around model cards, reproducibility, and the new community features. If you are wanting to engage with the wider AI community, this is the show for you!
AI is discovering new drugs. Sound like science fiction? Not at Absci! Sean and Joshua join us to discuss their AI-driven pipeline for drug discovery. We discuss the tech along with how it might change how we think about healthcare at the most fundamental level.
In the fourth “AI in Africa” spotlight episode, we welcome Leonida Mutuku and Godliver Owomugisha, two experts in applying advanced technology in agriculture. We had a great discussion about ending poverty, hunger, and inequality in Africa via AI innovation. The discussion touches on open data, relevant models, ethics, and more.
Abubakar Abid joins Daniel and Chris for a tour of Gradio and tells them about the project joining Hugging Face. What’s Gradio? The fastest way to demo your machine learning model with a friendly web interface, allowing non-technical users to access, use, and give feedback on models.
This last week has been a big week for AI news. BigScience is training a huge language model (while the world watches), and NVIDIA announced their latest “Hopper” GPUs. Chris and Daniel discuss these and other topics on this fully connected episode!
What happens when your data operations grow to Internet-scale? How do thousands or millions of data producers and consumers efficiently, effectively, and productively interact with each other? How are varying formats, protocols, security levels, performance criteria, and use-case specific characteristics meshed into one unified data fabric? Chris and Daniel explore these questions in this illuminating and Fully-Connected discussion that brings this new data technology into the light.
From MIT researchers who have an AI system that rapidly predicts how two proteins will attach, to Facebook’s first high-performance self-supervised algorithm that works for speech, vision, and text, Daniel and Chris survey the AI landscape for notable milestones in the application of AI in industry and research.
It’s still in closed beta, but this looks like a really cool environment for data scientists and other folks who code to accomplish other goals vs code as craft. One cool thing you can do is take your Jupyter notebooks and convert them to PyFlow graphs (and vice versa).
In addition to being a Developer Advocate at Hugging Face, Thomas Simonini is building next-gen AI in games that can talk and have smart interactions with the player using Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP). He also created a Deep Reinforcement Learning course that takes a DRL beginner to from zero to hero. Natalie and Chris explore what’s involved, and what the implications are, with a focus on the development path of the new AI data scientist.
I love how much hacking has been inspired by Wordle.
The Wordle source code contains 2,315 days of answers (all common 5-letter English words) and 10,657 other valid, less-common 5-letter English words.
We combine these to form a set of 12,972 possible words/answers.
We then simulate playing 1,000 Wordle games for each of these possible words, guessing based on the frequency of the word in the English language and the feedback received.
Then we take three measures to evaluate the observed distribution of ⬛🟨🟩 squares on Twitter according to our valid words.
The resulting code is included in the article.
From drug discovery at the Quebec AI Institute to improving capabilities with low-resourced languages at the Masakhane Research Foundation and Google AI, Bonaventure Dossou looks for opportunities to use his expertise in natural language processing to improve the world - and especially to help his homeland in the Benin Republic in Africa.
You might know about MLPerf, a benchmark from MLCommons that measures how fast systems can train models to a target quality metric. However, MLCommons is working on so much more! David Kanter joins us in this episode to discuss two new speech datasets that are democratizing machine learning for speech via data scale and language/speaker diversity.
We have all seen how AI models fail, sometimes in spectacular ways. Yaron Singer joins us in this episode to discuss model vulnerabilities and automatic prevention of bad outcomes. By separating concerns and creating a “firewall” around your AI models, it’s possible to secure your AI workflows and prevent model failure.
ZenML is an extensible MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstractions that are catered towards ML workflows.
The code base was recently completely rewritten with better abstractions and to set us up for our ongoing growth and inclusion of more integrations with tools that data scientists love to use.
In the second of the “AI in Africa” spotlight episodes, we welcome guests from Radiant Earth to talk about machine learning for earth observation. They give us a glimpse into their amazing data and tooling for working with satellite imagery, and they talk about use cases including crop identification and tropical storm wind speed estimation.
The time has come! OpenAI’s API is now available with no waitlist. Chris and Daniel dig into the API and playground during this episode, and they also discuss some of the latest tool from Hugging Face (including new reinforcement learning environments). Finally, Daniel gives an update on how he is building out infrastructure for a new AI team.
This episode is a follow up to our recent Fully Connected show discussing federated learning. In that previous discussion, we mentioned Flower (a “friendly” federated learning framework). Well, one of the creators of Flower, Daniel Beutel, agreed to join us on the show to discuss the project (and federated learning more broadly)! The result is a really interesting and motivating discussion of ML, privacy, distributed training, and open source AI.
Recently, GitHub released Copilot, which is an amazing AI pair programmer powered by OpenAI’s Codex model. In this episode, Natalie Pistunovich tells us all about Codex and helps us understand where it fits in our development workflow. We also discuss MLOps and how AI is influencing software engineering more generally.
In this Fully-Connected episode, Daniel and Chris ponder whether in-person AI conferences are on the verge of making a post-pandemic comeback. Then on to BigScience from Hugging Face, a year-long research workshop on large multilingual models and datasets. Specifically they dive into the T0, a series of natural language processing (NLP) AI models specifically trained for researching zero-shot multitask learning. Daniel provides a brief tour of the possible with the T0 family. They finish up with a couple of new learning resources.