AI (Artificial Intelligence) Icon

AI (Artificial Intelligence)

Machines simulating human characteristics and intelligence.
297 Stories
All Topics

OpenAI Icon OpenAI

OpenAI introduces Whisper (open source speech recognition)

They’re really putting the Open in OpenAI with this one…

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

We might need to give this a spin on our transcripts. Who knows, maybe our next big innovation could be The Changelog in German, French, Spanish, etc!

Matt Bilyeu

Responding to recruiter emails with GPT-3

Like many software engineers, Matt Bilyeu receives multiple emails from recruiters weekly. And, because he’s polite (and for other reasons) he tries to respond (politely) to all of them. But…

It would be ideal if I could automate sending these responses. Assuming I get four such emails per week and that it takes two minutes to read and respond to each one, automating this would save me about seven hours of administrative work per year.

Enter the GPT-3 API and some code that gets run by a future cron job (now that he’s tested this on a handful of emails) and Matt auto-responds to al the emails, continues to be polite, while also saving (his) time. It’s AI Matt responding the way real Matt would.

Simon Willison

Stable Diffusion is a really big deal

Simon Willison explains what it is:

Stable Diffusion is a new “text-to-image diffusion model” that was released to the public by six days ago, on August 22nd.

It’s similar to models like Open AI’s DALL-E, but with one crucial difference: they released the whole thing.

And why it’s a really big deal:

In just a few days, there has been an explosion of innovation around it. The things people are building are absolutely astonishing.

He then details some of the innovation and it is staggering, to say the least. Open FTW!

AI (Artificial Intelligence)

The AI art apocalypse

Alexander Wales:

This image was created by an AI, MidJourney. All I had to do was type in a prompt (“wildfire”) and aspect ratio. This AI is pretty good, but nowhere near the state of the art, and AI like it are, over the next few years, going to make art like this available within seconds at a cost of pennies. This applies not just to “art” like the above, which is going to accompany my prose and worldbuilding projects, but to almost every area of life where you see pictures of any kind. I think it’s hard to understate how big of a deal this will end up being, and this blog post is largely my attempt to collate a lot of the arguments under one roof, in part because some of the arguments aren’t actually arguments at all.

Microsoft News Icon Microsoft News

Microsoft's new AI for Beginners course

A 12-week, 24-course curriculum covering:

  • Different approaches to Artificial Intelligence, including the “good old” symbolic approach with Knowledge Representation and reasoning (GOFAI).
  • Neural Networks and Deep Learning, which are at the core of modern AI. We will illustrate the concepts behind these important topics using code in two of the most popular frameworks - TensorFlow and PyTorch.
  • Neural Architectures for working with images and text. We will cover recent models but may lack a little bit on the state-of-the-art.
  • Less popular AI approaches, such as Genetic Algorithms and Multi-Agent Systems.
Microsoft's new AI for Beginners course

AI (Artificial Intelligence)

A human-in-the-loop workflow for creating HD images from text

DALL-E can generate some amazing results, but we’re still in a phase of AI’s progress where having humans involved in the process is just better. Here’s how the authors of this workflow explain it:

Generative art is a creative process. While recent advances of DALL·E unleash people’s creativity, having a single-prompt-single-output UX/UI locks the imagination to a single possibility, which is bad no matter how fine this single result is. DALL·E Flow is an alternative to the one-liner, by formalizing the generative art as an iterative procedure.

A human-in-the-loop workflow for creating HD images from text


The Deepfake Offensive Toolkit

dot (aka Deepfake Offensive Toolkit) makes real-time, controllable deepfakes ready for virtual cameras injection. dot is created for performing penetration testing against e.g. identity verification and video conferencing systems, for the use by security analysts, Red Team members, and biometrics researchers.

What’s crazy is dot deepfakes don’t require any additional training. 🤯

The Deepfake Offensive Toolkit


Imagen (Google's text-to-image neural net) implemented in Pytorch

Last week I logged the very impressive Imagen project, which smarter people than me have said is the SOTA for text-to-image synthesis. Now a WIP implementation is just a pip install imagen-pytorch away.

Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design.

Google Icon Google

A text-to-image diffusion model with an unprecedented degree of photorealism

Google researchers are giving DALL-E a run for its money:

Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.

A text-to-image diffusion model with an unprecedented degree of photorealism

AI (Artificial Intelligence)

What would it take for artificial intelligence to make real progress?

Gary Marcus makes the case that deep learning has hit a wall:

Let me start by saying a few things that seem obvious,” Geoffrey Hinton, “Godfather” of deep learning, and one of the most celebrated scientists of our time, told a leading AI conference in Toronto in 2016. “If you work as a radiologist you’re like the coyote that’s already over the edge of the cliff but hasn’t looked down.” Deep learning is so well-suited to reading images from MRIs and CT scans, he reasoned, that people should “stop training radiologists now” and that it’s “just completely obvious within five years deep learning is going to do better.”

Fast forward to 2022, and not a single radiologist has been replaced.

But he doesn’t stop there. After laying out multiple examples of deep learning failures, he change tone:

For the first time in 40 years, I finally feel some optimism about AI.

Read the article to find out why that is.

Chip Huyen

Real-time machine learning: challenges and solutions

Chip Huyen:

In the last year, I’ve talked to ~30 companies in different industries about their challenges with real-time machine learning. I’ve also worked with quite a few to find the solutions. This post outlines the solutions for (1) online prediction and (2) continual learning, with step-by-step use cases, considerations, and technologies required for each level.

AI (Artificial Intelligence)

A 280 billion parameter language model named Gopher

In the quest to explore language models and develop new ones, we trained a series of transformer language models of different sizes, ranging from 44 million parameters to 280 billion parameters.

Our research investigated the strengths and weaknesses of those different-sized models, highlighting areas where increasing the scale of a model continues to boost performance – for example, in areas like reading comprehension, fact-checking, and the identification of toxic language. We also surface results where model scale does not significantly improve results — for instance, in logical reasoning and common-sense tasks.

Sometimes size matters, sometimes it doesn’t as much. Fascinating analysis.

A 280 billion parameter language model named Gopher

Machine Learning

Boring machine learning is where it's at

It surprises me that when people think of “software that brings about the singularity” they think of text models, or of RL agents. But they sneer at decision tree boosting and the like as boring algorithms for boring problems.

To me, this seems counter-intuitive, and the fact that most people researching ML are interested in subjects like vision and language is flabbergasting. For one, because getting anywhere productive in these fields is really hard, for another, because their usefulness seems relatively minimal.

AI (Artificial Intelligence)

Jina – build search-as-a-service powered by deep learning in just minutes

Jina calls itself a “cloud-native neural search framework”. What is neural search, exactly?

The core idea of neural search is to leverage state-of-the-art deep neural networks to build every component of a search system. In short, neural search is deep neural network-powered information retrieval. In academia, it’s often called neural IR.

And what can it do for you?

Thanks to recent advances in deep neural networks, a neural search system can go way beyond simple text search. It enables advanced intelligence on all kinds of unstructured data, such as images, audio, video, PDF, 3D mesh, you name it.

For example, retrieving animation according to some beats; finding the best-fit memes according to some jokes; scanning a table with your iPhone’s LiDAR camera and finding similar furniture at IKEA. Neural search systems enable what traditional search can’t: multi/cross-modal data retrieval.

This project looks quite established and collaborative. 172 contributors and counting…

The Verge Icon The Verge

OpenAI Codex translates english into code

Codex is a descendant of GPT-3 – its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.

“We see this as a tool to multiply programmers,” OpenAI’s CTO and co-founder Greg Brockman told The Verge. “Programming has two parts to it: you have ‘think hard about a problem and try to understand it,’ and ‘map those small pieces to existing code, whether it’s a library, a function, or an API.’” The second part is tedious, he says, but it’s what Codex is best at. “It takes people who are already programmers and removes the drudge work.”

Mozilla Icon Mozilla

Mozilla Common Voice adds 16 new languages and 4,600 new hours of speech

That’s a big addition. Here’s what Hillary Juma (Common Voice’s community mgr) had to say about it:

Internet access is increasingly mediated through speech: Voice assistants and smart speakers give us directions, search for information, connect us to friends, used in assistive technology and much more. Yet this technology doesn’t work for millions of people. For example, neither Amazon’s Alexa, Apple’s Siri, nor Google Home support a single native African language.

By giving individuals the ability to share their speech, we can help ensure all communities have access to voice technology and the opportunity it unlocks.

What a great initiative! (I first heard about Common Voice on Practical AI.)


Free Software Foundations declares GitHub Copilot "unacceptable and unjust"

The FSF is funding white papers on “philosophical and legal questions around Copilot”. In their post announcing the fund, Donald Robertson states:

The Free Software Foundation has received numerous inquiries about our position on these questions. We can see that Copilot’s use of freely licensed software has many implications for an incredibly large portion of the free software community. Developers want to know whether training a neural network on their software can really be considered fair use. Others who may be interested in using Copilot wonder if the code snippets and other elements copied from GitHub-hosted repositories could result in copyright infringement. And even if everything might be legally copacetic, activists wonder if there isn’t something fundamentally unfair about a proprietary software company building a service off their work.

One thing is for sure: there are many open questions that need answering. How we (as a community / industry) go about answering those questions is much less clear. But it’ll probably take place on blogs, forums, GitHub Issues, and even court rooms over the next decade.

0:00 / 0:00