Natural Language Processing Icon

Natural Language Processing

Natural language processing (NLP) is the study of how computers and humans interact.
46 Stories
All Topics

AI (Artificial Intelligence) github.com

Kern AI's refinery is a data-centric IDE for NLP

Like the data-centric sibling of your favorite programming environment. It provides an easy-to-use interface for weak supervision as well as extensive data management, neural search and monitoring to ensure that the quality of your training data is as good as possible.

This won’t rid you of the need to manually label, but it’ll save you time in the process!

Kern AI's refinery is a data-centric IDE for NLP

Python github.com

An open source, online reverse dictionary

This is the first time I’ve heard of a reverse dictionary, but now that I have… so cool!

Opposite to a regular (forward) dictionary that provides definitions for query words, a reverse dictionary returns words semantically matching the query descriptions.

Ever had a word on the tip of your tongue and you Just. Can’t. Think of it?! Reverse dictionary!

An open source, online reverse dictionary

Mozilla Icon Mozilla

Mozilla Common Voice adds 16 new languages and 4,600 new hours of speech

That’s a big addition. Here’s what Hillary Juma (Common Voice’s community mgr) had to say about it:

Internet access is increasingly mediated through speech: Voice assistants and smart speakers give us directions, search for information, connect us to friends, used in assistive technology and much more. Yet this technology doesn’t work for millions of people. For example, neither Amazon’s Alexa, Apple’s Siri, nor Google Home support a single native African language.

By giving individuals the ability to share their speech, we can help ensure all communities have access to voice technology and the opportunity it unlocks.

What a great initiative! (I first heard about Common Voice on Practical AI.)

Tooling github.com

Search inside YouTube videos using natural language

Use OpenAI’s CLIP neural network to search inside YouTube videos. You can try it by running the notebook on Google Colab.

The README has a bunch of examples of things you might search for and the results you’d get back. (“The Transamerica Pyramid”, anyone?)

The author also has another related project where you can search Unsplash in like manner.

AI (Artificial Intelligence) github.com

Introducing spaCy 3.0

You may recall spaCy from this episode of Practical AI with its creators. If not, now’s a great time to introduce yourself to the project. 3.0 looks like a fantastic new release of the wildly popular NLP library. The list of new and improved things is too long for me to reproduce here, so go check it out for yourself.

There’s also three YouTube videos accompanying the release. That’s evidence of just how much effort and polish went in to this.

TechCrunch Icon TechCrunch

Hugging Face raises $15 million to build their open source NLP library 🤗

Congrats to Clément and the Hugging Face team on this milestone!

The company first built a mobile app that let you chat with an artificial BFF, a sort of chatbot for bored teenagers. More recently, the startup released an open-source library for natural language processing applications. And that library has been massively successful.

The library mentioned is called Transformers, which is dubbed as ‘state-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.’

If any of these things ring a bell to you, it may be because Practical AI co-host Daniel Whitenack has been a huge supporter of Hugging Face for a long time and mentions them often on the show. We even had Clément on the show back in March of this year.

Practices github.com

Natural Language Processing best practices & examples

The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems. The content is based on our past and potential future engagements with customers as well as collaboration with partners, researchers, and the open source community.

Google github.com

Using Google's speech recognition to beat Google's ReCaptcha

A little ingenuity paired with changes to ReCaptcha’s audio challenge allowed this hacker to create a Python ‘robot’ that defeats the ‘not a robot’ test with 90% accuracy. The approach is brilliant:

  1. Navigate to Google’s ReCaptcha Demo site
  2. Navigate to audio challenge for ReCaptcha
  3. Download audio challenge
  4. Submit audio challenge to Speech To Text
  5. Parse response and type answer
  6. Press submit and check if successful

The code is small enough to grok in 5-10 minutes. Love it!

Using Google's speech recognition to beat Google's ReCaptcha

TensorFlow cvcompiler.com

An NLP tool for improving dev resumes

CV Compiler is an online resume analysis tool designed exclusively for software engineers.

The review technology scans for keywords from the world of programming and how they are used in the resume, relative to the best practices in the industry.

CV Compiler was built using Python with libraries NLTK and spaCy for tokenization, lemmatization, and POS-tagging.

The internal analysis engine for large datasets (resumes, job descriptions) was built upon a Seq2Seq model in TensorFlow.

Player art
  0:00 / 0:00