Come hang with the bad boys of natural language processing (NLP)! Jack Morris joins Daniel and Chris to talk about TextAttack, a Python framework for adversarial attacks, data augmentation, and model training in NLP. TextAttack will improve your understanding of your NLP models, so come prepared to rumble with your own adversarial attacks!
Sash Rush, of Cornell Tech and Hugging Face, catches us up on all the things happening with Hugging Face and transformers. Last time we had Clem from Hugging Face on the show (episode 35), their transformers library wasn’t even a thing yet. Oh how things have changed! This time Sasha tells us all about Hugging Face’s open source NLP work, gives us an intro to the key components of transformers, and shares his perspective on the future of AI research conferences.
So many AI developers are coming up with creative, useful COVID-19 applications during this time of crisis. Among those are Timo from Deepset-AI and Tony from Intel. They are working on a question answering system for pandemic-related questions called COVID-QA. In this episode, they describe the system, related annotation of the CORD-19 data set, and ways that you can contribute!
Catherine Breslin of Cobalt joins Daniel and Chris to do a deep dive on speech recognition. She also discusses how the technology is integrated into virtual assistants (like Alexa) and is used in other non-assistant contexts (like transcription and captioning). Along the way, she teaches us how to assemble a lexicon, acoustic model, and language model to bring speech recognition to life.
Expanding AI technology to the local languages of emerging markets presents huge challenges. Good data is scarce or non-existent. Users often have bandwidth or connectivity issues. Existing platforms target only a small number of high-resource languages.
Our own Daniel Whitenack (data scientist at SIL International) and Dan Jeffries (from Pachyderm) discuss how these and related problems will only be solved when AI technology and resources from industry are combined with linguistic expertise from those on the ground working with local language communities. They have illustrated this approach as they work on pushing voice technology into emerging markets.
Daniel and Chris explore Semantic Scholar with Doug Raymond of the Allen Institute for Artificial Intelligence. Semantic Scholar is an AI-backed search engine that uses machine learning, natural language processing, and machine vision to surface relevant information from scientific papers.
SpaCy is awesome for NLP! It’s easy to use, has widespread adoption, is open source, and integrates the latest language models. Ines Montani and Matthew Honnibal (core developers of spaCy and co-founders of Explosion) join us to discuss the history of the project, its capabilities, and the latest trends in NLP. We also dig into the practicalities of taking NLP workflows to production. You don’t want to miss this episode!