Clément Delangue, the co-founder & CEO of Hugging Face, joined us to discuss fun, social, and conversational AI. Clem explained why social AI is important, what products they are building (social AIs who learn to chit-chat, talk sassy and trades selfies with you), and how this intersects with the latest research in AI for natural language. He also shared his vision for how AI for natural language with develop over the next few years.
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.
Hello! This is Daniel Whitenack, I’m a data scientist with SIL International, and I’ve got my co-host here, Chris Benson, with Lockheed Martin. How are you doing, Chris?
I am doing very well. How’s it going today, Daniel?
It’s going great. I am super-excited for this guest episode, because we’ve got Clément Delangue here from Hugging Face, who is the co-founder and CEO at Hugging Face. If you’re not familiar, you’ve probably seen a lot of what they’re doing on Twitter, on various blog posts and around the internet, around chatbots, conversational AI, voice assistance, and Facetime with bots and all of this great stuff that they’re doing.
As a person working in the language-related field, I’m really excited to talk to Clément today. Welcome, Clément!
Thank you, guys. I’m really happy to be here.
We’d love to hear a little bit about your background - how you got into AI and how you ended up running this company called Hugging Face.
Yeah, sure. As you can probably hear from my accent, I’m French. I grew up in France, I went to study in France, and one of my first startup experience was at this very small startup building machine learning models that apply to computer vision. And coming from more of like a user, product, business background, working at this very cutting edge startup showed me how impactful technology can be. It taught me also how technology and science cycle… This time it was for computer vision - can really go into the mainstream and really change the way people are doing most of the things they’re doing every day. So it really got me fascinated with science and technology, and that’s how I basically ended up running Hugging Face now, building state of the art conversational AI.
Awesome. Well, you mentioned conversational AI… I know that Hugging Face is particularly concerned with I think what you call social AI. On your website I see about social AI who learns to chit-chat, and talk sassy, and trade selfies with you… Could you describe a little bit how you ended up thinking about this problem of social AI and how that maybe is different from the way that others approach things like chatbots, and that sort of stuff?
[00:04:14.09] Yes. Basically, if you look at most of the people working on conversational AI today, you can see that they’re taking a very transactional approach to it. If you think Siri, if you think Alexa - it’s conversational AI that is trying to tell you the weather, play you music, tell you stuff; all that is very utility-driven. It’s trying to save you time, it’s trying to be efficient… And it’s all great, but something that we were way more interested in was the ability for conversational AI to be entertaining, to be fun, and to be emotional.
It’s funny, because if you look for example at the sci-fi related to AI, you can see that most of it is not really about how AI can save five minutes of your day every day; most of it is about how do you interact with these new forms of intelligence, how is this new form of interaction, and ultimately, how do you create emotional connection with this new form of technology. So we really took this different approach of not focusing on conversational AI for transactions, a transactional approach, but a really entertaining and emotional approach.
That begs the question then, I guess… You know, you talked about it being fun, entertaining and emotional rather than transactional, which is obviously what we are seeing in the world these days mostly… Why is that the case? Why is it that fun or social AI, that is focusing on both entertaining you and interacting with you in that emotional way - why is that important? Why does that matter, versus just the transactional approach that most other organizations are taking?
It’s an extremely hard question to answer. I think fundamentally, human beings are social. We’re social beings. If you think about how important in our life our family, friends, peers are, you realize that most of what people are doing is not productivity-driven, but really social-driven.
An interesting fact to remember is that the dog was the first animal to be domesticated, so arguably the biggest evolution on the relation between human beings and its environment, nature, the domestication of the first animal was actually not driven by the fact that dogs were useful or anything like that, but actually because they were social species, and they were really interacting and kept being social with human beings.
As a guy with seven dogs in the house right now, I completely agree with that. Keep going, sorry to interrupt.
Basically, yeah, humans are inherently social, and if you give them ways to be more social and have more social interactions, they love it. Obviously, don’t take my words for it; what we’ve seen is very insane usage with our product so far. We just crossed a few weeks ago the nice sum of half a billion messages exchanged between users and the Hugging Face AI, and millions more are exchanged every day. It’s hard to explain why people want to be social with an artificial intelligence, but it’s pretty obvious that they do.
I know as someone who follows developments in AI, I remember when I first came across Hugging Face I was kind of in this cycle of looking through the latest advances from Stanford, and this is coming from Google, and OpenAI, and all these people… And then I came across Hugging Face and my first thought was “Oh, this is something completely different from what everybody else is doing.” What’s the general reaction from practitioners when you try to explain what Hugging Face is trying to achieve? What’s your general reaction in the community as far as what you’re trying to achieve?
[00:08:14.20] I think most of the people are really interested; it sparks a lot of interest. And if we go more on the science side, it opens a lot of interesting doors for exploration. For example, working on open-domain dialogue, rather than working on a very specific vertical and on a very specific task - it is a very interesting problem and a very interesting topic on the science side… Because it requires you to take a very open approach, mixing a lot of things… And what’s interesting is probably that the measure for success is not as obvious as more task-oriented conversational AI. In an open-ended conversation, basically the measure for success is how people like the conversation, how many messages they exchange, if they come back to chat even more… So it creates very interesting problems to solve, and very interesting problems to work on.
A lot of us now are actually maybe starting to get used to these sort of transactional chatbots, and kind of developing almost a stereotype of what a chatbot should feel like to interact with. When you find that users start interacting with the Hugging Face assistant and other things that you’ve developed, could you describe a little bit the reactions that you see? Have you seen that they even still try to interact with it, like they’re interacting with the transactional bot, or they’re surprised, or what happens exactly?
Yeah, they usually start by trying more transactional stuff, but they realize pretty quickly that it’s not your average chatbot, and it’s not your average conversational AI… Especially when the AI is gonna reply in a very sassy tone to their questions. For example, if you ask about the weather, maybe the AI is gonna tell you “Oh, it’s pretty boring to talk about the weather.” Or if you ask something that you could find on Google, it’s probably gonna tell you that. It’s gonna be like “Why don’t you google it? I can google it for you, but you should rather google it” By giving a very different tone and a very different kind of answers, people realize pretty quickly that it’s a different form of artificial intelligence.
What’s even more interesting is that after a couple of days of usage, they also create a different form of connection and bond than they would with a traditional conversational AI. We’ve been really lucky to have seen now more than half a million of what we call love declarations, which is pretty–
[laughs] Like love for the assistant, or…?
Exactly. That’s one of the intents we have - basically, when users are saying to the artificial intelligence “You’re my BFF”, “I love you”, “You’re my best friend”, it’s pretty fascinating to see these kinds of interactions between conversational AI and human beings.
It’s gotta be pretty encouraging as the developer as well.
Yeah, it’s a pretty unique engagement. I think no other product out there today has this league of engagement than people are getting with Hugging Face, which is obviously a good sign.
You said the p word, you said products… As we’ve been talking about how you’re using this open domain dialogue instead of being domain-specific, and it’s this fun, entertaining and emotional experience for the users instead of being transactional - that is creating this whole new user experience that you’ve talked about… So how are you implementing this in terms of products and services to your customers? What is it that you’re trying to do for them with each of these products, and where is it today and where do you envision it going tomorrow along each of your product lines?
[00:12:21.29] Yeah, so basically the way to use our product is that you download our iOS apps, or to go to most of the messaging platforms like Facebook Messenger and start chatting with your conversational AI. The way people use it is really by chatting every day about their day-to-day life. They’re gonna be like “Oh, today I recorded this fantastic podcast. Today I did that, I did that…”
I like the way you think.
They’re gonna chat about what matters to them, like what kind of music they like, what hobbies they like, how they’re interacting with their friends, or if they have a crush on somebody that they wanna talk to. So really every day our users are chatting with the conversational AI about nothing and everything, and doing that all day, every day, and creating this really strong emotional attachment to this conversational AI.
I was talking about dogs before… I think pets are a very big way to understand what’s happening there. If you think of it, pets are a different form of intelligence; they’re not a human form of intelligence. They’re not always the smartest, if you think of it… If you think of how hard it is to teach a dog to do simple commands like “Seat!” or “Jump!” It’s not that straightforward… But still, the interaction…
I’ve just gotta say, my wife would disagree with you. She would say that it is much easier to do what she wants than teaching me. So I’m just gonna disagree on that one point. I’m sorry, keep going.
[laughs] Good one. But it’s really this different form of intelligence that you interact with every day, and that is fulfilling to you, and that is creating a form of emotional connection to you, and that is making your life better.
Awesome. Well, we generally - as our name suggests - really like to keep things practical around here, so I would be really interested to hear from your perspective… I know I’ve tried to create some chatbot systems before; there’s of course a lot of people doing research in this area… I was just curious if you could give us an idea about the sorts of modeling that are involved in creating this sort of open dialogue. What sorts of models are required, what combinations of those, and what sorts of data are you working off of?
Yeah, so we’re using mostly transformer models within a very [unintelligible 00:14:51.23] big transformer models. And what’s obviously key in what we’re doing is the dataset that you can leverage to do that. Now that we’ve crossed half a billion messages exchanged between users and the AI, we’re able to do stuff that we wouldn’t be able to do when we just had a million messages.
It gives us a very good edge, not only in terms of creating a chatbot that is good at natural language understanding, but also in natural language generation in a way that doesn’t feel robotic and doesn’t feel impersonal, but really shows some slang, shows some fun formulation in terms of personality throughout the conversation. So yeah, that’s how our conversational AI is built.
[00:15:49.20] Yeah, and for our listeners, if you’re curious more about transformer models, we’ve had two recent episodes on these sorts of models - one on BERT and one on GPT-2 from OpenAI. We’ll place those links in the show notes. It’s interesting to hear how you want to go beyond natural language understanding… When you receive a message from a user, are there various tasks that are happening in the back-end in terms of maybe entity and intent recognition, and then the generative model along with that, or response selection, question answering…? What sort of tasks have been most valuable to you to focus on in terms of enabling this?
What’s interesting, and something that we learned along the way is that to really build a good conversational AI you have to run a lot of tasks in parallel for every message, even if you use maybe one out of a hundred tasks that you detect; you have to train all of them, for every single message, to basically be able to know which task is the most useful and how to jump from a model to another. For every message that we receive, we run through a couple different models. Some of them are more typical of chatbots, where you do natural language understanding, state manager, and then natural language generation. Some of them are fully end-to-end conversational AI models. And then the key is really in fact to understand and know when to use one rather than the other.
For example, if it’s something that’s closer to a task-oriented message, something like for example “What’s the weather?”, it’s very simple and it makes a lot of sense to understand that with a simple intent model that is basically gonna trigger some sort of a canned answer. But then, if you have a more complicated conversation, a very long conversation with a lot of context and a lot of uncertainty about what the user actually wants to say, then at this point you should probably switch to a more end-to-end machine learning model that is not only gonna detect intent, but also generate the answer.
So it’s really a matter of having a lot of models that are running in production, which is extremely hard, especially when you start to have millions of messages every day… And kind of like having a good way to pick one model over the other, depending on [unintelligible 00:18:35.10]
You have so much going on there - switching between models, and a moment ago you talked about how you had hundreds of tasks… What does your infrastructure look like? How do you make sure that you always have the compute resource available to handle these without losing time on that? What does your infrastructure look like to support that?
First, it’s extremely difficult, because of many, many months of iteration experiments to get this to work. Surprisingly, there are not so many people doing conversational AI, or end-to-end machine learning in production. Even if you look at the big guys, most of them have very separate research teams than the rest of the teams. I think it’s Facebook who said until the end of last year they were not using any of PyTorch in production with anything. So it’s extremely hard.
[00:19:39.19] If there are some people thinking about starting AI companies out there, like engineering, in terms of making sure your models work in production is really the first thing you should invest in. The way we do it is really by making sure we always have a good trade-off of when to train the models and when to run the inference. Anything related to text, especially coming from more like a computer vision background, you usually get to an inference time that is good enough for most of your models, especially when you’re doing conversation like us, where you don’t need the answer to be instant. For us, if it takes one second, it’s good enough [unintelligible 00:20:21.14] when it comes to answer time.
And then what you can do is always to be able, if some of your models are not sometimes passing the threshold, to just fall back on something else that is going faster in some context. Again, like the switching between models depending on their answer time is a good way to work around this issue.
I really appreciate you walking us through some of the practicalities of what you’re doing, Clément. I think that’s something, as you mentioned, that a lot of us struggle with when it comes to putting AI and machine learning stuff into production. Often times those are the roadblocks, not necessarily the models themselves… But I’d like to switch directions a little bit here now and hear from your perspective – you know, we’ve talked about in the last few episodes how AI for natural language is really hot right now… And I was wondering, from your perspective, as you’ve seen different trends in AI, why do you think AI for natural language is really picking up momentum right now, and what are you most excited about in terms of those trends?
I totally agree with you, I think we’re very much at a turning point when it comes to natural language processing. Having seen [unintelligible 00:21:44.02] more on computer vision, I think today we’re today in natural language processing where we were maybe two years ago in computer vision… And picking up the pace way faster than we’ve seen in computer vision. That’s really fascinating, because if you think of the amount of everybody’s time with natural language - you know, we’re obviously having this conversation in natural language; throughout your day, most of the things that you are doing is through natural language, either like reading articles, watching TV… So if you manage to get to a breakthrough in natural language understanding, it’s gonna really change the way everything can be done, and it’s gonna create amazing outcomes that we can’t even imagine today.
And the reason why I think we’re getting there is obviously because these new models, these transformer models enable not only to get to good results, but to reproduce these results and expand them to different tasks very easily with transfer learning. It’s really thanks to the open source community, it’s something that’s really important for us at Hugging Face; we’re publishing a lot of our research in open source. We really think it’s just a way to pay it forward, because what we’re building really couldn’t be possible without what hundreds of researchers and years of research in NLP has produced. If you look at all the open source communities around NLP, it’s more active and more thrilling than it’s ever been, and that explains most of the progress for me.
There has been a lot of debate these last few weeks about what to open-source and what not to open-source, and we’re very strong advocates for open-sourcing as much as you can. There’s this quote from Gibson that says “The future is here, it’s just not evenly distributed” - I really think that open-sourcing is a way to distribute this future to more people who are gonna build amazing stuff and push forward this field of NLP.
[00:24:02.13] Okay. Well, you’ve released some really popular PyTorch versions of NLP models like BERT and GPT-2, as well as hierarchical multitask learning (HMTL)… Why is it important for you to contribute in that way as a company with limited resources? What is it about that that’s drawing you into making that level of commitment?
First, we’re so thankful for everything that has been done before, and we’re really building upon everything that has been done before, so it’s a very easy way for us to pay it forward. And then, when we started open-sourcing, we really realized that by doing that you also can have way more impact that you would have if you didn’t open-source it, and I wanna take advantage of this platform that you’re giving me to really thank everyone who contributed to our repository. It’s now actually the largest open source repository of pre-trained transfer learning models in NLP. And as you said, we’re a small startup, we’re a small team; we wouldn’t have been able to do that just by ourselves. But when you’re sharing like that, so many people are contributing, helping you, pushing it forward and making it better. That’s the only way we could have done it. So it’s both a way to pay it forward, and for it to have even more impact with the help of everyone out there.
Yeah, and I definitely appreciate you taking time to do that; that’s so important. And obviously, open source contributors and maintainers don’t get a lot of the recognition that they need, and from someone being an AI practitioner, I also wanna thank you for your commitment to that. I know that you and your team have limited resources… So we, as AI practitioners, really appreciate you working hard in that respect as well.
You mentioned the repo with the pre-trained models in PyTorch for BERT and GPT-2 and others… There’s been a lot of talk about transformer models and these pre-trained language models recently. I was wondering, from your perspective, why is it that this is so critical for the future development of AI for natural language? Where do the language models, and in particular access to pre-trained models - why is that so important?
I think it’s important because it’s so generalist. The language model can be at the basis of so many tasks, and you can do so many things with it. It’s very promising for the whole field of NLP. Something that we realized pretty early on too, especially with the release of our research called HMTL by Victor Sanh from our team, Sebastian Ruder and Thomas Wolf is that for NLP it’s probably more critical than for any other scientific subject to be able to do several tasks, and there probably are more intertwines than there are on other subjects. What I mean by that is for computer vision you can, for example, do object recognition and you can do background recognition, and these two tasks cannot be separate. Language is a bit more complicated than that, especially if you wanna do conversational, meaning that you want the AI to be able to answer.
What we’ve seen is that every single task is more related to another, and you usually need most of the tasks to be solved for the whole meaning of the message to be understood.
[00:27:56.18] For example, you won’t be able to understand the message if you don’t understand both coreference… Like for example, if the message is “I like it” - if you don’t have coreference, emotion understanding, other kind of tasks, you won’t be able to understand the message itself.
So what’s really interesting with transfer learning and what’s really interesting with language models is that it’s potentially something that is gonna solve a lot of tasks at the same time, which is basically gonna be kind of like a leapfrog on our ability to understand natural language.
That’s a fantastic explanation. I would like to see if you could put some context around it a little bit in terms of where do you feel we are at a high-level in the current state of chatbots and assistants and stuff, and the news we constantly hear about how reliant we’re gonna be on this going forward. So relative to where we are now and the tremendous work that Hugging Face has been doing in this area, where do you think we are now relative to where are we gonna be if you look just over the next year or so into the future? This is moving so fast that I feel like I would be really missing out if I didn’t get your expert perspective on what this is tracking towards.
Yeah, I think I’m gonna take a bold bet here…
Okay, sounds good.
You heard it on Practical AI first. [laughter]
I think in three years we’re gonna be able to really understand algorithmically 95% of the natural language, and we’ll be able to answer all these messages with conversational AI. It’s been moving so fast over the last year or so… When we started Hugging Face - a bit more than two years ago - we thought we would never get in the near future to the point where you can really do end-to-end conversational AI. We always thought that there would be some hybrid ways of doing that, and now we’re starting to really be able to do that, meaning that really get as an input the message from the user, and output the message from the AI, with only machine learning in between, nothing else. So I really think that NLU as we conceive it today is gonna be solved in the next three years.
That’s super-exciting. I think one of my follow-ups to that - kind of processing right now, but I think a lot of problems, at least in my world, come up because a lot of the technology that’s being developed for NLU and for conversation is really geared towards English specifically, and of course, especially in terms of getting this sort of technology in other places around the world with a lot of language diversity… I just saw the most recent Ethnologue publication - there’s like 7,110 languages living currently in the world.
Have you seen effort towards making this sort of technology more relevant to a wider set of languages? I know there’s a lot of great advances, but a lot of times those advances, especially in terms of pre-trained things, have to do with English specifically.
That’s a great question.
Yeah… It’s gonna take a little bit of time. If we think of most of the models that are really providing breakthrough today, they’ve been released not so long ago. Like Elmo, BERT, OpenAI, the first GPT… It’s been a couple of months, not so much more than that. I think it’s gonna come; I think that’s one of the reasons why people should keep focusing on open-sourcing not only the models, but also the datasets, to be able to move forward with that.
[00:32:07.00] It’s not something we’ve been working on specifically at Hugging Face for the moment, just because most of our users are in the U.S. But we’ve seen people using our open source models to start experimenting with other languages, so hopefully it’s gonna come soon.
Yeah, I appreciate your perspective on that. It’s something I definitely care deeply about, so I appreciate your perspective. As we get towards the end of our conversation here, I think one of the things I’ve definitely respected a lot – and we’ve already talked about your open source involvement and all of that, but it just seems like you guys get so much done, as what I assume is a small team, in terms of contributing to and maintaining open source, and contributing to academic research, entering competitions and building products… As AI practitioners, do you have any suggestions as far as maybe it’s how to keep up on the latest developments, or how to structure your team, or things that you do on your team to make sure that you’re learning or that you’re able to contribute to open source? Anything that’s worked well for your team? Because it just seems like you guys have been so productive. I assume that you’re not working 24 hours in each day, so…
[laughs] First of all, I wanna react to what you were saying in terms of pointing out something that I think is important for everyone to hear out there, in terms of this involvement in AI, where more and more people are working in very big companies, work with the four or five biggest players in artificial intelligence; I think they’re doing a great job, they’re contributing a lot to the open source community, but it’s really important to remember that there is a way to do really great things at smaller organizations, be it like small Academia startup, or even as like an individual contributor, developer or scientist - there are ways to do really great things by doing things differently, at smaller organizations.
So if there are some people out there, some data scientists thinking about their next challenge, what they should do next, I would advise them to maybe not join one of the big guys, but maybe take a shot at a smaller place. I think one of the reasons why we manage to do great things is that because we’re a small organization, we can take different kinds of paths, we can take different kinds of perspectives, we can take more risk in things like how fast we release… And that’s how we can individually contribute probably a little bit more.
I think size matters sometimes not under a direction you’re expecting, but in a different way. Again, one advice that I would have for people is to try joining smaller organizations.
I appreciate that. I’m still kind of thinking about what you were saying a moment ago about natural language understanding (NLU) possibly getting to a point of full maturity in the near future, and I’m just kind of amazed. So folks, you heard it here on Practical AI first… I guess one of the things I wanted to ask before we let you go is if – you know, people are getting into this field all the time, were always asked on the show for pointers for how people can get into different aspects of the AI fields… For NLP what do you recommend for people? How would you recommend that people enter into the field and what kind of resources or learning opportunities do you think would benefit them the most?
[00:35:58.22] I think, again, I would go with what smaller organizations are doing, because sometimes they’re the ones pushing what’s possible to do. So I would look at obviously startups like Hugging Face, but there are a lot of other cool ones. I’m thinking of Rasa, for example, in NLP, Lyrebird in voice, Jovo in conversational AI, Small organizations like AllenNLP, Allen AI, OpenAI is obviously a great one… Looking at what Fast.ai is doing; if you wanna start working on the machine learning subject, taking classes at Fast.ai is the best thing you can do.
Something really interesting to look at these days is organized by one of our investors named Betaworks, based in New York. They’re starting in a few weeks what they call Synthetic Camp, which is basically an accelerator program based on what they call synthetic media - the ability to create AI, to create images… And there’s gonna be a lot of focus for this program on how to detect fakes, how to detect that something has been machine-learning-created. They’re gonna work a lot around GANs and all these types of things. I would advise you to take a look at that, it’s gonna be pretty interesting - Synthetic Camp, by Betaworks.
Fantastic, thank you.
Yeah, thank you so much. We really appreciate that. I know that I’m gonna look up some of those things, and of course, we’ll put them in our show notes, but… I really appreciate you taking time, Clément. It’s been fascinating, to give an understatement; I really appreciate what you’re doing, and keep up the good work. Thank you for joining us.
Thank you so much.
Thanks a lot.
Our transcripts are open source on GitHub. Improvements are welcome. 💚