In addition to being a Developer Advocate at Hugging Face, Thomas Simonini is building next-gen AI in games that can talk and have smart interactions with the player using Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP). He also created a Deep Reinforcement Learning course that takes a DRL beginner to from zero to hero. Natalie and Chris explore what’s involved, and what the implications are, with a focus on the development path of the new AI data scientist.
Changelog++ – You love our content and you want to take it to the next level by showing your support. We’ll take you closer to the metal with no ads, extended episodes, outtakes, bonus content, a deep discount in our merch store (soon), and more to come. Let’s do this!
Click here to listen along while you enjoy the transcript. 🎧
Welcome to another episode of the Practical AI podcast. We are the podcast that tries to make artificial intelligence practical, productive and accessible to everyone. My name is Chris Benson, I am one of your co-hosts, and Daniel is actually out this week. He has told me I’m allowed to tell you he’s down with Covid, so we are hoping Daniel feels better, much, and very fast. But for those of you who joined us last week, I have a treat for you. I have an excellent co-host, a proven co-host; you already know her name, it’s Natalie Pistunovich. Thank you for agreeing to come back in while Daniel is struggling through Covid, and co-hosting with me. Welcome back.
Thanks, Chris. It’s always fun to be here.
Fantastic. And to dive right into the show here, we have a guest who is from one of the leading organizations in the AI world; it’s one that we always love to talk about, one of our favorite organizations, and this particular guest has been doing some pretty cool stuff. I know from the pre-show conversation… So I would like to introduce Thomas Simonini. Have I got the last name correct, Thomas?
Welcome. You are a developer advocate at Hugging Face, which is the organization I was talking about, working on deep reinforcement learning, and I know you’re also the founder of a deep reinforcement learning course. But I guess if you, other than that little tiny tidbit that I’ve offered the audience already, if you would tell us a little bit about your background, how you arrived at where you’re at, doing deep reinforcement learning, and how you got there. A lot of people I know ask us out of the episodes, kind of “How did you get there?” because they wanna learn from you… So if you can kind of share with people how you got to where you’re at.
[04:07] Sure. First things first, thanks for inviting me. It’s a vast question. Initially, I have a bachelor degree of law and political sciences, and what’s happened is that I was working in a startup incubator and I was helping people who want to make a startup, to choose the technology behind their website, because I was a full-stack web developer in addition to my studies… And what happened is that I met a lot of people who were working on machine learning, and I didn’t know about that, and I discovered this domain.
After my bachelor degree I quit my studies and I started to self-study deep learning initially. In 2018 I discovered deep reinforcement learning and I fell in love with this domain. At the same time I started to write articles about deep reinforcement learning, and it became a course. It was not supposed, it was just one article, but it works really great… And it became a free and open source course.
After that, in 2019 I started to work at Dataiku for 1,5 years, where I worked on implementing deep reinforcement learning agents for real-world applications, dynamic pricing, for instance. Dynamic pricing is when you need to find a price – for instance, if you have an airline ticket and the goal is that you find the best price, to make the most benefits… And after that, I quit to work in the video game industry, and finally, I’m working at Hugging Face, to work on implementing deep reinforcement learning in the ecosystem of Hugging Face.
So I’m wondering, just because you mentioned it several times at the front, and I know I mentioned it at the beginning as well - there are a few people out there that may have heard of deep reinforcement learning, but they don’t have the background, they haven’t had a chance to dive into it themselves… Could you give us a quick kind of – you know, tell us a little bit about what it is and what’s involved in it, especially to differentiate it from some of the other forms of deep learning?
So to say it simply, deep reinforcement learning is to learn from action. So the idea is that you have an agent that will learn by interacting with its environment. If we take a simple example, you can make an agent that will learn to play SuperMario Bros, so to play videos. And how it does that, it just tries some action in the environment, so going left or right, and based on the rewards it gets - so for instance if he dies, it’s negative - it will improve its behavior to win the game.
And Hugging Face, where you’re a developer advocate now - so this is an AI community, right? And it’s all open source… So what is your mission? How do you do things? Why?
So at Hugging Face what we want to do is to democratize AI, and what we call “good AI”, through, as you mentioned, open source models and open source libraries.
The idea - if you want, it’s the GitHub for the AI models. It means it’s a place where you can access a lot of trained models, very powerful, and you can use it for your own problems you have. For instance, you can use a model to translate from English to French etc. And our goal is really to democratize the access of AI by providing a place where you can share or upload models… But it’s more than that; there is the transformer library that allows you to create transformers. A transformer is natural language processing models, you have a dataset library to create rapidly a dataset or to share a dataset, and so much more.
So I have a follow-up question to that, and that is - in my dayjob, because believe it or not, running the podcast isn’t the primary thing that I do… This is fun, this is passion, but in my dayjob I was talking to my boss about an hour ago, and I told him I was interviewing you, and that you were with Hugging Face as a developer advocate, and he - not for the first time either - said “I just love Hugging Face. That’s the library I wanna use every time I’m doing NLP”, and he went on for several minutes about the company, and stuff.
[08:18] But it begs the question… For those who aren’t intimately familiar and haven’t used it, what is it, in your opinion as a developer advocate, about Hugging Face that you’re doing right? So many people really love using the libraries out of Hugging Face, and it’s kind of a darling in the industry. What is it that Hugging Face is doing right that maybe others - I’m asking you to give away the secret sauce, I guess - would do well to emulate?
Well, I think I will tell you my experience, because I discovered Hugging Face - I think it was eight months ago. I think it’s really the fact that you can rapidly, even though you’re not a specialist in NLP, you can rapidly grab a model and use it for your own needs. In two lines of code we have something called a pipeline; it’s the first thing I discovered. In two lines you can for instance translate, you can - having a model - generate conversation etc.
For me, the secret sauce is really it’s a place where you have so much trained models and so much powerful models that you can use, in two lines of code. And I would also add that it’s also community, because we are a very, very strong community. It means that when you have questions, people are really willing to reply; even people outside the Hugging Face, on the forums, or on the Discord… And it really helped me when I started to train my first model, to have people who are very involved, and that do not work at Hugging Face, to help us.
And would you say that the community of users of Hugging Face - is it more developers? Is it more people who are just general enthusiasts about AI, or more data scientists? Who do you see active there?
I think it’s both of them, because we have a lot of researchers, but we have also a lot of enthusiasts. For instance, there is a lot of people who worked on computer vision models, on natural language processing models, and there is a lot of people who just started in deep learning, or in deep reinforcement learning, who directly try models.
So it’s very diverse… But I think the majority of people who use Hugging Face are researchers and data scientists, but there is also a growing part of really beginners, who just want to learn and to try some stuff.
So one of the things that – and I know that Hugging Face covers a lot of different kind of approaches in the libraries… We’re hearing, over the last year or two especially, a lot about the rise of deep reinforcement learning. It’s been around for a while, obviously, but it’s really come into its own. We’re seeing it in more and more places. What is it about that particular approach, compared to some of the other approaches that it has replaced over time, that is helping that become such a – kind of one of the top-tier first things you wanna go to approaches these days? What’s caused the rise of deep reinforcement learning?
Well, I think it’s because deep reinforcement learning is able to learn – we call that a policy, so behavior; it means choosing an action given a state… It’s able to learn hidden behavior that you couldn’t train with classical models. So I think that’s the first thing.
The second thing is also because deep reinforcement learning is becoming more and more efficient. It was a big problem for a long time in deep reinforcement learning - it takes a lot and a lot of time to train, compared to other models… And we are seeing that nowadays we have new models that are more and more powerful, and require less and less data to train.
[11:59] So one of the things I’m also wondering about is are there limitations to deep reinforcement learning that you’re seeing? What are the limits? We’ve seen some pretty amazing capabilities from all of these different deep learning capabilities over the last few years, these different algorithms… But what are the limitations?
I know that I’ve heard people make some kind of crazy suggestions in various areas, and you’re kind of like “We’re not there yet”, or something… What are some of the limitations that you might caution people and say “Maybe not”, or “Maybe you’re not looking at the right solution”, or “Maybe we’re just not there yet”? Any thoughts around that?
Yeah… For people who are doing deep reinforcement learning, they all know this article by Irpan Alex called “Deep reinforcement learning does not work yet.” I think it was published in 2018. Nobody wants to talk about this article, but it really presented every limitation of deep reinforcement learning.
The limitations are, as I mentioned, sampling inefficiency, but it’s really changing. Sampling inefficient means that you need a lot and a lot and a lot of experience from your agents before you know it’s becoming good. The second one that is the most important, is a frame of transfer learning. What this simply means is that we are not able, for now, when you train an agent, to transfer to another agent. And the problem with that is also the problem of generalization.
For instance, last week I tried to train an agent to play SuperMario Bros - so it’s a famous game - and when I tried to change the background color of the game, the agents become very bad. So the big problem is really the transfer learning and generalization errors. What this simply means is that for now we can consider that a lot of deep reinforcement learning agents just overfeed their environments, so they can’t generalize… Which is why I think it’s a big problem in the deep reinforcement learning community.
So the deep reinforcement learning course that you’ve been developing - how long have you been working on that?
It started in 2019, and the last update was during the first lockdown, so it was in 2020. Initially, it was just supposed to be two articles, and it became ten articles and eight videos. I stopped updating it because – I just updated the first articles, because they were the oldest, but now there is so much more in our educational resources that theory is still good, but the implementations are not relevant. You can still use it, there are still people who use it, but I really advise people to look at the theory and use overimplementation who made a very good implementation with what we call the clean error library that you can use if you want to learn to implement from scratch… Which is always a good idea, to learn to implement from scratch. Even though there are very good libraries, the best way to understand each architecture in deep reinforcement learning is really to implement by yourself.
Gotcha. So would the target audience be someone – it sounds like someone who probably already is comfortable writing some code, and probably has a little bit of familiarity with some of the other architectures out there, or do you think it’s a good one for someone who’s just getting into it? How would you position somebody that you’re recommending for your course?
The course is really from beginner to expert. You just need to know to program in Python. But even though you don’t know about PyTorch or TensorFlow, you can learn with the course. I think that’s why the course works great, is that you don’t need to be a mathematician to follow this course, because in each chapter what we do is that we really explain each part of the formulas… Because I think a big problem in deep reinforcement learning and in AI in general is that most of the people who start, if they don’t have a big, strong mathematical background, they can be very scared when they read the paper and there are big formulas. So the idea with this course is really to be very technical, but to go it step by step. Not just to start with mathematics first.
Do you think that deep reinforcement learning is a good first step into the deep learning world? It came along after Natalie and I had our first exposures, and stuff, to the larger framework of deep learning… But if somebody was just coming in today - maybe they’re in university, and they’re studying… Is this a good place to start, or would you say “No, go back to convolutional neural networks” or go back to other NLP things historically; start with transformers? How would you position deep reinforcement learning compared to that array of possible starting points?
Well, I think it’s better first to start with deep learning, because in deep reinforcement learning we use neural networks. And also, to work a little bit on convolutional networks before starting deep reinforcement learning. Because it’s skills that you need to have before being able to start to learn deep reinforcement learning.
And you mention in your – about you, that you’re building the next generation AI in games. So tell us a little bit about that. That sounds really interesting.
Yeah. So before joining Hugging Face, I really wanted to work in the video game industry. And what I started to do was initially to train a reinforcement learning model for small games, casual games. But what I discovered is that in this part games studios use what we call behavior tree, which is much easier. It’s AI made by hand, where the developers define each action to take for an agent. And then after that, I started to see some demos about people who have conversations with AI, especially using OpenAI GPT-III. And I was fascinated with this part, because I was like, you know, there is Anna Kipnis, which works a lot on the video game industry, that explains that in video games you had a lot of improvement in terms of graphics, but in terms of interaction, we’re still at the dawn of age.
[20:16] So the idea was how we can use natural language processing models to create better interactions with what we call non-playable characters. So non-playable characters are in games the AI you have conversations with. So I started to make some demos on that. We made a demo where you directly speak with NPC and they reply using OpenAI GPT-III and an AI called Replica, which is an AI that generates voices. And at Hugging Face we are currently working on a project – we don’t have the title yet; it’s called Murder at the Lighthouse, where there is a murder, and you have three characters, and you need to investigate who is the killer by speaking with them, and finding contradictions in their testimony.
That sounds good. So now that we’re into gaming, I’ve gotta ask - as you talk about it, I want you to expand a little bit on it. I’m taking a slight detour here… But because I can. This is one of those moments, I’ll say, as a parent, when I have a nine-year-old daughter interested in this, and she’s gonna wanna hear this part. So I’m kind of taking a detour to ask on behalf of my daughter… As you’re using deep reinforcement learning in gaming, how do you think about doing that? So there are a ton of kids - I know because I’m listening to their calls… You know, we’re in the pandemic, they’re all doing online calls, so I will hear a whole bunch of young ladies that are deeply interested in gaming, talking about it… How can they think about what they need to be doing going forward to get into things like deep reinforcement learning? They wanna do it in the context of gaming. Do you have some advice, since I’ve hijacked the conversation, so that those young ladies can start thinking about how they do this thing, if that’s what they really wanna do? You’re doing the thing that they’re aspiring to do.
Well, if I want to use deep reinforcement learning in games, I think the first advice is to start to learn deep reinforcement learning… But there is also a lot of libraries they can use, and I think the best is to use Unity – we have something called ML-Agents. That allows you to create your game and then to train your agent on the game you created.
For now, in the video games industry, what can be done with deep reinforcement learning is mostly testing the game. It means having an agent that will try every part of the environment. But I’m pretty sure that in the future we’ll have AI in games that will be made with deep reinforcement learning. For now, it’s not the case. We don’t have published games who use it, but I really think that in the next year it will be the case… But there are still some drawbacks to find solutions.
As a bit of a tease for those who are curious to try more of the course, for example, I saw that on the website there’s of course Doom; run Doom on everything. So what are you doing in that course, for example, with Doom, and with the games that we know?
So during the course we use a different environment; we use Sonic, we use Mario, we use Doom… And in Doom, what you do - there are two levels. One, it’s the simple levels where you are in a poisoned place, so you lose health every time you walk… And so you need to grab the health pack to survive. So there’s health packs spawn randomly. And the second one is you are in a corridor, and there is some monster, and you need to shoot them before going at the end of the corridor; otherwise they will kill you.
And after that, you have tons of other environments that you can use, because always, what we do in the chapter is that we always start with a simple environment, and then we show you how you can iterate on more complex ones. In this Doom there is a deathmatch environment that you can use, and you can train it, but it takes much more time than in the smaller environments.
[24:13] So I guess at this point, with a little bit of the background on how to get into the gaming, and how to do that, where – if someone goes through and they’ve done your course and they’ve started and they’re learning their way into it, it’s the first steps, when they get to the end of the course, what kinds of things do you expect that they’ll be able to do and be comfortable with? And then also, in addition to that, where would you think they might wanna go next?
So they’ve kind of come through your course, it’s the first one, they’ve now not only dipped their toes into this field, but have gotten some good hands-on experience. What comes next? What can they do and where can they go from there?
So after the course they normally have the skill until the state of the art; so we go until proximal policy optimization, which is one of the state of the art architectures in deep reinforcement learning. So they have those skills in terms of theory and in terms of practice.
The next step will depend on what they want to do. If they want to work in video games, I will advise to learn to use Unity ML-Agent, and in Unity ML-Agent we have three courses on how to use this library. On the other hand, if they want to work in research, I will advise to use spinning up reinforcement learning. It’s a course on a library made by OpenAI. Or to use clean reinforcement learning, the one I talk about, to dive deeper on more complex implementations.
So what are some surprising or unexpected uses that you see deep reinforcement learning was being used in the community?
I think there are multiple. The first one was the the mind will use reinforcement learning for vacular system, in the server B. The other thing we worked on what we call dynamic pricing. And for video games - yeah, video games there is a lot of funny things. For instance, you can train your agent on what we call [unintelligible 00:26:16.13] and you can train two feet, but you train to learn to walk with only two feet; you train to make what we call the [unintelligible 00:26:27.09] to run etc. So there is a lot of funny things around it.
So when you say that, the thing that was in my head as you’re describing that was it seems like you really have to break things down, to address things in very precise ways. So as we are moving forward, and this is – the deep reinforcement learning toolkit, for a lack of a better word, is now part of creating games and all sorts of other things as well, obviously… Is that really what you kind of have to do, is break down all of those things and think about “How do I train just the feet, just the walking?” Might there be a time that you foresee where you can combine those and it’s able to do very complex things? Or are we still a long way from that point? Are you still gonna have to do everything broken down in its individual constituents for a while?
No, there is for instance environments in ML-Agent where you train a complete humanoid to walk, etc. There is a technique that we call curriculum learning. The idea is that you start to train your agents on very simple environments, and then you increase the complexity. For instance, if you [unintelligible 00:27:37.04] you start first to train your agent to learn to shoot a snowball, and then you add the enemy. This way, the agent will learn step by step, so it helps the training to have kind of techniques.
You bring some interesting perspective into those different things, or different usages for deep reinforcement learning in your community and in general… Do you see that your original degree, which was not technical, and was a bit of a different background - did it contribute to something, did it help you on the way? Can you motivate people who are positioned in a similar starting point as you were when you started? How can this be a benefit there?
[28:21] Yeah, I think it’s very nice, because I think the premise that when I started I thought what it was - it was like a negative thing to not have a technical background and to come from a humanities background. But I think it’s really important to consider that it’s something positive when you start, because you have another way of thinking, another way of working than people who are more technical. So it’s something to keep in mind. And I think that it’s something that is visible, because nowadays you have access to so much free educational resources online. For instance, if you want to learn mathematics, you have MIT OpenCourseWare . If you want to learn deep learning, you have Udacity and you have a lot of MOOC online…
So my advice for people who come from humanities or less technical things is just try, start, and you’ll see that it’s really funny to study. And the second thing is really don’t feel like you don’t belong here because you’re not technical. You need to continue to work on, and just don’t think about the fact that you don’t belong here, because you belong here. If you do stuff, you belong here whatever your background.
I’m so glad Natalie you that, because your answer really got me thinking about something here, and that is - I’d like to expand that a little bit, and extend the line of thought that you’re on… We’ve been historically kind of in a place where there was quite a lot of barriers to entry for people moving into deep learning. The university systems around the world are changing now to accommodate this. Most of us in the early days were coming from someplace else. It was hard, for some people more so than others. I was just a software developer and I didn’t have a lot of the mathematics on that, and so I had a steep learning curve myself.
But we’ve seen this trend over the years of more tooling; Hugging Face has been a huge part of that, of the tooling becoming better and better… And it’s kind of democratized the entire space, and it’s allowed a lot more people to come into it without having to jump through quite as difficult a set of hurdles as maybe the beginning days when there wasn’t so much in the way of tooling and learning resources.
So do you envision a day where you can be productive along these things, where doing deep reinforcement learning for gaming and stuff becomes a low enough bar to where somebody with some level of aptitude, some interest in that can kind of dive into it even if it’s not their primary thing? Do you envision that it becomes that level of democratized, or do you think there’ll always be a certain level that you’ve gotta really get to? And when I say that, meaning a bit of a challenge to get to as an entry point into it. How do you see that changing over time?
Well, I think it’s clearly possible, because at Hugging Face we already have something called AutoNLP, that allows you to train and create models without coding. And I really believe that the next step in deep reinforcement learning is really that type of thing. Being able to just directly train the model, without having to know everything about the model, or how it works etc. But I think yeah - there is still work before that.
But in the future, it’s clearly possible, because AI will become something that we’ll use as we use software. So we will not need to know how it works, we will not need to know behind the hood. We’ll just need to start training and having some [unintelligible 00:32:06.01] results.
[32:08] And in addition to the human part of this, what are the fields that you see in the future that will be revolutionized by deep learning?
I think everything, especially the industry, with robotics; we’ll see more and more robotics. And I think on the one hand it’s a good thing, because it will remove a lot of work that is very hard, and that is really underpaid. So it’s a good thing. But also, I think in health - we already see applications in health, with cancer detection, we see also with DeepMind, who works on the question of protein detection.
So for me, AI will change everything, but it’s really important to consider that it will not solve everything. For instance, we all talk about the question that AI will help a lot in healthcare, as I mentioned, but the problem is that if people can’t afford to access these tools, the problem is not solved. So I think what’s important to consider is that we need to improve AI, but AI must be really accessible to everyone, to be a solution to human problems.
That’s a great point you make there… And we find ourselves in different aspects talking about pieces of that all the time, in terms of who has access to the technology. And some of us who are privileged to be in this field - maybe we get stuck in our bubble at times… You know, robots are starting to come around here, and it’s right around the corner, and we’re thinking about this… But there’s kind of a huge, huge segment of the world out there who hasn’t really had access to this yet. So given that, even with the best work from people like you and Hugging Face and others out there, we definitely have challenges of equity in terms of people having access to good solutions for their life. Do you have any thoughts on things that we can do? Because I think it needs to be in the forefront of our mind, things that we can do that are in this field to try to steer these solutions toward those places and people that might need them worse than we do in a lot of ways, that don’t have all the benefits that some of us are enjoying right now. Any thoughts on how we find that path? It seems really hard to me.
Well, I think there are a lot of positive things happening, especially because of hope and education. The majority of people who look at MOOC are people from developing countries. So the new generation, especially in developing countries, are more and more educated. So I think if we want that AI will be accessible for everyone, we need to make sure that all models are open source, that everyone can participate in the documentation, participate in the project.
And the second thing, I think, is more a politician thing… Because we will need laws to force companies to make their AI accessible, to make their AI – that we can analyze what’s inside the AI, to avoid bias and all those problems. So I think it’s a two-part. For us, so for people who work in AI, we really need to work on the question of making educational content free, or at least very cheap, and working on open source projects and trying to share as maximum as possible.
On the other side, we need to push our politicians to work on legislation to make sure that AI will be here for good, and not for negative stuff.
Yeah, that makes a lot of sense. I absolutely agree. Hacktoberfest is also a time of the year where everybody who wants to start with open source - I always like recommending contributing to the docs. Like you say, this is a wonderful way to participate, and it has less of a threatening entry threshold. So I could absolutely not agree more.
So the future scientist who is listening to this episode today - what is the first step that they should take in this field?
[36:17] I think the first step will be to just start with an introduction to deep learning. And then to try different projects, just to see what you prefer. If you want to dive more into natural language processing, on deep reinforcement learning, or on computer vision, audio etc.
I think the idea is not really to overthink this part, because most people that I met who want to start always get stuck because there is a wide range of possibilities. And the only thing – just type on Google what you want to start with, and start to learn deep learning. Take the first link, see how it goes, and then iterate from that. But don’t overthink… Because most people overthink and just say “Oh, this is maybe not the best resources, this is maybe not what I want to do” etc. Just try and you’ll see.
So that future has taken that first step; they’re diving into this, and doing exactly what you’ve just said. Along the way they’re taking your course and they are learning these skills, and they are in a world that is increasingly having this just amazing gamified experience that is now becoming available widely, and not just in the context of video games, but across all of life. And simultaneously, deep reinforcement learning is driving robotics, which are becoming ever more common as these new scientists are growing up. I’m thinking obviously of my daughter in this context, as I’m sure you’ve figured out. So what – as they grow up and they’re adults now, and they’ve come through this process, what do you think the world looks like, what do you think maybe some opportunities or challenges that they might face? And along the way, I would love your opinion about what you’d like to see, as opposed to what you think will be there. If you have any preferences about what you’re aspiring for right now, as a data scientist doing this kind of work.
Well, I think in the future clearly we will have much more robots than nowadays that will do a lot of tasks, especially social tasks, in terms of helping older people. With the question of autonomy - we will have autonomous cars all of that’s question.
What I really don’t want to see especially is the question of using these technologies for military things… Because we see more and more – you know, the question of… Especially deep reinforcement learning, we see more and more people talking about using deep reinforcement learning for robots during battles, and that kind of stuff… And I think this is the worst thing to do. I think it’s something that’s unfortunately going to happen, because if someone does it, all the other countries want to do it to protect themselves… But it’s really something bad, because it’s really something we don’t want to do, and we shouldn’t do… But unfortunately, I think we don’t have a lot of power on that.
The thing is - I want to be positive - when you work in this industry, we always need to think about the consequences of what we do. Sometimes it’s very small consequences, but when we work on some models, we can induce bias that can be negative for a group of people… And it can be much, much worse than that if we speak about robots in army… But it’s always important when you work on something to think about the consequences. Sometimes we’re too focused on the project and we don’t think about the outcomes of what we’re doing.
Well, Thomas, thank you very much for coming onto the show today. Natalie, thank you very much for co-hosting with me. This has been a lot of fun, it’s been a great conversation, and I really appreciate you. I hope you will come back sometime and tell us what you’ve done next in deep reinforcement learning and other things.
Our transcripts are open source on GitHub. Improvements are welcome. 💚