Practical AI – Episode #32

OpenAI's new "dangerous" GPT-2 language model

get Fully-Connected with Chris and Daniel

All Episodes

This week we discuss GPT-2, a new transformer-based language model from OpenAI that has everyone talking. It’s capable of generating incredibly realistic text, and the AI community has lots of concerns about potential malicious applications. We help you understand GPT-2 and we discuss ethical concerns, responsible release of AI research, and resources that we have found useful in learning about language models.



LinodeOur cloud server of choice. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2018. Start your server - head to

RollbarWe move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at

FastlyOur bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at

Notes & Links

📝 Edit Notes


📝 Edit Transcript


Play the audio to listen along while you enjoy the transcript. 🎧

Welcome to another Fully Connected episode, where Daniel and I keep you connected with everything that’s going on in the AI community. We’ll take some time to discuss the latest AI news, and we’ll dig into some learning resources to help you level up on your machine learning game.

I am your co-host, Chris Benson. I’m Chief AI Strategist at Lockheed Martin, RMS APA Innovations. With me is my co-host, Daniel Whitenack, who is a data scientist with SIL International. How’s it going, Daniel?

It’s going great. Welcome back from Switzerland. As our listeners will know, Chris has been recording Practical AI episodes on the road, at Applied Machine Learning days; I’ve really been enjoying those, but I’m glad to have you back from your trip.

It’s good to be back! It was a great trip. I met a lot of really interesting people and obviously recorded some good episodes… I did my talk - just for the listeners, Daniel ran the AI for Good track remotely from America, and it actually went off without a hitch, everybody felt. I was talking to the other speakers and there was no problem, so thank you very much, Daniel, for managing that from thousands of miles away.

Yeah, that was kind of an interesting experience. I had planned to be there, but… I was hoping that all AI people would just kind of converge there as expected, and it sounds like that’s what happened. If you don’t know about Applied Machine Learning Days, definitely check that out; it’s a great conference. We’ve had recently some guests on talking about AI for good and other things, and that’s been really awesome.

Now that we’re here together again, we have a Fully Connected episode. I’m really excited today – of course, if you haven’t been hiding under a rock and you follow AI stuff, then pretty much all you’ve been hearing about for a couple weeks or however long it’s been is OpenAI’s recent language model that they’ve released called GPT-2. We’re gonna talk through some of that stuff today, because it’s pretty interesting. Have you been seeing that online, Chris?

Yeah, it’s hard to miss. I think the very first thing I saw was Elon Musk’s tweet about (I’m not quoting, but something like) “We have a model that’s so amazingly good that it’s dangerous, and thus we have to not release the whole thing.” And obviously, like everybody else on the planet, that piqued my interest and I started diving into it. Technically, it’s fascinating what they’ve done, and then there’s some pretty interesting ways that they’ve chosen to not only approach the model, but approach the release of it. A little bit of drama around it here.

[04:13] Yeah, definitely. Of course people have been captivated, because one of the things that they’re doing with this model is text generation, which we’ll talk through in a second… But the quality of it is just astounding, really, and people have been posting different things; they’ve generated reviews for their book, or various stories and other things, and they’re kind of entertaining, but all of them are pretty astounding in the quality of the text generation… Which also, of course, leads a lot of people to be concerned, because “How do we know if this text has been generated by an AI or not?” and what are the implications of that.

Wired had this article about the “AI that was too dangerous to release”, based on, like you were saying, some of what Musk and OpenAI has talked about. It’s really been an interesting discussion. I don’t know, I’ve seen some people get frustrated with all of this talk about the danger of AI, which we can get into a little bit later, but… What’s your general feeling about this discussion generally, Chris? Is it positive, negative?

A bit of both. I think it is the reality that we are moving into, either way. Regardless of how you spin it or how you perceive it, we are in a moment here where we’re seeing this GPT-2 model that is able to make people believe that the text that’s generated is indistinguishable from humans. They put the text in front of a number of people… And on top of that, just as a side thing not to get into right now, there’s been all the facial stuff that I was also seeing in the news over the last couple of weeks, where there’s the website where you can just hit Refresh over and over again and a new person that does not exist in real life is generated by a game…

Oh yeah, I’ve seen that too.

And the reason I’ve mentioned that is we’re just moving into a moment where it is now entirely practical for these AI models to be able to generate things that are indistinguishable from the reality that we are otherwise in.

To kick things off, do you wanna maybe even back up just a little bit before we dive in and kind of talk about what a language model is?

Yeah, sure. This GPT-2 model, which is what they’re calling it - it’s building on a previous model, which you might have guessed was called GPT… But this model along with a variety of other models that have been released recently - those being like BERT or ELMo… So we had another episode (episode #22) where we kind of dove into a particular implementation of BERT. If you’re wanting to know in a little bit more detail what a language model is and how to utilize it, you might listen to that episode about BERT. But any of these models, including GPT-2, when they say it’s a language model, this is really like a pre-trained encoder. What that means is you put words in, and then out the other end comes these word embeddings, or these various representations of the words, that are based on contextual relationships between all the words in your corpus.

So these embeddings come out, and then you can utilize those generated embeddings for various tasks, like sentiment analysis, or named entity recognition, question answering, text generation, machine translation… So the language model, a part of these is that encoding bit.

[07:58] Yeah, and this is a particularly big one. They described GPT-2 as a large transformer-based language model with 1.5 billion parameters, and trained on a dataset of eight million web pages. Its objective is simply to predict the next word. That’s a huge scale though.

As you’re talking about that, I wonder how did they parse and format these web pages. As we’ll talk about later, they didn’t release the full dataset that they used for this…

So we’ll talk about that later, but I don’t know, thinking about how this would operationally work in my mind, parsing these web pages is a little bit complicated in and of itself. It seems complicated, and I guess 1.5 billion parameters is no small potatoes.

No, I think it’s pretty huge. And there’s certainly the drama associated with it. They note on their blog post “Due to our concerns about malicious applications of the technology, we are not releasing the trained model.” Then they go on to say that they’ll release a much smaller model for researchers to experiment with, as well as the technical paper.

Yeah… Cue the ominous music.

I know, and my first impression when I read that was – on the assumption that this model is as great as it looks like it may be here, isn’t that sort of… You know, you have a dam that’s about to burst, where it’s like, suddenly we have this new capability… Isn’t that like sticking your fingers in little holes in the dam to try to keep the whole thing from coming? Because if it is what they think it is and they’re releasing this, it won’t be long before it’s pretty much everywhere. Because now that everyone knows you can do it, it’ll be recreated elsewhere.

Yeah, and I think it should be noted that this really algorithmically, there is not really a major advance in the architecture or algorithm that is the focus of this model, but it’s really the scaling up of it,

As you mentioned, Chris, this is a transformer-based model. The other transformer-based models recently have been, as we’ve mentioned, BERT and ELMo and these things… The transformer architecture has been around for a bit; that’s like this mechanism that learns the contextual relationships between words or sub-words in a text. That’s been around, so that’s no new to this GPT-2 model. That’s not the new thing. The new thing isn’t really how they train it, because they’re really just using this simple framework of training.

When you’re training these language models, you need to have some sort of task that you’re trying to do, even though the goal is to get the embedding layer, it’s not to do classification or translation, or something; you need some simple task to train the embedding on… And they’re just using a simple task, so it’s just (like you said) predicting the next word in text from this internet text.

So the task isn’t really new, the transformer idea isn’t really new, it’s really the scale of what they’re doing. They trained it on this hugely diverse internet dataset, or dataset of web pages… And because of the diversity of that data, there’s really some kind of significant capabilities that come out of it. Have you seen this broad set of capabilities that they’re proposing?

I have, and as I’ve read through the various articles on it, it looks like (going back to what you were saying) the key differential in this is just scale. They put a lot more hyper-parameters into it, they had a much larger dataset, but they explicitly said they weren’t really covering any new ground algorithmically, so… As we’re all starting to scale up over time, it really makes me wonder, as fast as this is moving right now, if we’re not gonna be charging forward even farther. This was essentially the race track flag that went around, and it’s “Go for it!” I think this is gonna be so common within a few months out there that you’ll see it in production pretty quick, regardless of the fact that they held back the larger model in this particular case.

[12:08] Yeah, so maybe one thing we wanna pause and define – you’ll see as you read through some of these blog posts and everything, they talk about zero-shot something, and multitask, or various tasks associated with the model… Have you encountered this idea of zero-shot before, Chris?

Nope, this was a new one to me. You wanna jump in and explain?

Yeah, so the general idea is that zero-shot basically means that the model is not trained on data that’s specific to a task, but you evaluate that model on the particular task. Let’s say – where I’ve seen this in the past is in translation, if you have a model that translates English to French and then English to Spanish, you could train that model and then you could try a sort of zero-shot thing where you translate it not from English to anything, but you could translate maybe from French to Spanish. So the model wasn’t trained on that data, but you could try it out to see how well it worked to do that task.

This is the idea of zero-shot, and what’s really interesting with this model, and I think what people are getting really excited about, is that they trained this model on this large set of data with a simple task, but it’s showing really great results… I mean, not state-of-the-art, but good results for things that it wasn’t trained to do. For example, text summarization, translation, question answering - these sorts of tasks; they’re showing these zero-shot results for things that the model wasn’t trained to do, which is kind of a crazy idea when you think about it.

So what do you think the implications are for zero-shot on training, for the industry at large? Now that this announcement came out and people are diving in, and you’re gonna see more and more in the weeks ahead - is zero-shot and this unsupervised approach, do you think that’s gonna be the standard way that people tackle this going forward, given the result that we have initially here?

I think there’s two elements to this, which are sufficient data size and diversity, and compute. So I think what they’ve shown is not that these unsupervised techniques and generalization of a model to all of these tasks is something that always can be done, but specifically they’ve shown that because their dataset exhibits all of these very diverse qualities… So there’s data about different languages, and there’s data maybe from question and answer or forum websites, or something - because there’s this diverse set of data, it naturally encodes what you need for various tasks, like question answering, translation and things.

So given that sufficient amount and diversity of data, and the actual compute that you would need to train 1.5 billion parameters, then yeah, sure, this might be a really great starting point for a whole variety of tasks. I think the main issue here is not everybody has that diverse data and not everybody has that compute. I’ve never trained a model with 1.5 billion parameters; I don’t know about you.

No, that’s a little bit bigger than I’ve dealt with, for sure. But over time, as we’ve been on this exponential curve with compute increasing - you pointed out early in this episode that we didn’t know how they were parsing the web pages… They clearly took a dataset that is publicly available to everybody, so we do have access to that if we’re willing to put the infrastructure behind the collection and the parsing… And the compute is becoming more and more available. It’s really fascinating to me to start thinking about what the implications on all of our lives are gonna be. It’s really a science-fictiony kind of idea that’s upon us, in very short order here.

[16:11] You and I are always talking about how people are concerned about the potential dangers of AI, and whether they go – in my current job with Lockheed Martin, it’s actually become part of my job to be thinking about those types of things, in the frame of conflict, obviously.

One of the things as I was reading this that I was thinking about is if you go back and look at what GANs are able to do now and you combine it with this, and you think about all the – we’ve been talking about political misinformation over the last few years, with various elections and stuff, I just wonder… You know, that’s the downside to it, but there’s also some pretty amazing upsides in terms of being able to create user experiences around these new technologies, that can do some pretty wondrous things if you combine… In the medical industry if you wanna have beyond just a chatbot, but essentially a virtual doctor who looks and talks very much like a real person, you’d never know the difference, and you’re in a remote part of Africa… We’ve talked about being in places where you don’t have ubiquitous internet everywhere… I think this is a real game-changing technology that in tandem with these other game-changers, is really accelerating what we’re gonna experience over the next few years. I think the idea of the distant future is really upon us, whether it be good or bad. Any thoughts on that?

Yeah, and I think maybe one thing that we can share, just to emphasize these sorts of implications - and really, we can talk next about the dangerous implications of this, which really have to do with what they’re saying around fake news generation, and that sort of thing… So one of the things that I think we need to do to drive that home is just read a little excerpt of some of this generated text, which is really just astounding.

This is kind of a silly subject, which maybe people don’t find interesting, or wouldn’t think is real, but imagine that this was a real news story… In one of their examples that they post online, in one of their samples from OpenAI, they have a system prompt - so this is a text that was generated by a human, and then they followed that on with a model-completed or a model-generated text that actually just generates the rest of the story.

This first bit I’m gonna read is the human-written part. They say:

“In a shocking finding, scientists discovered a herd of unicorns living in a remote, previously unexplored valley in the Andes mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.”

Okay, so that was the human-written portion, and that’s all that they gave to the model. Then the model generated the following completion. So this is all model-generated, not human-generated. The model came back with:

“The scientists named the population after their distinctive horn, Ovid’s Unicorn. These four-horn silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Perez, an evolutionary biologist from the University of La Paz and several companions were exploring the Andes mountains when they found a small valley with no other animals or humans. Perez noticed” blah-blah-blah and it keep going on.

You can already get a sense that, like, if I was to read – and it kind of drifts in and out as the story goes along, but just reading that initial bit, I would have absolutely never expected a computer to be able to generate something that coherent, especially when it’s been trained on only a very simple task. I don’t know… What are your thoughts on unicorns, Chris?

[19:51] [laughs] Well, I like unicorns, just to go on the record. I’m looking at the rest of the text that you were starting to read through, and the thing that jumps out is that it is so sophisticated in the way it’s using language. It has the sophistication of a well-educated person, as they might speak in a story-telling mode… And that’s very different from many of the computer-generated text we’ve seen over the years prior to this… It’s that sense of sophistication that jumps off the page. It’s pretty astounding. If someone wants to get on their blog at OpenAI and read through the rest of the text - I mean, easily you could believe that all of this was written by a person, and you might even challenge that it was computer-generated.

It’s probably gonna really change gaming going forward. Dungeons and Dragons will never be the same again, the way this is… But I just can’t stop thinking about all the uses for this, that we can apply in the industry.

Yeah… So we’ve got to the point where we can see – generally, what this GPT-2 model is (they should make an easier pronounceable name), and the quality of the text generation that it can produce… So we’ve seen this very coherent, sophisticated text that’s generated by this model - it’s just astounding. So naturally, as you were saying, Chris, there’s a ton of great applications to this, and maybe fun applications - like you were saying, in gaming. Maybe really good applications in tech summarization, or question answering, and that sort of thing… But it naturally brings us to the point of talking about “Hey, there’s some really malicious applications of this as well”, especially if we talk about fake news generation… So if you’re able to generate basically endless news stories that are coherent along a particular viewpoint, or promoting a particular viewpoint or idea or story that’s fake - obviously, that flood of really coherent, fake new is definitely of concern.

You were talking about in terms of security, and all that; you always are thinking along these lines these days, Chris… What’s your thought on that sort of line of application?

When I was in Switzerland for the Applied Machine Learning Days conference, I also had a conversation with an expert on AI safety. He was working on models that addressed some of the very things that you were just talking about… And I guess seeing this come out - it’s something that we’ve been discussing, but it made me realize how critically important the field of AI safety is going to be.

I think just like we’ve been talking about ethics over the past year, it’s crucial to this. I think different forms of AI safety, in terms of being able to differentiate between what is fake and what is not fake is gonna be so crucial for not just the technology, but for society going forward. I think we’re gonna spend the next year talking a lot about AI safety, because the genie is out of the bottle on this, and whether we’re worrying about the good or the bad, it’s an amazing new technology… But we now have to be able to start being able to distinguish what that is. If you’re talking about fake news and the ability to scale up on that, on the downside you could just be awash in fake news, and suddenly AI safety is all about where is the real news in that.

If you’re talking about a situation where you’re in a conflict between two nations, or something, it becomes a weapon of war. You have to start having tools to distinguish between them. These are both dark things, but there are people in the world that will certainly try to use it for malicious purposes, as pointed out in the blog.

Yeah, the whole idea that the dangerous bit of AI, that AI is gaining consciousness and taking over the world - I think we can just put that aside for a long time.

It’s kind of irrelevant.

[23:56] Yeah. The danger that you can see with this application of text generation - humans can do a lot of things, and text generation is one of those. And even if we just see this model is capable of the quality of this text generation and nothing else, that in and of itself has huge security concerns.

I can read the quote from the OpenAI release blog post that you referenced before. They say due to our concerns about malicious applications of this technology, they’re only releasing a much smaller model. They don’t release the data, they have a technical paper… They reference certain particular malicious uses of it - they list off generating misleading news articles (that’s what we’ve talked about), they mention impersonating others online, automating the production of abusive or faked content to post on social media, and automating the production of spam or phishing content. I don’t know if it’s good or bad that they listed out the – I guess people would have figured out those anyway.

There’s a point that it makes, and it goes to what you were just saying a moment ago… For many years, the concern about AI going amok has been in – you know, what happens if AI becomes conscious or self-aware, and able to take actions that we were not anticipating… The reality is I’m unaware of anyone making any substantial progress down that road. But the thing that has mattered a great deal is that you have these tools, these AI tools - like we were talking in this case, this GPT-2 model - that has nothing to do with self-awareness. It’s not self-aware, it’s not conscious, but it is so very good at one specific type of task, that it is able to match or exceed human capability in its neural thing. We’ve seen that outside the NLP space as well, like the GANs we were talking about a little while ago.

So the concern, the danger, the ethical issues, the safety issues that we may be considering going forward - I don’t think it’s about consciousness, I don’t think it’s about the Terminator robot that’s loose upon the world, and going around killing everybody. I think it’s about the way humans are applying these very specific tools that are just marvelous at what they do, and they can be used for great good or terrible evil. I think that’s where the real conversation is going forward, it’s how do we want to do that.

I think probably - I’m guessing you are, too - I’m kind of tired of people talking about Terminator robots coming to kills us, because I just haven’t seen that in reality. But this is a big concern here, about how do we move forward into a future where these tools become commonplace.

Yeah, and I do appreciate OpenAI and Google and others - particularly OpenAI in this case - being transparent about their concerns with this. I’ve heard certain people say “Oh, well they’re just saying these things about danger because they want more publicity”, which who knows what their full motivations are. Based on my perception, I don’t think that that’s totally it. Maybe there’s a part of it that’s that way… But I do appreciate their transparency around the fact that they’ve thought through this. They’ve decided to deal with this issue by still releasing the research, so publishing the paper, still releasing the code, but only releasing a smaller pre-trained model, so not the full model, and then also not releasing the full dataset that they’ve parsed associated with this.

So I guess their thought process is “Well, people don’t have the compute that we have. It would take them an enormous amount of time to recreate this dataset, and train the larger model on this dataset.” So they’re thinking “Well, this at least buys us time.”

That’s exactly right.

[27:57] Do you track with that train of thought, or does that seem not sufficient to you, or just not relevant? What is your thought as far as how they’ve dealt with the issue, from their perspective?

Yeah, there isn’t really a guidebook on how responsible disclosure is to be done in this; different organizations have different approaches. I give them the benefit of the doubt, that they’re trying to be responsible. It certainly doesn’t stop the potential for malicious actors to take advantage of this, but what it does is it slows it down over time for the exact reasons that you’ve just said, and it gives us time to think our way through it a little bit… Which I think is good, because it’s still out there, it’s still coming, we now know it’s possible, and that means that everybody will be very much focused on it. It’s already proven to work, and therefore there’ll be money behind it and there’ll be interest behind it.

I think it really comes down to the fact that as we go forward, just as we have been talking ethics and as we are now talking AI safety, we need to build some frameworks around what it means to discover these tools, produce these tools and release them into the public. I think they’re coming, I don’t think they’re likely to stop at any point, but I like the fact that they’re thinking “Let’s put the brakes on just a little bit, to have time to react a little bit better than we can at the moment.”

I agree with you pretty much in everything you said. There is one aspect of this that I don’t know that I’ve fully formed an opinion on, in the sense that OpenAI is essentially saying that they’ve judged this to have negative consequences in however they’re quantifying that, so they have deemed that it matters that they don’t release things… Rather them releasing things and then having the community be able to test it, be able to actually use it to come up with methods that would fight against the negative consequences that it might produce. They’re pretty much restricting it to themselves, and in that sense, other people can’t really fully parse the consequences, because they don’t have access to the full thing.

I’ve seen this argument out there, essentially the OpenAI - they’re making this decision about it, and people said there’s no excuse for waiting to release it, and that sort of thing… Which I kind of get their train of thought, I don’t know that I fully agree with it.

I didn’t mean to interrupt, I was just gonna say - I think that if you’ll think back to recent history, where we spent so much of the past year talking about the ethics around AI, and we’ve had experts like Susan Etlinger on the show to discuss… In that time period, as that conversation was being had within the community, you had a lot of the big players such as Google and Microsoft and others helped by releasing – they thought their way through their own ethical framework and they released those guidelines that they were using internally, and those of us who have been the beneficiaries of that have been able to form what we think around several different frameworks, and combine and make something that we hopefully think works for ourselves.

Maybe that’s something we can do here, from a safety standpoint - with this kind of release, and presumably others to come as well, it gives us a chance as a community to react a little bit about how we want to frame this from a safety standpoint in terms of release, think our way through it a little bit and then do it. So maybe a year from now, two years from now there’s more of a standard way of doing it, instead of feeling your way on your own… So I don’t really hold the carefulness of what they’re doing against them.

[31:42] Yeah, that’s a good point. I think we’ll reference that link to Susan’s show, and others, in the show notes. This is an active topic of discussion within the community, and we would love to really hear from all of our listeners what your thoughts are on this pretty controversial subject. We have a Slack channel, a Practical AI Slack channel. If you go to you can join our Slack channel. We also have a LinkedIn group where you can make comments, so join one of those communities and let us know what you’re thinking. If you have references to other good articles, or other guests that you think might shed some light on this, we’d love to have them on the show and we’d love to share those links via the news feed at So definitely get involved with that.

And the show notes, by the way, while you’re mentioning community - the show notes are now starting to include a link to the Changelog News for Practical AI. So if you do go to the show notes, you’ll see that there’s a link right there where you can get into the conversation very directly, as well. I just wanted to mention that other newer approach that we’re starting to roll out.

Yeah, thanks. Well, as we kind of wrap the discussion of GPT-2 up, and before we share some learning resources, maybe it’d be good to summarize some takeaways from what OpenAI has done and from how the community has responded. I think one big takeaway that I have seen is that we can pretty much expect, as you’ve already alluded to, that OpenAI, Google, Microsoft and these other big players are no longer thinking that it’s appropriate to innocently publish all of their new AI research findings, and the code associated with them. So to some degree I think we can expect that the days of just everything going on GitHub all of a sudden, and download all the pre-trained models is over, to some degree… Which is sad in certain respects, and maybe appropriate as well.

Yeah, I think to tag onto that - the age of any significant release, automatically considering the issues around AI safety along with ethics, is part of the release at this point. And coming from more of a software development background, that’s been rare cases, very specific that you’d have to think that way… Because most software isn’t inherently so powerful that it could be used for good or ill in many use cases, the way some of these technologies are. So I think it’s a maturing process that we’re having here, and I’m glad to see that OpenAI is leading the way, as they do, and thinking about how to release responsibly.

I still think the code is gonna be out there, and I think not only them, but I think with this we’ll see a lot of other organizations researching this area, since there’s already been proven results. So I think it’s upon us, and we need to roll into it cautiously.

Yeah. Along with that, I think businesses are taking this seriously because it can affect their bottom line if there’s ethical concerns that can actually harm their business, based on the AI software that they’re using internally. They are, to some degree, looking at this from a business perspective, and seeing that there is some connection with these ethical concerns to both the perception of them, and how it affects their bottom line.

Along with that, of course a lot of people I knew as well have already mentioned that there is a huge need to, like, yesterday we need to be researching methods to detect AI-generated text. And I know there are certain efforts out there… I also realized – I forget who I was talking to, it was at a conference, and it’s a really hard problem. Generating the text is a lot easier than detecting if it was AI-generated or not.

You are so right about that. That AI safety conversation, that may very well be an upcoming episode (hint-hint), it talks about that. It’s much harder to differentiate the real from the unreal, than it is to simply create the unreal. It’s an order of magnitude harder, so that’s one reason why a cautious release may be a good, mature way of doing it.

[36:02] And lastly, if you haven’t noticed, AI for natural language is on fire, everywhere.

It’s like, everybody is doing AI plus Natural Language, and tons of great results… I think one thing that you can look for as this year goes on is some pretty crazy stuff probably to come out of conferences like ACL, and EMNLP, and NUREPS, around natural language, and this sort of thing, and along kind of the unsupervised or semi-supervised sorts of methods, so… Definitely something to keep an eye on.

Yeah, I agree with you. I’m really excited about seeing use cases for technology like what GPT-2 is making available gradually here, combined with what GANs can do. I think that’s pretty fascinating. You talked about how businesses will be impacted, but I think that there will be a wave of new types of businesses being created with these new technologies as well, and I’m very eager to see what kinds of thoughtful things entrepreneurs come up with.

Yeah. Speaking of that, in a couple of weeks here we’re going to be interviewing the CEO of Hugging Face. If you’re following natural language and AI at all on Twitter and elsewhere, they are all over the place, creating amazing things related to conversational AI, so I’m really excited about that interview. Stay tuned for that one.

To close this out here, we always like to share some learning resources… If this conversation has sparked your interest in these topics and you wanna dive in a little bit more, learn some of the details, maybe even try some of the methods - of course, we’ll link to the code and the repos and everything in the show notes, but we did wanna point you to a couple sets of blog articles that I think can really help you get started.

The first of those are on There’s one called “An in-depth tutorial to AllenNLP”, which AllenNLP is this toolkit based around PyTorch, and they have implemented things like ELMo and BERT in the toolkit. That blog post would be a really good, hands-on start. There also a kind of paper-dissected article about BERT on the blog.

Then there’s this other blog which I kind of came across recently and I wasn’t aware of, from Jay Alammar. He has a series of blog posts called “The illustrated…” something. He has “The illustrated transformer”, which is talking about this transformer sort of model that all of these releases are based around. Then there’s an “Illustrated BERT, ELMo and company”, which talks about these encoders.

I know I pointed you to these illustrated ones a little bit earlier… Did you get a chance to look at those, Chris?

I did. They’re really good, thank you very much for pointing those out. I recommend to listeners that wanna dive in; these can be fairly complicated topics to ramp up on, and the illustrated pages are really good for doing it. It may not be all you need, you may combine that with other resources, but it’s another good one that you found there.

Awesome. Well, this has been a great discussion, Chris. Thanks for all your insights, and looking forward to talking to you again soon!

Sounds good. As you said, we’ve got more interviews coming up… So have a very good week, and we’ll talk to you next week!


Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00