Daniel and Chris do a deep dive into OpenAI’s ChatGPT, which is the first LLM to enjoy direct mass adoption by folks outside the AI world. They discuss how it works, its effect on the world, ramifications of its adoption, and what we may expect in the future as these types of models continue to evolve.
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to Welcome to another Fully Connected episode of the Practical AI podcast. These episodes are where Chris and I keep you fully connected with everything that’s happening in the AI community. We’ll take some time to discuss the latest AI news and then dig into some learning resources to help you level up your machine learning game. I’m Daniel Whitenack, I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?
Doing very well. Happy new year! 2023, this is our first conversation.
Yeah, happy new year. This is the first one we’re recording in the year 2023. Looking already to be an exciting year for AI things. I hope you got a bit of a refreshing break over winter, because there’s a lot of – I’m guessing it’s gonna be a whirlwind of AI stuff this year.
I think it is going to be a whirlwind. I didn’t get a rest over the break, because having nothing to do with AI, our animal nonprofit, we had all the winter weather that most people in the US were aware of, and we were doing animal emergencies. We saved a whole bunch of lives, which made the lack of rest worthwhile.
But there was a lack of rest.
There was a lack of rest, but we did a lot of good… But interestingly, the conversation we’re gonna have today will play into that very non-AI side of my life, because we’re starting to see some crossovers, we’ll see in a few minutes here.
Yeah, it’s interesting, So today, spoiler alert, we’re gonna be talking about ChatGPT; you’ve probably been expecting us to talk about ChatGPT for some time. One of the things we wanted to do is really dig into the internals of ChatGPT, how it works and its implications, and so we wanted to do it justice, which is partially why we wanted to take some time and prep for that… But it is interesting also to get a little bit of perspective, now that ChatGPT has been out for - not that long, but a little while.
Over Christmas - I was at Christmas with my family, and even at our family Christmas dinner, my dad was asking me about ChatGPT. And at my church, I’ve had people come up to me and ask about ChatGPT, who don’t work in tech or anything like that. And my barber… Whoever is in my life, it seems like they’re at least aware of ChatGPT. They might not know exactly what it is, but they know that it’s a big deal. Are you having a similar experience?
Very, very similar to that. And for folks - Daniel and I haven’t talked through the holidays, so this is the first time I’m hearing it, just as you are… And I’m having the same experience, and it’s been really notable. Each new large language model comes out, and the various GPT series, and we talk about it - this is the one that’s crossed over into mainstream awareness and broad use. And I mentioned, as we were getting into the conversation, that it’s now crossing over from the technical and AI side of my life into the non-technical and animal side, as we do things like narratives, both written and video and educational material. This is an amazing tool that completely non-AI-focused people can use productively, to really do good in the world and get things done that they want. So it’s been really interesting to see how this one has been different from the GPTs before.
Yeah. So in your case, it’s something that, as you’re creating content, you see it as potentially playing a role in whatever scripts, or articles, or whatever that might be. Is that right?
Absolutely. It’s been quite humbling in that way, experimenting what was possible… Because the quality of the outputs are typically much better than I can do by myself. I’ve done that both in terms of - I’m writing a children’s story to teach children about animals, and I’ve been experimenting with it, and every time I write something and then I seed it into ChatGPT, it does a better job than me. So it’s been very humbling in that way. I think of myself as a decent writer as well.
And then the quality of video output has been quite good, and there’s a little workflow… But it means that we can do more good in the world, faster. It accelerates the ability to put out great content. And so I think that this is one of those inflection points that we’ve seen, not just on a technical merit, but in the world at large.
[06:10] Well, your usage seems to be much more useful and valuable than my usage, which has mostly been things like writing – I remember I had ChatGPT write a new Christmas Carol for me about the Three Wise Men in the style of a rap song by Eminem… I have to say, it was a great rap song. I didn’t record it, because I’m not Eminem… But I sent it to his people, and we’re having discussions.
Okay. I can’t believe you’re not sharing that with us.
Well, maybe before we jump in, I think some of what we wanted to do today was just describe a bit of what ChatGPT is, what the interface looks, what you can do… But then really do a deep-dive on what are the guts of the system. Why is it different than what’s come before? In what ways is it similar to things that have come before? Both of those things are true. And so we want to do a deep-dive and then think about some of the implications. So buckle up. Hopefully, this will be fun.
First off, it is called ChatGPT, which is interesting… So the interface that they’ve chosen for this, and the sort of design of the system is a chat interface… So if you go to chatgpt.OpenAI.com, you need to create an account, and we can talk about some of the implications around that in a second… When you log in, it gives you some examples of what you can do, some example capabilities, and some limitations. I’ve found this interesting, and we can talk about it later, some of how they describe the limitations, and they released the model… But the basic idea is there’s a chat interface, you can type a prompt, and it will respond. And then you can actually continue to have a dialogue with the system. So you can say, “Tell me more about that”, or “I don’t understand this part. Explain that bit more.”
So yeah, that’s the basic input/output. How did you find this sort of interface, Chris, in terms of your own usage, as related to like building scripts and other things?
It’s been interesting in that it will take in a direction… Like, as I’ve been trying it out, the children’s story thing is something I’ve been playing with, and seeing where we’re ChatGPT chooses to take the beginning of/the seed of a narrative… Like, I would start off with “Once upon a time, there was a precocious raccoon named Pandora”, because that’s the hero in the story. And it’s been interesting to see how it’s taken it. But it’s also – it will go off in directions I don’t want, so then I’ll ask questions to kind of steer it a little bit, and it will come back.
So it’s not final output, but it’s producing a body of narrative that’s better than I could have done, by far. And so I find myself, instead of being the creator of the story, I’m kind of editing it to make it work… But it’s a collaboration, in a sense, between – this is one of those first points where… We’ve talked in an inspiring way about collaboration with AI for a long time, but I now am doing that, and steering it in different ways with entering in the chat, and seeing where it went, and asking specific questions about the story… It’s been quite remarkable. For the first time, it’s like having a partner in the process. It just happens to be that the partner is not human, in this case.
[09:50] A friend of mine, Brent Seagal, has been also playing with it a lot, doing some stuff, and that’s how he described it as well. He was looking at some different topics, and he said “It’s like having a dozen world-class scientists” for the things that he was focusing on, right there, available to you, and they’re not wrong. They never get it wrong. He had some pretty cool stuff that he was working on in a very different thing… But it’s that sense of collaboration with the technology in a real life sense that’s really different now from the way it was before this.
Well, as you were chatting - which, that was some great context - I asked ChatGPT the following: “What state of the art AI topics does the Practical AI podcast need to cover in 2023? Who should they reach out to for interviews?” And the response - are you ready for what we’ll cover in 2023?
Let’s hear it.
ChatGPT said “It’s difficult to predict exactly what the state of the art in AI will be in 2023.” Definitely an interesting start…
“…as the field is evolving rapidly. However, some topics that might be worth covering include machine learning interpretability and explainability”, and then it actually gives a blurb about what that is, “AI safety and ethics”, so that’s right in your wheelhouse… And it gives an explanation that. “Natural language processing” - so we can continue to talk about my favorite topic of NLP in 2023… And computer vision was the other one. And they said, “To find guests to interview on these topics, you might consider reaching out to researchers and practitioners working in these areas.”
Some suggestions include Rachel Thomas, co-founder of Fast.ai, Timnit Gebru, co-lead of the Ethical Artificial Intelligence Team at Google, which is interesting that it gave that response, because that is not factually correct anymore, as she is not with Google… And actually, that was in the news quite a bit…
It was. That was a significant story in the AI world a few months ago.
And then it gives a few others, including Yann LeCun, who - of course we would love to have him on the show. And we’d love to have Rachel and Timnit as well on the show.
But yeah, interesting. So a few things, I guess, that strike me as an example with these certain cases - the output is definitely natural and coherent. So that is thing one. That’s striking. Thing two for me is there’s actually a good bit of like structuring that goes on here… So they actually give 1, 2, 3, 4, the topics that we need to cover, and then a bulleted list of the people that we need to have on the show.
Thing three is, despite it being coherent and natural, it is not fully correct factually. So that’s maybe another element of this.
You know, it’s funny, because we’ve seen a fair amount of criticism about ChatGPT getting things wrong, and stuff… I find it curious that, as we talk to humans about human things, we get things wrong constantly, and fact-checking, and was that misinformation, or was it just unintentional? And yet we hold these technologies to such a perfect standard that we are ourselves completely unable to hold up. I wouldn’t want to ask one question and assume that it was 100% right, but it makes it a little bit more interesting to me… That collaboration, I daresay takes on a human element by having error in it.
Yeah. And we’ll talk a little bit later about the interaction between this and humans, and where the burden lies… I do think that the interface that they’ve provided, and being explicit about limitations - that’s a good thing. Now, certain people might kind of go back and forth on – this model is not open access, right? Like, you can sign up and create an account, and a lot of people have done that, and you can interact with it, but the model weights itself, and it’s not released publicly in that sense, even if a lot of people can use it for free at the moment.
[14:12] So there’s pros and cons there, but I think it’s interesting that this model, as opposed to GPT-3 earlier - it was, I think, easier for the general population to interact with this model right away, in comparison with GPT-3, which had a very prolonged waitlist, and timing, and all of that, and lots of explanations… So it seems that they’ve kind of shifted the scales a little bit in terms of making access to run the model more open, while still maintaining it as a closed model and providing limitations.
So it’s interesting to see also that kind of shift in dynamics, which I think probably was influenced by the fact that actual open access models like Stable Diffusion and others have taken off so widely so quickly because they are more open access-wise. And so I felt like we saw OpenAI shift a little bit in how they released this, while still kind of maintaining some of the elements of how they released GPT-3 and others.
I agree with that. Yeah, I mean, we’ve seen that kind of evolution as they’ve explored release approaches over time, and iterations, and such. I think one of the things that we’ve seen across this is the fact that every time a breakthrough comes on, we’re starting to have fairly quick follow-up. Once people know that something is possible, they manage to kind of reverse-engineer it. So I suspect that aside from strictly ChatGPT, that we will see some fast followers pretty soon.
Alright, Chris, let’s get into the technical details of this, which I know I’m excited to chat through… I guess pun-intended in that case…
There’s kind of two elements of this that I think are important to talk about before we talk about what actually was done with ChatGPT specifically… And these two things are more general than ChatGPT. One is sort of the GPT family of language models, and those types of language models, and then also a technology or approach called reinforcement learning from human feedback. Those two things kind of combined here to create the ChatGPT system. And these two types of models and approach have been applied more widely in other cases and by other people, but here they are applied by OpenAI.
So starting to talk about this sort of language family model of GPTs - we had GPT, and GPT-2, and GPT-3, and GPT-3.5, and I don’t know what – to be honest, I don’t know what number we’re on now. But these GPT language models are just that - they’re a language model, and they’re a specific type of language model called a causal language model. People might be familiar, or at least have heard the words “causal language models”, CLM, or “mass language model”, MLM… So mass language model kind of takes a sentence, and what it’s trained to do is kind of, for one word that’s masked in the sentence, or taken out, or given a special token, it’s trained to predict that based on everything else in the sentence. So it sort of looks both ways at the sentence and tries to predict the mask.
[17:53] GPT is not a mask language model, it’s a causal language model, which means that it’s trained to predict the next word in a sequence of words, or in a sequence of tokens, whatever those tokens might be. It does that, and it predicts the next word in a sequence, but it does it based on all of the previous words. And it does that sequentially. So as you go through the sentence, the training methodology is what they call auto-regressive; it means that it predicts the next thing from all of the previous things, and then once it predicts that next thing, then it predicts the next-next thing based on all the previous things, and then the next-next-next thing etc. That’s the auto-regressive part of it.
I suppose we’re kind of seeing that in action, because when you’re using the interface, it doesn’t just give you the entire output all at one time; it comes back with text. You see the text developing, much as if you were typing it on the screen yourself. So I guess you’re gradually seeing each of those iterations coming back.
Yeah. And I think in the original GPT-3 interface, or the playground that we both played with, you kind of see this as well. You kind of give a prompt, and then it generates this text out… And that allows it also to be very flexible, and produce these structures, and it also allows it to be flexible between different tasks. Like, if you start prompting it with question-answers, it sort of learns that pattern in a sort of few-shot way, and then starts predicting the next question and answers, or something like that. Or if you want a script, or if you want a narrative, or if you want something else, it kind of adapts in that few-shot learning sort of way, which is a key element of this GPT or causal language model structure. And GPT is not the only one. There’s other ones, but this is the family which GPT sits in.
And you mentioned - just as a two-second sideline - few-shot. Do you want to – real quick, just for those who may not be familiar?
Yeah, some jargon… Few-shot, zero-shot is thrown around… A zero-shot prediction or usage of a model means that maybe you’re using a model on inputs or a type of input that it’s never seen before, even though it’s seen maybe similar things. This happens with machine translation models that are multilingual maybe, because you might have in your training data English to French, and Arabic to Spanish, but you don’t have examples English to Spanish; but you have English and Spanish data in the dataset, and so you could still ask that model to try to output an English-to-Spanish translation, and actually, that can kind of work in certain scenarios.
Few-shot means that you’re not quite doing it that way, but you’re providing a small number of prompts, that kind of guide the language model into the type of thing that you’re wanting to do. So in the GPT-3 interface or playground, if you remember, you can kind of start with a question-answer template and provide some examples. And then you can provide the next one, and it’ll answer it for you. And so you provide that set of templates or prompts. And this kind of gets into this idea of prompt engineering, and that sort of thing, because these models are so flexible. So that was the original paper from GPT-3, was titled something like “Language models/few-shot learners”, or something like that. That was one of the big ideas there.
That kind of gets us to GPT and language models… But ChatGPT – well, I guess it is a model like that. So it is a GPT-based model. But the reason why the system is so powerful is because it’s a language model that has been trained in a very unique way, that has proved to be actually quite valuable. And that’s that it is a GPT-based model that was trained using reinforcement learning from human feedback, or RLHF; reinforcement learning from human feedback.
[22:06] We’ll link to this in the show notes, but there is a really great article on the Hugging Face blog from Nathan Lambert, Louis Castricato, Leonardo von Werra and Alex Havrilla called “Reinforcement learning from human feedback”, and they talk about ChatGPT and other like models. So we’re gonna pull a lot of our insights from this article. Thank you to all of you for writing this article, because it was really helpful; much more helpful than maybe the OpenAI blog by itself.
The major idea here with reinforcement learning from human feedback is trying to answer the question “Can we use human feedback on generated text as a measure of performance that goes beyond sort of just like automated measures of performance?” So how do we integrate human feedback into the loop of training a model as a performance metric? And in that way, we’re sort of training a language model, but we’re also training it in ways that match human preference for answers. So human preference is a key piece of this, and I think that’s why people like ChatGPT, is we prefer the things that it outputs. I don’t know if that was the case for you. With just a raw language model like GPT-3 you can get some cool stuff output, but it might not fit your preferences of like how a human would actually respond to something.
Going back to the example I mentioned at the beginning, that was the trick for me… Using the children’s story as an example, I had a specific rough narrative in mind, because I’m trying to teach, and there are certain points that I’m trying to illustrate… And obviously, it doesn’t know that, the model; but if you work with the model, being able to kind of continue to point it the right way - that was very interesting.
I’m curious, going back to what you were talking about a moment ago, with reinforcement learning with the human feedback - how does that scale? Because if we were to compare this for a moment… I know this is very much kind of a newbie question. But for those of us who are not deeply into language models, when we were looking at other types of models a couple of years, 2, 3, 4, 5 years ago, there was always a challenge about getting human feedback to scale with the amount of training data. How is that tackled in this approach, so that you can do reinforcement learning that way, but it scales to what we’re doing at GPT?
There’s actually like a whole loop of models involved here, and different training sets that are of different scales, and different models that are of different scales. So let me talk through a little bit of that, and hopefully that will become more clear, because yeah, obviously, human feedback is expensive in terms of gathering it; so how much of this do you need. So there’s three steps – the process with which ChatGPT was trained and other models using this reinforcement learning from human feedback approach, it’s a three-step process. So you pre-train a language model, which is not new; we’ve been doing that for quite some time, right? You pre-train a language model, then you gather this sort of human preference data, and train a reward model. Now, the second reward model is trained to take in a prompt and a response and score it, like a human would score it, according to preference. It’s actually trained on the human preference data, and it outputs a prediction of what a human preference might be on this output. And then, the third and final step is that you fine-tune a copy of your original language model using this trained reward model and a reinforcement learning loop.
So is it kind of the discriminator? You’re using the reward model as the discriminator in that?
[26:08] In a reinforcement learning loop you would have a kind of policy, which outputs like a “what you should do next” sort of thing. And then you have some type of reward system that rewards the agent for acting according to the policy or not. So in this case, the reward model is outputting that reward or that preference, and the language model is actually acting as the policy here.
So you have an original language model that is kind of your original policy, and isn’t fine-tuned yet, according to human feedback. Then you gather some human feedback, like actual human feedback, train a reward model to simulate that human feedback, and then you fine-tune a copy of your original language model or a copy of your policy with this reward model.
And so the pre-trained language model - it could be any language model, it doesn’t have to be GPT-3, but in the case of OpenAI, it was GPT-3. But you have an original language model, and that language model could just be a general pre-trained language model, or you could additionally fine-tune that model, maybe for a domain or a specific type of output you want. So that’s your pre-trained language model.
And then step two, to get the reward model, what you do is you start outputting data from your original policy, from your original language model, and you have humans rate it. Maybe you combine that with certain human output, or certain other outputs, and you have human rating. So that way you’re creating a training set for your reward model, which includes human labels of their preference. Also in this step two you then train a reward model using that data that you’ve gathered from humans to output the preference.
Now, to your point of “How does this scale?” Well, the fine-tuning of the policy is done kind of with this automated reinforcement learning loop… But you do need humans to generate enough data to train your reward model that’s used in that loop. And what’s interesting is – and the Hugging Face blog makes this point, is that different people or different groups that have applied this reinforcement learning from human feedback have used different-sized reward models. And obviously, as the size of your reward model increases, you need more data to train it. That would be a general rule. In the case of OpenAI, their main language model was like 175 billion parameters, and the reward model was much, much smaller. 6 billion parameters. In other cases, people have done similarly-sized models. And so I think that is an open question - how should these models size-wise be related to one another? What types of models should you use for your reward model, and how much human feedback do you need? To be honest, I think those are open research questions.
Let me ask you another question on that. With us getting high-quality output that is comparable to human output, very closely, to where if you were to get that output, you would find a very difficult time knowing whether it was the model or a human that did that - does that potentially go back in to train further reward models, where you’re using essentially synthetic data as the output of a previously-trained, and so you can build on it? Essentially, there’s a point where you have enough data, where you’re largely able to take humans back out – recognizing it’s the tool of the day, but in the future you can take humans back out of that loop of providing the reward model to do that… Do you anticipate that that would be a reasonable expectation?
[30:00] I think in this methodology, the reinforcement learning from human feedback, one of the goals in that middle step is to get enough human feedback that you kind of reduce the harm and the helpfulness of the output model. So this is really addressing, I think, some of those kind of problems with large language models of hallucination, and harmful effects, general output. And what I think is the finding here is you can address those with humans in the loop, rather than humans totally out of the loop.
Now, here in the next step that we’ll describe in the process, humans are taken back out of the loop to fine-tune the model, but they’re that central piece, so this three-step process of starting with a language model on one end, ending with a reinforcement learning trained model on the other end, has this middle step that I think is a really key piece of it, that actually helps the utility of the output and potentially reducing harm of the output, which is that human feedback piece.
Alright, Chris, we’re about to the end of this reinforcement learning from human feedback loop. Just in summary, the loop is we have a pre-trained language model, then we gather this human feedback or rating of the output to train a reward model… Now we’re actually going to use that reward model. So in the final step of the process, we make a copy of the original language model, or the policy… So you have an original policy, and you have a copy of the policy; or an original language model and a copy of the language model. You put in a prompt to each of those models, and then you get an output from each of those models. Then you use a sort of constrained reward function where you actually penalize if the updated model is straying too far away from the original model – because I think what they have found is if you allow it to sort of just take any direction in the output you want, it can have computationally some optimization problems. So you kind of gradually change this language model from the original, and you have a penalty for how far that output strays from the original output… And then you score that output with this reward model that you’ve created. And the way that they’re doing the updates for ChatGPT and some of these others is with a reinforcement algorithm called Proximal Policy Optimization, which you have sort of two levels of what in physics I would think of as adiabatic change, meaning like things don’t change too quickly.
One is you don’t stray from the original policy output too much or you’re penalized from that, and secondly, this reinforcement learning algorithm called PPO prevents you from making too big of updates to your model weights in each step. That way you don’t have, again, this kind of hard optimization to do.
But in summary, you kind of have these two models - the original one, the updated one, you output a prompt from both of them, those go into your reward function, which includes a penalty element for straying too far from the same output, it also includes the actual estimated reward or estimated preference from your reward model… And then that reward is then used to update the weights of your copy model or your new policy using this PPO reinforcement learning algorithm.
[33:59] There’s some diagrams in the post that I think are quite helpful. It’s a bit hard on a podcast, but hopefully, that loop makes some sense in terms of how you’re updating this. And this updated policy, or this updated language model is the model that is used. So this is like the ChatGPT model that comes out of the end.
I think given the limitations of our medium here, I think that was a very lucid explanation of translating it, so I appreciate that. I’ve definitely learned some.
Good. And no formulas… You can ask ChatGPT to output all of the right formulas, and I’m sure it would do a fine job.
Where do you think we’re going from here? As you have looked at this progression of these models that we’ve covered on the show over time – with ChatGPT in particular, I’ve been kind of amazed at what it could do, and using it, but I’m really, really curious about where this is going… And I think it’s capturing a lot of people’s imagination in that way, that are outside the field… Like, what’s next?
I think there’s still open research question ones here that are worth exploring, and then there’s workflow and practical implications, I think. On the first side, as was mentioned already, and we were discussing this reward model, as far as I can tell, it’s not totally determined what the architecture of this reward model should look like, how big it should be in relation to the model that you’re fine-tuning, how much human feedback should you use, how does the amount of human feedback that you get influence the harmfulness or the utility of the output, and that sort of thing… So I think there’s a lot to explore around that dynamic between the reward model and the language model.
In addition – I mean, language models are still being developed, right? So ChatGPT used the GPT-3.5 language model as this original policy, right? And they actually used the fine-tuned version of that, using supervised methods and human chat conversations. So they started with a fine-tuned version of GPT-3.5. So obviously, we’re going to have a GPT-4, GPT-5, we’re going to have other language models from other providers, right? From other research groups - big science, or Google, or Microsoft, or whoever’s developing these other language models, we’re going to have updated versions of those.
So I think we can see a research direction with this where people are trying different pre-trained models as their original policy, where people are trying different reward models, or they’re mixing them up in interesting ways, where they’re maybe using slightly modified versions of the PPO algorithm, or other reinforcement learning algorithms to do the updates… So there’s a research direction where I think we’ll just see a lot of exploration with this kind of template as the structure that they’re exploring.
The second piece, which is maybe more interesting to some of our audience, is what are the implications of this in terms of people’s workflow…
I was about to ask you that if you hadn’t gone there…
Yeah, I don’t know, what are your initial thoughts there, Chris?
It’s less about the technical aspects of the model and more about going back to the user interface considerations that we talked about earlier in the conversation. I would be amazed if the community at large, not just OpenAI, hasn’t understood the impact of making choices like that, that may not be specific to the model development, but how you’re putting it out there… And they’re seeing widespread adoption.
When you go into their interface, you get a warning right off the bat, “We’re experiencing exceptionally high demand. Please hang in tight as we scale our systems.” And I think that’s indicative of the fact that people who are not normally listeners of this podcast are starting to find a lot of utility for the first time ever.
[37:53] It’ll be interesting… You know, we keep talking about exponential growth in this field, and these amazing kind of mini-revolutions along the way… But this is that first point, where it’s probably going orders of magnitude broader in terms of applicability to different workflows and audiences. And as you’re combining – just for a moment, going back and combining natural language, with the large language models, with generative capabilities, with reinforcement learning, and we’re kind of seeing… We saw slices of each of these fields over the last few years developing, and we’ve been talking about this fusion of the fields… And so how soon before we start seeing entertainment that is being heavily, heavily based on these technologies… I’m seeing it in my little tiny nonprofit, because we can suddenly leverage this to put out content to help folks in a charitable fashion, and we can do at least 10 times as much as we would have been able to before, by taking advantage of these.
And so I think we’re at that inflection point now where this will be the first, and as we have continuing episodes through the course of this year, and some new things come out, whether it’s from OpenAI, or similar things from other organizations, I think we’re getting to that point where it’s really hitting broadly in real life. So I’m really fascinated, I would love to hear from our listeners on ways that they’re using this technology, what they think might come next, and how they are envisioning using it within their own organizational missions to accomplish what they want. It’s a fascinating moment in the history of AI that we’re in this second.
And one thing which I can’t claim as my own insight, that I stole from Twitter, but I think has really shifted my thinking a little bit on this subject is – so this is actually a tweet from Chris Albon, who is the director of machine learning at Wikimedia… And the statement he made, which I think was really insightful - and maybe other people are having similar observations… But he said “Sci-fi got it wrong. We assumed AI would be super-logical, and humans would provide creativity. But in reality, it’s the opposite. Generative AI is good at getting an approximately correct output, but if you need precision and accuracy, you need a human.”
So I think the observation here is – and we’ve talked about this on the show with language models also… Language models are really good actually at naturalness, creativity, apparent coherence, right? Like, that actually is what they’re good at. But they get the facts and the precision and the accuracy wrong many times, right? So whereas I think in the past people have thought the unique thing about what humans can provide in an AI-driven system is creativity, not logic, and that sort of thing - actually, the opposite is really the case. The AI bits are really driving the creativity, and the humans are enforcing the logic, the facts, the accuracy, and the precision. That has really shifted – I think I’ve been realizing that over time, but that statement really put some words to what I was thinking…
[41:23] It’s comforting, in a way… And the reason I say that is we talked in times past about creativity coming from the humans rather than the machines, and yet the evidence that we’ve been looking at over these last couple of years has been not that. And so I have actually been wondering what role is there for the humans in that equation… So the fact that it’s flip-flop back, it’s the inverse of what our expectation was - it still means there’s room for a human in the picture, and that’s a little bit of a comforting moment. It may not be what we thought it would be, but there’s still a place, and I think that’s probably a good high note to leave people with.
On the note of things being useful to humans and humans getting involved, we did want to leave you with a few learning resources to explore things related to ChatGPT… Of course, play around with ChatGPT; you can go on the website and interact with it, we’ll provide the link. But also, I would really highly recommend that you look at this Hugging Face blog about reinforcement learning from human feedback. There’s actually a bunch of links in there as well to other things that you can kind of spin off and look at, like the PPO algorithm and other things in there.
Also, there’s good reference. I always love looking back at Jay Alammar’s descriptions of how certain language models work. He has one on GPT-3 and other GPT – actually, a number on GPT from different perspectives…
And then there’s an interesting article on “GPT-3 architecture on a napkin” from a blog Dugas.ch I’ve found it quite interesting how they describe some of the things there.
I like that one as well.
Yeah, yeah. So go ahead and check those out. Those are great learning resources. They’re all free, you can take a look at them and learn in more detail some of the things that we only had 45 minutes to talk about here on the podcast.
And our social media channels… I’m encouraging our listeners to share with us some of the ways they’re using the technology. I’m really waiting to hear that. The more unique, the better.
Yeah, sounds great. Let us know what you’re creating with ChatGPT. It’s been a fun one, Chris. Good to chat with you.
Absolutely. Thank you very much for the incredibly lucid explanation. It certainly helped me understand, and I appreciate it, as always. Daniel, talk to you next week.
Our transcripts are open source on GitHub. Improvements are welcome. 💚