Generative AI for devs (JS Party #262)

All Episodes

The panel dives into the current hot topic that is Generative AI. They start by defining it (a surprisingly difficult topic), then go into experiences they’ve had, how to get started working with it as a developer, and where they think it will and will not be useful in the near future.

Changelog++ members save 4 minutes on this episode because they made the ads disappear. Join!

60 minutes
Recorded Feb 2, 2023
Published Feb 10, 2023
Download (58MB)
Transcript
🎧 17,738

Featuring

Kevin Ball – Website, GitHub, LinkedIn, X
Amelia Wattenberger – Website, GitHub, X
Amal Hussein – GitHub, X
Nick Nisi – Website, GitHub, Bluesky, Mastodon, X

Sponsors

Lolo Code – If you’re familiar with building severless apps, think of Lolo Code as your backend with a visual editor that lets you think and build at the same time. All this without having to provision or manage servers. Use the visual editor to build your app, connect nodes, and add any npm libraries you need. You can even write your own integrations. This makes Lolo Code very Zapier-ish, but for devs. Try it free today with no credit card required at lolo.co/jsparty

Changelog++ – You love our content and you want to take it to the next level by showing your support. We’ll take you closer to the metal with extended episodes, make the ads disappear, and increment your audio quality with higher bitrate mp3s. Let’s do this!

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:00	Opener	00:36
2	00:36	Sponsor: Lolo Code	01:57
3	02:39	It's party time, y'all	00:50
4	03:29	Welcome, friends!	01:43
5	05:12	What even is generative AI?	05:12
6	10:24	Asking the AI to define itself	01:26
7	11:50	It boggles the mind	05:17
8	17:21	Sponsor: Changelog++	00:55
9	18:16	Today's landscape of AI tools	03:46
10	22:02	How devs can start playing with AI tools	03:47
11	25:48	Poking at the boundaries	04:09
12	29:58	Training on Nick's face 👀	05:49
13	35:47	Challenges & gotchas	02:09
14	38:14	Where we are & where we're headed	06:25
15	44:39	Thinking fast and slow	07:02
16	51:41	Bias concerns	01:51
17	53:32	What we're excited to see	03:17
18	56:48	Wrapping up	01:27
19	58:35	Outro	00:51

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Hello, JS Party people. Welcome to today’s episode on a fun, very relevant online topic of the day. We’re gonna be talking about generative AI. Say that three times fast, generative AI… I’m Kball, I’m your host today, and I am joined by three of my favorite panelists. First off, Amelia Wattenberger. Amelia, hello. Welcome.

Amelia Wattenberger

Hello, thank you. So excited to talk about generative AI.

We are excited to have you here. I think you are probably the foremost expert here, which means you’ll talk the least, but the things you say will be super-valuable.

Amelia Wattenberger

Oh, no… [laughs]

Next up, we have Amal Hussein. Amal!

Hey, happy new year, everyone. Happy to be here.

Happy new year, happy February. We’re already in February.

I know, but it’s like my first recorded podcast of 2023, so I have to say happy new year, Kball.

No, no, it’s good. It’s just blowing my mind that the year is already a twelfth of the way done, and it feels like we’ve just celebrated. Aland then the one, the only, Nick Nisi.

Ahoy-hoy. I know it’s February, but I get these regular phishing attempt emails at work, that are like generated by some company that we pay to try and trick me… And they just didn’t announce that it is time to sign up for Secret Santa. And that was the phishing attack. So it’s February, it’s time for Secret Santa! I don’t know… [laughter] Maybe they should use a generative AI to figure out when to send proper things that might actually trick me.

Maybe they already do, and that’s why they came up with Secret Santa. [laughter] Well, that feeds us right into our topic. Thank you, Nick. And I think we should just start, for those who maybe are a little bit less online and not as plugged into all of the craze of the moment, what even is generative AI? I don’t know, Amelia, is that something you could take a crack at defining?

Amelia Wattenberger

I’ve learned that the more you say here, the stupider you sound, at least for me… [laughs] I actually don’t use the term generative AI a lot. So as far as I understand it, it’s any machine learning model where you give it some input, and it gives you different output, but that could be totally wrong.

[05:52] This is good that we’re clarifying this, because the title of the document is generative AI, and I’m like “What does that mean?” Is that is that the AI that I’ve been using every day, and that we hear about in the news every day? Is that like GPT? Is it generative adversarial network, GAN? That’s another term… I have exceeded my knowledge on all of this at this point. But it is very confusing as to what this specifically means.

I do have to say that these are like the worst possible names to expose to the lay public… Like, talk about leaky abstractions… Like, ChatGPT, and Stable – all these things that really just, given how this is kind of totally… Like, it’s not even like tech bubble anymore, right? This has just like expanded into like my grandma, and my aunts, and like cousins, and my cashier at the supermarket is talking about this… And so it’s nice to maybe have some better names for our not-in-tech brothers and sisters. But…

I will say, I think it’s better having the cashier ask you about generative AI than my cousin calling me asking about NFTs…

You know what, that’s a fair point. Fair point, Kball.

Also we don’t really have a definition of generative AI… And there may not be a great definition. I think the –

I can try, but I don’t know – I mean, I don’t consider myself an expert at all, to be very clear… But I think my interpretation of this would be that it’s AI that can really – it breaks away from just traditional kind of pattern-matching and neural net, kind of “This is a cat, this is a dog.” It can take a series of inputs, which would be prompts, and leverage its extensive training with pattern-matching of things to kind of more creatively return outputs that aren’t so – it’s not so black and white anymore; like, the outputs are now – there’s an element of sophistication and creativity, and like an elevation in the pattern-matching that’s really like, it gives you a thing. A thing that you can use, and a thing that’s more – I think more useful to humans, really. That’s like my lay, like very simple definition, but…

Yeah, I think that gets pretty close to how I had been thinking about it, or how I sort of heard it, which is like a lot of the sort of last wave of AI was essentially classification, or what you called pattern-matching, right? I look at a thing - maybe it’s an image, maybe it’s a pattern in data, maybe it’s something else, and I make a judgment. This is an A, or this is a B; this is a click, or this is a robot…

This is a streetlight…

This is a streetlight, this is – what are some of the other things that the stupid Captchas show you instead of street lights? Like, whatever it is. That sort of like pattern match classification is the last wave, in some ways, of AI. And what makes this new generative AI different is instead of just a classification, it is creating something new; it is generating something that is more than an A/B, Yes/No, label this, label that, but it is actually a label.

Another way I’ve heard of it described is all of this current wave of generative AI could in some ways be thought of as translation. I’m translating from a language to a different language. I’m translating from a question to an answer. I’m translating from one image to a different related image, something like that… Whereas the previous ones were pattern-matching, these ones are translating.

Yeah, I like that translating analogy, but I feel like how do we explain translating analogy when it’s like, “Okay, read this to me like a reporter.” Like, it just feels so much more than just translation, because it’s able to kind of take all of these different inputs, like style of speech content, actual themes… All these things, and really put all of that together to spit out something that actually makes sense, and is actually useful. So in that sense it’s – I don’t even know if there are words, right Kball? Maybe it’s a futile task to even try to really define it, because it’s so new. We need like a new word, sort of like jiggy…

[10:22] Well, it seems like we’re in a qualitatively different place today than we were even a year ago. There’s been like some big breakthroughs. So what have been the things, the changes or the breakthroughs that have been driving this qualitative shift?

Just real quick, before anyone answers that, I appealed to authority, as we do on this podcast, and I just asked the AI how it would describe itself… And it says that generative AI refers to a subfield of artificial intelligence focused on creating new and original content, such as images, music, text, and more, based on a set of rules or a model learned from data.

Amelia Wattenberger

So maybe to tie those two together, I feel like some of the big breakthroughs were – like, we’ve shoved enough data into these models at this point, that it has this kind of new, more logical, logic-based possibilities. It’s not just translating English to French; it has a better understanding of what the concepts are within each of those statements, so it can do things like summarize a lot of texts, or turn it into something completely new. So a lot of it has to do with - you’ll see words like transformer models, and LLMs, so large language models… And I think those underlie just how much information we’ve shoved into these models that we can now interact with.

Yeah, I think that’s the big thing, though… It takes a lot of data to do what it’s doing. And that has been probably the prohibitive aspect of it in the past, and it’s still – only a few companies can actually do that kind of data processing. I couldn’t do it just on my little computer here. But now it’s doing it in such a generic way, if that’s the right term… Like, it’s so vast in the amount of data that it has, that it can pretend to know about anything, because it’s tying all of these nodes together from the 375 billion parameters, or whatever it has.

Can we break down what that means? I see these parameter numbers being thrown out… What does that mean?

I think we should just take this whole podcast and just like ask questions to ChatGPT and just see what does it spit out.

I’m literally doing that… [laughs]

Yeah. See, Nick’s on it.

I have like a very high-level understanding of what I think it is… So if we don’t have a better answer, I can go with that. My sense is what these things – like, one way we could model what these things are doing is you have this massive linear algebra matrix, essentially, that is mapping from “Okay, here’s a set of words, tokens or whatever. Here’s what most likely should come next after that.” And when you say there’s like – I don’t know what it is for ChatGPT; like 175 billion parameters - that means our matrix is 175 billion by 175 billion, like this matrix, and you throw stuff in, and that’s how many different ways it can sort of potentially think about matching those. I don’t know if that’s exactly right, but that’s like the mental model I have when we talk about these parameters. Basically, we’re doing linear algebra, except at a scale that is just absolutely freaking bonkers.

Amelia Wattenberger

[13:50] Yeah, I am not going to accurately depict how this is related, so I’m not gonna say anything, but there’s this concept of the latent space, which is like where do all these concepts live, from the way the model thinks about these things? And it’s massively multi-dimensional, right? If you think of a scatterplot, that’s two dimensions. If you think of a 3D scatterplot, that’s three dimensions. This is like – I don’t know the order, but like millions of dimensions, right? So it boggles the mind, things stop making any sense, and… I forget where I was going, but it was related to what you were talking about, Kball.

Nick, you look like you’re about to share the ChatGPT summary of what it is…

“A parameter is a value or a set of values that defines the characteristics of a system, algorithm or model”, if that helps…

Well, so that strikes me like a lot of the AI-generated text, which is it sounds official and says nothing.

Yes. That is one of the drawbacks.

Yeah. So I kind of want to take a stab at like tying together a few things that I’m hearing from Amelia and Kball, like the linealge analogy, and then this whole –

Linealge… I love it.

Yeah, yeah, Engineering School… Linealge analogy, and then what Amelia was talking about regarding the dimensions and whatnot. So I think if we kind of take a few steps back and just think of like binary as a concept… So everything can be represented in binary, and binary, inherently, the color yellow on a screen, or a picture of a dog, a certain type of breed - they all have their own binary patterns, obviously margins of error, but where you can match and create a pattern. And so if machines can understand ones and zeros, and they have these giant vectors that say, “Okay, this sequence of ones and zeros represents a cat; this sequence of ones and zeros represents the color yellow”, or whatever, really, we take all of those vectors and then scale them up infinitely to kind of say not only does it understand all these different types of the categorization or the classification that Kball was talking about earlier, but now it’s able to also just understand how they relate and how they connect. And for me, all of these things are just like reflective of a human brain, right? It’s just how the mind works.

And really, this is kind of a culmination of what we’ve been really slowly working towards, is to have really sophisticated machines… And all the content on the internet, everything that we’re publishing is to train them and feed the machines, and help us with that creation of that more sophisticated tool. And so it’s incredibly exciting, and for me, this is like a pivotal – not just for me, but for everyone, right? It’s a pivotal moment for us. I’ve been describing it to my family like this is like when the iPhone came out; some people were really, really excited about it, and it started to immediately change people’s lives… But then look at 10 years, 15 years later, how much has your life changed? Almost 20 years now, right? So just – I think we’re at that kind of a milestone, and I think we’re not even… Like, the effects haven’t hit us yet, but they will, drastically, over the next couple of decades.

Amal, you were saying, as we closed out the last session, that this feels like a new beginning. There’s a lot of stuff we’re gonna see… We’re only barely able to see what’s coming. So let’s actually talk a little bit about what is there today that listeners as developers could get their hands on and go. So first off, actually, maybe – we mentioned there’s only a few players that are able to train these massive models. So who are the players? What are the models that are out there for us to play with?

Well, the earliest one, and probably most developer-oriented one is obviously GitHub Copilot. Before that, maybe – I don’t know timelines, but there’s also Tabnine, which is kind of in a similar vein, to where it’s trained on code to give you answers about code in a way that GitHub is as well, or GitHub Copilot…

Amelia Wattenberger

So I think the even bigger whale behind this whale is OpenAI. GitHub Copilot is using the Codex model from OpenAI, and OpenAI has all these different types of models that focus on generating code… GPT-3 is more focused on generating text. They have I think another one focused on images, but I can’t remember; so that’s definitely in my head the player that looms largest in this space. Also ChatGPT.

Is it DALL-E 2 for images?

Amelia Wattenberger

Yeah, they do have DALL-E. Yeah, yeah.

[19:48] Yeah. That’s what ChatGPT was telling me. Real-time feedback here… [laughter] But you have to always question, because that’s the thing - like, it is confidently incorrect a lot of the times. You have to question if it’s actually giving you accurate information.

And then there’s also like Hugging Face is doing a whole bunch of stuff…

Amelia Wattenberger

[20:13] Yeah, so Hugging Face is like a little bit like the GitHub for these models, of people hosting their own models and sharing them, and they have an API to run things on those models as well.

Yeah, and we also can’t forget GPTZero, which is this product that came out to kind of help basically an AI recognize when content has been generated or created by another AI. It’s helping professors and anyone really who is looking to validate that the content that they’ve received was indeed not created by a machine. That’s really funny. It’s like the battle of the AIs has already begun.

Yeah, I mean, it’s really cool. I think you were kind of spot on, Nick, earlier, when you said – I think either Nick or Kball said not every company will have access to this… And I think the economic ripple that that kind of creates is going to be very interesting… Because it’s sort of like the resource war, it’s like “Well, is your house by the water? Great. Iif not…”

So it’ll be interesting how that kind of shapes our economic landscape… But I think what’s tremendously more exciting for me is that, similar to pivotal technologies like the iPhone, I think there’ll be a whole new class of companies, and tools, and for every job that gets eliminated, there’ll be N number more that get created. I think it’s just how do we kind of shift people to kind of change their habits and change their mindsets to understand that this can be an assistive tool for existing jobs, it can be assistive, and then for jobs that may be affected by this, how do we retrain and reskill?

So on that, what are things that people can do now to start playing with it? For example, I wanted to play with Stable Diffusion, and so I went to Hugging Face, and I looked around, and clicked through a few “Yes, I agree that I won’t do this for whatever purpose”, or I don’t even remember what it was I agreed to… Agreeing to your license, and I didn’t care that much, because I just wanted to play with it, and I wasn’t going to do anything with it anyway. I downloaded stuff and started running things with Python. And that was sort of my road into this. What other kind of developer tools, libraries, APIs etc. have you all used or seen that people could start just playing around with?

Amelia Wattenberger

I think most of the ecosystem lives in Python. So there’s an easy route of – they have like Google Colab notebooks, where you get some amount of compute for free, so you can go and like train your own, or fine-tune your own model. So you could say “Here’s some pictures of my face. Make more pictures of my face, but like I’m a robot”, that kind of thing.

For me, a web developer - if I’m feeling lazy, the best thing is to just hit one of these APIs. OpenAI has like a really accessible API, that you can just hit as an endpoint… And every day, it feels like there’s more and more of these APIs that you can you can hit for this… Which I appreciate, because it’s really easy for me to write like a JavaScript fetch call, right? From a web app.

I’m curious what kind of stuff you do with that, like through the API.

Amelia Wattenberger

So going back to our definition of these geni-AIs, if you think about it, there’s – one of the most common use cases is giving it text, and getting text back. You can also give it text and get an image back. And now you can give it text and you can get music back. There’s kind of this translation layer now, of you can give it an image and then get text, and then get music… So there’s this fun building blocks where you could build on them, and it’s getting more and more bonkers by the day.

[24:06] What I’m typically doing is – one thing I’m playing with is a writing tool that summarizes paragraphs, so it’s easier to read the flow. So you could send like “Here’s a paragraph of text”, and then you can end it with a little prompt of “Can you summarize the above paragraph in one sentence?” type thing. And there’s this whole field now of prompt crafting, of “What are the magic words you need to say to get what you want from these models?”

Yeah… Yeah, that’s pretty funny; you saw people like sharing prompts back and forth, all these like cryptic strings of instructions for image generation, and beyond also, with ChatGPT… Yeah, it is interesting. I think that’s also just an interesting thing, because it also exposes people to like inputs as a concept, and thinking about things as like “This is a box, that has like these holes in it, and you have to put certain things in the holes, and the box will spit things out on the other side.” I think that’s so cool, that everyone is being exposed to params as a concept. But yeah, it’s pretty cool.

I think for me – yeah, the only APIs that I’ve played around with are the OpenAI APIs that Amelia was talking about, and previous to this, TensorFlow. TensorFlow JS and a bunch of really cool things that are made available through Google Cloud, and what now feels like very rudimentary kind of workflows, compared to some of the newer abstractions that have come out recently.

So I guess an open question that I have as a consumer of this, or a potential consumer, is really around how do you build a company around something like this, knowing that you can get rate-limited? I think understanding where the boundaries are would be helpful for me as a consumer. I think it’s just such early days, but I think people are so excited… So I think those are some open questions that I have.

Well, and honestly, that’s one of the reasons why the Microsoft-OpenAI partnership makes so much sense… Because when you’re working with – people have a lot of doubts about building a company on random a startup’s API. Nobody thinks twice about building a company using Microsoft, or Amazon, or Google Cloud. And each of those offers all sorts of package services beyond just bare metal or virtual servers, or things like that. So someone who might look at OpenAI and say, “I don’t know if I want to build a company on this third-party thing. We’ll look at the wrapped API within Microsoft Cloud, or Azure”, whatever, and be like “Oh, yeah, it’s just another Microsoft service. No problem. I can build something on top of that.”

Yeah. That definitely gives it the feeling of it’s going to stick around for a while, or it’s going to be somewhat well supported, and not constantly down, or anything, so you can build around it. Still, when I use it as a consumer though, some products that I use already have pieces built in… Like what you were saying, Amelia, with generation – I use a relator service that has this built-in AI thing now that I can read the document, and then I can tell it to ask me questions about the document to see if I retained anything on it. I can also ask it to summarize it, or summarize it as a haiku, or do all these different things… And that’s really, really cool stuff, but at the same time, I’m like “This is cool. Can I rely on this being here in a year?” So I’m still in that, like, “It’s so new. I don’t know if it’s going to stick around.”

For example, ChatGPT just announced their pro service, so maybe it ties into that… Anything that’s free, that I’m not paying for, I’m always skeptical of, of like “I’m obviously the product in this transaction… But how does it sustain itself?”

[28:07] Yeah, I’ve tried to use ChatGPT free about five times. It worked the first time, and every other time it said, “Oh, sorry, we have too much load. I cannot respond to you.” Like, okay… Sure. I understand, you’re early days, you’re growing at ridiculous paces, but I’m not going to build something on top of something that reliable, or unreliable.

I hit those same issues when I was playing around with the API as well, for what it’s worth… It’s just, you’re getting that message in a command-line interface, versus like, you know… [laughs] Yeah.

Amelia Wattenberger

I think this is one of the nice things about – Stable Diffusion is open source, essentially. So you could ostensibly spin up your own server locally, or on AWS, or Azure, or something like that, and you don’t have to worry about other people taking your data, or it going down at any point. And you get to play with the model internals that way. But it’s not easy at this point.

Well, for that one specifically, it kind of is… And I think it’s because – I think I’ve heard that Apple is like embracing it in a lot of ways for running on their various processors and systems. But there’s like a GUI that you can download called Diffusion B, which is a full GUI, you don’t have to write any code, or anything; you install it as a regular Mac app, and it’s doing all of the processing locally on your machine. So I actually have that, and a ckpt file, I think - checkpoint file; I don’t know - that’s trained on my face, that I can add into that… And I’m comfortable doing that knowing that it’s not actually leaving my machine. I’m not training some model on my face outside of that. But it’s really cool that it gives you that privacy, while giving you pretty good results without having to share that with the rest of the world, or be constantly connected.

Something you said there got me to an interesting thing, and something that was – I think it took me a while to kind of wrap my head around with this, which is… You mentioned training it on your face. And that’s one of the things that I think is most interesting about some of these models, is that you can – like, they’ve been trained on huge, huge, huge amounts of data. But you can further train them on your own data, and get much more specific types of results. Like, they have sort of learned the right sets of parameters, and now you can add incremental data, with labels that let you access it in useful ways.

I think, Nick, you may have played with that more than I have… Do you want to – how did you go about that, and what did you get out of it, with training it on your face, as you said?

I got hours and hours of enjoyment from it, which is the main thing. The images aren’t super – I mean, some of them are really, really good. Some of them in the moment looked really good, and I looked back later, I’m like “Does it really look like me?” But it was just endless fun.

A friend and I did it together; we actually did it the, I guess, cheating way, where I just used some service that created that checkpoint file for me, instead of me setting up a whole Python environment and doing all of that… And then got the file back and started using it locally. But yeah, doing it together with a friend, and we were just kind of like messing around and sending – I probably sent like 500 images, and he did the same, back and forth. It was just so, so fun. And I can see utility in it going forward, but I don’t know… Beyond a Mastodon avatar, I don’t know what I’m going to use it for.

[31:42] So just to break down a little bit what it is - you basically feed a bunch of pictures of yourself into the system, you label them as “This is Nick Nisi. This is Nick Nisi. This is Nick Nisi.” And then, now that it has learned, “Okay, Nick Nisi”, and it has a bunch of different versions, you can ask it, you can say, “Hey, Nick Nisi as –” what was the one you were sharing? Sexy lumberjack. Nick Nisi as whatever. And it will generate an image that is you interpreted in that light. Is that fair?

Yeah, absolutely. And with the images that you upload, I gave it 30 images of myself, and you would think, “Oh, you need to have perfect images of your face in different directions”, and things like that. It actually works better if you have super-varying backgrounds, not just like a green screen behind you, or something… But something where there’s a lot of variation in it, except for your face; it will learn your face much better, and then use that going forward. And it puts my face in poses that were not in the image set that I gave it. So it can kind of figure that out; it figures out how to put things together based on other inputs. And a lot of them, I was trying to be Wolverine, or trying to be other things, and it would kind of–

It’s a picture into the inner id of Nick Nisi right there.

Oh, yeah… [laughter] I did this hundreds and hundreds –

Nick wants to be a Wolverine, sexy lumberjack. [laughs]

I just did like all of these comic book characters, but that one stood out in my mind particularly, because that one – some of them look kind of like me, a lot of them look like me plus Hugh Jackman kind of melded together… So you can definitely start seeing the cracks of how it’s like figuring out and trying to mesh these two faces together to make one, and sometimes it gets it better than other times… But it’s a lot of fun.

I think it’s super-cool that you did that locally, Nick, because in all my past experience, everything has to kind of go through a – there’s a big roundtrip to the cloud when doing anything with ML… Including just like working with a Google Assistant, if you’re setting up a Google Assistant, or Alexa app. When you’re developing locally, there’s like roundtrips to the cloud to go and train your data.

I was at Google I/O one year, and I got to see these big GPUs, like the big units that they have in their data centers, that actually are used for compute… And these things are massive, right? I can understand why that lives in a data center, not in your house… But that’s very cool to hear, that okay, things are getting sophisticated enough in their compute, such that they can be leveraged on a “standard developer” machine. That’s really neat.

Yeah. And the images that I was generating are – they look amazing to me; they were a lot of fun. They’re not like –sometimes I have like six fingers, or I have just a weird third arm somewhere. It messes up quite a lot. And there are tools, you can like re-put it in there and tell it to [unintelligible 00:34:41.02] and like draw over that stuff, and do all of that… But it is limited to the model that I have locally, and the processing power that I have as well.

Whereas like, you see other things like Midjourney. I think that’s one that you interface completely through Discord on, which is really cool… And the images that come out of that are astoundingly better, in my opinion. But it’s also going out to some supercomputer somewhere, and not just my little Mac studio here.

When I saw – so we recently talked to Fred K. Schott about Astro, and they did a similar thing where they trained a language model built on top of GPT-3 on their docs for Astro. And now they have this little AI bot that can tell you all about Astro. And this leads me to start wondering - and maybe we should come back to this in the next segment, but what are the opportunities here today, and where do we see them coming in the next couple of years?

So I guess before we get there, gotchas on using the technology as it is today. Amal, you mentioned one - for a lot of times you’re going back and forth from the cloud, especially if you’re using these APIs, and we talked about how flaky the APIs can be. Any other gotchas or challenges y’all have run into trying to develop with these first-generation generative AI tools?

[36:07] I’ll say that I have not really played with the APIs that much, but using ChatGPT as a more intelligent rubber duck for me to kind of bounce ideas off of - you have to be very, very skeptical of anything that it gives you, because it will confidently tell you to do something this way, and then… Like, I just got into like a pattern with it where it would tell me to do something, it would like actually spit out code, I’d use the code, and I’d report back the error. I would just like copy and paste the error that I would get, back into ChatGPT, and it’d go, “Oh, you’re correct. I was wrong.” And then it’ll change it. And it got to the point where it was like “Oh, that’s incorrect. Let me show you this.” And it would give me the exact same code again. It just kind of got stuck in a loop. So that’s where you start seeing the cracks, of it’s not actually that intelligent, but it’s better than a blank slate, in a lot of ways.

It turns out the Turing test is the wrong test, right? Like, the Turing test “Can this fool a human?” Well, it has learned to confidently generate bull***t. Any other challenges or gotchas y’all want to share?

I think just the pipelines and tooling, and just the whole ecosystem around end-to-end kind of local development, and setup, and deployment… I think that’s still being fleshed out. I know there’s a ton of startups that have come up in this area recently, that are kind of focused on pipelines, but I think it still feels very – it still feels a little janky and distorted, and it’s a very much choose your own adventure based on your stack, skills, ability, what services you’re using, and what tools you prefer, and what cloud provider you want to work with, you know… So I think that’s something that’s still a little – it’s bleeding edge, and that edge is still a little bit rough, you know?

Alright, well let’s hop back in and talk about what types of problems we think generative AI is good for today, is not good for today, and where we see it going in just the next even six months to a year, because this field is moving freakin fast. So shall we start with Amelia?

Amelia Wattenberger

So I’ve been following this space for a while, and it’s been really interesting to see it develop. First you have this breakthrough in technology, and it’s like any sufficiently advanced technology is indistinguishable from magic… Like, we don’t understand what its limitations are, so we think it can do anything. So then people are making these ridiculously overreaching products on top of it, that everyone gets all excited about, but they’re like flashy demos, right? So I feel I’m watching the ecosystem slowly start understanding what are the benefits, and what are the drawbacks.

One of the interesting things is this very important thread around security, and ethical use of these things… Like, what makes sense for us to build for humanity, and there’s this interesting distinction between like machines and tools, that – I always come back to you and think about, of like “We don’t want to replace humans”, but these new models are amazing at doing drudge work; like, stuff I do not want to do, like writing tests. I don’t want to write tests. Can the AI help me with that kind of thing? Or how can I use it as a tool, so I’m like kind of working at a control board with like sliders and dials, and not just doing a slot machine thing with “Do this, and then do that, and like do all my work for me.” So I don’t know, I think about that a lot.

[40:04] I like that. It is kind of like a slot machine right now.

Well, and that ties into something that I’ve thought about a lot of [unintelligible 00:40:09.24] I think it’s still at a place where most of the key value props are still going to be human in the loop, right? It’s great for writing some tests, or some boilerplate code, or for summarizing a paragraph, or generating a first draft of something. But it is also so confident in it own baloney that you need a person in there checking it, validating it, improving it most likely… So I think, as a tool, as a part of a creative process, to get past the blank page problem, or to scaffold, or boilerplate, or something like that, it’s phenomenal. To create a polished or complete project - I’d be pretty hesitant.

Amelia Wattenberger

Yeah, that’s a really good point. If we had to list out what are these AIs good at, and what are humans better at, it’s like, they’re very quick. Right? If I asked a human a question, they’re not going to answer as quickly as ChatGPT can answer. And they’re very confident, but potentially wrong, right? So as a brainstorming assistant, they’re amazing. But as – I don’t know, with anything where the accuracy matters… Nick, you were talking about it generating code… There needs to be a way to quickly evaluate that; you just can’t trust it at this point, right?

Yeah, I have to say, I’m always fascinated at how we’re so – our immediate response is always like “Oh my God, we’re going to be replaced.” I really do think we need to lean into this fear a little bit more, and just say “Where’s this coming from?” Because the human brain - there’s nothing that is ever going to replace this thing, especially when we’re able to leverage these tools to solve higher-level problems, and kind of get away from the rudimentary, menial stuff that you all were just referring to, like boilerplate code, and just simple stuff. And so that’s something I think we need to kind of socialize a little bit more… But I bet it’s a completely understandable and natural fear, you know…

Amelia Wattenberger

Kind of related, it reminded me of this Twitter thread I saw, of pictures from orchestra players striking against – they found out new technology to play sound along with the silent movies. And I had no idea that that was like a controversial topic when it came out. But if you think about it, they had these live orchestras to play at the same time as a movie, and that job is just completely replaced, for better or worse. So there’s a lot of interesting parallels; jobs will change, but in what way?

Yeah. We still have horses.

Amelia Wattenberger

We still have horses. [laughs]

This is all fascinating, and it is a really good – it’s what we want a personal assistant to be, in terms of bouncing ideas off of, summarizing things for us… If I was working on a logo for my company - and I’m no designer, but I have an idea…. Or I don’t even have an idea. Maybe I can try and describe what I want, and get it to spit out something super-rough that I would take to an actual designer to kind of polish up and make into something real - those are all tools that you can use, and that are becoming more broadly available to everyone. And that is the fascinating part of it.

Kball, you mentioned in the last segment about Astro’s docs. If you think about Astro, Astro is a difficult thing to potentially search for, and so having a language model that’s trained specifically on how to use Astro the web framework, and it’s not going to tell you about like the Houston Astros, or astrology, or something else… Like, it’s going to tell you exactly what you want based on the framework which you’re trying to get questions on. That’s a fascinating use case. And the fact that they could just spin that up… I don’t know how long it took them, but I assume it was relatively quick… And what other things could you build up? Like, could I make a language model and train it on our company, Confluence, and then just be able to ask it questions, instead of having to go search through that minefield of junk? Like, that is a fascinating prospect, and something that you could potentially do today. It’s just really cool.

[44:39] The one question I have though, as we start to adopt this stuff, is – you know, we mentioned that it has so many billions of parameters; hundreds of billions of parameters right now. Where does it go from here? What is ChatGPT 4, 5, 6? Going from 200 billion parameters to 400 billion - does that give me double the intelligence, or double the however you want to quantify it? Or are we going to reach some kind of plateau where it’s just not going to get any smarter with current technology, or because of some ridiculous compute limits that only they will hit?

I mean, one thing I’d love to see is thinking about the different types of aspects that go into intelligence. Like, if you think about human intelligence, there’s this great book that talks about system one and system two. Thinking Fast and Slow is the book, and it talks about how our brains have two different systems. One is a very fast, pattern-matching type of thinking, and one is a slower, logical, careful form of thinking. And that slower, logical, careful form is much more effortful, but it’s what we use when we do mathematics, or we do logical reasoning, or we do other things like that. When we look at these generative AIs, neural nets, all of the sort of recent wave of artificial intelligence, they all map pretty closely to that system one, like fast processing, pattern recognition, mapping patterns. But what they don’t have is what humans can do, which is we’ll do that pattern recognition, and we’ll do a slower validation. “Does this actually match…?” I mean, sometimes we do; a lot of times we don’t, and we just have that first gut, and then we have our biases, and all these other things that play in. But we have the capacity to assess that first gut response and say, “Is that right? What do I know about this situation that would let me validate that this thing is even plausible, or that my gut reaction is plausible?” I don’t see anything like that in the AI space right now.

What I would love to see is exploration on “How do we take these generative outputs, which may improve as we add parameters or whatever, but that like system one, gut response, fast processing, pattern recognition approach, and apply some way to validate it?”

That’s a great question. Also, Nick, you asked another great question, and so now I have like these two great questions that are stacked in my brain… So just to kind of start with you, Kball - not that I’m even going to attempt to answer Nick’s question, but to start with you… I’m just curious - there’s a few things that are kind of abstracted away for us when we’re looking at these models, and I think one important thing is the confidence metric.

So if you’re working with these tools as a developer, you can set your thresholds for what your confidence bar should be before you return data, right? Or I think it would be great to expose those metrics to users, to end users potentially, so that they can discern what the confidence score, and what thresholds were used when creating this. And I think for people who are curious to kind of use things that are generated by AI for important work, I think that’s kind of like a make or break metric. A confidence score, or just understanding what the thresholds and inputs are… Just kind of like peeling the layer back a little bit from just the answer, but what are the parameters, and bounds around the answer. I think something like that could be helpful. That’s where your analytical, reasonable human brain could come in and say, “Okay, I’m going to now, based on a wider picture, make a decision on how I want to use this output.”

[48:24] Do we have that level of threshold confidence or discernment in a large language model? Because I know that was one of the challenges with neural nets. A neural net - you have no idea how it got to your answer. It’s not like – another machine learning technique I love is random forests, right? And there, it’ll actually give you, “Here’s how we got here. Here’s the decision points.” You can introspect the model in a way that most at least neural nets did not expose. And I don’t know, do large language models have a way to introspect them?

Amelia Wattenberger

There’s a lot of active research on increasing observability into these LLMs. As far as I understand, we’re just picking away the layers at the surface. And then the other kind of funny thing is, if you do the naive thing and just ask it “How confident are you about this answer?”, it is almost always very confident. Like, “Yes, I’m, I’m certain this is correct.” That part doesn’t work, unfortunately.

And that feels like something that is gonna have to improve, or it’s gonna have to know what its confidence level is on things… And it’s not something that we’re as – you know, me on the outside of OpenAI, not internal… ChatGPT is trained up through 2021. What happens when it starts getting trained on data that it generated? Is that feedback loop going to perpetuate some major confidence about something completely incorrect, or is it gonna be able to work against that? I’m sure they’ve got to figure it out. I have no idea what I’m talking about here, but…

I mean, even training on non-OpenAI stuff, right? If they’re training on internet content… Like there’s a reason there’s a famous comic like “Oh, somebody’s wrong on the internet!” Like, there’s a lot of wrong content out there. [laughter]

Yeah, there is. But I think this kind of comes back to like a wider point… I’ve always thought for a long time that hey – this explosion of ChatGPT and generative AI is fairly new, but we’ve been using machine learning in products and technologies for a little while now, in lots of different ways, whether it’s a recommendation engine, or like a natural language processing, or a chatbot, or whatever it is… There’s all kinds of machine learning in products that you use every day, and I’ve always thought that it would be really great to educate users around how their models are trained. And that goes back to Amelia’s observability… Knowing what type of data was used to train it, and giving people facts on how this information was curated? Is there some way that we could kind of – similar to like a Chrome Web Vitals, kind of these three simple metrics… Is there a simple way of communicating to average people, “Hey, by the way, this data came from a machine learning model. These are the metrics, and this is how well the model scored on diverse content.” Was this human assisted or not? There’s ways that I feel it’d be nice to expose that. Does the average person need to know? Probably not. But should they want to know, should they have access to that answer? I think yes.

Yeah, that brings up a really good point, as this starts getting proliferated into other products, and everything. So far, seemingly ChatGPT – I’m talking about GPT specifically, I guess… But it seems like trying really hard not to introduce bias, or it tries to not step into topics that it thinks are problematic, for lack of a better word… But it’s something that we have to constantly be vigilant on, as OpenAI does, but plus us as a public who is starting to trust and adopt these technologies. Because the old ML models that are out there and being used today are ruining people’s lives constantly, in determining credit scores, in determining housing… All of that stuff.

[52:27] There’s a documentary on Netflix, I think it’s called Coded Bias, that talks all about that… And that could have its own level of bias in it as well. But there are – there’s bias in the systems, and it’s trained on the biases of us as humans, because we are biased on a lot of things… And we need to be vigilant to make sure that as these get better and better, they don’t get better and better at ruining people’s lives, and in fact, go the opposite way.

Yeah. Coded Bias is great, everyone should watch it. The person behind that is someone at MIT that I’ve had a chance to meet a few times. She’s this brilliant researcher, her name is Joy… She founded the Algorithmic Justice League, which is based out of the MIT Media Lab, and it talks a lot about how algorithms can exponentiate systemic injustice. It’s an algorithm, so it’s gonna exponentiate, right? If it’s bias, then it’s going to be really, really biased in terms of its impact. Thank you so much for bringing up that really good point, Nick.

Well, we’ve covered a lot of ground, and I’d love to close with just one thing from each of the panelists about something you’re really excited about seeing come out of AI broadly, or generative AI specifically… Something you’re looking forward to, whether it’s a short-time horizon, medium time horizon… Let’s not go too far out there, because then it gets really speculative, but something that you think is likely to happen, and that you’re excited to see.

Amelia Wattenberger

As someone who thinks about interfaces a lot, I have a very biased response of I’m very excited for us to have a more nuanced understanding of what these are good at, what they’re bad at, and how we can interact with them, and get over the “Raw text is the right way to interact with these models.” I don’t think prompt crafters are the magicians of the future. I think we’re gonna build these really interesting interfaces, that have clever ways of - I can draw not with a brush, but with a brush that makes this paint a tree here, or make this more X, or change this in this way, and will give artists and developers ways to kind of like sculpt their crafts like clay, as opposed to “I’m changing the text characters on my screen.” So that is what I’m most excited about.

I’m kind of glad you brought that up, because that’s kind of exactly what I was thinking, in terms of like - you know, the technology is coming; it’s been announced, and there’s like a waitlist for it that I’m definitely on… But I think back to the 2008 movie Ironman, where Tony Stark’s in his basement and he’s designing the Ironman suit with Jarvis, and he’s spinning things around… He’s not coding; you don’t ever see him at a keyboard, coding things. He’s playing with these models, but he’s mostly talking to Jarvis and telling him “Oh, let’s decrease this by that.” And “Oh, slap a little Cadillac red paint in there.” And Jarvis is immediately thinking, “Cadillac red. That’s paint. Okay, he wants it painted this way”, and figures it out and kind of helps him.

[55:36] So it’s this assistive tool where he’s designing it, but he’s not coding anything, he’s not like engineering anything. That’s all happening by his helpers. The machines, the dumb, dumb robot, all of that. And it feels like we’re on the precipice of potentially getting a very early model of that relatively soon, which is pretty cool. Like, just being able to have a conversation, and then get an output that’s based on that. That is amazing, and somewhere I hope that we get to soon.

Yeah, plus a million on that. My answer is gonna be very similar, but with a web slant. I really think we have the opportunity to tremendously scale up our ability to create better user experiences in the products that we build on the web, whether it’s accessibility, design, content management and creation, you name it. The whole gamut. I think every company just got a whole staff, new staff members that can do XYZ, and so I hope we can kind of start leveraging this technology to do more good. That’s my hope.

Yeah. I love that. I’m hopeful that – I think it’s similar to what Amelia was saying, that we will figure out soon what this is good for and what it isn’t, because I think a lot of people are trying to do stuff with it that it is not at all suited for. People are promising the world, there’s hyperbole, there’s all of this mess, and it’s going to cause a lot of harm. And this is a tremendously powerful technology. It has the ability to dramatically improve our productivity, and likely a number of other things… But we need to figure out what it is and what it isn’t.

Sounds like an NFT. [laughter]

You know…

It was only mentioned twice in this entire podcast. I mean, honestly, it’s impressive.

I saw somebody trying to spin that, like “Oh, generative AI and the crypto wave are made for each other, because of this, that and the other.” No. No, they’re not. No. This is a useful technology that we’re figuring out the use for. That is a technology that people have been trying to figure out a use for for the last 15 years, and consistently failed.

I was wondering, was that pitch immediately followed by “And so, invest in this coin”? [laughs]

Almost certainly. “Here’s how you get it.” Yeah. I think that we have – if we’ve circled back to NFT’s and crypto, we have run this topic into the ground. Thank you all for joining me today. This has been fun. And we’ll catch everyone next week.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art