Cambrian explosion of generative models (Practical AI #230)

All Episodes

In this Fully Connected episode, Daniel and Chris explore recent highlights from the current model proliferation wave sweeping the world - including Stable Diffusion XL, OpenChat, Zeroscope XL, and Salesforce XGen. They note the rapid rise of open models, and speculate that just as in open source software, open models will dominate the future. Such rapid advancement creates its own problems though, so they finish by itemizing concerns such as cybersecurity, workflow productivity, and impact on human culture.

Changelog++ members save 3 minutes on this episode because they made the ads disappear. Join!

42 minutes
Recorded Jul 1, 2023
Published Jul 6, 2023
Download (41MB)
Transcript
🎧 29,794

Featuring

Chris Benson – Twitter, GitHub, LinkedIn, Website
Daniel Whitenack – Twitter, GitHub, Website

Sponsors

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Typesense – Lightning fast, globally distributed Search-as-a-Service that runs in memory. You literally can’t get any faster!

Changelog News – A podcast+newsletter combo that’s brief, entertaining & always on-point. Subscribe today.

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title
1	00:00	Welcome to Practical AI
2	00:43	Introductions
3	01:16	Chris switching jobs
4	01:36	Animal advocacy and AI
5	02:24	The proliferation of models
6	03:47	SD-XL 0.9
7	06:55	The multi-model approach
8	08:51	Vertical limits
9	13:18	History repeats itself
10	13:55	Sponsor: Changelog News
11	15:28	Open models catching up
12	17:45	Zero Scope XL
13	19:17	How to evaluate
14	24:48	Quantizing your model
15	26:28	7b parameter zone
16	28:13	Mosaic's acquisition
17	32:20	AI in business decisions
18	33:50	We're in the wild west
19	37:32	AI culture
20	38:33	Models are tools
21	41:26	Outro

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another Fully Connected episode of Practical AI. In these episodes, Chris and I keep you fully connected with everything that’s happening in the AI community. We’ll take some time to discuss the latest news, and also dig into some learning resources to help you level up your machine learning game. I’m Daniel Whitenack. I’m a founder and data scientist at Prediction Guard, and I’m joined as always by Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?

I’m doing very well, Daniel. It’s more interesting times ahead of us… You know, I’m thinking about changing jobs. I’m thinking about like a job title called something - I don’t know, Generative Juggler. What do you think?

Daniel Whitenack

[laughs]

Yeah… Because it sounds fun. I mean, I can totally see–

Daniel Whitenack

LLaMA Wrangler?

Oh, I love that! That’s perfect for me, too. I’m all over that. Okay.

Daniel Whitenack

Of course, our listeners know that you’re a big animal advocate… What is an animal advocate’s perspective on the use of all this LLaMA, CaML, all this sort of different usage of animals? Do you find it fun and interesting?

Of course. We should all have animals on the mind all the time. I mean, it makes us better people.

Daniel Whitenack

Yes. Yeah, I’m traveling, and my wife just sent me a picture of our dog laying on the floor in a funny position, looking out of the corner of his eye, so it made me happy going into this recording, so that’s always good.

That sounds good. You know, pet pictures are really important when you’re traveling. My wife does that with me. She’ll send a good moment… In the face of all this technology change constantly coming at us, it keeps our humanity intact.

Daniel Whitenack

Yeah. And it is a crazy time in the AI community, with – so we use these Fully Connected episodes to update people on different news, and that sort of thing… And one of the things I was realizing this week as we were prepping for this episode is I’ve even seen there’s people - and I think there’s a website - talking about the Cambrian explosion of models, or the proliferation of models… So just in the past couple of weeks there’s so many different ones that have come out. It is really a proliferation… So I thought it’d be good to highlight a few of those. We can’t get to all of them, because they’re just so many… But one thing as a tip to people - sometimes how I look at this is I’ll go to Hugging Face and just go to the models tab, and if you make sure that it’s sorted by Trending, that’s kind of a cool way to see what’s at the top. And you can filter by different types of models, but I’ve found it kind of interesting to just look at what’s trending overall… Because, as of now, on the Hugging Face Hub it’s a mix between kind of video generation, image generation, language generation models… And over time, you can see kind of which of those categories is trending up or down. I don’t know, there’s probably an app that needs to be made to track that sort of thing, but I’ll let someone else do that.

One of the ones that I wanted to highlight was the new Stable Diffusion XL 0.9. Also, these model names are getting a little bit more complicated over time, I’ve found… But Stable Diffusion XL 0.9, or SD XL - of course, people probably remember Stable Diffusion. This is an image generation model; so you put in a text prompt, and then out comes an image… So something like “astronaut riding horse on the moon, photorealistic”, or something like that, and you get an image out. This one is kind of interesting… I think it was back in April they announced some kind of private access to this, or beta access… Now the model is up on Hugging Face, it is available, but under only a research-only license. But the images - I don’t know if you’ve seen some of these, Chris…

I’m looking at them now while we’re talking.

Daniel Whitenack

Yeah, you played with Stable Diffusion back with the previous kind of iteration… What is your thought in terms of the progression of this?

Oh, I mean, I remember when we were playing, we were actually doing it on one of our episodes… And we were coming up raccoons all over the place, I remember, at the time.

Daniel Whitenack

Yes, yes.

There were raccoons everywhere… Not just us. There seem to be lots of raccoons coming out of Stable Diffusion, regardless. I was rather wondering about that… But no, I’m looking through some of the things, and just like the imagery has come so far, and the capability and what you can do… And that’s just a few months since we were doing that, so I’m in awe right now, as I look at these shots, as we’re talking…

Daniel Whitenack

Yeah. At least the last time I checked - and this might be different now. If you’re listening to this episode, it might be different. But at the time, today, there’s a blog post about the release, from Stability, and they mentioned that there’s going to be a follow-up more technical deep-dive. I don’t know if it’s a full paper, or just a deep-dive post… But there are some general descriptions of how this is working, and you can dig into it a little bit.

[05:53] So instead of there being sort of one step, or a one model kind of situation in this image generation, apparently this model consists of a two-step pipeline. It’s still diffusion-based, but there’s one model that generates, they say “latency of the desired output size”, and the second step is specialized to generate this sort of high-resolution image. So it’s like an image-to-image model. They combine these, and the second stage of the model then kind of adds finer details to the generated output.

So that’s one interesting thing, which also is kind of interesting - I don’t know if you’ve been following everyone talking about what’s going on, quote-unquote, in GPT-4… But I think there’s a lot of speculation and evidence that also is a sort of mixture of experts, multiple models together, not just a single model call. So I find this trend kind of interesting.

Do you have any thoughts around what is the virtue of having the kind of multi-step, multi-model approach? Do you think that that’s likely to be kind of a general architecture that we see continually, instead of just having the model? I mean, even going back to the Stable Diffusion, I noticed the two models you mentioned, and interestingly, the second model is basically twice the size of the first one in terms of parameters. Any thoughts around the science or math around that, or why you would take that approach?

Daniel Whitenack

Yeah. As you scale up your dataset and you scale up your compute, for a given model size you’re going to get diminishing returns on the performance of that model. So in some ways, given a certain amount of data in a model architecture, what are you going to improve more? You could train for longer, you could train on more data, but at the levels that some of these models are at now - I’m thinking particularly about Open AI - what more can they do right now with respect to training longer, with the same model architecture, more data? So what’s a natural way to improve output but combining multiple models in a pipeline together? Now, I think that you’ll see probably advances in architectures, so different model architectures will continue to come out and maybe break some of that trend.

Another way that you see this kind of multiple models being applied is in things like the RLHF process, which we talked about on the show, the reinforcement learning from human feedback… Which - things like this have been around for quite some time. So GANNs, for example, include two different models - a generative model and the discriminator. These sort of like multi-model workflows that produce an instruction-tuned, or tuned model out the other end, I think we’ll continue to see a lot of that as well, even if the model that’s produced or used for inference at the end is a single inference.

I got one other question before we dive into the rest of the models release. You know, one of the things that was notable was Open AI kind of commented, after GPT-4, that there was only so much vertical growth you could have there, given the dataset - basically, the whole internet - in the model. So you can’t just keep growing them like that. Here we find ourselves in what we’ve kind of described as the proliferation kind of episode, talking about all these models coming out… Do you think part of what we’re looking at today is generated by the fact that when you lose the potential for further vertical growth, because you basically used all the data that’s out there - does that give all of these other model creators a chance to catch up, to some degree? So you kind of had the surging of the leader, but once they hit kind of a barrier there, now you’re seeing many, many catching up and comparing themselves to that. Is that a fair assessment in terms of kind of what we’re looking at now?

Daniel Whitenack

[09:48] Yeah… People probably have seen this post that went sort of viral, which is supposedly a leaked document from Google saying “We have no moat, and neither does Open AI”, and they talk about how basically - I think the phrase they use is “Open source is eating our lunch. We’re not positioned as major players to compete necessarily.” So I think that that’s where that sentiment is probably coming from, wherever that document originated… That would be the sentiment that’s being expressed there.

So the ability to have a foundation model is no longer the sort of moat that separates you… Because now there’s open source models, there’s really good open source models, that maybe the base model - let’s say the base model doesn’t perform as good in a general-purpose way as GPT-4, or something like that. Well, the reality is that in your business environment you don’t need a general-purpose model. That’s usually not what you need. What you need is a model that performs really well for your task. And so in that sense, having a really good open access, whether it’s a language model, or an image generation model, and then having the ability - which we have now - to adapt or fine-tune that model with your own private data, actually it is kind of part of what we’re seeing with this proliferation, I would say.

An example of this is the next model I was going to highlight, which I think is a really good example of this. So I saw this in a tweet – I don’t know the actual day it was released, but the OpenChat model. So if you just go to HuggingFace.com/openchat… So there was a model that kind of outpaced ChatGPT in some benchmarks. So there’s a Vicuna benchmark… That model wasn’t as open, but these OpenChat models are the first open models to outpace ChatGPT (with GPT-3.5) in this benchmark. And what’s interesting is, this is another very much a trend that we’re seeing more and more of, is actually using the closed, proprietary, but really impressively performing models like GPT-4 to actually create data for you to fine-tune an open model, which then performs, or maybe performs better than the closed models, at least in certain scenarios.

So that’s what they did, they used 6,000 conversations generated out of GPT-4 to fine-tune this model, which actually outperforms and is available publicly. And this is what we’re seeing over and over. So there’s other models… Like, people are generating this data for less than $1,000. They’re using the OpenAPI, less than $1,000, to create these models that are really impressive in how they perform.

Now, I think there’s all sorts of interesting implications of that, and part of me wonders, “Well, how is OpenAI going to shift its business model to make that sort of thing less–” or other providers of foundation models. One result of this might be that we see providers like OpenAI, try to prevent usage like this, where you’re just using their API to generate data to create a model that works better for you than using their API… I don’t know, we’ll see.

If you kind of back away for a second and look at the history of this, it’s starting to look a lot like the way software development when open source. If you look back around 2000, or even down into the ‘90s before that, you saw all of these proprietary programming languages - you’d pay for them, you had to pay for environments and stuff like that, and gradually open source overtook it. And from my perspective, it’s feeling a lot of the same right now, as we’re making a shift. I will leave it by saying I’m wondering if to that point about your unknown source document earlier, whether or not that’s kind of an inevitable destination we’re going to.

Break: [13:56]

Daniel Whitenack

We’ve been talking about open source models… Some things that we talked about even like two months ago on this show, like “Someday these things will happen…” I remember us talking about the graph that I think Clem from Hugging Face posted on Twitter, where you’ve kind of got this linear progression of these closed source models, and then eventually, there’s this kind of exponential increase of open models that surpasses the performance of the closed models… And I don’t know if we’re totally in that place yet, but it kind of seems like it’s happening, to some degree. And maybe not in certain ways… So I think still for general-purpose, like “This model can do whatever you ask it to do” - those sorts of use cases, still the closed models are winning, I think. But like I said, how many business use cases do you need a model that does that sort of thing? The majority, you don’t need that. So maybe for the actual proliferation of these models in business use cases, all that really matters is that you can have open models that perform really well for all those business use cases. And that brings up, of course, a lot of other concerns and practical implications…

So open models are great; if they perform better, that’s great. But there is a lot of – I mean, it’s not only that OpenAI or Cohere, Anthropic or whoever who are running these models, that the model is good. They also have a really nice and easy to use API, that generally is up, although I think ChatGPT was down the other night. But yeah, generally it’s good, and well-maintained, and all that. You don’t get that with these open models. You have to figure that bit out yourself, which has other sort of engineering implications and infrastructure implications.

And obviously, going back to someone I know here, there are business opportunities available to help people on board on those things. It’s so much happening right now… As you said two months ago, and now it’s already changing. And I think people need to get used to the new speed of how fast this is happening at this point. Later on I’ll come back to that, but…

Daniel Whitenack

One other one that is trending, at least this week, on Hugging Face is Zeroscope XL. Version two is the one that I’m looking at. But if you just search for Zeroscope, you’ll find it. This is a video generation model, which is pretty cool… So it’s video generation, it produces watermark-free videos… And one of the things I find interesting about – so like this model, the Zeroscope model, and also the Stable Diffusion model that we mentioned a second ago, is you can run these on some sort of commodity hardware. Maybe not the cheapest of commodity hardware, but this model supposedly uses 15, a little over 15 gigabytes of GPU memory, rendering 30 frames at 1024 by 576. So that sort of hardware is definitely within reach for a lot of people, even in platforms where you can access some of that for free, for some time. So yeah, that’s one of the things that I find interesting about some of these models as well.

That’s cool. Yeah, we’re seeing more and more video generation recently. It wasn’t long ago, it was earlier this year that we were talking about kind of moving there, as we were coming into 2023, and the fact that we were expecting it, but it hadn’t really arrived yet… And now it’s already - to your Cambrian point, it has blown up, and we’re seeing multiple opportunities in terms of these models, already in open source versions as well. So how do you – I’m kind of curious, Daniel, how do you as a practitioner, as you’re looking at this explosion of these different options coming at you, how do you make an evaluation? I’ve had people ask me that recently - like, “So much is happening now. I don’t even know how to evaluate one option versus another.” Do you have any thoughts on framing that?

Daniel Whitenack

[19:35] I think there’s a bunch of different axes that you could kind of narrow down your choices along. So let’s say that you have a commercial use case. That alone is a filter by which you can knock out a huge amount of models… Because just looking at the ones we’ve listed so far: Zeroscope - released under Creative Commons, non-commercial . Can’t use it. OpenChat - released under the LLaMA license; can’t use it for commercial. Stable Diffusion XL 0.9 - available only for research; can’t use it. So not that you couldn’t prototype with it or that versions of this wouldn’t be eventually released; or you could access them in other commercial products… But that kind of does narrow down your cases quite a bit… Whereas you look at certain models like the MPT family from Mosaic - released under licenses that allow you to use them for commercial purposes etc.

So that’s an easy one - what is your use case? Are you commercial? Well, that knocks out a whole bunch. Then you have a smaller set, and then I think you need to do a second layer of filtering, which is think about your practical use of this model. So for example, let’s say that I want to use an LLM to extract a bunch of information from a huge number of unstructured documents. I’ve got maybe millions of documents, and I want to extract information from them. Okay, well, if each inference is going to take 20 or 30 seconds for me, and I need to extract a bunch of information, then that’s gonna become a major problem. So then, I need to think about “How am I going to use this, and what are the constraints around the inference speed and the interaction with the model, or the context length that I’m putting in, in the case of large language models? Do I need to put in a bunch of information or a small amount?” And that narrows down to models that are maybe smaller, that can be run faster for inference, or models that support larger inputs.

So there’s those concerns… And then finally, once you get down to that - let’s say you’ve found one that fits your use case and the constraints that you’re working under. Then I think it gets down to the sort of – I guess we could call it old-fashioned, although it’s not that old-fashioned… Create yourself a test set. That’s still the best way to do this, right? If you have even 100-200 examples that you’ve manually labeled as “This is what I would like to go in, and this is what I would like to come out”, then you should just check the output and see what is the accuracy, or how does the output compare? How would I rate these? As failure, or what? That’s still the way to do it.

So the last two minutes is my favorite part of this episode so far; you’ve just put the practical in Practical AI in terms of how to go about actually doing this stuff in real life. So… Much appreciated on that.

Daniel Whitenack

Yeah, of course. Yeah, well, one of those things that was mentioned – well, two things; the licensing and the context link that we just talked about… So for those that aren’t aware, most of these generative models, except a prompt that is some amount of text that is kind of autocompleted, the result is an autocompletion. Most of the large language models that we’re dealing with are autocompletion models, so they predict next words. The image generation one or the video generation, you kind of think of the image or the video as the completion of a prompt as well, because you’re putting in text… But these models generally have a constraint around the amount of text that you can put in as your prompt. Many of the open models are kind of around 2000-ish tokens of input. So for example, you couldn’t put in maybe a whole chapter of a book, or something. That’s not what you could put in there. There are some trickeries that have been introduced, that take a model that was trained on a smaller context length and kind of extend the context length… But something we’ve seen in the past couple of weeks is some really seemingly very powerful models that are open, and are available for commercial usage under their licensing, that support a longer context length, one of these being the Salesforce XGen model. So if you go on Hugging Face, just search for XGen, it’s a 7 billion parameter model with an 8,000 input sequence length, which is obviously quite a bit more than that 2,000.

[24:09]One of the things I find interesting about this model as well, kind of fitting with similar trends that we saw on the other model - this 7 billion parameter is kind of an important piece of it, because 7 billion parameter, once you kind of go beyond that, you lose some of your ability to deploy models on more commodity hardware. And so that 7 billion is a very strategic number, and that’s why you see a lot of 7 billion, 6.9 billion parameter models, is it allows you to kind of run these models on more reasonable hardware, single GPU cards, that sort of thing.

What is the technical distinction there when you exceed the 7 billion parameter? Is this something as simple as kind of like the bus width of data bits going in, or…?

Daniel Whitenack

It’s really the model on fitting into to the GPU memory, and not exceeding it. So unless you want to quantize your model, which we had a whole episode with Neural magic, so I’d recommend people listen to that… That was really cool.

It was.

Daniel Whitenack

But unless you’re very careful… So quantization means each of these 7 billion parameters of this model are some sort of floating point numbers. And most of them, so if you load them in, are not used that much, or you don’t need sort of full float 32 precision to get good output. So one thing people do is they quantize those down to float 16, or even an eight, or four-bit, or whatever.

If you’re not really careful about how you do that, or if you don’t kind of retrain with that precision, oftentimes you lose a lot of performance. So the thing here is like the 7 billion parameter model with these larger cards now that you can get, single cards, even if it’s an A-100 or something like that, that’s fairly expensive, but it’s a single card, and it will fit and run one of these models fine. But if you go to like 40 billion parameters, 60 billion parameters, these larger models, now you’re kind of getting into multi-GPU zone, which makes things much more difficult. So there is a balance here; you can quantize or optimize the larger models and run them on commodity hardware, but it’s not always straightforward how to do that.

Gotcha. So in general, you want to get – if you’re just a practitioner out there, and you’re in a small or medium-sized business, you’re kind of doing it on your own, or with your company’s stuff, kind of focusing in that 5, 6, 7 billion parameters, so that you can be productive, and not escalate costs out of your control. Is that a fair way of looking at it?

Daniel Whitenack

Yeah. I would say basically if you try to work in that 7 billion or fewer zone, your life is much easier infrastructure-wise, I would say. And that probably will also change over time, but I think it’s the reality now. And one interesting thing of the Salesforce thing - I love it when people post this. They posted that the training cost was around $150,000 USD on Google Cloud using TPUs. And this model is released under Apache 2, which is cool for me.

The other one that I mentioned that was the 8k context length was the MPT 30 billion model, which was released recently… But also note there the difference in parameter size. The XGen model from Salesforce supports that context length at 7 billion parameters. And for MPT, you kind of have to go up to that 30 billion… Which the MPT models are really great. I love them, I’ve been using them. But that’s just the differentiation. You could see why maybe Salesforce XGen is trending, because of their focus on this sort of thing.

It’s more accessible.

Daniel Whitenack

Yeah.

Break: [27:56]

Daniel Whitenack

Well, Chris, I think that some of what we’ve talked about here with the open models is quite interesting, because as we already mentioned, we were talking about this a couple months ago, and thinking “Oh, at some point these open models are going to proliferate and kind of take market share, whatever you want to say, from the closed, proprietary models.” And I think we are seeing this trend. One of the evidences that I saw in the news - yeah, I forget if it was this week, as we’re recording this… But it was the acquisition of MosaicML. So Mosaic is the one that created the MPT family of models, which again, I’ve already said, are really great choices if you’re looking for some LLMs to play with… But Mosaic was acquired by Databricks, or “agreed to join”, which I don’t know…

The prices on these things are just astronomical, in terms of –

Daniel Whitenack

It’s crazy. Yeah, so… I mean, it’s public information; at least this one is public information. So a total of 1.3 billion for MosaicML, which has 62 employees. So that’s $21 million per employee.

That’s a valuable employee right there.

Daniel Whitenack

And I was talking to someone about this, and I wasn’t in the strategy meetings with Databricks when they were talking about “Why are we doing this? And how does this position us?” But think about – I remember Databricks and Spark and Hadoop back in the sort of big data days leading into data science days… And really focusing on this Spark sort of thing… And think about that use case I gave earlier, of the data extraction, right? How are people going to do large-scale data processing in the future, or large-scale analytics in the future? Well, there will likely always be data warehouses, and SQL queries, and analytics systems… But there’s going to be a large portion of what people are doing analytics-wise, or kind of big data analysis, quote-unquote-wise, by extracting information or doing reasoning with LLMs. The problem with that is for an enterprise you can’t do that with a proprietary, closed API, because you can’t leak your private data to that API. And it’s not cost-effective to do it anyway, because those charge per token. So how are you going to do that? You’re going to proliferate open models that are trained on your own private data, and make that easier and easier. And that’s what Mosaic’s doing. So I think once you kind of think about that positioning – I don’t want to comment on business strategy necessarily, but that’s how I’ve kind of thought about this, is yeah, that’s the valuable trajectory of where we’re headed.

I think it’s inevitable, because you run in – I know in business I have seen many, many cases where these closed models, and the licenses surrounding them, and the concern about proprietary data - it’s a big challenge for people that are trying to get into them as quickly as possible to navigate that through. It throws a whole bunch of legal concern around it, and then you need guardrails, which slows it down… So it makes perfect sense to go and consume and participate in the open community. And I think just like software, it’s inevitable; business will force us into that direction. So it’s not people doing it out of the goodness of their hearts, it’s people doing it for the betterment of their businesses, because it’s the only sustainable, viable option they have right now. The rug can get yanked out from under you very quickly.

Daniel Whitenack

Yeah.

[31:53] So yeah, I think we had no idea a couple of months ago that we were going to have this conversation now, though. I think you and I probably expect things to happen faster than most people out there, because we’re neck-deep in this stuff all the time… But I don’t even think we realized just how fast that would happen. I’m trying to adjust my own brain for the fact that will keep happening, and will probably accelerate… So we’re gonna have a lot to talk about in the days ahead.

Daniel Whitenack

Yeah. The trend that we’re seeing is happening very quick, that I thought would take much longer with these open models… So if you look back at these decisions that have been made around like funding new GPU clusters for different startups, trying to produce new foundation models… So think about – we saw funding $1.3 billion for Inflection, and the Latent Space guys in their posts on AI engineers have highlighted some of these for Mistral, and other ones, that there’s this sort of hoarding of GPUs which is taking place. But those strategy decisions were made - how long ago? Like two months. And now is having that sort of compute infrastructure - is that the differentiator that is going to make the difference in the business world? I’m not saying it’s not going to be good for those businesses; I think probably they’ll do great things, and be wonderful… But you don’t have to now have this sort of large GPU cluster, with thousands of GPUs to be a player and create value in the marketplace.

So yeah, it’s interesting to also see how the dynamics of funding, and business strategy are kind of getting intermixed with this rapid proliferation, and kind of individual developers, small groups of developers creating these models like OpenChat and other ones.

We’re also seeing – going back to our conversation just a moment ago, about 7 billion parameters being kind of over under decision point, because it changes how you’re going to implement; with that under 7 billion, you’re going to have whole industries focusing on things like that, because they may be working out on the edge, and now they’re looking at in the not-so-distant future, what we might have said “Eventually, it will blah, blah, blah”, now we’re “Let’s go do LLMs on the edge. Let’s go–” I can have a GPU that’s a single board in whatever edge device we’re talking about. And you’re gonna see whole industries pop up around the ability to do that, because you’re within that, as you pointed out, the RAM available on the GPU. So that’s going to create a whole bunch of new business cases.

One of the things in my mind right now is it’s such an explosive kind of Wild West moment for us here. As is always the case, all of the concerns that touch onto these issues such as cybersecurity, such as how it affects your workforce, your productivity, how do you integrate the tooling in, what does it mean for changing business strategy and opportunities - this is all are trailing distantly behind even things like AI ethics, which we’ve covered quite a bit… Legal frameworks in different countries, and different municipalities, and such… How do you catch up all of those things with the fact that we’re having this amazing Cambrian explosion in terms of model availability, accessibility and fragmentation into many different use cases that were not thought of two months ago?

Daniel Whitenack

[35:27] Yeah. You had highlighted the security side of this… I think it’s a really good note, because one thing I’ve seen as you go to, for example, the Hugging Face LLM leaderboard… Let’s say I’m a person, I want to use the greatest open LLM that I can find… Let’s say that, for one – maybe a lot of the licensing causes problems for me, but let’s say all the licensing problems are equal… Then I go to the leaderboard, I click on some of those that are high up on the leaderboard, and the lack of information around the data processing, the training set, the fine-tuning set, the testing and security vulnerabilities, potentially like prompt injection vulnerabilities - all of these things, similar to like, you go to GitHub, it’s the same with open source code, right? You can search for some tool, and it might have a little bit of information in the readme, and you might say, “Okay, great. Import. Solves my problem”, and move on. But that’s a recipe for introducing vulnerabilities into your code.

It is.

Daniel Whitenack

It’s why products like Snyk, which I think is a cool way that I’ve found for developers to deal with that sort of issue on the code side, is analyzing your dependencies to look for known vulnerabilities in open source projects… But there’s nothing like that for LLMs. Which of these LLMs has more hallucinations than another one? Which of them has more toxicity than other ones? Which of them are more prone to prompt injection type of things than other ones? All of that’s not on the leaderboard, right?

And one of the things to also note here is we’re kind of addressing all of the kind of the technical – and I don’t necessarily mean like code technical, but things like the legalities, and documentation, and how do you put in compliance around it, all these things… But I’ve also noticed - and I’m just kind of mentioning it in passing right now, because we can’t delve into it… There’s a huge cultural thing that we’re also trying to digest right now. We’ve talked this year about how 2023 is really the year that it’s been huge in the public’s consciousness. People are using the stuff, and they’re aware they’re using it in many parts of their lives. Things like the ChatGPT app. Everyone’s using it on their phones, and such, these days… I think I’ve had more conversations in the last three months around people trying to figure out not just like the business aspect of “How do I adopt?” but also a lot of fear, and a lot of concern about that. And so I think that is becoming part of what we need to be able to think about from a business strategy standpoint. It isn’t just the cybersecurity, and the compliance, and all these issues, but also “How do you bring the humans along for the ride and get them integrated in as we’re making these massive leaps forward?” So don’t forget your humans in the equation as you try to take advantage of all this amazing LLM goodness.

Daniel Whitenack

Really good point… And I think some of the writing that I wanted to share as our learning resources at the end of this highlights some aspects of those points that you’ve just mentioned… And I’ve been trying to tell people this recently, that the LM or the generative model, the image generation model, in some ways people are thinking about those things like applications… But really, they’re tools that are embedded in applications. So you’re building an application for real people, users, that might make use of a tool like an LLM, or an image generation model… But application development is still part of it, and coding and engineering is part of it, and security is part of it, and your UI/UX around how you interact with your customers is part of it… So that sort of thinking about these things as embedded tools within an application I think is important. It’s one thing that Jay Alammar, who was a previous guests on our show - he has a really great article, which I would recommend as a learning resource, if you’re thinking about this sort of how to create competitive advantage or moats with your AI applications… He has an article called “AI is eating the world”, and he gives some really good analysis of thinking about “Okay, where are there competitive advantages, and where aren’t there?” And he has this really nice diagram of “Models are down here, your application is here.” That’s where you live. The application level above that, maybe in the custom model, like fine-tuning level, and then above that there’s all of these things that are unrelated to the model. Or not unrelated, but are more so like business concerns, right? How is it distributed? What sort of proprietary or sensitive data are you dealing with? What sort of domain expertise do you have that can be infused in your application? etc, etc. Those are the sorts of things that can differentiate you. I’ve found his writing on this very helpful in framing my mind, so I would recommend people look at that.

I like it, in addition, because it reminds us to stay grounded and be practical. And while the world is changing out from under us in so many ways, kind of the workflow of how you think about applications and getting productivity out to people is still largely the same. New tools, and stuff like that, but the same concerns exist. So sometimes maybe you take a deep breath and you go, “I know how to do this.” We’ve been doing this even before this moment.

Daniel Whitenack

Good point. I think that’s a good statement to end with… So thanks for journeying through the Cambrian explosion or proliferation with me, Chris. This has been fun.

That’s right. It’s a space warp here of models flying by us… Good times, Daniel. Thanks a lot.

Daniel Whitenack

Alright. Talk to you soon.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art