Stanford's AI Index Report 2024 with Nestor Maslej, research manager at Stanford's HAI (Practical AI #276)

All Episodes

We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

Changelog++ members save 3 minutes on this episode because they made the ads disappear. Join!

47 minutes
Recorded Jun 26, 2024
Published Jul 2, 2024
Download (45MB)
Transcript
🎧 28,802

AI (Artificial Intelligence)

Featuring

Nestor Maslej – LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

Plumb – Low-code AI pipeline builder that helps you build complex AI pipelines fast. Easily create AI pipelines using their node-based editor. Iterate and deploy faster and more reliably than coding by hand, without sacrificing control.

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:00	Welcome to Practical AI	00:43
2	00:43	What is the AI Index?	01:57
3	02:40	The Institute for Human Centered AI	01:33
4	04:13	Recent indexes	02:15
5	06:28	Main research	03:00
6	09:28	Cost of frontier models	02:11
7	11:39	Growing model scale	02:34
8	14:25	Sponsor: Plumb	01:24
9	16:05	Increasing regulation	04:46
10	20:51	Impact of regulations	02:30
11	23:21	Will models run out of data?	06:38
12	30:00	Finding ROI	04:52
13	34:52	Industry AI risks	03:51
14	38:43	Current perceptions of AI	03:26
15	42:09	Exciting developments	02:59
16	45:08	Check out the AI index	01:08
17	46:17	Outro	00:46

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another episode of Practical AI. This is Daniel Whitenack. I am the founder and CEO at Prediction Guard, and I am joined today to talk about a very interesting report that we’ve talked about on the show today, the AI Index report from 2024, from the Stanford University Human Centered AI Center. I’m joined by Nestor Maslej, who is the research manager at the Stanford Institute for Human Centered AI. Welcome, Nestor.

I’m super-excited to be here.

Daniel Whitenack

It’s great to have you back. As I mentioned, we’ve talked on the show before about the AI Index report in previous years… But for those that haven’t had that background, or listened to those episodes, could you just give a little bit of a soundbite about what the AI Index report is?

Sure. So the AI index is an annual report that currently is in its seventh edition, that aims to tell the story of really what’s going on in AI from a diversity of perspectives. We look at trends in technical performance, so what can the technology do now, that it wasn’t necessarily able to do five years ago? We look at trends in the economy, how are businesses integrating this tool, how much are investors investing in this tool. We look at trends in policymaking; how are policymakers responding to what’s going on in the space? And we don’t just look at those things, we study research and development, ethics, public opinion, diversity… And I think really we aim to be kind of a one-stop shop, kind of an encyclopedia of what’s happened with AI in the last year that policymakers, business leaders, or really anybody that needs to know and understand what’s going on in the space can turn to when they have questions about artificial intelligence.

Daniel Whitenack

Interesting. Yeah. And could you tell me a little bit about the Institute for Human Centered AI, kind of why the Institute is undertaking this, and how it kind of fits into maybe the wider set of things that the Institute does?

Yeah, of course. I mean, funnily enough, very recently we celebrated the fifth anniversary of HAI.

Daniel Whitenack

Congratulations.

So it was founded in 2019, and we’re kind of five years into this wonderful journey. And I think the Institute really exists to try to advance AI research, education and policy in a way that will fundamentally improve the human condition. The creators of the institute I think came together really because they felt that AI could be an incredibly groundbreaking technology, it could be something that could really elevate the potential of humans, but in order to do that, we have to think very carefully about how we actually want to develop some of these tools, and that’s what we spend our time thinking about at the institute.

Realistically, when it comes to AI, the ways in which this tool is going to develop is not only going to depend on computer scientists and how hard they’re working, but it’s going to depend as well on policymakers, on business leaders and on the regular public. So those individuals as well need to be given a tool that allows them to identify and understand how the space is evolving and developing, and that’s what we aim to do at the index. The AI index is the report that we feel can give these individuals the capacity to make the decisions that they need to be making about this technology.

Daniel Whitenack

Yeah. And the index has been published for, like you say, a number of years… And of course, I’m imagining that this last year, or maybe the last couple of years have been kind of interesting on this show, in our conversations, in conversations across the industry… Generative AI has dominated a lot of those conversations, but it doesn’t necessarily mean that that is kind of the quote-unquote AI that is impacting humans in their everyday life.

Yeah. There’s more to life than generative AI. There’s more AIs.

Daniel Whitenack

Yeah, exactly. So how did you come at the report this time around, with the acknowledgement that of course this is transformative, what we’re seeing, in many ways, but also it’s not the old only – you know, when you talk about a report about AI, it’s not necessarily just a report about large language models, I assume.

Yeah, I think that was important for us. I mean, we certainly added some new data points this year on generative AI, because we felt it had kind of come to the surface, and it was something that we needed to chat about… But in a lot of the chapters, we – I think for us it’s incredibly important to, exactly as you said, draw that distinction between foundation models, generative AI, and non-generative AI systems. And for example in the research and development section we have information not only on the number of foundation models that different countries are producing, but also how many notable machine learning models these countries produce… The kind of idea being that it is possible for a machine learning model to be notable, and not necessarily be of the generative type.

[00:05:56.26] Similarly, when it comes to the economy section, we track total AI investments, not just generative AI investments, and we even included a completely new chapter this year on some of the ways in which AI is interfacing and engaging with science. And that touches on a lot of these kinds of developments that relate to some of the ways in which AI is used in non-generative ways, but is still really moving us forward, and leading to a lot of really exciting advancements.

Daniel Whitenack

Yeah, that’s interesting. And to give a sense of how the index is put together, I’m sure people have seen different surveys that are out there, different takes on AI and where it’s at… In terms of the approach of the institute, what is the kind of main research mechanism that goes into the report, and how is it developed, what’s involved, who’s involved… That sort of thing.

Yeah, great question. It’s kind of a two-pronged effort in that we collect data ourselves for particular questions that we feel there isn’t already good data for. So I think we try to be strategic, and we find that if there’s someone else in the research community that is already collecting data that we find to be relevant and interesting, we don’t try to necessarily duplicate their efforts. And as such, we partner with a lot of data vendors like Accenture, GitHub, McKinsey, LinkedIn, StudyPortals… They all collect data that we find to be interesting, and then we work to then include it into the AI Index report.

But again, for certain topics where we feel that there isn’t enough data… For example, we felt that there wasn’t a lot of good data on the number of AI policies and legislations that were being released on national levels; we endeavor to collect some of that data ourselves.

And in terms of the research agenda, it’s set by the AI index steering committee. So we’re very privileged to be advised by a diverse committee of AI thought leaders, people that really are very influential in commenting on what kinds of things are going on in the AI space. People like Jack Clark, who’s one of the cofounders of Anthropic, people like Erik Brynjolfsson, who is arguably one of the world’s leading economists on AI, and people like James Manyika, who leads AI research efforts at Google. We work with these individuals to discuss and identify what kind of topics we want to track, and we figure out as part of the process where each report should be going, and what kinds of things we need to be chatting about.

Daniel Whitenack

Awesome. Yeah, so it’s not just AI researchers that are – or polling people that are pouring into this. It’s actual AI people from industry, academics, but also spread across other areas, like economics.

Yeah, definitely a lot of diverse perspectives. And I think also perspectives from a lot of people that have been in AI for a long time, because we see this with a lot of technologies where AI announced itself in 2022, ChatGPT came out, and then all of a sudden, everybody and their mother became an expert on AI. I think some of the people that we really work with have been in this space for a very long time, and have kind of seen it ebb and flow, so can especially offer a lot of very valuable perspective on where we are in this moment, and kind of contextualize and situate that in a very nice way.

Daniel Whitenack

Yeah. Well, I do want to get into some of the specific points raised in the report, but before I do that, I’d love to ask, especially with these sorts of things, because you’ve been thinking about this deeply, and looking at all the data, and all that… Was there any one particular thing that stood out as kind of surprising or counterintuitive, or just stood out generally in the kind of year over year progress of the report? Looking at this year’s, anything that stood out as surprising to you in particular?

[00:10:02.11] I think thinking about the cost of some of these frontier models. One of the things we did in this year’s report is we partnered with Epoch AI, which is a great AI research institute, to do some estimates on how much it costs to train some of these frontier AI systems. And I think we kind of all knew in the back of our minds that these systems were going to be very expensive… And we eventually crunched the numbers, and it’s one thing to kind of anticipate that, and then you actually see the numbers that GPT 4 is costing close to $80 million, Gemini is costing close to $190 million… And you kind of see this trend line which is just kind of going up, and almost exponentially. So it really kind of puts into perspective how far we’ve come, and I think it also poses a lot of interesting questions about the future; how much further can we go.

I think a lot of these companies are betting very heavily that the scaling laws are gonna hold, that they can continue pumping more and more data into these systems, and that these systems are going to respond with improved capabilities, and even new capabilities. And I think they’re all making interesting bets, and it’s going to be very interesting to see if that’s going to hold… And what the future may look is something that I’m going to really be watching with a lot of open eyes and an interested perspective.

Daniel Whitenack

Yeah, it seems like it’s getting to the point where - and this was something that was also highlighted, that I saw in the report, about industry dominating frontier AI research. And I guess that’s not like – that’s connected in some ways to what you were just talking about, because the… I’m thinking back to my own PhD, and the five of us in our grad student office… It’s no group like that that’s going to go about kind of the training of one of these new models at this scale. Certainly, there have been efforts, I guess, from efforts like Bloom, and others that have brought together researchers from around the world to work on models, in almost like a collaborative way, like a CERN type effort, or something like that, which requires a huge budget… But at least as far as I can tell from the data and what you’re talking about those budgets are increasing, these models are expensive, both to create and to run at scale. And so I assume that that is connected then to that observation of industry kind of leading the front on the research side. Is that fair?

Yeah, I think pretty directly connected. I think a lot of these companies know that feeding more data into this system leads to better performance, or at least it has so far, and I think these companies are betting that that trend is going to continue, so they’re pouring money in and kind of hoping that we’re going to see improved capabilities. And I think part of the reason these kinds of estimates were so eye-popping to me is because even coming into the report I was kind of aware that, as you said, no grad student could build an LLM, train it on their laptop in a way that would kind of compete with some of the language models that these big industrial players release… But we’re kind of getting into territory soon where you’d even have to kind of wonder, of the kind of industrial giants, who can really afford to be building these things. If we’re gonna get into territories where they’re costing a billion if not more… I mean, 100 million is a lot of money, but it’s a lot less money than a billion dollars. And the scale of who can kind of get involved at that level is a lot different and varies fairly substantially, and that has a lot of important implications for how AI research is being done, and what kind of topics people focus on.

I mean, I think industry actors do a lot of really valuable research, and they do contribute a lot of interesting insights, but at some point you need to pay off the investors, and you need to make a product that is commercializable. And that could kind of shift incentives in ways that, again, coming back to this mission element that we talked about with HAI, might not necessarily align with building AI systems that do the most to really further humanity.

Break: [00:14:15.11]

Daniel Whitenack

Well, Nestor, some of the things that you mentioned in the report are related to generative AI, some of them are not… One of the things that I think is interesting about this particular index is talking about some of the things beyond the technology that might impact not only the direction of AI development, but actually does potentially impact practitioners, like some of those listening to this podcast. And one of those things being the kind of sharp increase in regulation, in particular in the United States, as related to AI. I’m wondering, from your perspective, especially as this index has looked at things over a number of years, kind of every year we’re talking on this show about when and how are the regulations coming down. What from your perspective can we discern from this kind of sharp increase in regulation in the US? And maybe the practicality of how that will actually influence developers.

Yeah. So I would say a couple of things. I think first big takeaway for me is that we’re probably going to see a lot more action on the state level - and we already are - than on the federal level. I mean, if you look at the total number of proposed AI-related bills in the United States, on the federal level, on the state level there has been a pretty massive increase in the last seven, eight years, in both counts. But there are substantially more state level bills that have been put into law… I think, looking at my notes here, close to 40 on the state level in 2023, compared to just one on the federal level in 2023. And I think this, again, is not surprising. I think it takes usually a little bit longer for there to be consensus on the federal level… But it might mean that we can usefully look to states as kind of being one of the first barometers of what’s going on with the regulation, and what we might see down the pipe in the federal level.

I think a second point is that when you actually look at the regulatory agencies, they’re all passing more AI-related legislation and AI-related regulation, but the regulation itself is coming from a diversity of bodies. So it’s not just like the Copyright Office is kind of hogging all the attention and passing 30 AI-related regulations. The Executive Office of the President is getting in on the fund, the Department of Commerce, the Department of Health and Human security… There are so many of these regulatory agencies that are thinking very deeply about this tool, and I think this reflects the fact that AI is general purpose; it doesn’t surprise me. But I think it’s also a note that if you think to yourself that “Oh, I’m not a computer scientist, therefore I probably don’t need to worry about how this is going to affect my life”, I would kind of urge you to maybe walk back that assumption… Because regulators across different spaces are starting to become much more actively involved in passing AI-related regulation, and I think a lot of us are going to be affected much sooner than then we would like to believe.

And coming back to the state level point as well, I think you’re seeing now a lot of very kind of hot and contentious debates, especially in California, around SB1047. I think you’re gonna see that in similar states. I think four or five years ago the regulation that we did see was quite kind of expansive, and let’s say a lot lower stakes. It was very often what we would call more expansive regulation I was seeing; let’s explore how we could use AI, or let’s kind of empower AI researchers… Whereas now the regulation is starting to become a lot more restrictive. It’s putting in rules about how these technologies can develop, and how people can use these technologies. And obviously, when you go with that approach, there tends to be, I think, much more debate about getting it right, and that leads to a lot of kind of fiery opinions on either sides of any particular regulation.

Daniel Whitenack

[00:20:14.02] I guess there was a sharp increase in that in this past year that you saw… There’s probably some people out there, as you mentioned, there’s a variety of mixed opinions about this, of whether that actually kind of drives more market share for these closed providers who have already built up some of that market and maybe have some of that influence, and open access, or open source community, that maybe doesn’t have the ability or the willingness to put in a lot of effort into this compliance side.

Was there any thought amongst the group around how not only this would impact – so regulations would come down, and they would be maybe enforced, potentially, in impactful ways, but how that would kind of shape the development of this technology moving forward?

Yeah, I mean, I think that’s an open question. We don’t do a lot of, I would say, predictive work in the index. We try to mostly look at what we know to be true so far. But certainly, I think at least if you look in California with the debate going on with the bill that I had previously mentioned, SB1047, I think yeah, it’s this kind of question of open vs. closed source. I think that particular bill wants to put certain requirements on models that are above a certain compute threshold, and some of these requirements are so stringent that conceivably a lot of open source developers wouldn’t necessarily be able to kind of meet all of them.

And whether or not these regulations are backed for commercial reasons, as you kind of mentioned, some of these industrial players could conceivably have commercial incentives to support regulation that maybe compromises their open source players… But I do think as well there’s a lot of people in the kind of AI safety community that very ideologically believe that AI poses a serious existential risk, and that we should really be hyper-cautious about how we scale. And I don’t think it’s necessarily the position of the index to kind of say we support one position versus another, but I do think what I would say is that it’s now kind of – we’re now getting into a moment where, again, policymakers are starting to think about this, business leaders are starting to think about this, and we really need to think carefully about what we do and what we put into law. Because once it’s law, it’s going to shape incentives, and it’s going to shape how the community develops. And I think there are compelling arguments made by people on both sides. But it’s difficult to know what the future holds, and I think it’s hard to know whether this is the right move or that is the right move, but I think we always have to try to hold ourselves to some kind of standard of intellectual accountability and kind of acknowledge that we can’t always get it right, but we need to try to have as thoughtful and as well balanced of a perspective as possible.

Daniel Whitenack

Yeah, I think that’s a really good perspective. You mentioned when you were answering that about kind of some of these regulations being tied to kind of how AI will develop moving forward… And in particular as tied to the size and scale of models being developed. I was really interested that, you know, there is a portion of the report that is just titled “Well, models run out of data”, which I think is really interesting. We talked a little bit about these models becoming more and more expensive… But of course, there’s an element of this which also is the scaling of the data required to train, particularly these generative models, but other models as well; maybe computer vision models or other models. And that is something that I definitely get the question every once in a while from people that have heard something like “Oh, these models have already sucked in all of the internet of data. What’s left to train on?”

[00:24:20.16] And then you’ve got another perspective that I hear sometimes, which is “Well, now we’re just going to fill up the internet with generated data”, which cycles back into training datasets. So yeah, I’d love for you to bring any perspectives that you have on this question of “Will models run out of data?” I think it’s a question that’s coming up frequently for people.

Yeah, it’s hard to know. I mean, that’s the problem. You want to kind of go on these podcasts and give kind of clear, concise answers, but in a lot of cases it’s quite nuanced. I think there’s reasons to be optimistic, there’s reasons to be pessimistic. I think when you kind of look at optimistic reasons for why we might not run out have data, or why data might not be the bottleneck that we anticipate it to be, I think there are some papers that seem to suggest that synthetic data could meaningfully aid in training AI systems.

I think as well, if you think about it, these language models are substantially less efficient than the human brain. They see millions times more text than any human would in their entire lives, and in some cases can perform better than humans at certain tasks, but it’s clear that there is kind of an architecture of the mind, that being our own architecture that gets the job done with a lot less energy. And plausibly, there is going to be more research being done on algorithmic efficiency that can maybe make it easier for models to perform at higher levels, with less data.

And I think as well, we’re always creating more data. We’re kind of in an era in which more and more data is being kind of manifested and put out into the world. Some of that, of course, is going to be generative and created by AI systems, but there’s also new models like Meta Segment Anything, which makes it a lot easier to get segmentation masks. So AI can be used to also take more data from the world in a way that can help these systems.

On the other hand, it does seem that if you kind of just look at the amount of text stock that we have now, it seems like we might potentially run out of that relatively soon. So I think we cited estimates from Epoch - they’ve since updated those - they predict that kind of in four years we might potentially run out in terms of if we’re going to kind of continue scaling models in the way that we’re scaling the models. There’s also other papers that are not necessarily as bullish on the potential of synthetic data kind of plugging this hole and filling in the gap.

I guess, for me, that kind of really interesting thing is I’m really going to be curious to see what it’s going to be like when GPT 5 comes out, because… I mean, all these models we have now are good, but they’re kind of incrementally improving their capabilities at various tasks. But it still seems to me like they struggle on some tasks like planning, they struggle on some tasks reasoning; they’re still somewhat prone to hallucination. And there are still kind of limits to what they can do. And I guess I would wonder, is scaling a transformer going to be sufficient to resolve some of those problems? Or do we potentially need a new architecture, a new way of building AI systems that could resolve some of those difficulties and resolve some of those challenges? And it’s hard to know, but there’s a variety of perspectives here. I’d be curious to get your perspective on it as well, as someone that kind of thinks about this a lot, too.

Daniel Whitenack

[00:27:44.00] Yeah, yeah, I appreciate that. From my perspective - similar to yours - I think you can look at it like a mixed bag. I think the thing that I would maybe draw on here is that, at least right now, it seems like a promising route forward is not necessarily relying on these foundation models to have all the inbuilt knowledge that is needed to, for example, answer every question, or have every fact, or be able to process any type of input necessarily. I think though certain things – like, I’ve seen this in the progression of like function calling, for example… There was sort of a generation of these models where people figured out that “Oh, we can use these to generate calls to APIs, or to functions.” But none of those prompts were in these sort of fine-tuning datasets. And so they kind of work for that, but kind of not… And now the ones that are coming out now, that have that pre-built into the fine-tuning datasets, they do that much easier, and can extend to a whole variety of function calling.

So I think that these sorts of - not like specific facts or specific knowledge about particular APIs, or these sorts of things, but the ability to figure out what those more general kind of building blocks of robust AI systems are, and building the prompt datasets around those… My view is that there’s going to be a lot more curation of that type of thing. Versus just hoping that increasing dataset size will solve a lot of those problems.

Yeah, it’s a question of efficiency as well, because for a lot of businesses it’s also not super-efficient to be kind of running these kind of very large, very computationally-taxing and expensive models. So it’s not only a question of maximizing performance, but what’s that kind of sweet spot where you get good enough performance and the efficiency is kind of where it needs to be.

Daniel Whitenack

Well, speaking of performance and maybe utility of these models, we were talking a little bit about that, but maybe more generally… One of the things that’s highlighted in the report is standardized evaluations for large language models, or maybe generative models… We also talked a little bit about this in relation to – there was an MLOps community survey about AI quality and evaluation, and I think the results of that showed people still are having some issues figuring out the right evaluation, standardized evaluations, figuring out ROI. Yeah, anything to highlight from your perspective in terms of this evaluation front, and where the state of that is now, or how that’s changed?

Yeah, I mean, I think there’s two things that I would say. So the first is kind of when it comes more to general capabilities, I think one of the things that I’ve seen at the index is I don’t necessarily know if the benchmarks that we have now, which I think are mostly academic, are sufficient for dealing with the realities of AI that we now face, which are industrial. And what I mean by this is that a lot of these benchmarks - and for the listeners that might be unfamiliar, the way I conceptualize it is when a model developer launches a new model (Anthropic recently released Claude 3.5) they’ll test it on a variety of benchmarks. These are tests of what AI systems can do, like a test of grade 8 math problems, and they’ll say “Our model gets a 96% on this benchmark, better than Gemini, or these other models, therefore we have the smartest and most capable model.”

Now, I think the reason the community does this is because 10 years ago AI was strictly an academic problem. It was something that university researchers were thinking about. And I think they wanted to know, on an intellectual level, how could AI think. And these kinds of benchmarks were useful. That’s how you got things like ImageNet. And it was useful then to kind of see how much better we could actually get at these systems.

[00:32:07.10] But businesses, they’re not solving grade 8 math problems, or doing competition level math, as is tested on some of these benchmarks; they’re using these AIs for wildly different purposes. And they behave wildly differently depending on the context. I saw this firsthand… I was editing a report using different AI tools. I would use GPT 4, and I will use Claude. And for whatever reason, I really preferred Claude over GPT 4. I thought that it was a much better copy editor… GPT 4 was sometimes suggesting to me words that I thought were very kind of gauche… And I would tell it like “Don’t use this word”, and then it would it again in two prompts. So I just kind of got a bit frustrated.

But the point I’m making more broadly here is that we have evaluations for these models that test them on things that businesses aren’t really doing. And I think there’s an opportunity there for someone to kind of really identify how can we use these models from a productivity level, and which ones are perhaps the best ones on that front.

The second point that I would make about benchmarks and standardization - we talked about this in the Responsible AI section that was co-written by [unintelligible 00:33:12.23], who’s one of the PhD collaborators at Stanford with us - was kind of looking at how these foundation model developers benchmark their models, juxtaposing general capabilities benchmarks with responsible AI benchmarks. And what you see when it comes to general capabilities is that a lot of these developers, they’re all benchmarking on MMLU, which is the benchmark of general language understanding. They’re all benchmarking on Codex, which is a benchmark of coding capabilities. They’re all benchmarking on GSM8k, the benchmark of math… But when it comes to responsible AI benchmarks, there really is no consensus. Some of them are testing on Truthful QA, others are testing on real toxicity prompts, others still are testing on [unintelligible 00:33:58.25] but really across the map there is no consistency. And it’s not clear to us if this lack of consistency reflects either a genuine belief that these developers have, that, okay, certain responsible AI benchmarks are better than others, or if these developers are merely doing this as a means of kind of juicing their model performance, and they just choose the particular benchmarks that best suit them. But I mean, it is consequential, because we’re kind of now in an era in which AI is being widely used. And when something is being widely deployed, you need some kind of standardized evaluations or standardized comparisons of how these different things function. And it seems at least when it comes to the responsible AI world, we’re not really kind of at a stage where that’s occurring.

Daniel Whitenack

Yeah. And one of the things I see highlighted under that section is also extreme AI risks are difficult to analyze. Could you dig into that a little bit? Because there’s one element of this which is practically in the industry setting, as you were just mentioning… Evaluating performance is maybe complicated, and not standardized, and relies on these benchmarks, which might not be applicable in all cases… But then the other side, there’s these risks, liabilities, harms that are promoted across sort of academic and industry settings, which inform kind of the general discussion about AI and safety, but also for a practitioner maybe they’re thinking “What do I focus on here? What’s an actual is – is this risk that these people are talking about a reality that I need to be considering, or is that sort of just something that should inform long-term development, or something I should consider today?” Yeah, talk a little bit about that, and what you were seeing around the maybe safety and risk side of responsible AI.

[00:36:00.10] Yeah, I think what we kind of meant here is that at least when you kind of talk about these AI risks, there seems to be kind of two categories. The more, let’s say, short-term risks, which are bad things that AI can do now, that we should be paying attention to, like its potential to be biased, its potential to be unfair, or to violate privacy… And more kind of long-term existential risks, which I guess - pardon, my French - refer to the possibility of it killing us all at some point.

I think the kind of challenge here is that with the short-term risks we already see this manifesting in the present. And with the long-term risks - I mean, some of the arguments these people make are theoretically plausible, and you could imagine this happening… But it’s hard to know how plausible it is. There are some people that feel very confident that AI at some point is going to become smarter than us, self-improve and want to take control… There are others that are not as convinced, and think that’s just kind of lunacy. But it’s kind of challenging with these long-term risks, because they’re so theoretical and so in the future… And I mean, even if you could show that these models are sometimes deceptive, that they have the potential of being Machiavellian, it’s still hard to kind of get to these longer-term arguments, because they depend a lot on these kinds of theoretical, argumentative claims that are difficult to actually ground in reality in the future. And I think you see this kind of manifesting very tangibly with this bill, SB1047, where it seems to me that a lot of the people that are in favor of this bill - which again, would impose some fairly stringent safety requirements on models above a certain compute threshold - they seem to be of the belief that AI could pose really serious safety risks. And there’s a desire that “Oh, if we scale up these models even further, we want to be able to kind of shut them down and ensure that they kind of don’t pose a threat.”

Now, if in fact it is the case that these models do have these safety risks, then that seems like a plausible argument. But if they don’t, then you might in the process pass a law that could really cripple the ability of open source developers to create models, and the open source development community is very important for the startup ecosystem and the innovation ecosystem.

So that’s kind of what we mean, when we talk about these risks are difficult to analyze. There’s a theoretical argument, which is somewhat plausible, but how likely is that theoretical argument? That’s, that’s tough to know.

Daniel Whitenack

Yeah. Along with the sort of risk discussion and evaluation discussion, I guess, some of what’s focused on in the report is also the perception of AI within maybe various demographics, but across various segments… And if I understand right, there’s a sort of general pessimism about AI in terms of like its impact on people’s jobs, but if I also understand it right, AI does seem to be making a positive impact in terms of people’s quality of work, and efficiency of work. Could you help us parse through some of those things a little bit in terms of maybe the more kind of general public impact and perception of this technology?

Yeah, I think there’s two things that you’re speaking to here. I think first is the fact that if you look at public opinion data, at least in a lot of countries like the United States, Canada, France, people are very bearish about AI. When kind of asked “Do you feel that products and services using AI have more benefits than drawbacks?”, in those three countries and also in a lot of other Western countries the overwhelming majority of respondents seem to disagree. They don’t think that AI is more beneficial than disadvantageous.

[00:39:57.29] Yet in a lot of other let’s say developing countries like Indonesia, Thailand and Mexico there is much more bullishness. People are very excited and very hopeful. And we don’t necessarily know why that’s the case. That’s obviously something that’s very important to kind of unpack and continue thinking about as this technology develops, and as it rolls out even further.

Now, you mentioned this kind of point about economic advantage. I think what we’re referencing here is that there is now a lot of new studies coming out in really different industrial sectors, whether it’s kind of looking at call centers, whether it’s looking at legal work, whether it is looking at computer science work that shows that workers that use AI tend to be more productive than those that don’t. At the minimum they’re able to do tasks faster, and at the maximum they’re not only able to do tasks faster, but they’re able to submit work of higher quality.

So what kind of explains the disconnect? Well, I mean, I still think we’re kind of in the early days of AI integration; these kinds of studies that have looked at AI’s positive impact were quite microscale. I don’t necessarily think businesses are kind of using this technology en mass yet. And I think second is the fact that it’s hard to necessarily know where we might go, even if AI is productive… Because if you’re working 40 hours a week, and all of a sudden you need to be working 20 hours a week because of AI, you could either use that 20 hours to do perhaps different projects that will lead to more money and more value for your employer, or maybe your employer decides “Hey, we just don’t need you for those 20 hours and kind of scales back what they asks of you.”
So I think the kind of jury is still out on whether the integration of AI technologies is going to lead to widespread automation or augmentation, and I think when you kind of look at narratives of fear, or if you at least try to understand why different people in different countries are as frightened by this technology as they are, I think a lot of it comes down to this element of just kind of uncertainty about what it’s going to do to their jobs and their livelihoods.

Daniel Whitenack

Yeah, thanks for that perspective. As we kind of near the end of our conversation here, I’m wondering, after seeing the results of the index in kind of this round, and also working on it for some time, what as you look forward kind of to this next season of AI development and adoption and integration, as you were just talking about, what do you see as exciting or positive in terms of how things are moving forward, and what’s on your mind kind of looking towards the next – you mentioned not necessarily doing projections, which is fair, but what are you curious about kind of in terms of how things will develop going into this next season?

[00:42:50.12] Yeah, I’d probably say three big things. I think the first one is scaling [unintelligible 00:42:51.23] So we talked about this already, but I think a lot of these companies are making these bets that they’ll feed more data into the systems, they’re gonna get a lot better… And I don’t dispute that they’re not going to improve; I think that the improvements are going to continue. I do wonder by how much, and if in fact the improvements aren’t as great as we anticipate them to be, what might that mean for the economics of AI?

I think number two, looking at how are businesses actually integrating this technology. If you look at literature on a lot of other technological transformations, a lot of economists argue that it typically takes decades from the launch of a technology to the point at which it actually registers positive productivity impacts, largely because very often you don’t have infrastructure that’s necessary to leverage a technology when it kind of appears out of the box. And I could imagine a similar thing with AI. Now, I don’t think it’s going to take decades for AI to kind of have its productivity impact being widely felt, but it will be curious to me to see if businesses start thinking a little bit more critically about how they want to use this tool and how they want to integrate this tool.

And I think third, kind of keeping an eye on what happens in the domain of policy - you talked about what is something that I’m kind of encouraged by or kind of watching out for… I think it’s - yeah, what is the policymaking response going to be? And I’m encouraged, because I know that I think looking back at the example of social media, I think it took us close to a decade from when these tools were launched, to when we started really kind of thinking about them on a political level in terms of kind of regulating them, and ensuring that we built them with the right kind of incentives and in the right kinds of ways. And if you kind of point to 2022 as this moment where AI was kind of officially launched - I mean, we had pretty landmark legislation in 2023 with the EU AI act, as well as Biden’s executive order… So it’s encouraging to me that policymakers are thinking about these things. I guess an open question for me is going to be “Where does the tone of policymaking go, and what comparative priorities do policymakers in different parts of the worlds have when it comes to launching these AI tools and models?”

Daniel Whitenack

Awesome. Well, I look forward to hearing maybe some of those, at least some data that’s indicative of some of those trends in next year’s index.

Yeah, you’ll have to play some clips of some things that I said about the future here next year, and we can kind of revisit how accurate or inaccurate it may have been.

Daniel Whitenack

How well did we do. Well, regardless, I definitely recommend people to – of course, we’ll link the index report in the show notes, so I encourage people to take a look at that. They’ve made it really easy to navigate; you can go by chapter and dig into particular sections, and look at the key takeaways… So yeah, definitely check it out, the amazing work that’s been put together. So thank you, Nestor, for putting in your work to that, and for all the Institute is doing to help keep us informed. I appreciate it very much.

Daniel Whitenack

No, thank you guys for having me so much. It was a great conversation, and hoping we can do this again next year. Take care.

Daniel Whitenack

Sounds great. Yeah. Bye.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art