Only as good as the data (Practical AI #282)

All Episodes

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, benchmarks, etc.). They also discuss the latest developments in AI regulation with the EU’s AI Act coming into force.

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join!

46 minutes
Recorded Aug 12, 2024
Published Aug 14, 2024
Download (44MB)
Transcript
🎧 31,023

Featuring

Chris Benson – Website, GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more.

Changelog News – A podcast+newsletter combo that’s brief, entertaining & always on-point. Subscribe today.

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:00	Welcome to Practical AI	00:45
2	00:45	Sponsor: Assembly AI	03:23
3	04:08	Daniel @ GopherCon UK	01:19
4	05:27	Categorizing training data	03:07
5	08:34	Model complexity & data	02:36
6	11:10	Framing data	03:29
7	14:39	When to start from scratch	02:44
8	17:23	Sponsor: Changelog News	01:39
9	19:02	Hold out data	03:03
10	22:05	Public benchmarks	03:49
11	25:54	Categorization overlaps	01:20
12	27:14	The best of both worlds	04:12
13	31:27	Comparing to RAG	02:43
14	34:09	UK AI act	04:16
15	38:25	Unacceptable risks	01:54
16	40:19	How much does it cover?	02:17
17	42:37	When does enforcement start?	01:16
18	43:53	Wrapping up	01:00
19	44:53	Outro	00:47

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another Fully Connected episode of the Practical AI Podcast. This is Daniel Whitenack. I am the founder and CEO of Prediction Guard, and I’m joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin.

In these Fully Connected episodes Chris and I will keep you fully connected with some of the things happening in the AI news and trends, and we’ll also discuss a few topics that will help you level up your machine learning game. How are you doing, Chris?

Doing good. How are you today, Daniel? I think you’re traveling, aren’t you?

Daniel Whitenack

I am in transit, so hopefully the hotel Wi-Fi/hotspot holds out and we can keep it going. But yeah, going over to GopherCon UK, which should be fun to talk to a few Go programmers about AI, and integrating it into Go applications. So that should be fun.

Fantastic. Sounds good. Incidentally, for people who may have just joined the show, that’s actually how Daniel and I originally met way back, was through the Go programming community. So this is a throwback there.

Daniel Whitenack

Yeah, yeah, it should be fun. This is actually my first GopherCon UK, so that’ll be good. Well, Chris, one of the things that had occurred to me, maybe - I don’t know, last week, or this week, or sometime, was seeing a lot of people kind of on general AI threads or on social media talking about how AI is only as good as the data that’s fed into it, or doing AI in the enterprise or in a real world environment is all about data, it’s not about the models, or… Some type of comments like that on social media. Have you seen these?

I have, I have. And I’ve actually been glad to see that, versus all the hype of some of the other topics that we’ve been dealing with over recent times. So let’s get into some data conversation.

Daniel Whitenack

Yeah, yeah. Basically, my thought was, one, what do people even mean when they say something like that? And then second, I think from a practical kind of boots on the ground standpoint, if you’re doing data science, AI machine learning stuff, there’s probably a huge number of types and kind of categories of data that you might run across or have a chance to be exposed to. And so I thought it may also be good to kind of break down and categorize some of those things to give people a little bit of a landscape of types of data or things that they might run across in the AI space, or things that they might even have to curate in their own company.

So yeah, that’s kind of what I was thinking… I guess, on that first point, what do you think people mean when they refer to this “AI is only as good as the data you bring to it”, or it’s all about data? What are people trying to get out there, do you think?

Well, I think it is the constraint around and limitation to the models that you’re trying to build. So when we build AI models, they are self-training; we’re not teaching it what to do. So you’re presenting the data that you want to build the model out of, and the model is only as good as what the data that it’s going to be able to train on is. And so the quality of the data and the robustness of the data is absolutely crucial.

It’s funny, over the last few years there’s been so much hype; we’ve talked about generative AI and stuff… Folks tend to get caught up in the hype, and they tend to think of kind of the AI being on its own… And I think that today’s topic, that’s one of the things we’ve been wanting to bring people around to, is there’s been a certain amount of disappointment and misunderstanding… And at the end of the day, your model is only as good as the data that you’re bringing so that it can train on. And so we’re it’s a moment to get back to basics, maybe leave some hype behind, and recognize that if you don’t get this part of it right, you’re not going to have a very good outcome.

Daniel Whitenack

[00:08:26.20] Yeah. So you mentioned a few things there I’d love to pick apart, which is this idea of - there’s some kind of provenance to a model that has to do with the data. So it may be good to remind people that a model, when we’re talking about an AI model, is really composed of two things. It’s composed of code that executes functions, and adds things together, and kind of essentially does a data transformation. So maybe it’s an image in and a label out, that’s a label, whether it has a cat in the image or not; or maybe it’s text in and a generated next token out. And these are data transformations, and that code that executes those data transformations is written in code, just like normal code, but it includes a bunch of parameters that need to be set. And by a bunch, maybe people are familiar, from seeing models now, that that might be 7 billion parameters, 70 billion parameters, 400 billion parameters. So in order to set those parameters to do that data transformation, there needs to be data that is used to fit those parameters, often called that training process.

Now, one element of this, Chris, is if you imagine like LLaMA 3.1 - which is a recent addition to our world - has whatever 400 billion parameters. You could imagine that maybe you’re not going to fit that many parameters with a small amount of data. And so there’s some relation between the complexity of the model and how much data is needed to fit it. And that may in itself be something that people aren’t quite grasping often, is that the bigger the model you want to use, the more data you need to have to train it, which is why these datasets have got larger and larger.

And I think that’s important to call that out. As people are getting into the idea of training their models, there’s a certain amount of understanding what’s realistic for you and your capabilities and your organization’s capabilities to do up front. And I think that’s why there’s a set of concerns about how you’re going to enter into the process to begin with, which I think you’re covering here. But I don’t think that’s very clear for a lot of people. I think when they use foundation models, when they’re going to go create their own and stuff like that… And I think the data you have available, and the quality of the data, and the amount of data, to your point, about complex models is really crucial to consider up front. If someone is interested in taking their organization forward, how do you start thinking about this, Daniel? How do you frame the whole issue of what data you have, and what you can do with that data?

Daniel Whitenack

I think this is something we’ve highlighted on the show before, but people sort of have this perception that “Oh, we’ve got a bunch of documents in a file store. We’ve got a big database.” We should be able to do AI or do machine learning with that data. And the situation is definitely more complex than that. So I would say that there’s really two things maybe that people need to have in their mind. One is the type of task that you’re wanting to do, which maybe is also related to the type of model that you’ll use. And also, what is the state of the structure of the data that you have?

[00:12:05.15] Let’s give an example. So let’s say that you want to do object detection, which is the task of taking in an image and detecting what objects are in that image. So that’s a computer vision task. You usually require some type of convolutional neural network… And some of this you could search through and find the type of task that you are trying to do, and then the typical model that is used to do that. And you might find - oh, these typical models that are used for object detection usually need thousands and maybe millions of images to train on. So that may trigger in your mind, “Well, first of all, do I have enough imagery to train that model?” If you do have enough data to train the type of model that you’re interested in, then maybe you do that. But oftentimes, what people need to do is fine-tune a model, not train a model from scratch.

So that would be taking a model that already exists, maybe it’s posted on Hugging Face in a repo that is already trained for some type of task related to what you’re doing, so maybe a similar object detection task. And then you continue the training on from that point with your small amount of data.

Now, the second piece to that is, like I mentioned, how structured or unstructured your data is. So if I just have a bunch of images in a file store, that really doesn’t do me any good for that object detection task, because they’re not pre-labeled with labels that I would be able to use to further train one of those models. So another relevant thing here is, is your data unstructured or unlabeled, or structured or labeled? And in the case of training a supervised type model, which is a model that requires some labels to be trained, like an object detection model, or sentiment analysis, or machine translation maybe, then you need those labels in order to further train your model.

So just to kind of recap what I just said there, there was the element of determining what task and what kind of model is needed, and how that maps to your data, and then whether you have enough data or not enough data for pre-training or just fine-tuning. And then finally, if you have labeled or unlabeled data, or structured or unstructured data.

Let me ask you a question here. To your point about computer vision - in my own experience across a couple of different companies, in computer vision projects I’ve used Yolo, which is one of the very common convolutional models that’s out there. And doing that, we’ve had to go through a labeling process, but you’re using Yolo as a foundational model that you’re building upon. Is that, in your thinking about that, if you have a set of maybe a few thousand images that you’re using on Yolo, is there ever a good reason to potentially say “Well, maybe I don’t want to use a foundational model, even though it would require more data to train”? Would I ever want to go create a new computer vision model to work with? Is there ever – how do you think about that? Because that’s come up in topics a number of times, as projects got started and stuff. How do you think about when to go use somebody else’s foundation model versus maybe trying to do something like that on your own?

Daniel Whitenack

Yeah, I think that it kind of comes into this element of how big of a model is needed and how complicated of a problem you’re trying to solve. Certainly for something like object detection, especially if you have a bunch of labels that you’re trying to detect in your imagery - and generally, that task is a fairly complicated task, I guess, in terms of even how we would think about doing that data transformation - then it likely needs a more complicated model, which means more data to train that model.

[00:16:13.27] And in most of those situations, whether you’re talking about object detection, or machine translation, or speech synthesis, or speech transcription, these sorts of tasks, most of the time companies would be much better off doing a fine-tuning and not a training from scratch, either because they don’t have enough data internally to train a model from scratch, or they don’t have it labeled appropriately, or maybe they just don’t have a big enough compute cluster to do that training… So they could benefit then from a pre-trained model that would do fine-tuning on that.

But maybe other tasks, like a sentiment analysis or a forecasting problem, where you’re forecasting a time series, it’s not that you couldn’t do that in the fine-tuning approach, but that it may only take five seconds or five minutes to train that sort of model, and on a small amount of data, like thousands of samples, not millions of samples. And that could achieve your goal very well, and be a small model that you could run performantly. So in those cases, of course, you would be training a model from scratch.

Break: [00:17:24.26]

Daniel Whitenack

Well, Chris, we’ve talked a little bit about training, pre-training… So there’s this first category of data, which is training data, or pre-training data, it sometimes might be called… And this would be the data that you’re taking a model where the parameters have not been fit, an untrained model, and you are doing the first fitting or training of those parameters with this training or pre-training data.

Now, along with that training or pre-training data, of course, you may have tests, or holdout, or evaluation data - this is a second category of data that you might have - which may just be a holdout from that training set. It might be a public benchmark type of test set for a particular task, like machine translation or something like that. Or maybe it’s data that you’re going to use, and have humans review, or something like that… But anyway, it’s a test set, it’s an evaluation set, it’s held out from that training or pre-training.

So you have a volume of data and you’re maybe taking like arbitrarily 20% of that data and setting it aside? Is that what we’re getting at?

Daniel Whitenack

Yeah, yeah. And of course, if you look up like how much you should hold out, or how you should construct test sets or evaluation sets, that’s a very complicated rabbit hole that you could go down. But I would say at the most simple level - yes, you can take whatever your training set is and hold some out, and oftentimes you would do that randomly, so that there’s no kind of… If your data is stratified, meaning it has some structure to it in terms of what comes first and last, then you could randomize that and get a little bit better sample. And that’ll then allow you to train your model or fit your model, and then make predictions on that test or evaluation set, calculate a metric. Maybe that could be accuracy, or F1 score, or in the case of machine translation BLEU or COMET, or in the case of time series forecasting, some mean squared error, mean absolute error type of thing… And that then allows you to gauge “Well, am I doing better than random? Do I have any predictive power to my model?”, I guess.

And it’s relative to that training dataset, specifically. So it’s assuming that the model is accurate against the training set that you had there, or other data that you may introduce later that is very consistent with what you would see in the training set. Is that accurate?

Daniel Whitenack

Yeah, exactly. You want to hold out enough to where you have confidence that when your model sees new samples, that you would likely see in a kind of production scenario, then you’re able to make predictions on those new samples and get ideally some type of predictive power, a result that is useful for the task that you’ve trained on. And there’s public benchmark data for a lot of different tasks as well, if people are looking for that. People might be familiar, if you just search like open LLM benchmarks or leaderboard, there’s a bunch of leaderboards for LLMs, but there’s also public. You may want to search like shared task data… Often this benchmark or evaluation data comes out of peer-reviewed workshop type of scenarios.

[00:22:33.01] So if people aren’t aware, there’s these research conferences in the AI world, and research conferences are the primary way that people publish academic AI research. And at these academic AI research conferences, there’s sometimes things called workshops… And this isn’t – I mean, there’s learning that goes on, but it’s not like a learning workshop like you go to at an industry conference. It’s a workshop to work on specific research problems related to a topic, and then share results together around usually a common shared task. So there might be a workshop for computer vision related tasks, or machine translation related tasks… There’s one called WMT, which always has a shared task around machine translation… And there’s many other types of shared tasks that publish peer reviewed benchmark evaluation data.

And how would you – just going for a moment back to the previous thought process around training data and test data, how does benchmark datasets, how do they fit in to your notion of training and test? Are you using them afterwards? How does that fit into your process?

Daniel Whitenack

Yeah, so if you are doing a task that is the same or very similar to an existing benchmark that’s out there, some or all of the data related to that benchmark may form either test or training data for you when you’re doing your fine-tuning or evaluation. So let’s say, for example, you’re training a machine translation model from English to Arabic. You could go and look at a bunch of different benchmarks related to machine translation, and they’ll have many, many different language pairs, including English to Arabic potentially… And so you may use a portion of that data for evaluating your model, or for adding to your training dataset. However, if you’re doing machine translation, maybe to a new language pair that isn’t represented in any of those academic benchmarks, this would be the scenario where your company’s maybe trying to do something that hasn’t been represented in the academic research world yet, like a manufacturing company that’s wanting to detect defects in certain types of products using a computer vision model. There’s not a shared task for that, other than the fact that there are many computer vision shared tasks.

So a way to think about it in that scenario is if my company is trying to design this new model to detect defects in chips or in other types of products on a manufacturing line, I could go to a shared task and look at “Well, what are the best models that people are using these days for the task of – the relevant computer vision tasks like object detection or something like that.” And so in that case, the benchmark or the shared task data would represent more of a gauge for you, or like a starting point to determine maybe what types of models you want to be considering.

If I’m doing that, is that fine-tuning the model? Is that actually – like, at this point, you’re talking about creating a new model, rather than using a foundation? Or am I misunderstanding that.

Daniel Whitenack

[00:26:05.18] Yeah, good question. So really, this is maybe part of the categorization where there’s overlap, and maybe people tend to get confused… So there’s kind of one set of categories which is related to the type of data and the way that you’re using that data.

So you’re going to be using some data for training your model or pre-training a model, some data for fine-tuning a model, or adapting a foundation model, and some data for evaluating, or testing, or benchmarking a model. So those are the categories of data that you would use in this process. Now, where you source that data could be from public benchmark data, it could be from data that is internal to your company, that likely isn’t public, or it could be a combination of the two. So you want to be thinking both about how you’re using the data and for what purpose, and also being creative with where you might get it from.

Gotcha. As you’re mixing data from your own organization with some of this benchmark data, and trying to align that so that you have the benefit of trying to maybe fine-tune on a model and use benchmark data to drive that, but you’re also trying to introduce your own new capabilities based on data that your company has… How do you get those two sets of data to end up being a high-quality dataset without a lot of differences between it? As you’re trying to kind of get the best of both worlds; taking advantage of that which is already there, but bringing some new capabilities out that your company or your organization can leverage. How do you think about merging those two disparate sets of data so that you end up getting a good training set to do some fine-tuning with?

Daniel Whitenack

Yeah. So I think this really depends on how close the task that you’re really trying to accomplish is – how close that task is to the public data that’s out there. And in certain cases, it may be very close. Like I mentioned, the Arabic translation example - there’s by-text or parallel text data from English to Arabic. There’s a lot of that data out there.

And if that’s a specific task that you’re doing, maybe you are using that in your initial training, and then fine-tuning with your domain-specific data on top of that later on. Whereas other cases, you may just treat that public data as a good starting point, or you might even just look at what models are trained on that public benchmark data in order to understand which foundation model you’re going to use and fine-tune, or which model you could pull off of Hugging Face to then fine-tune, because it was high ranking on this benchmark, which was close to the task that you’re doing. And then you can fine-tune on top of that.

So yeah, remember you’re not always, I guess, doing the pre-training step of this. You may only be doing the fine-tuning step. And also, I would encourage people to think about, I guess, this mix of data within your organization and data outside. So there’s a lot of data on repositories like Hugging Face, that might be useful to your company if adapted in a very specific way.

[00:29:46.04] Let me give a simple example. So there’s a benchmark out there called SQuAD, which is a question answering, extractive question answering type of benchmark. And this was used to train models very specifically for question answering; so not the kind of general-purpose large language models that are out there. But you could take the SQuAD dataset, which has essentially - in the input it has paragraphs of text and some question that’s asked and answered in that paragraph of text, and then paired with the appropriate answer that’s extracted out of that. And so it’s very possible if you’re doing a question answering sort of task, you could test whatever model you’re using on that SQuAD output. Even if you’re using an LLM, you could structure that data in a way that you could test the LLM’s performance on that benchmark. Or you could take that data and structure it into prompts for fine-tuning an LLM. Even though this dataset was made prior to this latest wave of LLMs, it’s still relevant and can be used for various purposes related to even these Gen AI models.

So there’s a lot of stuff out there, and I think it would benefit companies to kind of explore the datasets available in the space that they’re thinking about, and not just write them off if they’re not labeled exactly like they want them labeled, because they still may be useful with some strategic post-processing.

Just to combine, since you brought up Gen AI along the way, and obviously that’s on people’s minds over the last couple of years a lot… We’ve talked on a number of episodes about how popular RAG is, which is Retrieval-Augmented Generation, which lets them take their own data that they have available and use it with a generative model; so they may have an interface to company data, and stuff. Does any of what we’re talking about here apply to that? In terms of - as we’re looking at data, and what they’re using, and data quality, and they’re starting to think, because maybe their manager has come to them and said “Hey, we’d like to use – I’ve heard about RAG. I want to use this.” Is there any overlap in this process, or are they totally separate?

Daniel Whitenack

I think we mentioned this on a few previous episodes, where a lot of times what people consider adding their data to these Gen AI models is not actually even changing the model at all. So it’s not even fine-tuning the model, it’s certainly not pre-training the model… What it’s doing is augmenting the prompts. So using a retrieval mechanism, pulling something out of their data, and injecting it into the prompts of these models. So that’s data that’s being used to augment these models.

Right.

Daniel Whitenack

So if you think about how this fits in, there’s the data that – just trying to pull it all together here. There’s the data that Meta used to pre-train LLLaMA 3.1. We don’t have access to that full dataset, but they use some very large dataset of text to pre-train LLLaMA 3.1. They then used a curated set of prompts to fine-tune LLLaMA 3.1 for instruction following, which is the Instruct version of LLLaMA 3.1. Then you, Chris, could download LLLaMA 3.1 and “add in your own data” using this RAG-based approach. But you’re not actually changing any of the parameters of the LLLaMA 3.1 model, you’re not updating it… It’s just, you’re running it and injecting your model then as a knowledge base or via augmentation.

So in that whole chain of events, you had pre-training data, you had test and evaluation and benchmark data, which was used to benchmark LLLaMA 3.1, you had fine-tuning data, which was used to fine-tune the instruct version… And then you have the knowledge base or augmenting data, which is then injected at runtime to improve the performance of the model. And that kind of gets, I guess, the full chain there.

Gotcha. That was a great explanation for the differences between them.

Daniel Whitenack

[00:34:10.16] Well, Chris, we’ve talked a lot about data… Hopefully, some of that discussion is useful for people in terms of the categories of data in their mind. But there’s also some interesting things – of course, with all of this data being used, there’s the chance of misuses of data, which have always been popular to be talked about, especially in Europe, around things like GDPR, but most recently with this EU AI Act. And I know one of the things that you texted me earlier was the fact that the EU has this AI Act, and it’s enforceable now, or it went into force recently… Is that right?

It did. So it’s been a couple of years in development. It had originally been back on April 21st of 2021 had been proposed by the European Commission. Then the European Parliament passed it on the 13th of March of this year. And then it was unanimously approved by the EU Council on May the 21st… And it has come into effect on August 1st, which as we record, this was less than two weeks ago.

It has a number of provisions, and they are coming into being over a variable time basis. Some are coming into being very quickly, or within the first few months, some of them are going to take as much as three years to come into effect. But it is, in its own right, really the most comprehensive legal treaties of artificial intelligence in the world so far. Obviously, there have been some – we had in the United States, the White House had issued some stuff… But we have not had an AI legal framework get passed through the United States Congress. And so the EU has done that. They’ve done the first very large one, and it’s gotten its fair share of criticism, but it’s done a pretty good job, I think - I say that as a non-legal mind - about trying to address some of the concerns that have been enunciated over the last few years about AI capabilities, particularly in terms of risk, with some risk categories there. Have you had a chance to take a look at some of those?

Daniel Whitenack

Yeah, it’s interesting… This gets to the things that I love thinking about on this podcast, which are the practical sides of this… So for those of you out there in Europe, and likely, as we’ve seen on this show in the past, regulations originating in Europe tend to make their way over to the US, or even kind of broadly to what people tend to do globally… And so me as a practitioner - am I going to be regulated by these risk categories of AI? And so it’s useful, I think, to know them and kind of understand how your systems fit in, and where you’re likely to see some regulatory burden. And yeah, there’s kind of low, moderate, high and band risk. There’s some type of scale like that.

On the low end, you’ve got systems like spam filters, or video games, that don’t really have mandatory regulations, and you could kind of decide if you follow guidelines for that… And it goes all the way up to banned systems, which are things that are unacceptable risks. So maybe using AI systems to provide people with a social score, that would impact their government services, or actually malicious use of AI to influence the behavior of children, or something like that.

Yup. The actual categories, just to call them out, is unacceptable risk, high risk, general-purpose AI, limited risk, and minimal risk. Sorry, I didn’t mean to cut you off. Keep going there.

Daniel Whitenack

[00:38:19.22] No, you’re good. Yeah. So what have you seen in terms of things that – did anything surprise you in terms of things that might have been judged risky, that maybe a good number of people are even exploring right now, that they could face regulation?

They’re really looking at what AI capabilities are trying to – you know, what outcomes are they trying to achieve. And by way of example, we talked about unacceptable risk - it’s that highest category, which are those banned things. And that really comes down to AI capabilities that are seeking to manipulate human behavior explicitly. And those that might use a real-time remote biometric identification. There are certain use cases with things like facial recognition where it could fall into the unacceptable risk, depending on what you’re trying to do with it… And then things – we’ve talked about things like social scoring in China; those types of things where you’re essentially applying a scoring to human behavior, to try to influence how people are behaving. Those types of applications where the AI is making changes to how humans are operating, to their behaviors, are very typical of those that would be found in the banned versions of AI, that would be considered unacceptable risks. And so that’s a good example.

It is important to note as part of that that across the board military and national security applications of AI are exempt from the scoring under this law. Just to note that up front. So obviously, there can be things that happen in a national security or defense context that might be considered a very high risk thing, but because of the nature of what you’re trying to do, it would be allowed. So I wanted to note that early on in the process here.

Daniel Whitenack

How much, Chris, do you think these risk categories - certainly, there’s specific things that the act has in mind. We can actually see examples of things like the social scoring piece in China, and that sort of thing… But other things, like there’s going to be new types of things that are done with AI that maybe weren’t anticipated by the act… So how much do you think these categories will be able to actually capture some of that net new functionality that maybe was not anticipated by the regulators?

[00:40:57.13] I think it does a reasonable job of trying to capture that, because rather than going after specific applications, they hit categories. As an example, the high-risk category, which means that it is something that is allowable (it’s not banned), but it would be highly regulated, and it would typically apply to things, for instance, in healthcare, safety, fundamental rights of people, recruitment, critical infrastructure management, law enforcement, justice… All of those are areas where the law explicitly says “This is an application in which you’re regulated in, because the potential for bad outcomes is legitimately there… Although it is allowed because there’s potential for very good outcomes as well.”

The higher the risk, the more they explicitly are looking to offer the regulation. But conversely, for very low-risk applications, they go down to the point of for minimal risk, having zero regulation whatsoever, if you classify that. So that might be a video game, or a spam filter that you’re playing, where the video game is not trying to adjust behavior or something, but just pure entertainment. They say “This doesn’t need it.”

So they give these broad categories of areas that it might be applied to. I suspect there’s some sort of remediation process where you could say “I might be in healthcare, but this is not a high-risk thing.” So I think some of that’s going to have to play out in the enforcement over the years ahead.

Daniel Whitenack

Yeah, definitely. So when will we start seeing the first news of them cracking the whip and bringing people into line with this stuff?

Well, I think enforcement’s supposed to start fairly early on, and certainly on the risks. And I don’t know if there’s a specific date, or if it started off the bat on August 1st, but I know they were talking about things rolling out in a 6 to 36-month timeline. So certainly over the next six months, by the end of that, we should start hearing about the fact that EU - and I suspect there will be new stories that we’re following, on how different applications that once upon a time they might’ve just done it, and now there may be a legal battle or regulatory battle to get that in.

And that’s now part of the AI landscape, especially if you are either global, or operating primarily in Europe, then this is the new reality. And I suspect, to your point earlier, that some form of this will start taking hold in the US, throughout Asia… All across the world, this will gradually take hold over the next few years as other laws pass, which will probably be somewhat similar.

Daniel Whitenack

Yeah. Well, it is super-interesting to see these things develop, and I’m sure that we will see more and more develop around this in the coming days… And this is, of course, mostly focused around risk, but there’s other types of regulation that we’ve talked about on the show before, even in the US, related to the executive action around AI and other things. So yeah, it’ll be interesting to see how this plays out, Chris… And I look forward to chatting on the show about it here with you.

Oh, I’m sure we’re going to have an episode or two come up where an interesting story comes out on how regulation is applying. So yeah, we’ll probably have some interesting things to talk about in the months ahead.

Daniel Whitenack

Alright, Chris. Well, I hope you have a good rest of your day. Thanks for talking through all the data and AIX stuff with me.

Thanks a lot, man. It was a great explanation. Talk to you next time.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art