Practical AI – Episode #197
Data for All
with John K. Thompson, author of "Data for All"
People are starting to wake up to the fact that they have control and ownership over their data, and governments are moving quickly to legislate these rights. John K. Thompson has written a new book on the topic that is a must read! We talk about the new book in this episode along with how practitioners should be thinking about data exchanges, privacy, trust, and synthetic data.
Notes & Links
Use the code podpracticalAI19 for 40% off of Data for All, along with all Manning products in all formats!!!
Click here to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How’re you doing, Chris?
I’m doing very well, Daniel. I’m just so happy to actually be online. As I was struggling to actually show up today here, so…
Internet issues… they still happen.
Oh, my gosh… Yeah. When in doubt, reboot, right? So here we are.
You know, data transfer - that’s often an issue, and actually fitting for today’s conversation, because today is all about data, Chris. We’re privileged to be joined by John K. Thompson, who is the author of a new book called “Data for all.” And he’s also written a number of other books, “Analytics teams: Harnessing analytics and artificial intelligence for business improvement”, and “Analytics: How to win with intelligence.” So, John, it’s great to have you with us. We can’t wait to learn all about the data.
So glad to be here, Daniel, with you and Chris. Looking forward to the conversation. Thanks for inviting me.
Yeah, it was super-interesting as I was reading about the motivations for the book and what you’re covering in the book. You talk about how the book provides this vision of how new laws, regulations, services around data work in the kind of time that we live in, but also how we can benefit from data in new and lucrative ways, which sounds great. I’m all about benefiting from data in new and lucrative ways. Could you talk a little bit about why – kind of the motivations and why you thought this was kind of the time to bring in some of these discussions around types of data, how it’s stored, who controls it, what the regulations are etc.?
Yeah, and thanks for the opportunity. As you said, this is my third book. I’ve written mostly about analytics up to this point - how to build a team, how to invest in a team, who to hire, who not to hire, how to structure it, and all that kind of stuff… But I started my career 37 years ago, and I was a programmer, and an analyst, and everything I did just seemed to revolve around data. It was just all data, data, data, all the time. So it just struck me that data was the thing. And I switched my career to be part of the business intelligence, and data warehousing fields… And I did that for decades, and I’ve been thinking about it for a long time.
[04:18] When we were raising our two kids, that are 25 and 23 now, we were always talking to them about “Hey, how’s that game going? What are you doing?” They’re “Oh, it’s free. We love it.” And it’s “No, it’s not free. You’re giving them your information about who you are, and your age, and your behavior, and what your elasticity is, and what your tolerance is for trading this and trading that, and what the price is…” So we’ve always had this conversation over our dinner table about “There’s no free thing. If you think it’s free, then you are the product… Your behavior. You are what they’re selling.”
So I’ve been thinking about it for a long time, and I’ve been part of the data industry for almost four decades, as I said, and a lot of it - Daniel, I know you’re here in the Midwest, I’m in Chicago, you’re in Indianapolis… Chris, I think you’re somewhere in the United States…
I’m down in Atlanta, that’s right.
Okay, you’re down in Atlanta… Well, the whole Midwest is where the whole data world started. So Arthur C. Nielsen, two miles up the road, is the guy that created this entire ecosystem that we live in - the legal, the norms, the way people think about data. And I thought “Nobody really knows this, nobody really understands it, except for maybe a handful of people.” So I wrote the book, so people would be able to understand over the last 100 years why data is thought of as it is, and why it’s regulated as it is, and why we have this really misguided idea that our data is not our own. That these companies that manage it, and move it around, and resell it, and use it, own it. But they don’t. We own it. But now we’re starting to get a legal framework, it’s led by the EU, to where we can actually own our data, we can manage it, we can delete it, we can do things with it.
So the book was – it was just decades and decades of me thinking, “Gosh, this whole thing, this whole area is just opaque and confusing, and people don’t understand it, and there’s got to be some book out there that says, “This is really the way it should be. And this is why it has been like this”, that’s the first part of the book. The second part of the book is what’s happening today, and what does happen with your data, because a lot of people don’t understand what happens with their data when they’re on Facebook, or LinkedIn, or Google, or wherever it happens to be… And then the third part of the book is all the laws and the frameworks and everything that’s coming out of the EU, that’s now spilling over into the United States and the rest of the world… So you can look at it and say, “Okay, I really do want to manage my data. I do want to monetize my data.”
There’s an example in the book where I talk about that if you’re an average user, and you’re on three platforms, and you had the chance to monetize your data, it’s probably two grand to you every year, for doing nothing more than what you do today. And I talked to experts, and they’re all like “Ah, two grand… Who cares? No one wants any money. They just want to have free email and continue on the way they are”, and I’m “Hey, I would like to have two grand a year for doing whatever I do. I’d be happy to get a check for two grand.” Every time I talk to someone, they’re “I would love to have $10.” I don’t understand why the experts are “Oh, nothing should ever change. People don’t care.” But people do care.
[07:49] Yeah. So you do talk about some of the history around this topic in the book… What do you think are some of the main points to stress about that history to help people understand why we got to this point where – yeah, there’s a lot of experts saying “People don’t don’t care about their data”, but there’s also people waking up to the fact that their data is being abused… There’s also this general sense – I get very frequently from my non-technical friends, the thing that comes up in conversation is like “Well, I’m sure Google or whoever’s listening to me, right? Because I said this, and then later on, I see this ad, or whatever.” But there’s a mystery around what is actually collected? Is that actually true? Is it not true? So what are the things kind of in the history of how this has evolved, that you think are important to stress, to give context, I guess?
Sure, absolutely. And I just had that conversation two days ago with my sister. She was like “Well, I was talking to your niece (her daughter) about XYZ, and then all of a sudden, I start seeing it in my Facebook feed, and my Google feed.” And I started asking her, I said, “Well, did you search on anything? Did you type anything into Facebook or Google?” And she goes, “No, I just had the conversation with her on the phone, so I know they’re listening to my phone.” And I’m like, “They’re not listening to your phone. This is not the NSA. This is not the DNI.” We had more conversations and she was like “Well, I did go search for this, and I did go search for that”, and I’m like “Well, there you go. You actually put it into the engine, and your history got modified by the algorithm” or whatever, whatever they’re using there. But anyway, I digress.
So everybody’s talking, a lot of people are talking about this. And the thing that I think is very important for people to realize - and Arthur Nielsen, great guy, created Nielsen, really smart fellow… But precedents in the United States Legal System is a huge deal. And when Arthur struck the deal with these grocery stores, that they would basically transfer all their usage data to him for free, set a precedent. And it went on and on and on for 100 years, and no one really thought about it, and they kept accreting more and more data; media data, and sales data, and radio data, television data… And it went on and on and on. And now some people say, “Well, Nielsen does pay for the raw material.” Yes, they do. I absolutely understand that. I used to work at Nielsen, I know what they do. So yes, they do pay people for the data, but it’s a pittance compared to what they get paid for the data.
So all that’s to say that this precedent that was set 100 years ago still continues today. So people are saying, “Well, my data really isn’t worth anything…” but the world has changed. We have ubiquitous internet, we have broadband, we are always on, we have mobile phones… We’re always contributing… Some people call it digital exhaust, which I don’t really like that term, but we are always contributing our usage data. Think of – do either of you have electric cars?
I do not, no.
Not yet, but my brother-in-law does, yeah.
I have a Mustang Mach-E. It’s not a car, it’s a rolling computer. And it’s generating data 24 hours a day, even if I’m not in it. So we have to realize that we are generating the data. We own the data. This idea, this precedence of giving it away for free must change. And that’s one of the things in the book that I talk about a lot, is that we have a colored or a skewed view of data ownership, that we give away the province of our data to all these companies, and they use it for free.
In the book I talk about, you know, Facebook doesn’t pay for the raw materials that it uses to run its business. And it makes no sense. I mean, Daniel, and Chris, if you went to a builder and said, “Hey, I’d like you to build me a house.” And the builder came back and said, “Well, we’re going to get the lumber for free.” No. Nobody gets a major raw material for free. And my point is that, number one, we have to understand that we own the data, and number two, they should pay for it.
[12:12] So let me ask you a question… You’ve already kind of created the context around it, I think, over the last couple of minutes… But something you said a couple of times earlier, you talked about the EU leading the way. And certainly, there is a certain well-known EU law that I suspect we’re talking about there… But aside from the law itself, I’m curious, why is the EU leading the way, in your view? What is it about the EU that has created that law and has done this where as we have struggled to do that in the United States and elsewhere in the world, and where we have done something that has been in smaller geographic areas, like specific states?
Yes, that’s right. You’re referring to GDPR.
That was put into law six years ago. And GDPR has been a huge success. It has really been a great movement for the people of Europe. And we all know, Britain is no longer in Europe, they’re on their own, they’re outside the EU at this point… So GDPR has been a boon for the citizens of Europe. They can go in, they can access their data, they can delete their data, they can take it off platforms… They can do all sorts of things with it. And based on the success of GDPR, the EU has now passed the DATA Act, the Data Governance Act, and the Digital Markets Act. And all of those acts have been passed, and they are now going into effect. And those laws now put together data pools, data unions, data exchanges… All the structures that I talk about in the book, that if you and I, or any of us want to go to Google, Facebook, Amazon, United Airlines, American Airlines and say, “I want all my data”, they have to give it to you. That’s number one. But number two is it goes on - these data exchanges and data pools are going to be the intermediaries that we work with, that we go in and say – we can withdraw your data. Let’s say that you’re really worried about climate change. Any company that you feel contributes to climate change in a negative way, you can say, “You can’t have my data at all.” You can just say “United Airlines, or Exxon, or Mobil, or Rosneft, or whoever you want to block, you can. But my point is, why block them? My point is, if you’re going to say – the music royalty system is the system that makes the most sense to me when you’re thinking about data monetization… You know, “You may take all my browsing data, and I’ll let you use it. Every time you touch it, you’ve got to pay me a penny, or a half a penny, or a tenth of a penny, or whatever it is.” For these companies, you say “Every time you touch my data, you have to pay me a million dollars.” That sends a pretty strong signal that you really don’t like what they do. And if they pick you up on it and say they want to use your data, either intentionally or by mistake, and they use it four times, they’ve got to pay you $4 million. So you know, stay in the game.
Well, John, I’m really fascinated by this sort of topic and area, talking about like data exchanges, and I guess the infrastructure or the mechanisms by which some of these newer ways of dealing with your data could come about. It actually reminded me… So my brother-in-law works for a company that is sort of an intermediary between farmers and grocery stores… So there’s the raw material, there’s the vegetable, carrots or whatever, and he mediates this exchange between like the actual farmers and grocery stores. I’m wondering, in the data world, let’s say there’s Google, there’s Facebook, there’s whoever wants to use my data, and there’s me who owns the data… At least that’s sort of the shifting mindset that we want to think about. From your mind, how might this sort of data exchange or the other mechanisms that you talked about - where do those sit? Who sort of regulates those, or how might those come about? Is there a current example that you could give or maybe a way forward that you think is probable?
They do exist. They exist predominantly in the UK and in the EU. There’s one that’s very prominent called PoolData.io, and they’re working really hard to have their data exchange be out there. And there’s all sorts of other data exchanges going on right now. Across the United States, we usually see these kinds of structures, and they do exist and have existed for many years, in the area of health, and they’re usually related to cancer or heart disease, but they’re more prominent in the area of rare diseases. You know, people that have got hereditary angioedema, or primary immunodeficiency disease, or hemophilia, or something like that. And these exchanges really allow these people to contribute all their diagnostic data, their clinical data, and maybe even their genetic data. So they do exist, and they do operate, they’re in the United States, they’re around the world. Commercially, they’re mostly in the UK, in the EU right now.
And physically, the way it’s going to work is that when these laws come out - and California and five other states have these laws on the books right now… So you can go in and say, “You have to give me all my data, and you have to delete it.” If you live in Britain, or Denmark, or somewhere in Europe, you can do that. What’s going to happen in the future is these data exchanges will sit in the middle. So Amazon and all the other companies are not going to contribute their data to some monolithic central storage unit; that’s not going to happen. Colossus, or Megalith - that won’t be the case.
[18:12] What’s going to happen is they will still own their data, they will still have their data; we will own our data. And through the exchanges, you will go in and say, “For my browsing data, for my shopping data, for my health data”, whatever you have in there, your airline travel data, you will put a monetization amount on it, and you will say that these companies can or cannot use it. So when those companies go to use the data, they will have to pass through the exchange, they will have to check the yes or no, the opt in/opt out, they will have to understand the monetary value associated with it, and when they go back and use it, they will have to have an accounting system where they rack up the amount of money that they owe you, me and everyone for using that data.
So I have kind of a dumb question I want to ask…
No dumb questions.
I knew you were gonna say that… We’ve leapt forward a little bit, but what exactly constitutes a data exchange? As we’re using the term around – is it always a third-party? Could a social media giant like Facebook or Google or whoever, could they have their own exchange? What’s the difference in those? What does it mean to have a data exchange?
A data exchange is a legal entity created by EU law at this point. And it will happen, it will be created in the United States as well. And a data exchange is a third-party that does just what we talked about - they allow you to come in through an interface, they allow you to set prices, they allow you to set usage policies, and those kinds of things. They cannot monetize data. They cannot accrue, store and sell data. They’re an exchange where they allow you to set your policies, set your prices, stop people from using your data… What they can do is they can reach into systems, they can analyze usage patterns, and they can suggest to you how to best monetize your data, or how best to achieve your objectives. Maybe your objectives are to give all the money that you get from your data monetization usage efforts to a charity, that comes along and says, “Okay, every time I get $100 in my data usage account, or my data monetization account, I want to donate it to the American Cancer Society, or I want to donate it to Ukrainian relief. Or I want it spent over all these areas.” Or you can actually say, “When these charitable organizations use my data, I want to pay them.”
So there’s a little bit of a marketplace that it establishes, and maybe not in a precise across the board – maybe as a very rough analogy, sort of like a stock exchange, where you don’t necessarily know how to price what you’re looking at, but the market that exists in that exchange prices it for you… But in this case, it’s data directly.
Exactly. And you could set your own objectives. So you want to say, “I want to maximize the amount of money that I accrue, because I’m going to take that money myself and spend it.” And it is money. It’s not credits, it’s not units, it’s money. It’s dollars, it’s euros, it’s drachma, yen, whatever it is. So you’re actually piling up money in your account that you can spend.
Now, your other objectives may be “I want to reduce the usage of my data by people who are climate offenders. Or maybe I want to help these charitable organizations understand my activity better.” Or maybe you find a group of people that are like-minded or have the same affinities as you do, and you grew up together, and all your data can only be used in aggregate as a pool. There’s a million different ways you can take this.
[21:56] One of the other things I love about the topics that you cover in your book is actually digging into how data works today, and what that actually looks like. So we’re talking about this sort of monetization or exchange a little bit, but if we shift and think about – like, from your perspective, whether it’s daily interactions with people in your own social circles, or it’s your actual business colleagues who are working on data problems specifically, what do you think are some of the main types of data that people aren’t considering, or the main characteristics of that data maybe they aren’t considering? I know you talk a little bit about fresh or stale, or repetitive, infrequent, episodic, these sorts of things. So from your perspective, what are some of those types of data or characteristics that maybe people aren’t thinking about as much as they should?
One of the things that people do not think about is you’re carrying around your mobile device all the time. And 90% of us, or maybe 80% - I’m making these numbers up - are walking around with location services on. And then we have all these crazy conversations that we’re having in our political sphere right now, about what the government’s going to do, what they’re not going to do, or who’s doing this, and I’m like, “You’re allowing them to track you every moment of the day.” And some people actually sleep with their phone on their nightstand, while it’s on… I’m like, “This is insane. Your actions are so incongruent.” And I take people through – in the beginning of the book, I take them through a very light scenario of what happens with just location services. And that data is hugely valuable; you can do a great deal with it, and we do a lot with it in my day job, and my consulting work, and all sorts of things. And then at the end of the book, I take them through what two years from now will look like with just location services as the foundation. So all these people saying they’re upset about this, or they’re upset about that, I’m like, “Well, just turn your phone off, and you’ll be a lot better off there.”
And then the other thing that we talk about a lot in the book, and I’ve talked about in my other books, and I am a big proponent of, is if you’re an analytical professional, this whole idea of just stacking up one source of data… You know, in neural networks they always show trying to discern between Chihuahuas and muffins. Okay, fine, I don’t know what real application is going to be helpful in understanding the difference between the two pictures, but I get it. So you take a billion images of Chihuahuas and a billion images of muffins and you analyze them. But really, what happens, what we’re trying to get to, and what we are getting to in analytics, is we’re trying to get models to reason as realistically as we possibly can. I try to stay away from the whole AGI concept of Artificial General Intelligence… But we are trying to use many, many, many sources of data and integrate them together. And that’s one thing that people don’t really understand, is that we as analytics professionals are starting to take 3, 4, 5, 6, 7, 8, 9, 10, 12 sources of data, and bring them together and generate features that realistically show us what people are going to do. And we can do a really good job of predicting what most people will do with 6, 7, 8 different sources of data. And that is something that is really going to come into the fore over the next three, four, five years.
So the concept of data - you know, location data, voice data, browsing data, commerce data, driving data, all of that is the true picture, is a real picture of who you are and what you do. And we know that when people describe who they are, they always describe that they eat 25% less calories than they do. They always say that they sleep less than they do. They always say they talk less than they do. Well, we can see what they actually do, and we know how people act.
I was just going to ask you, you have my full attention, because you completely freaked me out a minute ago. So I’m hijacking a short segment of the show here to go back and ask you a question, because I am guilty. You mentioned some people even sleep with their cell phone on the nightstand…
[26:28] No, Chris… No…!
I do. I’m confessing to the audience that I have actually done that, not once, not twice, but pretty much every night. So doing that, in my mind I’m thinking, “I’ve got an elderly mother, I only have a cell phone. I don’t have anything but that. I need to be available, and stuff.” But as you talk about that, that’s a real-life scenario, from my standpoint, and you hit it with a hammer just now. like, if I’m going to be available overnight, in case my mom has an emergency or something, what is it – like, can you talk a little bit about that? Because that’s incredibly tangible. Can you talk a little bit - what if I just sacrificed in terms of my privacy or the data I’m giving up to do that? Because I’m truly like weighing this at this point; my mom’s gonna be horrified to hear that I’m weighing whether her safety is worth it… But please, just for a moment, dive back into that.
Yeah, I mean, we all have these; we’re all talking about that. And I turn my location services off. My net position, my default position is location services off. And at night, I turned my phone off. And I can do that when I’m at home, because I have a landline.
You’ve got the old fashioned one right there beside it. The other one. Okay.
So my family knows if they need to call me, call the home line. I’ll pick it up. Don’t call my mobile phone, because after six o’clock it’s off.
Yeah, I think maybe it speaks to the issue at hand, that one of us on this discussion that’s been an analytics professional for their entire career takes that position, and maybe we’re on a little bit different side; that’s probably worth noting…
I’m just saying, guests don’t freak me out completely most of the time, but I’m kind of freaking right here, okay? I’m thinking, “What have I done?”
I’ll tell you, what I used to – you know, pre-Covid we’d go to cocktail parties and people would ask me what I would do, and I would give them kind of the same description that we’ve been talking about… And they would get freaked out and not talk to me anymore. So when people ask me now, I just say –
We have a show to complete though, you know? You have no choice. I have no choice. We’re gonna do this.
Now I say I take data and turn it into money. That’s what I do.
Yeah, I guess that’s a really interesting point, because you could see Chris’s phone on his nightstand as a moneymaker, I guess, based on our previous discussion. But that’s only possible if he had the opportunity to monetize that data. So I think in terms – I know you talk about different jurisdictions in the book, and such… Maybe for those - you’ve talked a little bit about Europe; what does the landscape look like around the rest of the world in terms of how quickly we’re moving towards this position where we’re able to kind of in a more lucrative way manage our data?
Yeah, the EU will be there within 18 months. Australia will probably be there in about the same timeframe, maybe 24 months; it’s spotty across the United States. California has already got their privacy law, and they are actually following very closely the three laws that I just talked about in the EU. Then we’ve got five other US states that have those laws. And beyond that, you can take a look at where the liberal Western democracies are, and most of those will come up in the next three to five years. You can look at the other countries, the autocracies, and the autocrats and dictators and things like that, and that will probably be never, if they continue with that standard of government, because they just don’t the transparency and the – well, they do like it, if they control all the data; they like it that way. But as far as their citizens being able to monetize their data, that’s not going to happen anytime soon.
John, a couple of the sections of the book that you dive into are trust and privacy… These are two terms that are – I don’t know, Chris; I don’t know what percentage of the conversations we have on this podcast someone uses one of those two terms, but I would say it’s very much terms that come up very often. I’m wondering, John, as you’ve really dug into the state of how data flows these days, how the regulations are changing around data, maybe as like analytics professionals or as AI developers, or as AI researchers, or for professionals in the field like ourselves, what do you think are the kind of practical considerations that we should be thinking about in terms of trust and privacy, as we’re building out – like, “I’m gonna make this AI-enabled app to do X.” What should be those things on my mind related to trust and privacy, from your perspective?
It’s a great question, and I’ve been in this field long enough to know that – you know, when we started out, those many decades ago, we just always did it because we were just trying to sell more bars of soap, or cans of soup, or pizzas, or whatever it was. It wasn’t anything to nefarious and we did have people ask us to do things that crossed the line, that broke ethics, and we just wouldn’t do it. So it was a pretty small community, and we just did what was ethical, and what was the right thing to do. Now we’ve gone to where data and analytics are – the horse is out of the barn. We actually need – and I’ve never been a proponent of this until the last couple years; we need government to step in. We have organizations like Facebook, and people like Mark Zuckerberg, that have no rules, that have no red lines. They just go all over the place.
Mark Zuckerberg’s answer to any problem with Facebook is more Facebook.
I’m actually stealing that from Kai Ryssdal just so that you know.
I’ve heard it. I’ve seen it. I know what he’s saying. Yeah, absolutely. So the reason I dove so deeply and dedicated an entire chapter to trust and an entire chapter to privacy is they are concepts that we talk about a lot, but we generally are not taught what they really mean. I think we understand what the words – you know, the connotative meaning, the denotative meaning of trust and privacy, but when you start to really delve into those concepts and how they relate to human behavior, we could all use a little bit more education than we’re getting. And that’s why I spent so much time in the book on those.
So we as analytics professionals have to be ready, and should welcome government regulation in these areas. It’s required, it’s needed. We’re getting to a point where the folks in data analytics - or some of the folks in data analytics - are really getting into trouble, and causing trouble for us as a society. And we can’t stand that. That cannot happen.
[34:08] In privacy I talk a lot about the need for privacy and secrecy, which is really an interesting concept, and we could spend hours talking about it… But if nothing else, that might be something why you read the book, is to understand the difference between the need for privacy and the need for secrecy.
It’s interesting when we talk about government, because you have the left and the right, and the conversation kind of goes back and forth, depending on circumstances… But I think maybe people can arrive at “Yes, we need government” regardless of which side you’re coming from, because they’ve been so slow to come at all. And I think one of the challenges that we’ve all observed there is, you know, every time we see one of these figures in technology, such as Zuckerberg, or any of the big companies that we’re always talking about, and they testify before Congress or something like that, you see how far behind government officials, various congressmen, senators and stuff are at that point. That’s the big news thing, is one of these figures testifies, and everyone’s like “Oh my God, did you hear the questions that were being asked?” Is that part of the problem, potentially, that there’s such a knowledge difference in this topic that maybe, in some cases, government doesn’t really know what to do? Regardless of which side of the aisle they’re on. Could that be part of the struggle, or would you identify it somewhere else?
No, I think you put your finger on a very salient problem. We’ve got a bunch of octogenarians running the government right now, and most of them don’t even understand how to use a computer. So that is a real problem. But there are people out there, like me and others, who are experts in this field, who would love to serve on a blue ribbon panel to formulate the laws and the rules and the regulations that we need. I’m sure there’s lots of Americans that would love to help.
And then the EU has done a lot of the hard work. I know as Americans we’re loathed to think that anything outside of the United States is better than anything we would ever do… But the fact of the matter is they’ve done a good job over the last eight years in formulating GDPR, they’ve implemented it, it has worked, it has changed the way that we look at data, the way that we do analytics, the way that people can access their data… The three other acts, the DATA Act, the Data Governance and the Digital Marketing Act - those are very nice pieces of legislation, and I don’t think I’ve ever had those words come out of my mouth before. You know, I’ve sat down, I’ve read them, they’re easy to read, they’re clear, they’re concise… Anybody with a high school education can understand them. It’s the way that it needs to go.
I’m wondering – part of me is thinking about this conversation as someone who is producing data, but then another part of me is thinking about this conversation like someone in a business or organization that is using data, right? So there’s one side of it that “I own my data, I would love to benefit on that and maybe make money on that.” I certainly see that. And then I’m thinking, “Oh, well, if I’m thinking that, and I’m a person in a company that wants to actually build a model or an analytics system or something using that data”, that changes how that business entity then thinks about its strategy of building that product.
So from your perspective, maybe shifting to that other perspective… So if I’m sitting in the company, and I see, “Okay, well, these things are changing; people are going to be able to exchange their data for money.” There’s going to be this exchange. How, from your perspective – should we start shifting our thinking as analytics professionals or AI professionals to how we would approach maybe architecting our systems, or how we would approach starting out a project, and how we’re thinking about data on that project, that sort of thing?
[38:14] Yeah, that’s a great question, Daniel. If you are doing analytics the way that I’ve been doing it for decades now, you don’t have to change anything. I’ve been part of consulting firms and software firms and services firms, and now I’m part of a biopharmaceutical firm… There’s lots of data inside those companies that you don’t have to pay for; you’re part of the company, you get that for free. But the other data that you are going to use, and that you use today, and that we use today, that you’re going to have to augment and want to augment to get to that 10, 12, 13 sources data I was talking about earlier - you’re gonna pay for all that data anyway. So you’re going to pay somebody for that value-added data. And in the future, you’re gonna pay somebody, it’s just gonna be a different somebody, that’s all. So now, we really don’t have to think about it in any different way. You may have to budget a little bit more money for it, but it doesn’t dramatically change the way you do things.
I have a follow-up to that real quick, if you don’t mind. Would it be right to think – we think of stores of value in terms of money, and we’ve been talking about money in recent years, we’ve looked at cryptocurrencies and we’re starting to think of those as stores of value and forms of currency themselves… Should we be thinking of data in a direct way? Because we’ve kind of talked like one step removed so far, but is data money in the way that we should be thinking going forward?
It is. Data is money, there’s no doubt about it. Data is cash. You’re either going to pay for using it, or you’re going to use it to generate value on the backend. It is that way. Daniel touched on it lightly earlier in the conversation - most people think of Google as a search engine. And they are, there’s no doubt about it. It’s the most popular search engine by far in the world. But they’re a huge data shop. They’re a huge advertising organization. And in my day job, we buy data from Google all the time. We go through the B2B interface of Google and we buy their geolocation data, we buy travel data, we buy advertising… We buy all sorts of things from Google. It’s just the way it is, data is money.
It’s triggering so many things in my mind. The sort of market around data - it seems like it could get very, very complicated and sort of multi-tiered, in the sense that there’s people generating data, but there’s people that could buy data, right? And if data is money, and that money escalates in value, all of a sudden you’ve got a sort of market for this thing that increases in value over time, and there’s like an investing element to it as well, which is quite interesting.
One other feature of this that I see you touch on in the book is like derived or synthetic data, which I think is quite interesting, because - Chris and I have talked about this a number of times on the podcast in relation to privacy and the fact that if you are able to augment your datasets, especially as a professional, with derived or synthetic data, you can actually do things maybe beyond what you would be able to do with the amount of data that you have, that’s maybe cleaned, and detoxed, and has no privacy issues. So I don’t know, could you touch on that a little bit, and maybe how you see the methods and usage of generated data and synthetic data kind of progressing as we move forward?
Yeah, absolutely. And it’s a great topic to talk about, and I love to get into it with analytics professionals all the time. We’ve gone past the era of aggregations and averages, and integrating data. We still integrate data, of course; it’s a powerful tool for us. But if you really want to get somewhere today and have competitive advantage, you are probably going to have to derive data from multiple datasets to come up with indicators and functions and things that don’t exist other places.
[42:20] You will have to create something that is proprietary and unique to the way that you see the world and you are approaching the world. That’s derived data. You take travel data, location data, and you bring it together and you have a whole new set of data there.
Synthetic data usually comes up, at least now, and today, it comes up where you have industries where people are really not watching them very closely, and you don’t have access to proprietary data, because the small number of people in those industries won’t give it to you. They’re smart enough to hold on to it for themselves. So then you have to synthesize and create the data to measure that industry from the outside. And you can do it. We’re doing it today. We just did a project where we did that, and it’s worked out very, very well for us. So you can derive data from existing sources, bringing them together, and coming up with a whole new dataset, or you can actually synthesize the data and create it from different indirect measures that you can see from the outside.
I have one small follow-up to that is intriguing me a little bit. To start with, you’ve definitely changed the way I’m thinking about it in terms of the monetization of data. We have these exchanges, which are giving us the ability to place some market value on it, and so I’m definitely moving into that mindset… And so if I look at the analogy for a moment - back to cryptocurrencies, when we talked about synthetic, there is a mathematical limitation in terms of the compute required to generate new value there. If you’re going to look at synthetic data and place value on it in a monetary sense, in an exchange, how do we regulate that? It seems like there could potentially be the ability that if you’re really going into a new business - maybe this is several years in the future, exchanges are widespread, and we’re seeing an industry built around the monetization of data specifically at that point, you know, here in the US, and people are synthesizing data to do that. How is that not printing money potentially? Or is that just one of those gotchas we’ve got to figure out going forward?
We’re going to have to figure that out as we go forward. That’s something that we’ll see, and there’ll be all sorts of people stretching and pushing the boundaries, and we’ll have to look at those edge cases as they come to be. One thing that I’ll throw on the table that might be interesting for you and your listeners is what industry in the United States has generated the most millionaires over the last decade?
Over the last decade? I don’t know, social media? I don’t know.
I would guess something along those lines, but I don’t know either.
Market research. There’s more market research organizations in the United States that are run by entrepreneurs that have become millionaires than any other business.
And it’s all data. There’s nothing to those businesses other than data.
[45:01] And that sort of brings me to a last question, John… We’ve talked a lot about different elements of this, and certain ones that are maybe - like Chris was saying, he was disturbed by certain things, and other things that are maybe cool, because I’m going to be making an extra two grand each year… So you know, that’s positive… As you look at where things are headed, what in a sort of positive way excites you about kind of the future of maybe the professions associated with data, whether that be analytics or AI? Or how those professions are shifting under this changing climate. What kind of excites you about that, and you’re looking forward to?
Yeah, some people look at the book and they come away from it and go, “Oh my gosh, this is terrible. It’s all been a sham. I don’t understand… The overlords have been manipulating me”, and all this kind of stuff. And it’s like, no, that’s not the takeaway from the book. The takeaway is that we’re all waking up, we’re all in a new era. We need to throw off the regulations and the structures that we were using from 100 years ago today, and look at where we are today. And the EU is putting in the structures and the frameworks that we need to leverage, and we all just need to look at how we want to monetize our data, and how we can have that be part of our life that is beneficial and positive each as individuals.
Now, as far as the data and analytics profession goes, I’m bullish. If we took every high school student and college student and graduate student in America and turned them into a data scientist, we might have a tenth of what we need. So there’s lots and lots and lots of jobs. All these people that are wringing their hands and saying, “Oh, the future is nigh, and our children won’t have the same level of lifestyle we had”, that’s bunk. There’s lots of opportunity out there around the data and analytics fields, and that alone would employ everybody. Not everybody’s gonna want to do that; we need people to make chairs, and dig ditches, and run factories, and those kinds of things, too. But data and analytics is a very, very bright spot for all of us. Both of my kids go through two Big 10 schools, Michigan and Illinois, and they’re both engineers. And they both work with data every day. So I’m living my own truth right there.
And it’s way better than digging ditches, I’ve gotta say.
I dug ditches. I dug graves when I was a kid, and it’s no fun being a grave digger. I can attest to that.
Yeah, yeah. Or painting fences. That was my first one. John, it’s been a real pleasure. Your book is available now on early access on Manning. We do have a permanent discount code with Manning, 40%. That’s pretty amazing. 40%. So listeners, the code is “podpracticalai19”, and we’ll put that in our show notes as well. So please, take a look at that. We’ll put the link to the book in there, along with John’s other books.
It’s been a real pleasure, John. We’re excited to see the book take off, and also, whatever you write next. Excited to have you back on the show.
I’d love to. I enjoyed the conversation. I’m sorry to freak you out, Chris…
[laughs] I’ll get over it.
But yeah, when the new book comes out, we’ll do it again.
Our transcripts are open source on GitHub. Improvements are welcome. 💚