Practical AI – Episode #220
Causal inference
with Paul Hünermund, assistant professor at Copenhagen Business School
With all the LLM hype, it’s worth remembering that enterprise stakeholders want answers to “why” questions. Enter causal inference. Paul Hünermund has been doing research and writing on this topic for some time and joins us to introduce the topic. He also shares some relevant trends and some tips for getting started with methods including double machine learning, experimentation, difference-in-difference, and more.
Featuring
Sponsors
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
Changelog News – A podcast+newsletter combo that’s brief, entertaining & always on-point. Subscribe today.
Notes & Links
- How Can Causal Machine Learning Improve Business Decisions?
- Causal Inference is More than Fitting the Data Well
- Causal Data Science in Practice
- Causal Discovery
- DoWhy Github
- The Book of Why
- Causal Data Science Meeting
- Paul’s study on causal ML adoption in industry (incl. an overview of useful software packages in Table 3)
- Causal Data Science MOOC on Udemy
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | Welcome to Practical AI | 00:43 |
2 | 00:43 | Intro to causality & Paul Hünermund | 04:52 |
3 | 05:35 | Why causality? | 02:36 |
4 | 08:11 | Determinism vs non-determinism | 02:50 |
5 | 11:01 | Gaining confidence | 03:05 |
6 | 14:06 | Sponsor: Changelog News | 01:37 |
7 | 15:53 | Main ways to use causal inference | 04:16 |
8 | 20:09 | Making it practical | 02:40 |
9 | 22:50 | First steps to take | 02:20 |
10 | 25:10 | Some helpful resources | 02:08 |
11 | 27:35 | Daniel's practical example | 05:26 |
12 | 33:01 | The effects of causal learning | 04:09 |
13 | 37:11 | Closing thoughts | 04:14 |
14 | 41:33 | Outro | 00:45 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist building a tool called Prediction Guard, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How’re you doing, Chris?
I am doing very well. I’ve been watching you building Prediction Guard from afar, and looking forward to hearing more about it in the days ahead.
It’s a fun one. We’ll talk about it in more detail soon. And the causal reasons why I ended up doing those things that I’m doing.
Could that be a transition?
Yes. Speaking of cause and effect, and causal things, we’re really privileged today to have with us Paul Hünermund, who’s an assistant professor at Copenhagen Business School. Welcome, Paul.
Hi, Daniel. Hi, Chris. Thanks for having me.
Yeah, it’s great to have you here. And I think also this is so cool, because I think the topic that we’re going to talk about is so very practical, and important, because I’ve tried to do - as many of our listeners know, my wife owns a business, and I’ve tried to do a bunch of analytics or predictive things for her over the years, just out of either need or fun. And often, the question is “Why is the prediction that?” Or for a business person, they’re wanting to know what is the attribution? What is the behavior behind this thing that I’m seeing? So you’re an expert in causal AI, causal machine learning, and have been doing research in this area, and are very well-versed in it… And I’m wondering if you can, as you start out here, just like give a brief understanding to everyone about what do you mean when you say causal AI, or causal machine learning, and how maybe is that differentiated from what people might commonly think of when they think of AI or machine learning?
So there are many names - causal AI, causal machine learning… Causal Inference, I think, is the more traditional term. But I think the basic idea is pretty intuitive for everyone who works with data, is that if we look at correlations and patterns in the data, sometimes they can produce quite surprising, and probably nonsense results. I mean, we all know the story about ice cream sales and shark attacks that are highly correlated over the course of the year. Chocolate consumption and Nobel Prize winners in a country, probably driven by of Switzerland predominantly… Stocks and the babies - the stock population and fertility rates are correlated. With these examples, we usually use them in the classroom as sort of like a caveat, right? “Wait a minute, correlation is not causation.” People have heard this term. But then causal inference and causal machine learning is really this idea of taking causality seriously, and trying to build tools, algorithms that allow you to draw causal inference from data to distinguish cause and effect, and weed out the kind of nonsense correlations. And yeah, that comes with a different tool set. You can approach this from a purely algorithmic point of view, and you would probably apply different tools than standard machine learning… It goes even deeper. There’s a whole epistemological point about it. Well, if you want to do causal inference, you cannot do this in a purely model-free way; you actually need background knowledge, expert domain knowledge in order to do this, in order to, for example, distinguish between possible alternative explanations… And that is a whole paradigm shift in terms of how we approach data, and how we approach machine learning.
Well, and then my last word on this is what’s the difference to standard machine learning is that - well, standard machine learning, the bulk of it is really correlation-based. I mean, all the tools that we’re having - deep learning, support vector machines, and so on, these are predictive tools; prediction means correlation, finding/detecting patterns in data. And so they’re not suitable for it. I mean, there’s a branch of AI that is reinforcement learning - maybe we could talk about this later - that goes more into the direction of actually intervening yourself; the learner intervenes itself in the environment. So that goes in the right direction, but it’s not getting there the full way. This is the main difference to standard machine learning.
That kind of gets to the what, I guess, of what is causal AI, causal machine learning, causal inference. I guess the next question that’s probably a good foundational one is the why. And maybe this is even exacerbated in recent times. I don’t know if you’ve seen this, with all of this sort of hype around large language models that are incredibly non-interpretable, or produce things that are very factually incorrect, or unexplainable… But how should a data scientist working in an enterprise, let’s say - why should they care about causal inference, rather than just making good predictions, let’s say?
[06:09] Yeah, so maybe it helps if I’ve defined first what I mean exactly with causal inference, because there is actually a neat definition by James Woodward, which is a philosopher of science, who said that causal inference or causal machine learning is a special kind of prediction problem, so in a sense, it is a prediction problem, but here we are predicting the likely impact of an action, intervention or manipulation. Really this idea of I do something, like I increase the chocolate consumption in a country; will that produce more Nobel Prize winners? …in this context.
I think if we approach from that perspective, you immediately see the value for business. Because in business, we are always asking these kinds of questions. What if we do X? What if we implement this new HR policy? What if we enter a new market? Should we invest in this product, or another product?
So business always involves actions, interventions, and we want to forecast, predict the likely outcomes of this. That would be what causal inference people call the interventional level; then one level above that is the counterfactual level. So counterfactual meaning we’re reasoning about two states of the world. Had I not taken the aspirin this morning, would my headache be worse today still? These kinds of questions. And they also very relevant in sort of hindsight retrospective, like “Was it, again, HR policy that we implemented that improved employee satisfaction?” and so forth. So immediately relevant in all sorts of domains in the business world.
Specifically in AI, we’re talking about fundamental problems in AI, which is fairness, robustness, explainability… And I believe that causal AI has to say something in all of these domains. So there is also an immediate practical value in this.
Let me ask you a question on – I want to throw another term that we haven’t used yet, that sometimes gets thrown in in casual causal conversation (say that 10 times fast). Determinism versus non-determinism. Because we have a habit of applying that at a high level with AI models, and say “Ah, they’re non-deterministic” and there’s a certain expectation over the years in training AI models that are non-deterministic, in terms of you do have that disconnect in terms of understanding that causality from beginning to end. Can you kind of distinguish a little bit between the two terms in the sense of if someone is just kind of getting into this, they’re early in data science, and they’re trying to go “Wait a minute, I thought AI was non-deterministic, but yet causality is explainable.” How do those fit together? How do those as slices of perspective on an AI model - how do they work together where you have determinism, non-determinism, and causality or not? And what are the implications of those?
When I put out the definition, I used the term “likely impact”, and that already hints at that also causal inference, the approaches that we are having are probabilistic frameworks. So there is no determinism in this. What we’re interested in is still a probability, or a contrast of two probabilities. So I have the probability of a certain outcome that I care about. If I had taken this specific action, or if I hadn’t. So these are factual questions, but still, there is no determinism, in the sense that if I implement something, it will always work, or it will always have success with this product, and so forth.
There’s an interesting, I think, sort of intellectual history, because the frameworks that we’re having in causal inference, directed acyclic graphs, developed by Judea Pearl, and these kinds of people - they build on actually earlier work in AI, like Bayesian nets, for example. At that point still a purely predictive tool, [unintelligible 00:10:05.04] our tool to deal with complexity in terms of probabilities, because expert systems were too rigid. We’ve figured that out in the ’70s and ‘80s. And building on that, then people immediately, once you reason probabilistically, people made the shortcut, the mental shortcut in reasoning in terms of cause and effect, probably because it’s so intuitive to us. But the tools were actually not ready for that yet.
[10:32] So that was the intellectual history, how we moved from probabilistic AI frameworks to causal inference. And again, I think people immediately started to think that way because causality is such a fundamental concept for human thinking. We learn it very early in our development. Babies can think causally; there’s some psychology work that we pick that up at the age of two, or so. Pets sometimes can think causally, probably. So it’s a very fundamental concept.
Have you found in practically interacting – because I know you’re involved in the data science community as well, and have helped run events, and other things related to this topic… Have you found data scientists are sort of – because I could see some data scientists were so in the mindset of “We’re making a prediction. We probably understand that we’re thinking about correlations in many cases”, I think. But then it’s sort of scary for us to think about like “Well, I don’t know if I want to put out a – if I tell my executive “This is the reason that something happened”, I can see the value of it, but how confident can I be in that?” And that also gets to maybe people’s – I think people during COVID, and other times realized maybe how rusty they were on like basic statistical and probabilistic type of concepts, where everyone was all of a sudden thinking about medical trials, and such. Have you found this sort of hesitation amongst data scientists as you’ve interacted with them? And what maybe is some steps that data scientists can take to gain confidence in initial thinking and education around this topic?
Yeah, so we talked with a lot of data scientists from industry practitioners, and I don’t think there’s hesitation; it’s actually the opposite, there’s lots of interest. Of course, this is sort of a new topic. You need to tool up in a different area. So that’s a step that you need to take. But many people are very curious… And we just simply wanted to understand sort of where are we right now, and we had a hunch that this is the toolbox that we know, predictive analytics, correlational AI… And then based on that, what kind of questions practically do you address? Also maybe in the interplay with the broader organization, right? What is it that the executives want to know, what do they approach you with? And is there sort of a mismatch between the methods that you’re working with and the questions that are asked? And in our interviews, and we did some quantitative analysis on this too, we could clearly see this kind of mismatch between many questions that are asked do actually have this causal component to it, because actions forecasting interventions is so ubiquitous. The standard tools are not up to the task for this, and that creates actually this disinterest in approaching causal inference, and looking beyond what we currently do.
So one interview is stuck to my mind… It was an IT consultant that was working a lot in in the data science field, and he said “Yeah, most of the questions that our clients asked are causal questions in the end. But what we do in the end with them is always some form of predictive analytics, deep learning, and so forth.” That always created this kind of tension in the projects that he was working in. So that was very eye-opening for us.
Well, Paul, you described really well how to think about generally this sort of causal inference, causal AI, causal machine learning, the importance of it… And you mentioned that in doing causal inference, you have sort of a different tool set, or maybe different algorithms that are applied. I know that one thing that of course I’ve done before and know about from various data science positions is like experimentation, or hypothesis testing; like A/B testing. I know that only scratches the surface you were talking about, directed acyclic graphs, and other things… So could you give us like a broad sketch of currently what are the main categories of approaches within causal inference, and how can we think about those?
Like from a really broad categorization, traditionally, people divided the field into experimental and observational methods. And experimental would be the A/B testing that you’re talking about. One of our interviewees even called it the big hammer that tech companies swing around, A/B testing. And it’s applied a lot, sometimes together with some form of multi-armed bandit reinforcement learning type of approaches, but often just this plain vanilla way… And that’s great, because - well, experiments are easy in many domains to set up, easy to understand, and you don’t need a lot of background knowledge. You simply try out different things; shades of a button on a website, classic example.
But in other domains, it’s really not that simple, because - well, experiments can be very costly, they can be unethical, in many questions… I think you mentioned the COVID pandemic earlier - that was an interesting example to observe, because when we tested the vaccines, of course we did the standard clinical trials, which is an experimental method, an A/B testing, if you want… That costs a lot of money, but we have these procedures for it, and we need to approve drugs in that way. But then, after we rolled out the vaccines, immediately there were follow-up questions, like for example, “Where is the vaccine more effective? Is it for older population or younger population?” Or “In which way do we need to roll out scarce vaccines, and so forth?”
[18:22] These kinds of questions that were not included in the control trial, we didn’t have experimental evidence for it - we needed to answer this based on ex-post data. So people picking up vaccines and then seeing where they’re most effective. That was interesting to see, because many of the questions that we ask in practice do involve this observational causal inference, and with observational causal inference I mean we don’t actively intervene ourselves, but we passively observe the data and still want to get cause and effect out of it, although we haven’t designed the experiment ourselves. So in a sense, we’re then trying to mimic a thought experiment, if you want, with observational data. And that creates all sorts of problems, because - well, those people that picked up the vaccine earlier probably are those that thought they have the most to gain from it, for example, so there’s this sort of self-selection bias, or confounding bias in this, and we need to address all of these things.
These are the two main categories. And then within those categories, we have all sorts of different techniques, algorithms. Experimental design is an entire course catalog at our university in the observational fields. So for example, I originally come from an econometrics background, and in econometrics, or in economics, we ask a lot of causal questions, and then we have tools like regression discontinuity design, difference/indifferences, so nearest neighbor matching, and so forth.
The new kid on the block are the computer scientists, and they catch up fast in causal inference, and they developed these techniques like directed acyclic graphs, causal reinforcement learning… So all sorts of exciting streams of literature coming up these days.
I’m really trying to absorb what you’re saying, and it’s very interesting… I’m kind of wondering if I have a problem today, like before our conversation, and I want to go through your kind of the typical data prep, and model training, and model testing to deploy… But now I’ve listened to you and I want to start implementing causal approaches into my workflow. How does my workflow change? What does it look like with typical tools now, and where might gaps be in the typical toolchain that we currently have? How do we make it practical and go do it after the show?
Starting from the epistemological challenge is that we cannot do causal inference in a purely data-driven way. We cannot just optimize a target function or look at our confusion matrix loss function in that sense; we need to complement this with background knowledge that - well, in the simple examples, it’s not just ice cream sales and shark attacks, t here’s a third variable lurking which we need to consider, which is weather probably, or sunshine. So this is in the simple case, but now imagine a problem that you approach for the first time - you do exploratory research, so you don’t have this good theory, so we need to do something about this.
Then a lot of the standard challenges that we’re having, collecting good data, maybe designing an A/B tests are the same, but this additional step of bringing in background knowledge… And there it depends, I guess, how also the data science team is structured in an organization. Do we need to bring in outside stakeholders? Do we maybe need to talk with the marketing people, or the logistics people, depending on the project? Often at the moment data science teams are almost this kind of in-house consulting type, and there are, for example, not that many mixed teams that could bring in this background expert domain knowledge.
[22:03] Practically speaking, there are all sorts of tools out there in the standard software languages. It’s a little bit scattered probably landscape, so you really need to know what kind of libraries are out there… For example in Python, the [unintelligible 00:22:17.03] package by Microsoft really became a standard industrywide, because they also have this kind of causal inference pipeline implemented in the package, which starts from modeling a specific domain or phenomenon, to applying the causal inference algorithms, getting cause and effects out, and then also refuting the model or challenging the model. So you have this kind of step by step procedure that can really help you in getting started and getting results quickly.
One of the things that you said that I wanted to ask kind of a clarifying question was - you kind of talked about going to that kind of external source, the extra authority, if you will. A lot of practitioners these days are kind of starting on their own if they’re not on a big data science team… I work in a big company, and we have tons of data scientists, so this probably doesn’t apply to me in that capacity… But a lot of people in startups are out there trying to kind of delve into new businesses, and stuff, and they may not have access to kind of outside the data expertise to apply. Do you have any tips or guidance on like if you’re that practitioner, and you’re trying to solve a problem for which you don’t have that external expertise, how would you go about tackling that? How would you go about saying, “It’s me, myself, and I joking around, and this is a way I can apply causal approaches when I don’t have a lot of resources available to me”?
First of all, I would say you’re never really alone…
True.
So think outside of the box a little bit. I mean, often it doesn’t take that much. You can just approach people and maybe talk with them for an hour and get the insights out that you need. Consulting probably the scientific literature on a certain topic can help too in sort of figuring out alternative explanations that you can then bring to the data and test to the data.
We’re also not completely helpless, in the sense that everything has to come from the theory. There are data-driven approaches, so that will be the area of causal discovery that we can apply to get closer to kind of a causal model based on the relationships that we find in the data. We know that that never gets us 100% all the way, so we will always need to complement it with some form of background knowledge… But it can already help.
And then I would say – I mean, talking also to practitioners, I think sort of the 80/20 rule it’s called, and it applies. I mean, already getting closer to something causal is often good enough, and we should get away from this idea that it’s 0 or 1, right? That either it’s causal, or it’s not. Often we get closer to the truth, and if not, we can do for example - there are a whole tools on sensitivity analysis that we can challenge our assumptions and see how robust they are. And I think in practice, this already helps tremendously.
You mentioned reaching out to practitioners in the community around this. Could you describe a little bit – I know, like I mentioned earlier, there are some resources that you’ve kind of helped co-found and run over time related to this. Could you mention those, so that people could find those as they’re looking into the topic?
Based on what we identified and where the field is, we actually saw the need for more exchange between different academic fields, because causal inference is such a general-purpose technology almost; it’s applied in various different fields. I’ve mentioned economics, computer science, epidemiology, health sciences… But then also practitioners. So it’s really like a mixed group. So we set up the Annual Causal Data Science meeting. We started in 2020, so we had to do it online because of the COVID pandemic, and then realized that it’s really an easy way to get people into one virtual woman in this case, and there was lots of interest from practitioners… And we’re gonna have the third iteration of this this year, in November; so there’s still some time. But hopefully, listeners will make a mental note.
[26:18] Well, there are also good teaching tutorials out there, many blog posts, online courses that you can sign up to. Books, like “The Book of Why” by Judea Pearl; maybe not really a textbook, but really drives home the idea of why causal inference is so important. It has really nice historical anecdotes, because Judea is really a giant in this field.
“Causal Inference, the Mixtape” by Scott Cunningham, if you have more of an econ background. Perhaps “The effect” by Nick Huntington-Klein. So these are all beginner-friendly textbooks that you can pick up.
And then trying out the different packages, like [unintelligible 00:26:54.06] in Python, for example… There’s a startup called Geminos that is developing causal inference software, and they’re having free trial versions where you can start out drawing your directed acyclic graphs and see how answers change if you change assumptions, for example. So I think that is usually the best way to learn and pick this up.
Well, Paul, I’m selfishly going to present you maybe with a scenario and do some sort of on-the-fly problem solving. I figure you’re probably good at that, being a professor and always solving problems with students, and others, and colleagues. So I mentioned my wife runs a business; it’s a candle manufacturing business, and there’s actually this sort of like Why question that we’ve been talking about a little bit… So last year, to give context – I just logged into Shopify. So last year, they had 87,837 orders. Each of those orders, or at least most of them, when they shipped them, included a free sample two-ounce candle. It’s like a freebie add-on. And over time, the assumption has always been “Oh, people really like that”, and it’s sort of like part of the package that they get; it increases reorder value, right? Like, they see the package and they’re “Oh, cool, I got like a free candle. I love these people forever, and I’m going to reorder”, right?
Well, the question has come up, obviously, at this scale of orders, that’s a lot of free two ounces. And even just the savings of those would be huge. So how might you as a practitioner, or someone thinking about this problem, both in terms of the experimental or the observational approaches, what might be some ways to dig into this? Obviously, it’s very expensive to get different packaging, and do an experiment at that scale… So it’d be nice to know without doing like a large-scale shift in packaging, and that sort of thing… Any tips for me?
In this case, the big advantage is that you, or your wife, you’re actually controlling this process. So you decide in which packaging to put the free add-on, and in that sense, you immediately understand the selection process, or the treatment assignment, how we would call that here. And in that situation, I think an experimental approach would be the way to go. Then it becomes more of a statistical question, like how large your sample needs to be in order to draw robust conclusions. And if it’s just the yes/no questions about a free sample or not, the experiment can probably be quite small. That relates a little bit to the COVID example that I discussed earlier.
[30:09] Probably you want to broaden that up and think about, for example, heterogeneous treatment effects, in the sense that “Well, is it high-volume customers that like this free add-on most? Or is it more like the casual shoppers?” And suddenly, you have four groups that you’re catering to, because - well, high-volume, low-volume, and treatment and control. These are problems that always come up in causal inference, so tools like causal random forests are developed exactly for that problem - how do you efficiently partition the population in order to reduce costs associated with an experiment? You want to be as cost-efficient as possible, but also still get robust conclusions out.
A similar problem arises then of - I mentioned earlier robustness of findings, right? So transfer learning is a big topic in AI. So maybe let’s assume you’ve done this experiment at this point in time, and you’ve found that robust treatment effect - so people react to it, it increases reorder value… The question is then, “In six months from now, will the world have changed, or will these results still be valid?” And maybe you’re thinking about that in six months not so much has changed of the business, but for example, platforms like booking.com, hotel bookings - for them it’s very relevant, because you have people that book hotels, and some are for leisure travel, and they are very different than business travel, for example. So you have this kind of problem, can you transfer causal knowledge that you have obtained to a different domain? Because that would save you on experimentation costs, for example.
So yeah, all sorts of interesting questions in that domain. But the big advantage is you can actually run experiments here. In other domains we are relying on data where people self-select into something, like for example the standard question in economics about, for example, the returns to a college education. There we could not randomly assign people to colleges, or whether they can go to college or not. There we have to rely on them self-selecting into sort of treatment and control group. And then there’s always the question whether we have a really apples to apples comparison, or is it perhaps an apples to oranges?
I just want to say, I think you’re lucky that you got Daniel’s example question, because coming from my industry, I would have had to ask, like, I don’t know, hypersonic missile design, or something, and I don’t think we wanna go there.
This is a great thing about the podcast, right? We get to have like the expert on, and I get to selfishly ask the question that helps me in my day to day, so…
Excellent way of getting some free consulting in there.
Yeah.
So I wanted to actually take you back to something that you mentioned a little while ago… We were kind of talking about the benefits of causal inference, and you brought up reinforcement learning, but we were generally talking about kind of fairness, bias, robustness, the impact of causal on those… Could you kind of go back to that point and kind of talk a little bit about what that means? These are huge topics that are in all of the different branches of AI right now, and it’s on everyone’s mind, especially with all the advances this year. How does causal affect that worldview of doing these amazing things, in these different branches of AI, but doing it without bias, doing it fairly, such as that?
I’ll start with fairness, because that’s actually the very first example that I use in my own course, Causality Causal Inference course here at Copenhagen Business School. It’s a case taken from Google actually, so a while ago, I think in 2019. Well, already earlier - the story goes longer, but they have been accused of underpaying women in their organization. So there we have a classic example of like a protected attribute, like gender, race, and so forth, and we want to prevent bias in some form of automated or semi-automated decision-making, right? And that comes up all the time. I mean, in loan acceptance models, for example, we want to remove bias, and so forth.
[34:23] So to make the story quick, is they have been accused of underpaying women in their organization, and then they did a fairly sophisticated analysis, published a whitepaper, and the result of that analysis was that they found that they’re actually underpaying men; at least they thought so. And not only men, but actually high-level software engineers, so high-seniority software engineers at Google. And then because they’re committed to fairness in their organization, they actually raised salary levels for these high-level software engineers based on the analysis. So it also had a practical component to it, or like a policy implication.
We cannot analyze this case here in detail, but if you do that analysis, it’s very likely that they actually did sort of fairly common causal inference mistakes, or they conditioned on some variables that are downstream, that are affected by gender, like occupation, for example… And then if you have discrimination already at that stage, that for example women don’t have it’s so easy to get into high-level positions for various reasons that we know of, then that will be a classic mistake, and you can produce these kind of, again, nonsensical correlations in the end, like the sharks and the ice cream.
That’s one example that you can actually easily transport to other kinds of questions - like I mentioned, algorithmic bias. And that’s a causal question, because if you don’t understand how variables in your model causally interact and relate to each other, you cannot answer this question, you cannot decide how to correctly analyze the data.
Robustness, I mentioned – so the transportability, transfer learning kind of aspect of experimental knowledge and their causal inference techniques have been developed… Also dealing with selection bias in data, so a dataset that might not be a representative sample of the population that you care about, but it’s measured with some form of selection bias, because only happy customers answer your consumer survey, or unhappy customers, but no one in between answers these questions…
And then lastly, explainability - I think explainability almost comes for free with causal inference. I mean, don’t get me wrong, causal inference is a hard task, but once you solve it, explainability almost comes for free, because - well, I mentioned “The book of why”, right? So causal questions are always related to why questions, counterfactual as well… Like, “Why did my headache go away? It wasn’t because I took the aspirin this morning.” I mentioned this example. This is the way we reason, this is the way we explain, for example, things to other humans, and so there’s an immediate connection to explainability.
That’s a really great way to think about this. And it gets me thinking like what will be the impacts of these two fields as they interact more over the coming years… And I’m wondering, from your perspective, because you’re so plugged into the research that’s going on in this area, but also the practical side of this, and how data scientists are beginning to use these techniques… What, as you look forward to the next, let’s say year, or whatever time period you want to have there - like, what gets you excited, or what trends would you like to highlight, that maybe people should be thinking about in this field? Or maybe it’s just things that you’re excited about in terms of new opportunities, or new methods, or whatever it might be?
[37:51] Yeah, I just attended a conference last week in Thüringen, Germany, a causal learning and reasoning conference, and it was just exciting to see how many young minds there were attending the conference. So it was a very young audience, a lot of grad students and computer science specifically there, and a bunch of seniors as well… But that really showed me that this seems to be the next big thing in AI, and people have confirmed that to us, that there’s more and more interest in the academic side, but we also see that in practice in the industry.
Yeah, so I’m excited about - well, experimental design, what I mentioned earlier, heterogeneous treatment effects… For example, not only being satisfied with the having one average treatment effect, or average causal effect, one number and this is what we’re expecting, but actually making this more fine-grained and opening up, answering questions like “Is it old people, young people that benefit more from vaccines? In which way do we need to roll that out in the most effective way?”
Then on the observation side, I think causal discovery is really promising. This is really the idea of how far can we go with just simply trying to get out causality from observational data? And we will never get 100%, I mentioned that, but how far can we go? So one big challenge in that area is, for example, to have good benchmarking datasets. In machine learning, that’s usually easy - you divide a sample up into a training and benchmarking dataset. With causality that’s not so easy. You often need an experimental benchmark, for example. A lot of work has been done in genomic research, where you can knock off genes, for example, in an experimental way, so that is really exciting.
There’s new work on causal root cause analysis by Amazon, for example… So figuring out what are the causes of actually outliers, even in an engineering system. So you mentioned, Chris, that you’re working in that area; I’ve seen, for example, companies in the defense industry thinking about this problem of root cause analysis.
Lastly, perhaps - because originally, before I came to causal inference, I was actually trained in economics, like I mentioned, in specifically innovation economics, so this idea of how do we produce knowledge as a society, how does knowledge spread across society… And so in causal inference, there’s new lines of work, thinking about actually interactions between treatments. So not just the idea that I take a pill and I get an outcome from that, but it’s like “You take a pill, and that reduces the viral load in our community, and that’s why I actually have a lower likelihood to get sick”, for example. So these kinds of interactions between people are really important, I think, in many domains, and specifically also in the way knowledge spreads across networks. So that is something I’m really excited about.
Awesome. Well, I am really, really happy that we got to have this conversation on the podcast, because I think it highlights something that’s a real compliment to many of the things that people are exploring around deep learning, and large language models, and other things. This is a really important piece of the sort of practical side of what data scientists are doing in the enterprise. So yeah, thank you so much, and thank you for your research on the topic, and also engaging the community around this. It’s really great, and really happy to have had you on the podcast.
Thank you. I really enjoyed the conversation.
Our transcripts are open source on GitHub. Improvements are welcome. 💚