Causal inference
With all the LLM hype, it’s worth remembering that enterprise stakeholders want answers to “why” questions. Enter causal inference. Paul Hünermund has been doing research and writing on this topic for some time and joins us to introduce the topic. He also shares some relevant trends and some tips for getting started with methods including double machine learning, experimentation, difference-in-difference, and more.
Matched from the episode's transcript 👇
Paul Hünermund: I’ll start with fairness, because that’s actually the very first example that I use in my own course, Causality Causal Inference course here at Copenhagen Business School. It’s a case taken from Google actually, so a while ago, I think in 2019. Well, already earlier - the story goes longer, but they have been accused of underpaying women in their organization. So there we have a classic example of like a protected attribute, like gender, race, and so forth, and we want to prevent bias in some form of automated or semi-automated decision-making, right? And that comes up all the time. I mean, in loan acceptance models, for example, we want to remove bias, and so forth.
[34:23] So to make the story quick, is they have been accused of underpaying women in their organization, and then they did a fairly sophisticated analysis, published a whitepaper, and the result of that analysis was that they found that they’re actually underpaying men; at least they thought so. And not only men, but actually high-level software engineers, so high-seniority software engineers at Google. And then because they’re committed to fairness in their organization, they actually raised salary levels for these high-level software engineers based on the analysis. So it also had a practical component to it, or like a policy implication.
We cannot analyze this case here in detail, but if you do that analysis, it’s very likely that they actually did sort of fairly common causal inference mistakes, or they conditioned on some variables that are downstream, that are affected by gender, like occupation, for example… And then if you have discrimination already at that stage, that for example women don’t have it’s so easy to get into high-level positions for various reasons that we know of, then that will be a classic mistake, and you can produce these kind of, again, nonsensical correlations in the end, like the sharks and the ice cream.
That’s one example that you can actually easily transport to other kinds of questions - like I mentioned, algorithmic bias. And that’s a causal question, because if you don’t understand how variables in your model causally interact and relate to each other, you cannot answer this question, you cannot decide how to correctly analyze the data.
Robustness, I mentioned – so the transportability, transfer learning kind of aspect of experimental knowledge and their causal inference techniques have been developed… Also dealing with selection bias in data, so a dataset that might not be a representative sample of the population that you care about, but it’s measured with some form of selection bias, because only happy customers answer your consumer survey, or unhappy customers, but no one in between answers these questions…
And then lastly, explainability - I think explainability almost comes for free with causal inference. I mean, don’t get me wrong, causal inference is a hard task, but once you solve it, explainability almost comes for free, because - well, I mentioned “The book of why”, right? So causal questions are always related to why questions, counterfactual as well… Like, “Why did my headache go away? It wasn’t because I took the aspirin this morning.” I mentioned this example. This is the way we reason, this is the way we explain, for example, things to other humans, and so there’s an immediate connection to explainability.