Fighting bias in AI (and in hiring) with Lindsey Zuloaga (Practical AI #17)

All Episodes

Lindsey Zuloaga joins us to discuss bias in hiring, bias in AI, and how we can fight bias in hiring with AI. Lindsey tells us about her experiences fighting bias at HireVue, where she is director of data science, and she gives some practical advice to AI practitioners about fairness in models and data.

Changelog++ members support our work, get closer to the metal, and make the ads disappear. Join!

41 minutes
Recorded Oct 16, 2018
Published Oct 22, 2018
Download (59MB)
Transcript
🎧 6,156

AI (Artificial Intelligence)

Featuring

Lindsey Zuloaga – Website, X
Chris Benson – Website, GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

DigitalOcean – DigitalOcean is simplicity at scale. Whether your business is running one virtual machine or ten thousand, DigitalOcean gets out of your way so your team can build, deploy, and scale faster and more efficiently. New accounts get $100 in credit to use in your first 60 days.

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.

Rollbar – We catch our errors before our users do because of Rollbar. Resolve errors in minutes, and deploy your code with confidence. Learn more at rollbar.com/changelog.

Linode – Our cloud server of choice. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2018. Start your server - head to linode.com/changelog

Notes & Links

📝 Edit Notes

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another Practical AI. Chris, I know that you’ve had a number of jobs throughout your career… Was the hiring process always super-smooth for you?

[laughs] Anything but. I’ve been hired more than a few times and I’ve hired lots of people over the years, and no, for me at least way, way more art than science. I’m looking forward to maybe learning something here.

Daniel Whitenack

We’ve got Lindsey Zuloaga with us. Welcome, Lindsey.

Lindsey Zuloaga

Hi, nice to be here.

Daniel Whitenack

Did I get the name right?

Lindsey Zuloaga

Yeah, you did good.

Daniel Whitenack

Okay, perfect. Well, I’m excited to have you on the show today. I know me as well, with Chris, I’ve had some awkward experiences in the hiring process. I’ve done well at interviewing, I’ve crashed in the interviewing process, I’ve done well and bad at assessments and coding things, and we’re just excited to have you because we’re gonna be talking today about your work with AI and hiring, and also bias in AI… So it’s super-great to have you here.

It’d be great if we could just hear a little bit about your background. I know you started out in Academia, and then eventually moved in the industry. Give us a little bit of your story.

Lindsey Zuloaga

Sure. I studied physics, so I did my undergrad here in Utah, at the University of Utah, and then I did a master’s and Ph.D. at Rice University in Houston, Texas, and a post-doc in Germany, as well. During that time, I was in the field of nanophotonics, so I was doing experiments on how nanoparticles interact with light, so building laser setups… A pretty different world than what I’m in now. When I went into graduate school, I really wanted to work with my hands. I thought I didn’t want to sit at a computer all day, but to my surprise, what I actually enjoyed the most about my work was writing code to analyze data. So when I did transition in the industry, data science ended up being a really good fit. It kind of relies on a lot of those similar problem-solving skills that I learned; obviously, having a strong math background was useful, and writing code to analyze data… So it was a good fit, the right place at the right time.

My transition - and you guys talked about job interviewing - I’ve written a blog post about this, but my transition from Academia to industry was a lot more difficult than I expected. I was doing well in Academia, so I thought it’d be easy for me to transition into industry, and I really was naive about the importance of connections; I had a CV - not a resume, but a CV - with publications on it, and things that people in the industry don’t really care about.

[04:10] So I came into the whole industry job world a little naive, and ended up applying for a lot of jobs online, and going through this process many people have probably been through, where you apply for a job through what’s called an “applicant tracking system” (ATS). You enter all your information in, you upload your resume, and then you have to re-enter all your information in… And then all your information kind of gets parsed into plain text, and you finally submit, and you’ve spent all this time trying to personalize your cover letter, and you submit and you just never hear anything again. It’s kind of this black hole…

I hate those systems, whether I’m an applicant or a hiring manager; either way, they’re terrible.

Daniel Whitenack

Yeah, I hate how you format your resume perfectly and you get it all flashy-looking, and then you go through the system and then you realize that you just have to put it in as plain text, or something, and all of that work is for naught.

Lindsey Zuloaga

Exactly. And there’s a lot of gaming the system, which I didn’t know, and maybe I would have benefitted from knowing this at the time… There’s actually a website - I can’t remember what it’s called - that you can go and see… You put in a job posting and then you put in your cover letter and your resume and it’ll tell you the likelihood of getting past these ATS filters… And really just a lot of times it’s just keyword match. I always felt like that was a little weird, to just put the exact keywords that are in the job posting in my application, but it turns out that does help you get past these filters. A lot of times these filters are pretty simple, they’re looking for certain keywords, or they’re looking for a certain school you went to, or GPA, which is silly, because we found a lot of times that doesn’t really tie a job performance very strongly at all… But the bottom line is companies just have so many applicants, and they need some way of filtering through people.

I went away from the experience feeling like something’s wrong with this system, and me and so many people that I know that had Ph.D’s, and once they got a job they did really well, and they were just passed up by so many companies. The companies are losing out, as well. That story is kind of my motivation of why I care about hiring and kind of fixing this broken system.

Daniel Whitenack

Now you’re a director of data science at HireVue. Is that correct?

Lindsey Zuloaga

Yes.

Daniel Whitenack

And what does HireVue do? Is it one of these systems, or what do you work on and what does the company do?

Lindsey Zuloaga

We’re not an applicant tracking system, although we do a lot of times have to work with them, integrate into them… We are a video interviewing company. Our philosophy is that a resume and a cover letter are not a very good representation of a person. We started years ago with a video interviewing platform, and our most popular product is called an On-demand Interview, which is asynchronous. Our customers are companies that create interviews; they can have any different types of questions, they can have a professional football player ask the question, they can do all sorts of interesting things, but they can send the same interview out to many different people, and the people record themselves answering questions on their own time. Then the companies can review those interviews on their own time.

That’s a very popular product, and that was our main product for a long time, because it kind of replaces this resume/phone screening initial phase of the funnel… Because we’ve all experienced looking at resumes and they all look the same, and it’s really hard to differentiate people. But once you hear them talk about what they’re interested in and how they communicate, you can get a better feel for who they are. So we had a lot of success with that product, but we still had this issue of volume, like I said before. Companies are just getting so many applicants that it’s impossible for them to actually look at all of them, and the way it ends up going today is that a lot of people are just randomly ignored.

[08:10] So we started building our AI product a few years back, where we said we have all this rich data from job interviews, and if our customers can tell us who ended up being good at a job and who is bad at a job - and what that is depends totally on the job, so we have some performance metrics around “This person was a really good salesperson. They sold a lot. This person wasn’t”, can we train algorithms to notice patterns between people who are top performers in a job and others? That is the assessments product that I work on.

So would it be fair to say that you’re focusing on using machine learning to take the bias out of the process of hiring, and if so, how does that work? How does that manifest itself? How do you train to get rid of that?

Lindsey Zuloaga

Yeah, it is a common question that we get pretty immediately when people hear about what we do - a little bit of like “Oh, this is creepy. How do you know what the algorithm is doing? How do you know it’s not biased?” Algorithms are really good at stereotyping, and that can be an issue anywhere where AI is used. If there’s any bias in the training data, or just even under-representation in the training data of certain groups, the algorithm could mirror that bias.

Daniel Whitenack

Do you mean if there’s only a representation of a certain type of candidates, let’s say, then your algorithm might behave differently when it’s trained on that data, according to when it sees those candidates, versus candidates that weren’t in the training pool? Is that a fair statement?

Lindsey Zuloaga

Sure. I think an even bigger issue is if there’s a small number of – say there’s only one female software engineer, and she wasn’t very good; then the algorithm takes that and says “Oh, every time I’ve seen someone act like this or talk like this, they were bad.” So if there’s no one, the algorithm doesn’t learn as strong of patterns, although it could, and that’s something you wanna look out for… But a lot of times under-representation or just explicit bias in the data, which we do sometimes see, and depending on how subjective that performance metric is, that can be strong… And depending on the country as well, we’ve seen it vary; manager ratings, and things that are subjective like that… So we definitely prefer objective metrics, like sales numbers, call handle time, productivity measures, things like that.

Daniel Whitenack

I’m curious, have you had more of a challenge on this front in certain industries? I’m not sure which industries HireVue is working with; you’ve mentioned sales a little bit, maybe software engineering… Do you have to approach this as far as your models go differently, in different industries, or is this something that’s kind of a problem across the board?

Lindsey Zuloaga

I would say it’s probably more on a company level or a cultural level that we notice differences. A lot of what is important in trying to level the playing field is these interviews ask people very consistent questions, and that’s something that’s been done in hiring over the past several decades, because hiring is very much about gut feelings, so we’ve improved it by trying to treat all candidates in a consistent way, but it’s pretty much impossible for humans to actually do that. Humans have this implicit bias that we don’t even know we have… So there’s also a big culture recently of this concept of cultural fit, which is very popular, and companies say they wanna hire someone who they like, and that can communicate well with them and work well with their teams, but this often results in a similarity bias, where “I don’t know why, I just like that person.” Well, you like them because they’re a lot like you, or they’re a lot like your team already, so you get this homogeneity in your team.

[12:11] So to some degree would it be fair to say that when a company is looking for a cultural fit, are they almost acknowledging their bias and saying “We’re going to accept that as part of the process”, or am I misreading that?

Lindsey Zuloaga

I think some people have made that argument. There’s articles written about the issues with cultural fit, which are – you’re just opening the door for bias. I wouldn’t go that far to necessarily say that that’s exactly what’s going on. I mean, I do understand the concept, but it is very tricky, and humans are probably gonna be a part of the hiring process for a long time, so it’s something that we need to try to deal with.

Daniel Whitenack

I’m thinking in my mind right now in terms of like, okay, we know that humans are biased in terms of these ways that we have mentioned; we know that we can subtly introduce bias into our machine learning and AI models via a representation in the dataset, and in other ways… I’m just wondering, as kind of human AI developers, what chance do we have of fighting this bias, and how can we have hope to actually do something better?

Lindsey Zuloaga

I think a big part of it is just becoming aware. As data scientists, I think we’ve spent a lot of time just trying to optimize the accuracy of our algorithms, and kind of not thinking about bias or fairness at all. As I’ve studied algorithmic fairness more and more, I’ve found that it’s a more nuanced, tricky topic than you might assume.

If you look up, there’s a recidivism model - this started a whole conversation that’s called Compass, and it was this recidivism model in Florida, where they tried to predict the chances that someone would re-offend after they were released from prison. When you looked at the data, actually blacks had a higher false positive rate, so they were marked as being at risk when they actually didn’t re-offend in the training data at a higher rate than whites. That algorithm was trained to optimize accuracy, but because of different base rates in the data, this was a side effect.

This whole thing spurred a really interesting conversation around fairness and how to define it, and the upshot is that basically there’s many different notions of what makes an algorithm fair, and with most real-world problems it’s impossible to satisfy all of them. So it makes things tricky for data scientists, and we actually need to consider what notions of fairness matter the most for our particular problem.

Another example - and I think marketing is a really interesting space, because it relies a lot on demographics… An example of a situation to think about is if you’re trying to predict who would click on a data science job posting, an ad for a data science job, the algorithm could look at a bunch of browser data and say “Users who look at female type things online are less likely to click on that ad”, and end up making an algorithm that doesn’t show it to any females.

It’s a really strict notion of fairness to say “We need this to be shown to the same percentage of men and women.” That’s obviously pretty strict, because there are more men that are interested in the ad and would click on the ad, so the marketing company would lose money, but it’s maybe realistic to aim for something else, like “We just want the same true positive rate, so out of the people that are interested, the same percentage of men and percentage of women saw the ad”, for example.

Those are the kinds of things, and there’s a lot more detail beneath that, but those are the kinds of different notions of fairness that I think you need to take into consideration when you’re building an algorithm.

We’ve kind of dived right into doing it from the algorithm, and I guess I’d like to see if we can differentiate a little bit between what a traditional job assessment process looks like and how HireVue is approaching it algorithmically at this point, and what are the things that might be the same for companies going from one to the other, and what are some of the things that might change for them, and how do they prepare for that.

Lindsey Zuloaga

[16:14] Sure. Yeah, a lot of people are familiar with this traditional job assessment, which is often multiple-choice tests, and they’ve been around for a long time. They are the result of trying to make the process more consistent. Some of the drawbacks are that they are close-ended… So you have multiple choice, but none of those choices describe you.

They also can be kind of a bad candidate experience. Companies care a lot about that; they want people to come in and have a good experience… Even if they didn’t get the job, they don’t wanna damage their brand by having this awful experience, so those assessments can be long, and make that experience negative, and they also give results like personality traits, and the connection between personality traits and actual job performance is loose, or it’s maybe kind of made up by a person… So assuming we want a salesperson to have these exact personality traits is sometimes not validated.

In our process, like I said, we train straight to performance. Like I mentioned before, we try to get objective performance metrics, and that could depend on the job what exactly that means.

An example of the salesman that you talked about - there is a stereotype that people have about what is a salesman, what’s that natural-born salesperson look like, personality-wise, and that usually has a picture that is the stereotype in our head… Are you essentially trying to take those stereotypes out of the process by validating which of the metrics are applicable for that job, versus what we can see from the data is not?

Lindsey Zuloaga

Yeah, sure, and I think sometimes that does happen, that humans have an assumption about what is gonna make the perfect person for this job, versus what is actually in the data… I think a lot of times those notions are overturned by looking at actual performance data.

Daniel Whitenack

One thing that I’m thinking about here is it might be – you already mentioned the example where you only have the one example of a female software engineer who went through and maybe performed one way or another… Is it hard for you, as you’re thinking about being objective in these ways – I imagine in some cases it might be hard for you to actually get the data that you need to be objective. Maybe when you’re first working with a company you don’t know the performance information of how the people that they’ve hired in the past have performed in a subjective way; how do you go about establishing that data that you need as the foundation?

Lindsey Zuloaga

A lot of times that’s a process. A lot of companies don’t have really strong performance metrics. We have a team of IO psychologists (industrial-organizational psychologists) who go in from the very beginning and help our customers kind of get set up. If they’re existing customers, they might already have their own interview and their own questions, but ideally, we kind of start with them from the beginning - what is important to this job? We do a whole job analysis. What do you want to measure? What are you looking at? And our IO psychologists have a lot of experience with knowing which questions to ask to actually tease out that information.

It’s interesting, there’s questions like “Tell me about yourself”, which are good warm-up questions, that don’t actually differentiate people very well at all, whereas questions that are about a situation, like “What would you do if this happened? You have this difficult customer, and some detailed scenario - how would you act in that situation?” Those questions tend to be better at differentiating top and bottom performers.

[19:59] The hope is we go in from the beginning and design the interview, we design the process of how we’re going to collect performance data… As you guys know, machine learning algorithms do rely on our training data being kind of representative of who’s coming in the funnel, so we wanna see a distribution of people… Sometimes gathering enough data is a challenge though, so we have continuous monitoring of our algorithms. I can just say a little more about that… after we release an algorithm, we’re always watching for how it scores different groups of people, and making sure that it’s not treating different groups of people in a statistically significantly different way, if that makes sense.

Daniel Whitenack

Chris, that was something I know we talked about in our last news updates thing, Google recommending through their AI – I forget what they called it (AI guidelines), to always be continuously monitoring for those biases, and everything.

Yeah.

Lindsey Zuloaga

Yeah. I mentioned before when I’ve done research on fairness in AI and bias in AI, there’s a lot of problems that are really difficult to solve, because the features that you’re looking at, the inputs to your model, that actually do matter for the thing you’re trying to predict, have different base rates in the data. An example would be like if you wanna predict who should be given a loan or not, you’d need to look at credit score and income, but credit score and income have different distributions among different age, race, gender groups, so it’s really hard to get away from that coming into your model.

In a way, we’re really lucky because we are only looking at this job interview. We don’t do any kind of facial recognition, we don’t find out who this person is and try to scrape the internet for more information about them… We’re not throwing in a bunch of data that we don’t understand, we know exactly what we’re dealing with… And the way we take our video interview data and structure it is intentionally made to obscure some of the things that we don’t wanna know… Like, we don’t wanna know your age, race, gender, attractiveness; we wanna know the content of what you said, how you said it - tone of voice, pauses, things like that - and your facial expressions. Those are kind of the three types of features we pull out and structure. So we’re already kind of blinding the algorithm to demographic traits, but one thing to be aware of is that, you know, if there’s bias in the training data, sometimes those traits can leak through, somehow… For example, maybe you have an algorithm that was trained to be sexist, and it will notice some little difference in how men and women speak in the dataset. If that’s the case, this continuous monitoring is really important, to see how the algorithm is behaving in the wild, and if it does have any issues, like it’s scoring men and women differently, we can go back and say “What are the features that are even telling the algorithm who’s a man and who’s a woman, and then remove some of those features. So we’ll do a mitigation process. We are in the situation where we have a lot of features, so we can afford to throw some out; if they’re contributing to bias, we simply remove them. In doing that, we might lose a little bit of predictive power, but we mitigate that adverse impact.

We’re also lucky in the sense that our rules are very well defined by the EEOC (Equal Employment Opportunity Commission). So there’s federal laws about how assessments can need to behave, so we follow those very closely. Basically, the rules say that if you have some kind of a cut-off, like “people who score above this score continue on to the next steps, and if you score below, you’re out of the running” - at that cut-off, no group can be scoring less than 80% or four-fifths of the top scoring group. We have to follow those rules - that’s U.S. law - and making sure that our algorithms are not treating people differently. If we ever see anything, we can go through this mitigation process.

[24:07] Okay. Coming back out of break, I have a question for you, Lindsey - what types of things cannot be covered well algorithmically? Starting with that, and then where do humans fit into the equation? You noted at the beginning that you thought humans would be in the equation for a while to go, for a long time, potentially, and I’d like to understand where they fit in and how the human and the algorithm work together.

Lindsey Zuloaga

Yeah, definitely we’re not taking humans out of the loop any time soon. I always laugh when I try to talk to Siri and she does a terrible job understanding what I’m saying…

No kidding…

Lindsey Zuloaga

And I think “Oh my god, and we’re worried about these robots taking over?” [laughs] There’s still so many things that humans are a lot better at. I think the important thing is that AI will be taking over the mundane, boring things that AI can do well, while humans still need to be a part of making personal connections, making final decisions and taking in other information that might not be available to the AI. For hiring, that does, one the other side of the coin, mean that bias will still be a part of hiring… But we’ve found that even removing bias from a chunk of that hiring funnel can help people get through to later stages that they might not have originally. We’ve had customers say they’ve increased their diversity by 16%, or give us these great metrics… If this initial stage of the funnel is open to more people, they tend to get further along in the funnel. So definitely for the slice that AI is taking over, we hope to remove that bias.

Daniel Whitenack

One of the things that you mentioned is monitoring around fairness, and I was wondering - it seems like you have to develop a certain culture as a data scientist or as a data science or AI team to really make that a core part of a goal on each one of your products to monitor for fairness, and all of that… I was wondering if you could briefly talk about how you went about developing that culture on your team, and maybe make some recommendations for those out there that are kind of thinking about “Oh, well this is something I’d really like to do on our team, but I maybe don’t know where to get started, or how to develop that culture.”

Lindsey Zuloaga

Yeah, definitely. I think for us a lot of it came from our IO psychology team, and being in the assessment space. Starting from there – like I said, we have laws around how our assessment scores people. Our particular assessment happens to include AI. We’re coming into this space - the job assessment space - that had been around for decades, so we got a lot of those ideas started there, and then it’s kind of blossomed more and more as we’ve studied… There’s a lot of academic study going on around this, and we collaborate pretty closely with some researchers here at the University of Utah, who study algorithmic fairness.

Like I said, what constitutes fair is not well-defined, so it’s usually something that needs to be discussed and refined for every individual problem. I would suggest a great place to start - IBM just released something called “AI Fairness 360.” You can go on their website and just play – I played with it a little bit with just some Kaggle data, and they show a lot of these metrics that I talked about, these definitions of fairness, and you can see how those things are related to each other and you can possibly mitigate bias.

Another recommendation I have just to illustrate the concept is – Google did some research; I think if you look for “attacking discrimination with smarter machine learning”, there’s an article with an interactive portion, where you can play with this fake data, a credit score, and you’re trying to predict who would repay a loan. This is something I mentioned earlier. It’s a great thing to play with and see how there’s trade-offs.

[28:10] In real-world situations there’s really not one way to do things where you could satisfy all notions of fairness, so you’re always dealing with these trade-offs, and I think that’s something that’s good to look at. Again, this really varies from problem to problem, depending on your inputs and how different your base rates are and how much you rely on inputs with different base rates to predict your outcome.

Daniel Whitenack

Keeping things practical, because this is Practical AI, I’m finding all of this really fascinating, and I was wondering if you could just walk through – do you establish, maybe based on looking at some of this Google work, or IBM work, kind of figure out some metrics that at least make sense to track first? And then how are you tracking them? So you’re making predictions with your model, and then are you running those metrics on the predictions? Are you running them on the training data that you’re feeding in? What exactly are you monitoring and what’s the process? You put the metrics in place and then you kind of send notifications to people to review them, and who reviews them…? I’m interested in those sorts of details.

Lindsey Zuloaga

Yeah, like I said, the notions of fairness that we look at are tightly tied to employment law, but we also look at other things as well, and we’re always interested in being ahead of it. I think it’s kind of common that people assume data scientists don’t care about this… We’re really giving it a lot of thought, and we’re always looking for different ways of looking at it, and seeing how we can improve certain notions… But again, we always come back to the regulations in the employment space as being our most important base to cover.

I mentioned the four-fifths (or the 80%) rule for us, which is something we closely monitor… And you did ask before about training data, versus kind of how the algorithm is behaving in the wild - we’re always watching that… Like, “Here’s the customer’s cut-off score. They are watching job interviews for everyone who scored above this, and maybe first, or maybe they’re not watching the lower scores at all… So what are those ratios of that cut-off? How are men scoring compared to women? How are the different races scoring?” If we ever have an issue there, continuous monitoring is really important, because we start off with a training set of maybe hundreds and hundreds of interviews, and there wasn’t a lot of diversity, possibly there was a group that were small and it was hard to see with all the noise how the algorithm is treating those groups.

So watching how the algorithm actually behaves in the wild is very important, as well. We’re always watching those numbers, and being proactive about coming to our customers and saying “Hey, we need to mitigate your algorithm.” Obviously, we also mitigate at the beginning, but if we ever see that we need to mitigate after the algorithm has been out in the wild for a while, we will do that.

Daniel Whitenack

Have certain things surprised you as you’ve done this monitoring? …like biases or things pop up where you thought you did a really good job preparing the algorithm, but it turns out you didn’t, in some way or another?

Lindsey Zuloaga

Yeah, most of that probably – if there’s any bias that comes in later on, a lot of that is because your training group just wasn’t very diverse. That is something that we see when maybe there are very few people of color in this dataset, or maybe there are very few women… Like I said, it was really hard to tell with just the training data that there was some feature that was allowing the algorithm to mimic bias in the data, but it becomes apparent later on, and we have seen that… Usually, not too badly; usually, we’re pretty on top of our monitoring and we don’t see anything too drastically different than we expected.

Daniel Whitenack

[32:11] Cool. I’m thinking maybe we can transition a little bit here to the machine learning and AI community in general, and maybe outside of hiring… Are there trends in the community around how we’re developing AI that concern you around the topic of fairness? And then are there maybe other things that are encouraging? Maybe these projects from IBM and Google, for example.

Lindsey Zuloaga

Yeah, I think the conversation – IBM’s 360 toolkit is an awesome example of how this is coming into the conversation and people are talking about it. For the last few years I’ve sometimes been frustrated by the alarmism that goes on in the media, kind of calling out situations where data scientists did behave really irresponsibly, or just absolutely didn’t think about repercussions… And it’s hard as a data scientist who does care about this and works on it a lot to not get a little defensive when you’re stereotyped… But I think there are some legitimate concerns, and there are a lot of books and articles about algorithms gone wrong, and kind of showcasing these kinds of examples. I think it’s good that that conversation is out there; in some ways it scares people, and they kind of make assumptions that all algorithms are bad, which can be frustrating.

From the hiring point of view, I talked about how broken hiring is, and I really feel like we’ve made huge improvements, where with an algorithm we can actually look inside the algorithm and say “Okay, what features are causing this bias” - you really quantitatively see how the algorithm is treating different people, where it’s a lot harder to do that with human beings. Human beings don’t even know why they made the decisions they made; you can’t open up their brain and figure out “Oh yeah, you’re a little racist, and that’s why you’re doing that. Let’s just tweak your brain and account for that.”

So we have these tools that are amazing, but like any powerful tool, they could be good or bad. I think we’re reaching a point where people are having these really important conversations about using them responsibly.

Talking about bias in these ways, we’ve had various conversations across different episodes with people doing all sorts of different types of work, and it really seems that you have a great process now on how you’re approaching it with the monitoring, and with the feature selection, and trying to make sure your data fairly represents where you wanna go.

In a broader sense, beyond just the topic of hiring, we have so many people that listen that are faced with similar challenges. Do you have any more generalized recommendations that you would make to a data science team that is trying to get the bias out of their own circumstances, or rules of thumb to utilize on that, that is kind of broad-based and simple for them to follow?

Daniel Whitenack

I know I’ve seen, for example, a checklist come out. I don’t know if those are useful around your data and your process, and all of that.

Lindsey Zuloaga

Yeah, like I said, it’s hard to define what fair is, and I think you have to sit down and have a conversation with a lot of input about what you care about in this problem, and being transparent about it. If you’re not just trying to get a higher prediction accuracy, be clear that we care about these notions of fairness, and this is what we’re doing, this is what we’re measuring, and this is what we’re doing to mitigate… That’s something that’s just been really useful for us, because we were doing this for a long time and not really talking that much about it. We were getting criticized when people assumed that we were being careless.

[35:59] I think now this conversation has started and people are being really transparent, to be really open about it and say “Hey, what we’re trying to do is difficult. These are the notions of fairness that we care about and that we’re trying to optimize, and we’re open to have conversations about that, and we’re open to changing that.”

I think everybody understands that machine learning can be very powerful, and if there isn’t clear answers, we wanna have a conversation about what we’re trying to do with it.

One of the things that we’ve noted before is we’re still in the very early days in data science, especially if you compare it to software engineering, who has been maturing for decades now - and I’m kind of talking about the AI space specifically… But do you think that this period right now where we’re all grappling with bias is a kind of growing pains that we’re going through, or do you think this is going to be inherent from now on? Is it always something that we’re gonna contend with, or do you think we’ll have better tools going forward to tackle it?

Lindsey Zuloaga

I think kind of both… I mean, I do think it’s a growing pain; I think in 5-10 years way more data scientists will be well-versed in fairness, and understand that it’s a part of their job and it’s something they need to think about, but at the end of the day it’s like any complex topic. There’s always gonna be different opinions. So because there’s not one clear answer, I think there will always be debate about what an algorithm should be doing, and this is a great example with the Compass model, the recidivism model that I mentioned.

At the end of the day, there’s no agreed upon way it should behave, because different notions of fairness - to satisfy them you sacrifice another, and there will always be people that have their opinions about what the most important notions are. So I think it will be something that’s controversial going forward.

Daniel Whitenack

I know that I have definitely appreciated your perspective on this, Lindsey. It’s been super-enlightening to me, so thank you so much for being on the show. Are there any places you’d like to point people to to find you online, or certain resources or blog posts that you’d like to highlight.

Lindsey Zuloaga

Sure. I’m mostly just on LinkedIn, Lindsey Zuloaga. That’s where I’m probably the most active.

Daniel Whitenack

Awesome. Well, thank you so much for being on the show; I know I’m really looking forward to seeing more of the great content that you put out and the great work that you and your team are doing, so thank you so much.

Lindsey Zuloaga

Thanks for having me.

Thanks a lot!

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art