AI for Good: clean water access in Africa with Chandler McCann from DataRobot (Practical AI #89)

All Episodes

Chandler McCann tells Daniel and Chris about how DataRobot engaged in a project to develop sustainable water solutions with the Global Water Challenge (GWC). They analyzed over 500,000 data points to predict future water point breaks. This enabled African governments to make data-driven decisions related to budgeting, preventative maintenance, and policy in order to promote and protect people’s access to safe water for drinking and washing. From this effort sprang DataRobot’s larger AI for Good initiative.

Changelog++ members support our work, get closer to the metal, and make the ads disappear. Join!

43 minutes
Recorded May 4, 2020
Published May 11, 2020
Download (41MB)
Transcript
🎧 9,665

Featuring

Chandler McCann – LinkedIn
Chris Benson – Website, GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

DigitalOcean – DigitalOcean’s developer cloud makes it simple to launch in the cloud and scale up as you grow. They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99% uptime SLA, and 24/7/365 world-class support to back that up. Get your $100 credit at do.co/changelog.

The Brave Browser – Browse the web up to 8x faster than Chrome and Safari, block ads and trackers by default, and reward your favorite creators with the built-in Basic Attention Token. Download Brave for free and give tipping a try right here on changelog.com.

AI Classroom – An immersive, 3 day virtual training in AI with Practical AI co-host Daniel Whitenack. Get 10% off using the code PRACTICALAI10. To learn more and purchase tickets go to datadan.io.

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.

Notes & Links

📝 Edit Notes

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Welcome to another episode of the Practical AI podcast. This is Chris Benson speaking, I’m a principal AI strategist at Lockheed Martin, and with me as always is my co-host, Daniel Whitenack, who’s a data scientist with SIL International. How’s it going today, Daniel?

Daniel Whitenack

It’s going great. It’s already been a busy Monday, with a lot of prep for training stuff that I’m doing, and also training people and training models; that’s been my day so far. So that’s a pretty good day, I guess.

I’m not surprised. You are a ferociously busy person, as I have known you… Between your classes, and your day job, and the podcast… And I know your wife has a business which you help out in…

Daniel Whitenack

Yeah, actually over the weekend we were rearranging stuff in her factory there, to make sure when some essential people come back, that they’re six feet away, and all that good stuff… So yeah, it’s been a range of things over these weeks, which makes things interesting, that’s for sure.

Gotcha. Well, I’m here in Atlanta, Georgia, and we have officially opened up from sheltering in place, but I’m more cautious in that, and I expect we’re gonna keep doing it for quite some time… But I’m looking with envy at neighbors who are having parties at this point, so… I’m afraid to go over, but we’ll see. Hopefully everyone stays well.

I wanted to talk a little bit today about AI For Good topics, and –

Daniel Whitenack

Yeah. Good timing, this is great stuff.

I know, I know, and we have a pretty awesome guest for that. With us today is Chandler McCann who is the general manager of DataRobot for federal. Chandler, welcome to the show.

Chandler McCann

Thanks, Chris and Daniel. Thanks for having me, great to be here.

I was wondering if you could give us a little bit of background about yourself, and how you came to be general manager at DataRobot, before we dive into the topic today.

Chandler McCann

Sure, I would love to. About myself - like a lot of data scientist, I’m a recovering engineer. My undergrad was in Material Science Engineering, and I spent a few years in the Flash and DRAM manufacturing space as an engineer. From there it kind of flowed into statistical consulting, largely focused on the department of defense, and that’s where I really fell in love with data science and got exposed to more modern machine learning techniques. After that, I pursued my masters at Berkeley, and during that time I became an employee of, at that point, a fairly young DataRobot.

My time at DataRobot began as a customer-facing data scientist, and over the course of the years have evolved to lead various teams, including the AI for Good program, which I still oversee, and now our federal practice today.

Daniel Whitenack

[04:14] So just in terms of DataRobot and the world you live in - I know when I was starting out in data science it was definitely very much the Wild West. There wasn’t a lot of platforms or systems to manage your data science work and other things… And I remember distinctly going to conferences frequently, but then there was a certain year where I remember there’s all these companies sort of popping up in this space; DataRobot was one of those early ones I remember popping up and just kind of being consistently present in the data science world. Maybe just mention the premise behind how DataRobot got started and what you do.

Chandler McCann

Sure. DataRobot – and I actually found DataRobot through one of those conferences, at Strata in 2016. That’s a funny story… But DataRobot was born from our co-founders, Jeremy Achin and Tom de Godoy, who were early Kagglers, in addition to working in the insurance space… But during their Kaggle days, they realized that they could benefit from automation. They could automate a lot of the tasks that data scientists were doing; not to replace them, but just streamline the workflow.

So following some funding, DataRobot and automated machine learning were born. That was back in 2012. Today, DataRobot is a full, end-to-end enterprise AI platform, and that means really going from the whole value chain of data, from “I have an idea, and a raw, unstructured dataset”, all the way through model building and monitoring a deployment, and even consumption, into automated applications. DataRobot provides a very high level of automation, all the way across the spectrum. So it’s been a lot of fun to be a part of that ride.

Daniel Whitenack

And what do you see – I’m kind of curious as to what you see about the receptivity of data scientists and AI people towards automation maybe now, as opposed to when you joined in 2016. It seems like – you know, there has been a shift that I observed, but I was wondering, what are your conversations about automation, how do they typically go down when you’re talking to a data science team, or to even a software engineering and data science team combined? What’s the feeling about automation these days, and what should be automated, and maybe what shouldn’t be automated?

Chandler McCann

Yeah, that’s a great question. I think, to your point, it’s definitely shifted. I think there was more resistance earlier, the 2016, 2017 phase, where there weren’t that many full-on enterprise AutoML platforms… But I don’t think the world’s ever really looked back from automation. When you get anything from iPods, to digital cameras… It’s hard to reverse that, and I think that data scientists are coming to grips with “This is not something that’s replacing me, but it’s something that’s augmenting my workflow.”

So while DataRobot certainly has a complete GUI-based platform for our users to work within, for our advanced data scientists - they mainly interact with our API in Python or R. And that just scales a data scientist in a way that’s not feasibly tractable otherwise. So a data scientist can quite comfortably build and manage and deploy potentially thousands of models by themselves in a way that manages risk and is still interpretable. I think that’s appealing to most data scientists [unintelligible 00:07:39.24]

I had noticed that prior to you moving into the federal practice, that you had been a practitioner - and probably still are a practitioner - as a data scientist, and you kind of built up your career that way. I’m curious, as you get into federal, what have you discovered about that? Obviously, working for Lockheed Martin, I have an interest in federal, so I was just curious about your perspective as you’ve moved into that role.

Chandler McCann

[08:06] Yeah, I think within the federal space there’s a lot of opportunity, and I’m intrigued by helping the federal government, and particularly the Department of Defense, leverage automation in a way that unlocks the potential of AI in an organization. I think a bottleneck has been the ability to get human talent. Requiring data science talent has been historically really competitive, and that can be a challenge for the government… So tools like DataRobot really help up-level an organization. That is one way I see us helping a lot. And two, just being able to solve problems that were typically very hard and intractable before.

DataRobot, in addition to just time series, classification, and regression also handles visual AI problems now, and computer vision issues. So being able to bring that to [unintelligible 00:08:55.18] in the broader marketplace, in the federal government - it’s been fun to watch.

Daniel Whitenack

And how does someone working on these sort of federal problems at DataRobot - how do you get routed to this sort of AI for Good effort? Is that something that was a specific passion for you, or it came up unexpectedly, or how did that happen and what’s the story behind that initiative?

Chandler McCann

That’s a great question. When I joined DataRobot, I was working large on the commercial team at that time… And I had had a relationship with the Global Water Challenge, which is a non-profit based here in DC. I had met them while I was at Berkeley. Their mission is to help invest and manage investments in large-scale water projects across the developing world… So at DataRobot I brought them on and talked to my CEO Jeremy about bringing them on as a pro bono customer, and he was very supportive of that.

Following the work we had done together, the AI for Good program was really born. I remember vividly being on a plane, coming home after a trip to Sierra Leone with Brian Banks, our customer at the Global Water Challenge, and receiving an email from my CEO, saying “What if instead of doing this project by yourself you had a team to help you do that? What would that look like?” From there, I was able to take all the lessons learned, gathered from my discussions with Brian about the challenges at non-profits and NGO space, and build a program around that to address those.

And for those who aren’t necessarily familiar with the Global Water Challenge, could you talk a little bit about what that is in general, before we fully dive into how you were interacting with that? Just so that people who haven’t heard of it before have a reference point.

Chandler McCann

Sure, yeah. So the Global Water Challenge is a non-profit based here in Washington DC, with the mission of helping bring water to communities in developing nations. So roughly one in four people around the world are dependent on non-traditional water sources like hand pumps or wells. The Global Water Challenge sets out to help direct investment to countries in need.

Daniel Whitenack

So are those investments/projects seeking to upgrade those water systems, or…? When you say it’s not available, I guess – what I’m trying to work through is it sounds like part of it is maybe clean water and good sources, and part of it is just access at all… What sort of projects do they work on?

Chandler McCann

They have a wide portfolio, but the ones that we’ve been focusing on with DataRobot have been around typically new construction and the rehabilitation of water plants. For example, you have a community that may not have an infrastructure for running water, so they may look to direct investment to drill new wells, or repair new pumps for people in these communities.

You’ll have villages that don’t have access to water, and they will help direct investment with other NGOs or large corporations, to either drill or build new water points.

Daniel Whitenack

[12:00] And I guess this problem has probably become increasingly evident, even more so over these recent times… Because of course, disease spreads in various ways, but if people aren’t able to access water for cleaning, and all of those things, then I’m sure it further exacerbates many things related to disease spread, and health, and a lot of different things.

Chandler McCann

Yeah, absolutely. Particularly now, with the importance of handwashing, having access to clean water has never been more important. So yeah, the downstream effects of not having clean water in a community are huge when it even comes to impacts in equality, in education, as well as just overall disease. Being able to move the needle on that has huge impacts for an economy… And I think one place that was really exciting about the global water challenge and what drew us to work with them was their focus on data.

Around the world there are hundreds of thousands of these water points, but there was no centralized repository… And Brian from the Global Water Challenge set out to build the first standardized and normalized database for water points around the world.

Daniel Whitenack

Chandler, you just started talking about the focus of data that Global Water Challenge has… I assume that that focus on data and the data that they’ve gathered in this repository is central to the solution that you’ve built for them… But I was wondering if you could maybe describe that data a little bit more. What does it represent? What is the scale of it, and what sort of information is included in that data?

Chandler McCann

Absolutely. So the Global Water Challenge and Brian Banks, the person driving this project - they set out to build a standardized database of water points around the world… And the reason they did this was because the water points kept breaking. Around the world, after a few years of being installed and having such a positive impact on the community, these water points would break… And they had no idea what was going on and why.

Daniel Whitenack

When you say water point, you’re meaning like a tap, or water main, or well, or what’s included in these?

Chandler McCann

Typically – so when I say water point, these can refer to a few things, but they generally fall in the categories of a well, or a tap, or a rain harvesting system where they can either get groundwater or purified rainwater.

Daniel Whitenack

Gotcha. So this repository includes information about where those are at, or what sort of information is included about those water points?

Chandler McCann

Exactly, yes. As I was saying before, the challenge we were trying to solve was why do these water points - either wells, or taps, or rain harvesting systems - break. So the dataset that he set to normalize includes things around the location of the water point… So there’s cell phone applications, they can take a picture, capture a lat/long and geolocation; so there’s also image data, to some degree.

Then there’s information on the source of the water, as well as the technology… So whether it’s coming from a river, or groundwater, or if it’s a tap stand itself, or a pump. Then it’s got information about the country and region it’s in, as well as the installation year, who installed it… And then some interesting factors, such as “Are the communities paying for it? Is there a management structure in place in the community to maintain the well or water point itself?”

[16:22] I’m curious what kind of solution – as you got involved in this, what kind of solutions you had envisioned that might be able to help them, what was the motivation for you to get involved and for them to work with you (I think you said it was Brian), so how did that get going, and what was the vision that ended up being implemented? Where did that come from, how did it start, what was the collaboration that got all that going?

Chandler McCann

Yeah, so the vision when we started working with Brian was really his. He had had a vision for this data from day one, and this is a person who built the database, he knew they could do something with the data, but he wasn’t sure just exactly what that looked like. Aspirationally, they wanted to be able to predict which water points were going to break in the future, or at least understand which ones were going to break. In parallel, they also wanted to understand if they could identify a priority to these water points; which communities are not being served, where would it make sense to build or construct new water points? Because it’s non-trivial to set up construction of these things in a developing nation.

For me, the appeal was “Wow. There’s this relatively clean dataset on a really interesting problem that’s out there, with the potential for a huge impact.” That’s what drew me to it. Our main focus when we set out was “Given data on which water points have broken in the past, can we predict which water points are going to break at some point in time in the future?”

Daniel Whitenack

Gotcha. It seems like there’s really a lot to tackle there. Non-profits (I’m guessing) typically have resource constraints, so being able to understand where they should put their investment is definitely important. But for this particular first project, in terms of predicting where a water point is going to break, what is the sort of – out of all water points in the database, how many are breaking on any given point? What’s the distribution like here? Is this something that happens fairly rarely, or is it something that happens all the time? Is it more compared to something like fraud detection, where you’re trying to detect something that happens rarely, or what’s the situation like on that front?

Chandler McCann

Sure. It’s more frequent than fraud, unfortunately. The distribution of things that are broken is around 25% on average, compared to 75% functioning.

Daniel Whitenack

Gotcha. And that’s due to obviously what Global Water Challenge is trying to address, just the old systems, and systems that aren’t being maintained, and that sort of thing?

Chandler McCann

Yeah, there are a slew of potential reasons, some of which could be maintenance, some of which could be environmental. Perhaps a water point wasn’t dug deep enough, so you have a well that becomes dried out six months out of the year, during the dry season… So that can have an impact.

There’s geographical inputs, there’s community-based inputs, as well as maintenance-based, failure modes that are out there.

Daniel Whitenack

Gotcha. And so when you first saw this data and what was included in it, where did your mind go in terms of an approach that you could take to solving this?

Chandler McCann

When I first saw the dataset, I was at one time impressed by how standardized it was, but at the same time, digging into it, I realized there were a lot of nuances and challenges, and I’m sure, Daniel, working with non-profit data yourself, you have been exposed to this… Whenever you’re dealing with human data collection, there’s always some challenges that are out there. On our side, a big one was the ability to enter free-form text for the same thing.

[20:03] So there’s obviously a few key pieces of data to solve this problem, particularly what type of technology is it. So is it a pump? What brand of pump is it, for example? Those things all matter. When we started looking at the dataset, there was roughly 1,600 unique values for the type of technology, when we knew that it really boiled down to about 12 or 14. So one of the first problems that we tackled with that was just some basic natural language processing to try to match categories together. And that was something that we had done by hand originally, and today we’re actually automating that process now through the use of Paxata, which is our automated data prep tool. That’s been a big step forward, as we moved from version 1.0 of the solution to version 2.0 in the future.

So back to your original question - my first thought was we need to organize and clean the data, and the second one is “How do we frame this to make it a useful problem down the road?” We had to identify what variables would be really important to this, and a couple things jump out that are available to us, namely location, the age of the water point, as well as the technology and the source, and the community interaction with it. From there, we built our first predictive model.

Daniel Whitenack

Gotcha. And you did mention the problems with data in the non-profit world, and humans gathering this data, which I guess isn’t also specific to the non-profit world… But I know for us a lot of times it’s hard – especially in developing countries, you wonder about like “If I want this data”, but the only access I have to that type of data was data that was gathered four years ago, or something… You wonder about “What’s updated since then?” In this case, how is this data being generated? Is it people just going out into the field and marking down where the water points are, and that sort of thing? Is there actual instrumentation on some of this stuff?

Chandler McCann

Yeah, that’s a fantastic and very important question, and it was subtle in the data… So that was one of the first things I asked Brian, “How does this data come about?” And there’s a couple different ways. One way is from large national efforts, national assessments. This is something that we came across in countries like Swaziland and Sierra Leone. But also, you have manual input by smaller groups, like local NGOs or local non-profits. They’re uploading it.

What was the magic behind Brian’s idea was he was going to build a standardized way of capturing this information. So no matter how they were keeping their own records, they had a common format to upload it in, to maintain these key fields.

I’m curious, some of that data coming in – we talked a bit here so far about the textual data… I remember earlier in the conversation you mentioned something about images as well. Did you have a mixture of different types of data? Was imagery used as input, or was it mostly text-oriented?

Chandler McCann

The image data has always been there. There’s S3 Buckets or Dropbox files that are storing this image data… But we really haven’t leveraged that much until I guess within the last two months or so at DataRobot. At first, our original models contained both text, numeric and categorical data, and as we’ve expanded, we began to integrate image data. Like I said, we’ve released our visual AI platform at DataRobot which allows us to incorporate images into the modeling, which is something we’re currently exploring and is really interesting.

In fact, an interesting use case from that is if I have an image that’s uploaded, but perhaps someone forgot to fill out the field of the technology, can we train a classifier to say that this image is actually a hand pump, or it’s a rainstand? Can we use image analysis as an intermediate step into data augmentation and cleaning?

Daniel Whitenack

[24:08] On that front, let’s say at least in version one of what you did in predicting these water point failures, after you had done some of this NLP and you started getting into thinking about how to predict these failures, what ended up being a good way to do that, or a way that you found out how to do that, and what portions of that data ended up being good predictors of that behavior?

Chandler McCann

Looking at the problem, we realized that there was some country to country variation… But some common things popped out. The data that turned out to be the most predictive across the board was the age of the water point, so certainly its function over time was certainly dependent on – there was a strong relationship to how old it was, who installed it, whether it was from a private, government, or sometimes a non-profit was also predictive in certain areas… As well as also strong local effects. We saw things like the region of the water point having a relationship to life. Places that were far away from the large city in that country may have low access to parts, and in some places they would tend to have a shorter life, all else being equal. Those were some of the things that jumped out at us.

As you were engaging in this process and recognizing some of these constraints that you’ve talked about, was this particular engagement, in this AI for Good charitable approach - was it more or less the same as other data science projects, in terms of you’re still getting data, you’re prepping the data, and running it through your model… Or was there anything in your mind that distinguished it as something onto itself, something a bit different from your typical business scenario that you might otherwise be engaged in? I was curious if they were essentially all the same, or if there was something that made that stick out, from a process standpoint?

Chandler McCann

When we began engaging with the Global Water Challenge, we were just treating them like a regular customer. There was no AI for Good program formulated at that time, they were just being treated like a regular customer, and our job at DataRobot, at the time as customer-facing data scientists, was to enable them to own their own solutions. That involved teaching them how to fish. Working with Brian to help frame his problem better, understand the data with him, and then talking through all the blind spots and gaps in the modeling process that would come up along the way, along with helping him interpret his model.

So in that sense, it wasn’t unique from a process perspective, but it was unique in the level of access to the data that I could get with clients. With Brian’s data, it was a side-by-side partnership. Everything was available to me, and obviously there can be restrictions when you’re dealing with certain private companies, when it comes to the level of access, the data you can get. So that was nuanced.

But I think the takeaway for me and what really helped us when we built the AI for Good program was that if we treat these non-profits just using the same process we do with our customers, they can own and even build these solutions over time themselves, and that was something that was really inspiring to me.

Daniel Whitenack

As we get into how this inspired more AI for Good efforts at data robot, and also your learning from how to work with a non-profit, and that sort of thing, I would love to hear about where this project ended up in terms of positive or negative results, and then how that inspired further – it sounds like things have expanded past that, so I’d love to hear about that story.

Chandler McCann

Where it ended up with the Global Water Challenge was really interesting, and to be fair, it’s still an ongoing story. During 2019 we were able to go to both Sierra Leone and Liberia. First just with DataRobot to Sierra Leone, and the second time to Liberia on behalf of the State Department, where I was asked to be a part of the Water Expert program. I went with GWC to participate in a water data workshop in Liberia.

Daniel Whitenack

That’s great.

Chandler McCann

It was very cool, and it was just an awesome experience from both a data science perspective. Where I’m working over here, I’m pulling data out of a table, to actually going on the ground and meeting with the people who are collecting it, and having conversations with them and trying to communicate the power of machine learning and the importance of the data that they were collecting, and how it could be used… It was just a humbling and awesome experience, all wrapped up into one.

So following these two trips, we had very positive relationships with the government of Sierra Leone. In 2019 the ministry of water kind of reaffirmed their commitment to evidence-based decision, and actually passed a national policy requiring the use of data in decisions about water services… Which is pretty cool, and also, again, humbling. If you think about it, I think the story of this project is the story of the power of data, and what it can do… And if you think about it, in 2018, the use of data at the national level in Sierra Leone to inform decision-making by policymakers, again, with very constrained budgets, was very low. And then in 2019, we do know that working with the ministry of water, they were able to use some of the insights from our tool to inform decision-making and budgeting that year. So that was definitely exciting and a near-term win for us on the project, and that helped shift things out of this R&D phase to where Brian and the Global Water Challenge are in the process of pursuing more funding. If you wanna contribute to them, feel free.

We’re looking to build a much more sustainable tool, that can be deployed to many countries around the world. So it kind of was a launching point for the project.

[32:13] Yeah, that’s a super-cool story. I just wanted to note to listeners that DataRobot on its AI for Good page has a video of you and Brian doing this work together. I wanted to call that out, and we’ll include it in the show notes. It’s just a few minutes long, and I would urge anyone to take a look at it just to kind of get the imagery of what you went through.

I think you surprised me a second ago when you were talking about being able to shift national policy. That intrigues me, because we have the privilege of talking to people that engage in AI for good on a fairly regular basis, and hear about cool projects, but most of them don’t change a country’s policy towards evidence-based action from the work they’ve done… And I kind of wanted to get a sense of how did that feel, when you realized that not only had you had the specific impact that you had by providing the service that you walked into the engagement with, but when you realized as well that you were actually changing the way a country was thinking about using data to affect good for its population, what was that like? I’ll let you answer that, and then I have another follow-up?

Chandler McCann

Yeah, I remember getting the email saying that they were leveraging the insights from our models, and even the simple data visualizations was a huge leap ahead for them. I remember hearing that and being told that it was being used to inform the budgeting process for the following year, and I was just floored. That made me a little – I don’t know if “scared” is the right word, but I was just like “Wow, I did not realize this was gonna be happening so quickly…” But at the same time, it was just another proof point that people were starving for information. You have people that are trying to make decisions that impact a lot of people in their citizen base, and the ability to just synthesize a little bit of information can go such a long way. So I think we’re just on the frontend of the wave when it comes to the way to leverage data across these developing nations for water policy.

Daniel Whitenack

Yeah. Just to interject there, I totally echo what you’re saying, and I know some of the proof of concepts that we’ve done around dialogue systems and chat interfaces in emerging markets – to be honest, the interface has not been that great. It probably wouldn’t fly in the U.S. But people are so hungry for information, accurate, good information in certain contexts, that to some degree that doesn’t matter. Of course, we strive for good interfaces, but I totally resonate with what you’re saying, that it just can be so powerful to be in these situations where you’re creating something that allows a new view onto information that people are so hungry for… So yeah, it’s really cool. Chris, did you have a follow-up?

Yeah, I wanted to reference another thing that you had done… I know you have a blog post called “Why most tech for good campaigns fail, and how we can fix them.” I was wondering if you could address those steps that you take us through and why that works, but also if you have any thoughts about how people or organizations like yours that are trying to do these AI for Good projects might be able to influence or impact policy going forward.

I know it kind of surprised you in this case, but if you have any thoughts toward even how to extend this so that you might get systemic change for the long-term in how a government thinks about this - I’d love to know that, what your advice is in general.

Chandler McCann

[35:52] Sure, yeah. So two parts - how do we look at delivering this in the most effective way for non-profits, and then for companies interested in this space, how do they think about potentially impacting policy. I’ll tackle the first one… So when it comes to why we were observing a lot of these tech for good initiatives - and I hate to say “failing”, but just not delivering the results they intended, a lot of that was just based on experience from Brian and what we heard, and his life at the Global Water Challenge, and his exposure to people in this sector.

The big idea is that you need to partner with these non-profits and NGOs and help them build solutions that they can maintain… And hackathons are all well-intentioned, but it’s probably not realistic to expect a small non-profit without a big data science team to maintain the codebase over time.

Even if they build an app, models get stale; models need to be refreshed. So our idea was – and we happen to have the benefit of a very powerful enterprise AI platform behind us, but just help them understand how to think through their problems… So how do we appropriately identify use cases that matter to them, how do we frame them, how do we think about the ethical considerations for what we’re doing, how do we think about acquiring data for it appropriately, and then how do we go through the iterative model-building process, and then how do we deploy things in a way that are useful to people?

Chandler McCann

The truth with data is that – and I’m of the opinion that value really isn’t realized until it’s consumed by somebody. So we can build models till we’re blue in the face, but until someone’s doing something with it, it may not be super-useful, outside of perhaps insights, which I would still argue is still consumption.

So our process within the AI for Good program today is structured to help non-profits go from “This is my big vision” to “How do we deconstruct this to a machine learning problem?” and then how do we go from “This is my idea” to “This is my deployed model” in a structured way, and then teach them how to learn the process through each of those toll gates.

Daniel Whitenack

Yeah, it sounds like good advice for really any data science program, in my opinion. A lot of those things ring true.

Chandler McCann

Yeah. On the second part, on the policy side, I hope that more companies continue to provide resources to non-profits and NGOs. I think what’s true in this space is that they may be a little bit behind than other private companies when it comes to collecting data and storing it and organizing it… But they’re coming along, and times are changing quickly. So the situation where you have a lot of non-profits with potentially a lot of very rich, interesting data that aren’t sure what to do with it - that grows a lot.

So I would love for other companies to continue to get involved and offer their services, and I would just say that, again, we just have to account for the fact that machine learning models are living things that go stale over time, and we need to help our end users build solutions that account for that.

Daniel Whitenack

I’m curious as we get to a point of closing things out - I know you mentioned earlier getting that email asking about “What if you had more people to work with on this initiative?”, where are things right now with the AI for Good initiative at DataRobot and what are things you’re looking at in the future?

Chandler McCann

Yeah, so that was very exciting, and I can’t thank Jeremy and the team enough for supporting us to build out this vision… But June of last year we sort of launched our program and opened up our application process, and we actually got applications from ten countries on five continents, which was pretty exciting for our first year of the program and its inception.

[39:52] Since then we’ve been working with six non-profits. Kiva, which is obviously a large lending platform in the non-profit space right now, so helping them predict the likelihood of which loans would go unfunded or not. DonorsChoose, helping provide supplies to teachers and classrooms, Anacostia Riverkeeper here in Washington DC, helping them forecast E. coli levels in the river, given some sensor data that’s being read, to some very interesting healthcare use cases with the University of California San Francisco, the Zuckerberg Spinal Cord Institute, where we’re looking at ways to use OR data to help improve outcomes for spinal cord surgery… And then finally, working with the University Hospital of Mannheim over in Germany, where we’ve been forecasting top World Health Organization causes of death, as well as predicting patient mortality given people that come into the hospital.

So those are some of the use cases we’re working on now, and we’re excited to just continue to broaden the impact to people we’re working with, and keep the program growing.

Fantastic. Chandler, thank you so much for coming on the show and telling us about DataRobot and the things you’re doing or have done with the Global Water Challenge… It was really fascinating and inspiring. Thank you for doing that work, I really appreciate it.

Chandler McCann

Absolutely. Thanks for having me on the show. It’s really been fun talking to you, and let me know if you guys ever wanna catch up again.

Will do. Thank you.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art