In this special episode, we interview some of the sponsors and teams from a recent case competition organized by Purdue University, Microsoft, INFORMS, and SIL International. 170+ teams from across the US and Canada participated in the competition, which challenged students to create AI-driven systems to caption images in three languages (Thai, Kyrgyz, and Hausa).
Click here to listen along while you enjoy the transcript. 🎧
Hi everyone, this is Daniel, coming to you with a slightly different episode of Practical AI this week. Recently, Purdue Microsoft, INFORMS, and a few others put on a case competition, which included student teams from across the nation, around 170-something teams, all working on a shared task related to image captioning. This is a task where an image is input to a model, and then the job of the model is to output a text caption corresponding to that image. And I had the privilege of getting to be one of the judges for this competition, and I took the opportunity to interview some of the sponsors and the participants in the challenge.
Also, it was really fun, because the competition used some of our data, the data from SIL International and the Bloom captioning dataset, which includes image captioning data for a lot of languages, but specifically this competition focused on image captioning in Thai, Hausa and Kyrgyz. So I hope you enjoy the discussion of this challenge, and here we go.
Welcome to a very special episode of Practical AI. This is Daniel Whitenack, I’m a data scientist with SIL International, and this is a very special episode, because I’m here at Purdue University, judging a really interesting case competition, Data Analytics for Good, that’s sponsored by Purdue University, Microsoft, SIL International, my organization, and INFORMS, and I’m here with Matthew Lanham, who is the Academic Director of the MS Business Analytics and Information Management Program (BAIM), which we’ve had the privilege of getting to know each other over the past few years, and it’s really cool to collaborate and judge this competition. Matthew, could you tell us a little bit about it and how it came about?
Sure, yeah. So as the Academic Director for the Business Analytics program, my job is basically trying to make sure our students are involved in analytics and data science competitions. And over the last seven years, since we’ve had this program, our students have won or placed in many of these national competitions. So we’ve got a really well-established brand and name out there, and we thought, “Hey, why don’t we create our own national data analytics competition? And let’s do something that’s for good, not necessarily just focus on trying to make money.”
Yeah, yeah. So tell us a little bit about the actual problem that the students are working on, and maybe a little bit about the mix of who was involved in the competition from across the nation.
Sure. So the actual problem is sponsored by your company, SIL International, and basically, what they’re trying to do is use natural language processing to do image captioning, which is not a trivial task, by any means. So when we put this problem out there to the students, they’re like, “Oh my gosh, what is this?” And the great thing is it’s not something that you would see in a traditional NLP course, this kind of problem, and there’s been a lot of great learning involved.
Overall, we had 172 teams across the nation apply and register for the competition, and there were 36 universities that were represented. Two of those were outside the United States.
Wow. That’s really great. Yeah, the competition, so this image captioning - it’s been cool to see, because recently, SIL put out this dataset around image captioning… And it was convenient timing, because about a week or so later you reached out and said, “Hey, we’re running this cool case competition. Do you have any cool datasets to work on?” And that worked out really good. I think I’ve heard students all learning a lot about natural language processing, but also the world’s languages. So when you try to do image captioning in Thai, for example, there’s no spaces, and you can’t tokenize words with just like spaces… So just even realizing things like that has been quite interesting to see for students. And I know I’ve worked sort of halfway through the day of judging at this point that we’re recording, and I’ve already been surprised and encouraged by a lot of the solutions.
One of the other sponsors of the competition is INFORMS, which I’d love people to know a little bit more about what INFORMS is generally, because it is a vibrant and large community. Can you tell us a little bit about what that is?
Absolutely. So the INFORMS stands for the Institute for Operations Research and Management Science. It is my favorite professional organization. So it’s been around for many years, and we used to call it Operations Research and Management Science, but now we refer to a lot of the stuff that we’ve done for years as analytics and data science. Within INFORMS, there’s also a certification program called the INFORMS Certified Analytics Professional, or CAP. So I’m a CAP, I was one of the first CAPs many years ago… And basically, the whole idea with the CAP is you don’t have to be a technical person to get this cap, or you don’t have to be just the business person. It’s really for everybody, and the whole idea, how it came about was the people at INFORMS worked with business professionals from all different areas of analytics and data science operations research, to identify what are the key kinds of tasks that you would do as a professional. And what they did is they came up with basically seven domains; business problem framing is the first one, than analytical problem framing, knowing your data, methodology or approach selection, model building, deployment and lifecycle management.
[06:16] And if the audience hears those things, they’re probably thinking, “Hm… That sounds a lot like CRISP-DM, that we all heard about in school, or at some point following a process. And that was the thing, is a lot of times when we’re working on these problems, you’ve got to follow a process. And they kind of extended that CRISP-DM framework, and there’s just a lot of tasks within there that we hope people are aware of, and think about when they try to develop solutions in practice.
Yeah, and we’ve been utilizing that framework in the judging here at the competition, which has been really, really useful, I would say, to consider these different elements of the process. How would you say, after working with teams in this process for years now actually, like thinking about these different elements - how do you think that kind of rounds out someone’s view of solving an actual business problem with AI, or analytics, or data science in ways that maybe is sometimes neglected in a lot of just sort of process, when you step into a problem…? What are the main areas that you think kind of stretch students, but then as they go into the professional workspace, how do you think that sets them up for solving real-world problems?
Great question. And this is why I just love INFORMS CAP, and why we make our students follow the INFORMS CAP when they do projects with companies, is because you’ll see a team that maybe they’re the data science team, the real technical team, and they love to get into the nitty-gritty details, and there’s absolutely nothing wrong with that. But at the same time, you need to know your audience, and I think just following those seven domains, those seven domains, for INFORMS CAP is important, because before you even get into the nitty-gritty details of your problem, you need to be able to say, “Well, what is the business problem here? And then how do we frame it? What are the possibilities of framing it into an analytics problem?”
So that’s the front end of the thing, right? And then you’ll get in the middle part, which a lot of the stuff that we would talk about as data scientists is the data, the methodology, the model building, all the stuff that we really like to do… But then the last part of it, the deployment and lifecycle management - that’s so key. Right? So that’s when you get into architecting, and developing the pipelines, all the stuff that I know you’re an expert in… It’s so key.
So basically, that’s what the INFORMS CAP is doing, is to say “Hey, let’s lay all this out to make sure that when we architect a solution, we design a solution, we try to create a solution to our problem, that we’ve thought about all these things, and we haven’t missed anything along the way.”
Yeah. Well, thank you so much, again, for helping organize this, Matthew. It’s been a pleasure, and I really just appreciate your work on this, and also your work with INFORMS.
Thank you, Dan.
I’m not sure exactly how the funding happened. It was just about Azure, and the [unintelligible 00:08:52.04] So that’s actually a really important point. So one of the interesting things about this competition that’s different than all other competitions that my students have participated in, is we designed it where there’s three phases. And phase two is where they actually work on the problem provided by Dan’s company, but the phase one is we wanted to actually provide some training on cloud services, which a lot of people in industry, they know -you’ve got to be familiar with the services if you’re going to architect a solution and put it into practice.
So Microsoft offers free training on their Azure AI… Basically, everything’s free if you’re a student, and they also offer students free practice exams and certification vouchers, which is amazing. So we told them, “We could get a whole bunch of students to participate in your training events if we can kind of piggyback off of you”, and they said, “Absolutely. We love this.” So that’s what the students did in phase one - they had some training for Microsoft professionals, some of them even set for certifications… And then in phase two, the goal was to try to apply some of those web services for this particular problem.
[10:02] The last phase, phase three, is when the top teams that perform the best in the Kaggle competition would come on campus, present their solution, and then show how they could follow the INFORMS, the seven INFORMS CAP JTAs to architect their solution. That’s how it all came about.
Awesome. Yeah. And that definitely brings us right into the Microsoft involvement with this competition, which I’m also pleased to have with us Mark Tabladillo, who is a cloud architect with Microsoft. Yeah, thanks for being here, and being part of the competition, and also Microsoft’s involvement in this.
It’s been interesting, as we’ve seen some presentations already, even to hear how students are sort of making that realization about “Hey, I’ve been working on my laptop, solving maybe like data science toy problems in my courses, or something like that… But I got to this problem.” And even one of the student groups said, “Hey, I bought more RAM for my laptop to try to solve the problem”, but then they were like, “That’s not doing it”, so then they started thinking about cloud services.
So how do you think, Mark, as – I guess my question is, as students and maybe people getting into the field are kind of making this realization around the resources required to solve actual business problems, what are those ways in which they can kind of start dipping their toes into cloud, and experiment with things to kind of expand their horizons in terms of what’s possible without knowing about – maybe people don’t know about Docker and Kubernetes, and all that stuff yet, but they want to start dipping into more resources… What’s a good way for people to kind of get into that, as you’ve seen these students kind of do that?
Sure. And I think there’s more resources available more than ever to do self-learning… And it was maybe 10 to 15 years ago where it was very common to go to a bookstore and find these big, thick books on Microsoft –
With animals on the front, and such?
Yeah, maybe… You know, they would be training books for technologies. And of course, the publishers are still out there. O’Reilly still out there, producing books, and I have a friend, by the way, who’s coming out this month with his book on practical machine learning and AI, Jeff [unintelligible 00:12:17.05] and I’m so proud of him to write a new book. But the point is that so many things are online. And in the Microsoft ecosystem, there was a time when you had to pay to even get the proceedings from a conference. Like you didn’t even go, and you couldn’t even get the recordings. Well now, Microsoft’s making a lot of that available for free, and in a way that people can find it. And so I think for the audience of this podcast, I’d love to have them look on YouTube, what’s available there; Microsoft’s got a few channels of content on there… And that’s a good way to get started. Sometimes they’re short sessions, between 5 and 10 minutes, sometimes they go to an hour. But that’s definitely one way to get started.
Yeah. And could you help us – I think it’s good for people to kind of organize certain categories in their mind. I’ve heard students talk about like the Microsoft Azure kind of studio environment, and then there’s other things, like these cognitive services and managed AI services… So how do these things differ, and how might they be used, or how have you seen them being used, either in this competition or other places?
Okay. So the unifying thing is either - and you can pronounced either Azure, or Azure; both correct, both correct…
Good, good. It’s good to have the definitive answer on that.
This is a definitive answer for all time. I tend to use Azure, but just out of habit. The unifying factor is Azure Active Directory. So that’s the main authentication path going into an Azure subscription. And the subscription itself is - think about it like a credit card. If you sign up for a subscription, you would have to put your credit card on there to pay the bills.
[14:04] Again, another free path - Microsoft offers a lot of things… We have free subscriptions, and we even send our customers to go get them. And I even tell my customers – so you know, it does run out. So I say “Well, go make a new email at outlook.com and then just make a new one, alright?” We want you to get hands-on, because there’s no substitute for experience. And that’s even kind of the point of this competition. You know, you can study it in the book, you can do class exercises… But two things are true; first of all, putting into practice, and working as a team. And that’s what we’re doing in this competition.
So back to your earlier question… Now that a team may all join the same subscription, the Azure Machine Learning Studio is our flagship technology for machine learning. And it can run on regular CPU or GPU instances. We have regions available around the world. I happen to work in the federal space, so we also have specific clouds just for that use… And there’s different sovereign clouds now. And Microsoft’s now beginning to build out specialty clouds on top of our regular clouds.
And also a call-out too for students - we have a lot of promotionals for students, where they get things at discount rates… And also nonprofits; nonprofits will get special treatment inside the Azure ecosystem. But let me go back to the original question. So there are two focuses that people will have; one is machine learning, and it tends to be thought of now as building a model. That is the way to think about the products. AI is the marketing term for all our technologies now. So if you go to the main website, everything is AI. However, inside the Microsoft technology, there are cognitive services. And those are considered mostly APIs. They’re REST APIs, and they’re already pre-trained models, and they do certain things. And we’ve seen some of the teams here at the contest, they’ve been using maybe computer vision, or text-to-image, or image-to-text… You know, those types of technologies are already out there.
Now, Microsoft’s not unique. Other vendors have these. And Microsoft is now behind the scenes supporting all open source technologies. Some come pre-built inside Machine Learning Studio. Like, we have a certain version of a Python kernel that will run inside there…
And another technology - okay, so going back to the Azure Machine Learning Studio, we have doubled down on ML Flow as the way we are organizing our workspace. So on the new version of the API, that is the path forward, and that is open source.
And could you just describe a little bit what that is, ML Flow?
ML Flow. So it is a way to organize experiments, and training, and models, and model deployment… And it has its own syntax in terms of vocabulary and API, the way to work. And Microsoft’s not alone in using ML Flow. There’s other vendors that use ML Flow in their technology… But it is a way to organize kind of the technology and the assets. And Microsoft decided to make that native to our own API, and how that works.
Great. Yeah. And I think one of the things – like, as we’ve been in the presentations here today, I think when I was in grad school, I programmed, but it was like, I’d use MATLAB, or Python, or whatever, and I did some things… I had no concept of how infrastructure worked in industry, or the whole thing about doing programming in Academia - it’s just not always a parallel to industry. So coming from the Microsoft perspective, I’ve found it really encouraging to see students saying things like object store, or model registry, or like these things, and thinking through the architecture… I know that’s one of the things in the INFORMS, they emphasize that sort of model deployment and model lifecycle management.
[18:07] So yeah, do you have any words of encouragement for maybe those listeners who are, again, getting into the space, or maybe they’re students, in terms of getting hands-on with actual infrastructure that people use in industry, and how that benefits kind of your understanding of how to create value with what you’re producing, rather than just creating a cool model?
Right. So let me call out a few things… Now, some people aren’t so lucky to be even admitted to Purdue. And if they were, I would certainly want to come to a program such as here, and you can participate in all these cool events. But short of having that either undergraduate or graduate experience, Microsoft has – and this is what I believe is the front door; we have something called AI business school. It is a series of courses that show how to tie in the value of AI in a business context. And a lot of the videos were done by our own leadership, and they’ve shown how we’ve used AI inside the Microsoft business. Now, it’s not intended to be a catalogue of all possible ideas, but it does kind of cover the landscape of – you know, along the lines of the INFORMS domains, it covers the landscape of “We have a challenge. Here’s how we’re going to use data modeling, and then put it in production, and here’s how we’re evaluating use.” And it just gets people started. So it’s something I do recommend to our customers, because people do have different roles… And even one we’re working on internally, we’re now rethinking through who are the personas of people who touch data projects. We talked about the domains, but we’re going now to thinking about “Alright, so who’s that person? What does that person do? Are all modelers the same?” We don’t think so. So we’re now beginning to think that because we’re now serving a large internal community inside Microsoft in terms of our programming.
So that’s the first thing I think about, is the AI Business School. And then also in terms of getting started, we have inside all our technologies, we have tutorials and samples to get started; those sample datasets, notebooks that run quickly… They don’t take hours, but they show you a variety of things and they’re guided toward specific outcomes. They begin to get better. I mean, I’ve seen Microsoft examples and how they have grown in the last 15 years, and they’re just getting better and better, because they’re getting better minds thinking about it… But I’ll also call out – you know, Microsoft also is always looking for partners that want to share their stories. And we have a lot of case studies of companies doing things in different industries, whether it’s for profit or nonprofit… I’ll call out one example - we work with the Metropolitan Museum of Art to digitize their entire holdings. Now, I don’t know if they did 100%, but they had the same challenge as a lot of art owners, and that is not all their collection is on display. And some researchers want to have access to those products. So that’s an example of - you know, anytime we do work with major organizations, we put those ideas out there.
But more practically - and these may even be helpful for students or users - we have what’s called the Azure Architecture Center. And inside, we see many architectures very similar – by the way, we’re only looking at the top presentations, but we are seeing architectures presented, and it’s the type of thing that I do in my own work. The architectures will be in there, the diagrams, and also the case study of what it did and kind of what the use case is. So it gives people, again, a catalogue of different ideas of how do you use the different resources that are available. So between all that, it’s a lot.
Yeah, thank you so much, Mark. We’ll definitely link both to INFORMS, what they’re doing, to Purdue and the BAIM program, and to these resources from Microsoft in our show notes for the podcast… So make sure and check all those things out. Thank you again, Mark and Matthew, for what you’re doing on this, and looking forward to hearing the rest of the presentations.
Thank you, Dan.
Alright, well, I’m here with the winning undergrad team from the competition, the image captioning competition, which is from Butler University. I’ve got Chris Stein, Andrea Marquis and Aaron Pinner with us… So congratulations on winning the undergraduate portion of the competition.
Yeah, thank you very much.
Yeah. So your solution was really interesting. Actually, all the undergraduate presentations, I was surprised, they seem like graduate student work to me… But tell us a little bit - so the task, again, was image captioning. So just tell us one highlight about maybe one of the challenges that you faced in the competition.
So I think that one of the big challenges in this case was about the dataset, because for certain languages like the Hausa languages that we have to work on, the dataset was kind of small… And also, there were not much variety into the pictures… So that was the biggest challenge. So we kind of overcame that challenge by either artificially augmenting the dataset, or adding new pictures to the dataset.
Great. Yeah. And specifically, part of the competition was thinking about different languages where maybe image captioning isn’t supported, and I think one of the things I appreciated about your all’s presentation as well was thinking through the business implications of something like this technology of image captioning, that could enable new or expanded possibilities for local language communities that don’t have this technology… Could one of you comment on maybe what you envision in terms of the impact something like this could make in terms of image captioning for a language where it’s not supported yet?
Yeah. So our idea was almost to create a web app, or a mobile app, that small businesses using like Kyrgyz, or Thai, these small languages could go on this app and submit their photos there. So everyone’s got a cell phone, in all communities nowadays, and if they can utilize that cell phone to almost leverage it, and upload those pictures right then and there and get a caption - that is handy for the small business, and SIL and their mission.
A small business would want to use this because a lot of people are drawn to websites because of images. They click on images on Bing, and Google… So if we can help small businesses, especially if they have a user base with a minority language, that helps both SIL and the company. So really, there’s a monetary win, but really, we’re helping the world in a way, so it’s really neat.
Yeah. And also – sorry, if I can add something… Last night we had a brief talk about the languages around the world… And so also there is not only a business implication to this challenge, but we know that also we are losing a part of our heritage. You mentioned last night - at every two weeks, one language is lost forever. So there is no way that we can keep those languages alive. So if this can also help make the world more open towards those small communities, this is also a good thing also for the world, because it makes the world a more interesting place.
Yeah. Awesome. And maybe a comment from Aaron… As far as this competition, maybe what’s one of the highlights of something that you learned throughout the competition, that you view differently now, either in terms of the technical challenges and that side of things, or the business problem, or something that you’ll carry with you throughout the rest of your work?
Andrea Marquis: Yeah, I think one thing I learned was just – I think the challenge really opened my eyes to this problem that existed. I would have never thought about using AI or machine learning in a way that directly impacts languages. So I think that was definitely something I learned and it was really interesting.
Great. Well, congratulations again. I hope your travels back home are safe, and… Yeah, congratulations. I hope to stay in contact.
Yeah, thank you very much.
Okay, well, I’m with now the winning graduate team from the Purdue Using Analytics and Data Science for Good competition. This team is from Georgia Tech; here I have with me Harsha, Varun, Ravi, and there was another team member, [unintelligible 00:26:58.00] who couldn’t make it to the competition here in-person, but I want to acknowledge her and her contribution… So congratulations, first of all. You all are the first out of - I think it was 170-something teams in this competition to come up with an image captioning model that performs well on three sort of diverse languages from around the world: Thai, Kyrgyz, and Hausa. So first off, congratulations, and I think one of the things that was really interesting to me about your all’s solution is one, kind of looking to state-of-the-art models like CLIP, which was something that featured in your solution… But then also using sort of a multi-stage approach where you actually determined if a caption existed already, that you had in a database of captions; and then if it didn’t exist, generating a caption. So could one of you describe a little bit about how you kind of eventually got to that solution, how you considered using CLIP and got to thinking about that direction?
So I think the problem itself was quite challenging when you look into the dataset and actually see what the data is… You can see poems, you can see philosophical statements, moral statements, parts of stories. And if you want to predict these kinds of statements, you need information about the previous part of the story, or the further part of the story, in order to even build a model.
A zero-shot captioning model – it’s very difficult to achieve a good zero-shot captioning model for such kind of prediction tasks. So the next step that we thought was maybe we could do some sort of classification model. That was the original thought process, that we could select – from a corpus of sentences, can we select a sentence that best matches this? And from there, we started researching, basically, and when we went through Hugging Face models, we found the CLIP model, and then we researched further, we found a multilingual CLIP model that could handle different languages… And it sort of went through that process. And when we actually used it, it was decent. I wouldn’t say it was perfect, but it certainly improved our overall solution quite a bit.
Yeah. So when you were thinking about this idea of looking to existing captions, and using those when you could, how often in the dataset that you were looking at, which is this Bloom dataset, how often did you have to generate image captions, versus maybe looking to a list of captions and using one that pre-existed?
So even in the training data set where we had the images and all the captions that we needed, when we actually used the multilingual CLIP model, it was more about like 20%, 30% that were matching, and we had a very low threshold by that itself… And we didn’t want to lower the threshold, because we didn’t want to get more false positives, in a way. And basically, we just decided on that threshold, we didn’t do any optimization on it particularly.
From there, when we actually used the model on the test set, we suddenly got a huge jump in the score. That was basically it. So we were covering about 20% to 30% of the images. Even when you had all the captions for the images, only 20% to 30% were actually matched with the multilingual CLIP model. All the other images went to the generator model.
Yeah. And you’ve mentioned sort of CLIP, Hugging Face… These are all kind of the industry-standard state-of-the-art sort of things… As a team getting into this problem, what were the kind of challenges that you faced in terms of maybe even finding where to start, or maybe it’s computational challenges, or other issues…?
So when we initially started with this dataset, we were stumped, honestly. We hadn’t even heard of a model that could generate contextual information with as much depth as was required by the solution here. So as I said, we did our initial EDA with Microsoft Azure, using their Computer Vision API and translator model. When we actually used that, we thought “Okay, these are reasonable guesses.” Like, “Okay, a human would make these guesses.” But we had to go deeper. So we had to match – we had other ideas as well. We thought of clustering common images; maybe they belonged to the same story, or they’re part of a single book, or something like that.
So that’s how we got started off… The EDA that we did helped a lot; understanding that they were poems and stuff mixed into the data helped us look for more deeper models that could generate context.
Great. Yeah, yeah. Thank you for that info. So I think as you look forward to – I mean, I know all of you will be going very far with the innovations that you’ve demonstrated here… I hope that maybe when you own billion-dollar startups, that you’ll hire me to sweep the floors in your startup, or something… But how do you think kind of working on a solution like this from start to finish has influenced how you’ll think about maybe AI or data science problems in the future? Any input?
So the amount of good the AI can do [unintelligible 00:32:12.26] SIL does - it’s actually improving a lot of language proficiency among the students, and also increasing the educational rate among the people who are not studying so much. So the amount of good that data or AI can do will definitely influence our thoughts in the future as well… So the kinds of use cases all these things can have on the real lives of the people are definitely going to [unintelligible 00:32:42.20] and we’ll definitely try to contribute wherever we can by keeping this in mind. So this is going to stay with us forever.
Great. Great. Well, thank you for your participation, and congratulations, again. I hope your travels are safe back home.
Our transcripts are open source on GitHub. Improvements are welcome. 💚