While at EMNLP 2022, Daniel got a chance to sit down with an amazing group of researchers creating NLP technology that actually works for their local language communities. Just Zwennicker (Universiteit van Amsterdam) discusses his work on a machine translation system for Sranan Tongo, a creole language that is spoken in Suriname. Andiswa Bukula (SADiLaR), Rooweither Mabuya (SADiLaR), and Bonaventure Dossou (Lanfrica, Mila) discuss their work with Masakhane to strengthen and spur NLP research in African languages, for Africans, by Africans.
The group emphasized the need for more linguistically diverse NLP systems that work in scenarios of data scarcity, non-Latin scripts, rich morphology, etc. You don’t want to miss this one!
Matched from the episode's transcript 👇
Bonaventure Dossou: Before leaving the floor to the ladies, I would like to just second what Just said. I also had the same struggle with Fon, because I started and nobody was working on Fon, nobody knew about the language… And that was also something interesting and exciting, going into a direction where nobody is looking at, and inventing it.
Not to show off, but a lot of people nowadays just quote me as the Fon guy. When someone is talking about Fon and there’s doubt, they just tag me on a tweet, or whatever. And yeah, I envision Just to be the same for Sranan Tongo.
The moral of the story is that you need to get started, because there’s always going to be a point where there’s no data, and someone has to do some little effort. For instance, we have JW300, but what if those people didn’t do anything? We would not even have a starting point.
So I started with JW300, and then I tried to manually scrape with my friends, through Google Forms, and created something like 25,000 sentences, and then out of that, then I’ve been able to bring some proof of concept… And it grew up, and people are now more knowing about the language. It still is not – I mean, I’ve built FFRTranslate with Chris, and people are using it, people aren’t saying anything bad, they are happy. It helps them. Artists, and other people… There’s more awareness, people willing to be more contributing, creating more content. It’s not yet on something like a Google Translate or a centralized translation for those African low-resource languages, or low-resource languages in general, but I hope that’s something that’s going to be coming. So I would just say just start.
Honestly, like my name says, I like adventures. And I like good adventures. So I just like to go where nobody is focusing, and [unintelligible 00:25:24.23] is exciting… Bring something that people haven’t been focusing on to light. I don’t think I would have had the same maybe impact if I for instance started with [unintelligible 00:25:36.12] because that project, the first FFR project that then went on [unintelligible 00:25:40.19] this type of thing, we were doubting whether we should use for [unintelligible 00:25:43.22] So finally, Chris and I, we decided to go for Fon, because [unintelligible 00:25:47.22] has at least some effort done already, but nobody heard about Fon. Nothing was on Fon.
[25:55] Today, there are a lot of papers, people citing the work… It’s been cited in the paper that led to the extension of Google Translate to 20 more African languages, or 24; it’s been cited, and no language left behind of [unintelligible 00:26:08.15]
Also, being part of Masakhane, you collaborate with people like Sebastian, with Julia, with Angela Fan, who work on NLP… So just get started. People will know about it, and then it will just keep supporting. If you don’t have support, just be yourself a supporter, and at some point – you know, when people are seeing the effort, they will definitely then join, and pick it up from there.