Recently Chris and Daniel briefly discussed the Open RAIL-M licensing and model releases on Hugging Face. In this episode, Daniel follows up on this topic based on some recent practical experience. Also included is a discussion about graph neural networks, message passing, and tweaking synthesized voices!
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another Fully Connected episode of Practical AI. This is where Chris and I keep you fully connected with everything that’s happening in the AI community. We’ll take some time to discuss the latest AI news and dig into some learning resources to help you level up your machine learning game. I’m Daniel Whitenack, I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who’s a tech strategist at Lockheed Martin. How are you doing, Chris?
Doing great today, Daniel. How are you?
Doing well. It’s been a pretty fun week. Last night I got to speak (virtually anyway) at the Utah MLOps Meetup… So that was pretty fun. I had a few connections there, and a lot of good questions and such came out of that. It’s interesting to see meetups happening again, but also still embracing this, like, bringing in virtual speakers thing… So it’s still hybrid in that sense, because if you’re a local meetup, you’ve got to bring in speakers, and it’s a lot easier to bring in virtual speakers… And I think it worked out pretty good the way that they had it set up.
That sounds interesting. Since you mentioned it, I’m speaking at a virtual conference next week. Through COVID, I had taken more or less a long break after doing way too much conference talking in the years leading up to it. It’s a national security conference…
…and I’m going in to talk about AI data and software in the context of national security, intelligence and defense.
Yeah, I’m looking forward to it. It’ll be the day after this episode is released. I will put a link in the show notes in case anybody wants to hop in; I believe it’s free to attend, so if anybody wants to do that… And we’ll see what happens there.
Yeah. We’re also gearing up quite a bit, because in December is EMNLP, which is the sort of biggest natural language processing research conference; one of the main ones, but I think considered the main one. And that’s in December, in Abu Dhabi, and I’m going to travel there with a couple of colleagues… So that’ll be one of the bigger excursions I’ve taken after COVID time; excited to be there at EMNLP. We had a paper accepted, so I’m excited to present that there, and also hear from the rest of the community.
[00:04:10.29] Actually, it’s interesting, because some of what we’ve released with – what we did release, and what we are releasing with this paper at EMNLP, we had to think a little bit about like licensing around those things… And I don’t know if you remember - I forget which episode it was recently, we were talking about these OpenRail licenses for the models.
So the idea that, okay, for data you have maybe like Creative Commons, or licenses like that… For software, you might have Apache, or MIT, or GPL, or whatever it is… But models fit into this weird space where they aren’t quite either of those things, so how do you license them…? And people at various groups have tried various things, but there’s an effort called RAIL, Responsible AI Licenses. I think if you just go to licenses.ai, you can learn more about what they’re doing. But we made an attempt at – some of our benchmark models that go along with our paper, we made an attempt at creating one of these RAIL licenses for the release of those, and that was quite an interesting experience. I don’t know if you remember the tenets of what goes into a RAIL license at all. Do you remember that?
I remember – I say this halfway tongue-in-cheek… I remember thinking, “I’m in the wrong industry for this, because I may need to blow things up with inference.” If I recall, they don’t want me to blow things up. So that stuck in my head.
Yeah, yeah. So there’s really this – I think the biggest thing that people can keep in their mind with these responsible AI licenses is that there’s usage restrictions. So Chris, it just because there’s a RAIL license doesn’t mean that you couldn’t blow things up, or run a drone or something with your model.
But it would mean that if they put that in the restricted use clauses of the license, right? So how this works is there’s a main bit of the license, and it says various things about copyright, and patent, and warranty, and all these things that you might see in a normal license. But all of those things are subject to the terms of the license, and the terms of the license are subject to this clause at the end of restricted usage. So for example, the recent release of Stable Diffusion used an OpenRail license, and they put in restrictions around like “Don’t use this for misinformation, or to harm others” or various things like that. So they recognize there’s dangers with these models, and they want to put some barriers around that.
So that’s the idea with these RAIL licenses, is that it’s a way to distribute and make your model available with certain guardrails around usage. I don’t know, what are your general thoughts on that?
I think we need that, and that’s part of the maturing of this industry. We had that for a long time with those other types of intellectual property, in terms of things like software and such… I remember, as we did that episode not too long ago, it felt like a good time for that to happen at this point, because it was one of those questions people are having. And by the way, I should just, for the sake of being responsible, let people know that I’m not blowing things up, nor is my employer asking me to blow things up. They don’t do that, they don’t ask me to blow things up.
Sometimes we use sarcasm on this show.
We use sarcasm on the show from time to time, but I thought maybe I should specify that I’m not blowing anything up at all. So there we go.
[00:07:59.17] Yeah. I think in our license, we balanced a couple of things around the restricted use that probably would restrict your usage of our models in certain cases, Chris, not because of like military usage necessarily, but commercial usage. So one of the things that we wanted to do internally because of how we have sourced our data, and the fact that this data actually came from local language communities - so actually, we trained our models on books that were written by local language community members, and they released those books under certain rights, some of those being non-commercial licenses. And so we wanted to make sure that we both honored that, but we also released the models… Because what can happen sometimes with language data is like big companies could use language data from language communities to make money without any real benefit going back to those language communities. So that was partly also in our mind with this license, and so we put in our restricted use two things - one, a restriction around commercial usage… This would be like in our thinking around what benefit goes back to the language community, but then secondly, putting in there a restricted use around uses that are particularly discriminatory against indigenous peoples. So you could use indigenous language data to discriminate against indigenous people, right?
That’s a good point, yeah.
And there’s actually a nice clause in the UN statement on indigenous people, where they talk about discrimination against indigenous people. And so we kind of pulled a reference to that into our restricted use. So I don’t know, it’s our try at this. It was an interesting exercise to actually try to put this into practice and figure out like, “Okay, we talked about this on the podcast, but can I actually create one of these licenses for my own models?” It was an interesting exercise.
So I really liked the fact that you were thoughtful enough, for instance, on the question of discriminatory practices against indigenous folks, to think about that and make sure that was in. But another thing that I was wondering as you were discussing that was - going back to the start of this particular topic a moment or two ago on the show, there’s software there, there’s data, there’s the model, there’s all of these intellectual property overlaps between different types of IP, and they’re relying independent. Did you have any question in your mind as you went through the process about whether a license from one type of thing such as software could clash with a model license? And did you have to think about that and resolve that a little bit?
Yes. So there was this idea that the data that we were using - in this case it was from the Bloom Library, which is a product from SIL where people can create their own books online. So each book, the author releases that under a certain creative – well, not all are Creative Commons, but the majority are Creative Commons licenses.
[00:11:18.18] And so we had to look into whether the models that we were creating off of Creative Commons data would be subject to the same sort of restrictions as the Creative Commons data that we were training it on. And so there’s various writings within – you can actually look up, Creative Commons has some commentary on this, of when certain things are derivative works or adaptations, and that sort of thing. In our case, the models that we trained off of this data, whether it’s surprising or not, according to how we read those, were not derivative works, and so wouldn’t be restricted to the same sort of license. However, what we tried to do was we tried to match the restrictions of the original data just in good faith to how people might have expected that data to be used… But I think technically, we had more latitude there.
No, that sounds good. And I’m not surprised that you and your folks were doing that. I would hope that everybody out there in the larger community would be thoughtful in that way about it. It’s interesting, we’ve talked so many times about having these different constructs blending; having software blending with the data, blending with the models now, and getting them out, but we just haven’t spent a lot of time talking about the legalities of how to do that and how to honor those across the format. So… Good to hear.
So Chris, I don’t know if you remember - not that long ago, in one of our recent episodes, which we can link in the show notes, we had Josh from Coqui on.
We even made some clones of our voices, and that sort of thing… That was a lot of fun. Coqui is doing amazing things in sort of open source speech technology, and really enabling a lot – actually, we’re using a lot of their libraries in our own work. But they had an announcement that I thought I’d share in terms of the news side of things, which is you can now join their waitlist and get access to what they’re calling their Voice Studio Audio Manager advanced editor features within their system… Sorry if I’m not getting the names right. But it’s pretty cool. I don’t know if you remember when he was on the episode, but he talked a little bit about how they were thinking about managing the sort of tone, emotions, expressions of synthesized voices more flexibly… So you don’t just get sort of one synthesized voice, and it’s either monotone, or having the same expression throughout; you can actually match different portions of your content with different kinds of expressive qualities…
I remember him talking about it, yeah.
And that’s what this voice studio does. And there’s some pretty cool things where you can actually look at different words, and the different phonemes in those words, and adjust some of these expressive features - emotion, and pitch, and mixing different voices together as well, like to create a mix of synthesized voices. You can do all this within this advanced editor, which seems really powerful.
I’m really looking forward to using it, but I’m a little dismayed aid that I’m currently in number 6,466 in line to receive it. So it may be a little while before I received the joys.
Well shout-out to Josh; if you’re if you’re out there listening, you can bump Chris up the waitlist… [laughs] But yeah, I think it’s really interesting, where it’s one thing to produce a synthesized voice, it’s another thing to have multiple voices, maybe in a video that you’re mixing down, and mix voices together, change the expressive qualities… Almost like working with synthesized voices like people do with computer production of music, right? …where you can change things, and mix things together, and all of that very fluidly.
I want to ask you a question that’s very specific to the work that you’re doing on a day to day basis - for working with indigenous populations and their languages and stuff, what are some of the ways that you think this will change that going forward, or add to it, that you guys have been talking about? What’s the future look like for someone in your line of work on that? I’m just curious about that real-world aspect.
[00:16:11.24] Well, yeah, there’s definitely the side of this which is probably the more commercial side of it, which is media production, and that sort of thing… Let’s say that you produce a video in one language, and you’re wanting to do the dubbing across languages, or something like that… Or maybe even you’re using like an avatar and using synthesized voices in your video, and the whole thing is synthesized. I mean, that’s happening quite a bit right now as well. And so the ability to bring in multiple voices, and do all that without going into a recording studio - of course, that has huge applications for like advertising, marketing, media, production, entertainment, all of those different areas, which is where I would guess, and I can’t speak to Coqui’s business model, but I would guess that their tooling is quite applicable across those areas.
For local language communities - of course, they’re also involved oftentimes in the production of media or content for their communities, so that also relevant there… But I think there’s also unique things that are relevant in those scenarios. So imagine that you’re part of an indigenous community, a local language community, and you’re kind of marginalized by the national government, or discriminated against in one way or another… It might be a big ask for your community say, “Hey, could we put up 100 hours of content with your voice, and maybe your likeness?” That’s potentially painting a target on yourself, right? When you’re associating yourself, and you’re the face of that community.
So I think it’s really interesting that there’s tools like this, where you could create high-quality voice and expressive voice that’s maybe synthesized, and not someone’s – maybe even styled transferred, where it’s not someone’s voice that can be tracked to a certain person.
But then if you think about then combining that with the video elements… So think about having a video that maybe is recorded with someone talking - you can use things like Stable Diffusion and other things now to actually shift that video and obfuscate the identity of the person in that. Now, the more nefarious use of that, of course, would be misinformation, and deepfakes, and that sort of thing. But there is a very positive use of this for these sorts of communities where it is important that they want to produce media content for their community, but if you’re marginalized or discriminated against… It’s interesting now that there’s these tools that are accessible, and have really nice user interfaces, and accessible to community members where they could actually produce some of that content themselves. So it’s an interesting dynamic.
To your point, I’m just thinking about in my world a little bit, and thinking about the fact that – you know, as we’re recording this, in the current day and the current months, the war in Ukraine, in which the Russian invasion of Ukraine has been going on, and Belarus is also part of that Russian effort… And I was reading this morning an article about some dissidents that are trying to a) survive, that situation, and b) escape, and try to help, and do the things that their conscience is dictating. And going back to some of the ideas that you just enumerated about marginalized populations, indigenous populations, and being able to kind of find some protection while generating content - I could imagine that in this as well.
[00:20:01.04] I mean, you saw it sort of way in the past with online hackers, and when they would like hack SeaWorld or whatever, they would release a video, “We are Anonymous”, or whatever, and it would all be synthesized voices, right? Because you don’t want someone else to put your voice on that.
That’s right. And don’t get me started on animal protection, because… [laughter] It will stop being an AI podcast, and we will just go off into a totally different realm.
So be good to your animals, folks. That’s our little sideline right here.
Yeah. One question that was actually brought up to me last week, which I think is kind of an interesting question for people like us that do produce content… I mean, this is just our voices; we’re recording our voices on this podcast. But let’s just imagine that Coqui, or their voice studio - it’s great enough that we can just… I mean, we already have a lot of sample of our voice, right? If we can create really nice voices… You and I could just type out a script back and forth, and when we’re traveling, we could “record” a podcast and just mix our voices together with content and release it. How does that sort of thing strike you? Because I got into this conversation with someone last week, and there was some sort of mixed feelings about what you lose when you do that, or what you gain when you do that…
So that’s a great point that you make there… And who knows, maybe we both have tough schedules… And for listeners who don’t know, for Daniel and myself, this is a passion project. So who knows, maybe there is a moment of tight schedule for us where we do exactly that.
I think the thing we would lose is that there is the element of the unexpected in our conversations often, and a lot of banter back and forth that’s not scripted, completely unplanned… I know people think that we plan every word out, but we don’t. And so maybe that would be lost. So you might get the information you wanted to share out, but you may lose a little bit of the human element behind it.
Yeah. The context for the conversation I had last week was there was someone who does produce video content specifically… And so their face is sort of part of their brand, right?
And so the question was, “Well, if we dubbed your video into another language, it would make sense –” You know how everyone hates the thing about dubbed video where the lips don’t match the voice, right? And you can sync it up pretty good in a lot of cases, but ultimately, what you could do is just modify the person’s face and lips to match the dubbed…
They recorded the video in English, but now we’re dubbing it to Chinese, and we match up their lips using some sort of video manipulation. They reacted very negatively to that, because they’re like, “My face is like my brand”, right? Like, “I don’t want anyone messing–” They actually said they prefer the dubbed content, because their original expression of how they express themselves in the video was what was important to them. So yeah, it was interesting…
I’m going to challenge that. I’m going to suggest to you that in the not so distant future not only will that exist, but when you’re getting on – you know, we’re all through COVID, and certainly continuing post-COVID, we’re all on video calls all the time. And so I’m going to suggest that that’s going to be one of those killer features that one of the video call providers is going to do, and that is not only doing translation in real time, which I think is entirely possible in the not so distant future, but using some of these technologies we’ve been talking about in recent episodes to do exactly that. Because especially if you’re using their service quite a lot, which some of us are, then they also have a thorough dataset to train on. And I think that with the video and everything, I just think that’s going to be very doable not too far down the road, and I think it’ll also be able to be done live. So I think it will be part of what we do. I think you and I will find ourselves doing that before long, is what I’m suggesting.
So Chris, one of the things that we had just started talking about before we started recording was you were asking a few questions about graph neural networks, which I’ve thought have been interesting for quite some time… And I think you ran across it in some NVIDIA post, or something like that, right?
So NVIDIA has a blog that’s widely read. They blogged - as we record this, it was actually just two days ago - on the 24th of October 2022. But they had a blog about water graph neural networks. And it occurred to me - I was glancing through it, and there’s a fair amount of stuff that I’m familiar with in it, and there was a few items there that I hadn’t thought about. But it occurred to me that we have kind of touched on graph neural networks quite a number of times on this show, without ever really diving into it. So maybe there is a Fully Connected episode where we do a full show to dive into the detail… But it made me start wondering a little bit about how many of our listeners out there are using graph neural networks, and some of the use cases… I saw something the other day about starting - and this is very typical for some of our conversations, is where you’re putting together multiple kinds of deep learning approaches to try to get something new. And we’ve had a lot of shows talking about that. But before I go on, have you had any opportunities to use graph neural networks yourself?
We recently did train a graph neural network for a question answering task… So one of the areas where people have applied this is to this task of automated question answering, where there’s a text prompt, and you’re looking for the answer within some set of documents, or something like that. So I did dive a little bit into that. And to be honest, I’d love to learn more. That was definitely an interesting experiment. As I was diving into that and learning what does it mean to have a graph neural network - well, there’s certain approaches, and for maybe graph neural network people out there that are experts, maybe I’m simplifying this too much… But there seems to be a cluster of techniques that are focused on representing graph-structured data as sort of flat form, or a matrix or tensor form… And so there’s ways to embed a graph, or learn an embedding for a graph in a sort of flat form. There’s also methods that exploit the structure of the graph neural network, or the graph itself, which I think is what I think of when I think of graph neural networks.
[00:27:40.07] So one way to, I guess, think about it is, if you think about a convolutional layer, you’re running some kernel or filter over your image, or your set of inputs, but what you’re doing is you’re always considering one data point in the context of a fixed number of other data points. Even if you’re running your filter over in some various ways. It’s sort of one data point in relation to a number of other fixed data points. And actually, transformers, or recurrent neural networks, and this sort of thing also behave similarly, right? You’re comparing one data point in reference to maybe a sequence of other things, but which have a fixed sort of structure.
And what’s interesting about graph neural networks is that – the graph neural networks that I’m thinking, which are built around these concepts of message passing, consider one data point in reference to sort of an arbitrary structure of other data points. And what happens in these graph neural networks is that you have a stage where you take an embedding for one node in your graph, and you look at all the neighboring nodes, or maybe a certain number of neighboring nodes, but all the neighboring nodes that fit within a certain structure, and you combine or concatenate or perform a function over the combination of the embeddings for those nodes and the embeddings for the node that you’re considering. And so what happens as you apply this across all of the nodes of your network, you actually pass a lot of information between all of the nodes of your network. And if you iterate that then, then the idea is like all of this messaging and information is transferred from all of this different complicated graph structure to the data point under consideration. And so oftentimes, this involves this sort of message-passing and iterative approach, which is quite interesting, and has been applied in a variety of ways. One of the ways, of course, that may be very well known is AlphaFold, which is one of the protein folding approaches.
We had a show about that not too long ago. So one of the things - as you’re looking, in your line of work, at large language models, and we’re looking at graph neural networks and how they can merge, and you’re talking about the flat structure, and I know NVIDIA talks about the unstructured nature of the message passing within the graph itself compared to other neural networks that are a lot more structured… Can you clarify for a moment, what does it change in terms of how you’re approaching large language models when you have this flat, unstructured node approach where you’re doing the message passing and you have an arbitrary number of them that are there? Does it dramatically changed the workflow for when you’re working on those largely language models, or is it similar?
Yes, so a lot of times, the large language models take a very naive approach to how text is structured. So most language models now work around something called subwords, which means I have a piece of text, I’m going to split that up into sub components, but not necessarily words; you could tokenize something into words, but the problem with that is how do you know how to tokenize what is a word and what isn’t a word? You’ve got all these sort of weird structures in language. The other thing is, if you tokenize into words, what happens when you see an unknown word in your input? And so what often happens is you figure out what are the most frequently occurring subwords across my corpus of known language? Maybe like your name, Chris, there’s a subword, chri, and then you tack on an s subword, and that forms your name, right? So if you figure out what are those frequently occurring subwords, that’s how you split things apart, and that’s how you sort of look at the attention across these different subwords in a large language model.
[00:31:57.17] But that’s somewhat – I mean, it’s statistical in terms of how you get those subwords, and it’s useful, but with language in general, language is much more structured, in many cases, like a graph. And so to do this sort of large language model approach works well and it’s scalable, because you don’t have to know as much about the structure of your input… But if you do know more about the structure of your input, you can do maybe really powerful things with less data. So for example, if you know all of the parts of speech of your language, like “Here’s my noun subject, and it’s connected with a verb via a node in the graph, in this way”, and you draw out all the tree structures in the syntax of your language, there’s tons of information encoded into that structure, that you lose when you just treat words like a sequence of subwords, right? And I think there’s arguments that these large language models do learn some of that structure. There’s some work out of Stanford on that. But the way that language is structured, I think the hope would be that if you’re creative about encoding this linguistic information into your model, and then maybe using something creative, like a graph neural network, maybe you can do more things with less data, or you can do really powerful things, or you can be more robust to changes, and that sort of thing.
I see. Good explanation.
And yeah, if other people have input on graph neural networks, or have used them in certain ways, definitely let us know. One of the other interesting ones that I know just doing searching when I was looking into graph neural networks is from Pinterest. I guess the system that does recommendation of items for users within Pinterest is – their real-time system for recommendation is built on some sort of graph neural network called Pixie, which… Yeah, if any of you are on that team out there, and you want to come on the show and talk to us about it, we’d love to hear more. But you can look up, they do have a paper about it, and all of that.
Absolutely. I know that in the NVIDIA article that we were talking about, they mentioned LinkedIn does the same. I would imagine that other social networks do as well, quite honestly. That seems like a very logical fit in terms of trying to get that functionality.
Yeah. Well, as we get near the end here, Chris… I mean, we always try to share some sort of learning thing, and as I was learning about graph neural networks, I’ve found several interesting resources, but one which - if you’re into sort of paid courses, this one seems to be quite full of great information about graph neural networks. So this is from Zak Jost. He actually has a YouTube channel called Welcome AI Overlords. But he’s really into graph neural networks. I think he has worked in a variety of big tech places, and he has a full course of introduction to graph neural networks. If you just go to graphneuralnets.com, pretty simple link, then you can see he has introduction to graph neural networks, foundational theory of graph neural networks, basics of graph neural networks, and the basics of graph neural networks is free. So at least you could get like the sense of the graph neural networks from the free version.
Well, I may very well be a student on that course, and maybe some of the other stuff out there as well. I have a very specific doing graph database work that I’m working on in my day job, without going into specifics, and I can totally see how one of the associated problems with that could be solved by graph neural networks. And so I think it’s time for me to get a level up, and I encourage our listeners that are maybe not already in that to go consider it as well.
Sounds great, Chris. Well, it’s been fun as always. Good to connect our two nodes via the edge of Practical AI, as always, so…
Oh, boy. Oh, boy. That was like the AI version of a dad joke right there.
Alright. Well, on that note, Chris, we’ll see you, before I make another joke…
Okay, no worries. Have a good one, Daniel.
Our transcripts are open source on GitHub. Improvements are welcome. 💚