Practical AI – Episode #201
Protecting us with the Database of Evil
with Matar Haller of ActiveFence
Online platforms and their users are susceptible to a barrage of threats – from disinformation to extremism to terror. Daniel and Chris chat with Matar Haller, VP of Data at ActiveFence, a leader in identifying online harm – is using a combination of AI technology and leading subject matter experts to provide Trust & Safety teams with precise, real-time data, in-depth intelligence, and automated tools to protect users and ensure safe online experiences.
Featuring
Notes & Links
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who’s a tech strategist with Lockheed Martin. How are you doing, Chris?
Doing very well today, Daniel. How’s it going?
It’s going great. So yesterday was voting day here in the U.S, and I did go to the voting place… And it was interesting, because in line, I could hear people talking about cyber threats to the voting machines, and other things that… So my mind was already actually thinking about these things, because we have a really interesting topic to talk about today, that’s in that same vein. We’re privileged today to have with us Matar Haller, who is VP of data at ActiveFence. Welcome, Matar.
Hi, thanks for having me.
Yeah. And ActiveFence - I’ve read a bit about it, and the website talks about this barrage of threats that online platforms are susceptible to now, which ActiveFence is addressing in various interesting ways, which we’ll get into… But I’m wondering if you could give us a picture… If I’m going to run an online platform of some type, maybe it’s not even – I’m likely not going to start and run the next Facebook, but I might very well start and run some type of software company that provides an online platform to do something… What should be on my mind, and what’s the reality of online threats that I might need to be aware of if I’m getting into that space?
Yeah, so first of all, I think there’s one thing to think about, is that anytime that you have a platform that has any type of user-generated content, whether users are uploading photos, or they’re chatting, or they have comments, or anything that, you’re gonna have tons of data very, very fast, and just it’s primed for people to post wonderful things, but also some really, really dark things, which we’ve all sort of seen and been exposed to.
So one thing to keep in mind is that trust and safety and just basically safety online, it’s not really a nice-to-have anymore. At this point it’s a competitive advantage. It’s kind of a basic expectation, right? So users are expecting it, advertisers are expecting it, parents are expecting it, the public expects it… So if you’re going to spin up a platform, first of all, best of luck, and second of all, you need to keep this in mind from the get-go, before you sort of find yourself down this rabbit hole.
One thing that I think is really important to keep in mind is that although trust and safety isn’t a new industry really, it’s really now finally becoming something that people are aware of. Like I said, it’s this basic expectation. Now it’s not only users, but also regulators and legislators. There’s new legislation coming in that’s making it even more at the forefront… And the basic sort of content moderation that is out today doesn’t really make the cut.
To follow up the second part of your question about what kind of harms are out there… So online harm is really multi-dimensional. We can see it in different media types, we’ve seen it in games, and merchandise sites, chats, texts, video, audio, things that, across many, many different languages… And also different types of violations. So you have white supremacists, and terrorists, and human trafficking, and sort of these really painful sorts of things. It also goes into misinformation, disinformation, fraud, spam, cyber-bullying, and so forth. So it’s this really, really complex space that you need to have a deep understanding of to understand how to address.
And up until this point, you’re talking about content moderation and how it has evolved over time, but it’s still kind of lacking in the sort of traditional sense… What does it look like – I mean content moderation, people might have in their mind, “Oh, I have a blog, and I’m going to choose whether I allow people to post a comment, or I have to approve that comment before it’s posted, or something that.” So this sort of content moderation - in your opinion, where we sit today, is most content moderation sort of reactive at this point? Or how do you view how are most people approaching the problem right now maybe, and why is that lacking?
[06:20] So there’s different levels, I would say, of content moderation. At this point, to just sit and moderate every single comment gets out of hand really, really fast. And so there is some level of automation that started being introduced. The first sort of basic level is, “Let’s go out and look for keywords.” I don’t want any slurs on my platform, I don’t want anyone calling anyone any of the words, you know… I don’t want that there, so I’m going to ban all those. And then people get a little trickier, and say, “Well, what about if we use an emoji?” And so there’s different kinds of emojis, or combinations of them that also can be used in a hateful way. So you can say, “Okay, well, I’m gonna ban those. And I’m gonna ban these specific keywords, and I’m gonna ban these engrams, because these phrases are bad, like “I hate Jews” So I’ll ban that.” And that works okay. But very, very quickly you get to these cases where keywords are either insufficient… So like I said, with emojis, or with leet speak… So you can write – for people that aren’t familiar, it’s basically taking a word and replacing letters with numbers. Adolf Hitler can be written pretty much all in numbers to evade detections, and only on a need-to-know basis sort of thing. There’s also in the keyword space you have numbers. 1488 is a white supremacy number. 88 for Heil Hitler, and 14 is a phrase that they use, of the number of words in the phrase…
And so you can say, “Okay, well, I’ll add all these to my dictionary.” But then you get to the place where you say, “Well, what about someone that’s telling someone else “Don’t call me some slur? Don’t call me that.” And so now they’re using the slur, right? Is that something that you want to necessarily ban? Maybe in some cases, but in other cases, you’re going to need sort of a deeper understanding of how language is being used before you just go out and outright ban it. And so you see this sort of evolution in terms of the ways or the approaches that platforms are taking to moderate this space. And sort of looking at the context in which language is used is sort of the first step in that.
I think I just realized – that was a great explanation, and I just realized how sheltered I am, because several of the things you referred to I just didn’t know, at all… So I guess I’m very sheltered. But I’m curious, when we were first starting the conversation a few minutes ago, you also mentioned misinformation, and we were just diving into some of the specific use cases on hate speech… How does misinformation – because we’ve been dealing with hate speech for a long time now, but misinformation in the last few election cycles has really become a huge issue, and obviously in national security issues, and things like that, it’s big. How does that fit in?
I think in my mind I’ve thought about there’s hate speech, and there’s misinformation and all… Is there a connection between them all? Are they bound together in some way, the way you see it? Or are these distinct, separate kinds of things? How do you think about it? How do the folks at your company think about it?
[09:34] That’s a really interesting question, to take sort of these two examples of these two specific violations. I think violations are sort of on this spectrum, right? And so you can think about these sort of more evasive violations, so things that kind of are more difficult to find, or require subject matter knowledge, so hate speech… You need to know these keywords, you need to know these things… And then you also have these sort of more common violations, where some of them are like “Well, maybe it’s not even a violation”, like nudity, or profanity, or things that, where it’s just more out there. And everything kind of lies almost sort of on a spectrum, and you can go like spam, and fraud, and so forth, until you get to the really dark, like child safety and abuse, and things that.
And with misinformation it’s actually interesting, because it’s not really trying to evade; that’s the whole point. On the other hand, it’s really tricky to find and to understand, and there’s lots of organizations that do really wonderful work of fact-checking, and keeping up on the trends, and really identifying misinformation… And there’s also sort of these techniques that we can use once we’ve identified sort of a specific type of misinformation, then we can use it to find it, and that it’s going viral, and so forth.
It’s a real struggle to understand how to put that in context, because you could say things in a misinformation context that are – there’s no hate speech in it explicitly, there are no banned, none of that is there. And yet, as we’ve seen in recent years, it can do great harm. So it seems like a very hard target to go after, and be able to mitigate it in a sane and reasonable way.
Absolutely. I think that’s one thing that’s really unique about ActiveFence and what we do, is that we’re – what we do is we basically combine this very, very deep subject matter expertise with our technology. So we’re a technology company, and yeah, we also have sort of experts in the fields, in the domain - experts in the field of researching human trafficking, and really understanding that space, or in misinformation, and different types of misinformation, and hate speech, and in terror… And so they speak the languages, they research the space, they understand it, they know the key players, they know the different organizations, the keywords… And this is an adversarial space, it’s constantly changing, and so they make sure that they stay up to date. And then what that means is that us, on the data side, we can basically take their ideas, take their knowledge, and then engineer features out of those. So really translate the human knowledge into our models, so then we can go out and automate that and do it at scale.
The reason that it’s so interesting is because, as you all know, models drift, they decay, and so you can go out and you can retrain your model, and get new weights, and you’re great… Except if you’re in an adversarial space, then not only are you drifting, but your reality is just so not stationary; it’s changing from underneath you. So as it’s changing from underneath you, you need to hurry up and reengineer your features. And so we’re constantly engineering new features, retraining our models, and also thinking about just what else can we possibly extract from this data that’s coming in. We’re analyzing text, video, audio… Everything, basically. Anything that we can get our hands on, and just really milking whatever we can out of it.
That’s super-interesting. I have so many questions… It’s a really interesting technology, but also the infrastructure and management part of that is, I’m sure, a great challenge. But I’m glad you brought up the modality thing. I also saw on your website you talk about different languages as well, which - it seems like this is definitely now, in terms of the way people communicate online… You mentioned emojis, and I was also thinking of GIFs - or GIFs, depending on who you are - or posting memes with text in the image… There’s also – of course, you’re talking about videos, and audio messages, all of that… I guess, as a more general question, is language, but sort of multimodal language - is that your primary area of research? Or are there other things outside of communication, in terms of the threats posed to sort of online platforms, where someone’s not trying to communicate a certain messaging or something, but it’s still a threat to the platform in one way or another? I guess maybe spam would be an example of that, but I don’t know if you have have other examples… Or is it really, from your view, a lot of what you focus on is the communication and language piece?
[14:18] So the goal – it’s not necessarily language, and in a second I’ll talk a lot about contextual AI and what that means… But really, our goal is to enable users to be safe online, to have this safe experience, right? I’m a mom, I have three kids… I started working at ActiveFence and I said, “Oh gosh, my daughter is not getting a cell phone until she’s 35. Forget about it.” Because you’re suddenly exposed to all this, and then you say – but you know, that’s why what we do is so important, because that’s our whole… That’s our whole goal. It’s not only about language, and that’s just one form of communication. There’s lots of things out there. And we’re a bunch of concerned parents; let’s really make it as safe – the fact that Chris is still sort of in this sheltered bubble is amazing. I want everyone to be in this sheltered bubble, right? And so that’s kind of the idea.
I think I just need to correct it… It depends. There’s some bubbles I’m sheltered from, I think, and there’s some I probably am not.
Probably so, yeah.
Yeah. Like, when you were talking about the hate speech, the specific numbers that meant stuff, I was like “I didn’t know that.” So anyway, I didn’t mean to cut in, but…
Why would you? [laughs] Yeah. So language is only one part of it. And I think one thing that we really get into is context. Let me take you on a journey through context, and we’ll end up at memes, which to me is crazy.
It sounds great.
So we were talking about the language context, and so how keywords and engrams - they just don’t cut it. You need language models, so transformers and so forth, to really get an understanding of what is being said and the context in which it’s being said. And so those are the kinds of models that we end up training, and that we have data for. We have, like I said, our subject matter experts, and policy experts that are able to sort of ensure that we’re capturing things that are sort of on the edge and border, because that’s where things get interesting, and that’s how I’m able to get the difference between, “I’m proud of being a whatever”, because I’m reclaiming that word, versus “You are a *bleep* you’re not allowed here.” And even within the hate speech of language, there’s insulting hate speech, not insulting hate speech… Like, “Hey, wasn’t that a great KKK rally yesterday? I really your proud boys tattoo.” Right? It’s hard to catch those things.
That’s a surreal statement right there, that example. I just –
That’s never been said on your podcast, right?
No, it’s never been said… [laughs] As you said that, I was like “Wow, I’m not talking to the people that are saying things like that.” So anyway, I’m sorry, go ahead. It’s novel to me to hear some of this perspective.
But then we can sort of go to the level of - okay, so we’re now in the image space. So one thing that we’re able to do - and again, because we have this deep subject matter expertise, is we’re able to search for logos, right? So logos of rare terror groups, small, that are hard to find… We know this space, and so we’re able to go out and do logo detection, find those particular logos, identify things. And then you say, “Okay, great, so here’s the video. Found the ISIS logo. Great. Check, terror.” And then you say, “Well, wait a minute, but there’s also the CNN logo here.” So suddenly, even though it’s a snippet from ISIS, the context in which it’s used doesn’t make it violative. Suddenly it’s interesting, it’s important, it’s historical, it’s whatever. And you can see the same things with videos of Nazis marching, right? Sometimes that’s glorified, and sometimes it’s just historical, you know, it is what it is. So that’s another level of context, where we have to sort of look at the context of the – you know, one signal out of the image isn’t enough, right of the video.
Another thing that we like looking at that, that is important, is that you can look at the context in which the image is being used, right? What is the title? What is the description? What are the comments? We have an example that I like using where you see sort of non-violative texting. “I love him.” You’re like “That’s fine, who cares?” And then when you zoom out, you see it’s “I love him” with a picture of Osama bin Laden. And that’s suddenly more interesting, right? Suddenly, it becomes violative. So you can’t just take any one piece in isolation.
[18:20] Or there’s an example of some chef that’s – it’s hard to do this on a podcast, but showing knives… And he’s demonstrating these knives, and showing these knives, and his hands are all cut up, because uses knives… And if you do just object detection, it screams at you, “Weapon! Weapon! Weapon! Oh, gosh, this is a terrible video.” And then you analyze the title and the description and the comments and the channel and everything along with it, and you’re like “No, he’s teaching about knives. It’s a chef video. Really not interesting.”
So looking at things just as keywords in a sentence aren’t enough. Also, just looking at an image by itself isn’t going to tell you whether or not something is problematic. And so it’s this idea of contextual AI that we think about a lot, is like “What is the context in which something is used?” And context can mean lots of things; it can also mean the policy, right? So different platforms have different policies. Some platforms will say, “Baby’s first bath” is child abuse; you cannot have it, child nudity. And others will say not a big deal. So that’s another level of context that our models need to deal with.
I mean, just knowing about where NLP models or other models fail… Like, this area of sarcasm and humor is so difficult, and there’s this further distinction that you’re drawing out, which is, well, there’s some memes that are jokes and sarcasm; there’s some memes that are jokes and sarcasm to the point of being very, very harmful. And also, that’s tied into the context of where they’re put, or the timing of when they’re put somewhere, or something like that… So I’m wondering if you could break down, as you’re stepping into addressing some of this, you already mentioned frequently you’re updating new features, there’s new behaviors that you’re seeing, that didn’t exist before… So let’s say that ActiveFence starts to understand that there’s some type of new behavior that’s harmful, or something that. What is your process, and how do you think about going from knowing this is happening, to detecting that this is happening, and in a repeatable sort of way?
So there’s a couple of different ways that we’re basically staying up to date. The first is really, really close contact with subject matter experts that are out there, gathering information, intelligence, researching, collecting data, building keyword databases, looking for particular bad actors that frequently post things… They’re really there, and so we frequently talk to them and understanding what was it that made this violative, or they’ll at a certain point be like “Hey, this is a new hate group”, or “Hey, this is a new meme”, and so forth. That’s one thing. The other thing is that even within our models, we’re constantly getting feedback. We have something that we call the database of evil, which is –
[laughs] I mean, you might as well call it what it is…
Right?
That has to be the best name I’ve ever heard. It’s the database of evil…
And it’s true to its name.
I believe it.
[22:05] And so we keep that updated. So we have data that’s coming in, we score it, we give it a risk score, which essentially, the probability that it’s violative for some violation… And then we have trained analysts that review it, review the score, and sort of can say, “Yes. No. No.” Anything that’s verified as being violative, it goes into the database of evil. And the database of evil is used for a few things, one of which is new content that comes in. We can say “Well, have we seen this before? Do we know it?” Things that we’ve seen a lot, versus things that are brand new. And it’s also used because as we take this feedback, we’re constantly, like I said, we’re retraining, we’re learning. And those are the small adjustments that we can make to our models.
And so it’s this idea of constantly getting feedback, both just from researchers that go out and find things, from the data that’s coming in that’s being scored, and then we’re sort of retraining on top of that… And then of course, we have our database of evil.
So let me ask a question… Obviously, as you pointed out, your database of evil has a lot of really explicitly evil stuff. But I’m also imagining that there are gray areas… You mentioned the baby’s first bath kind of thing… And that would depend on audience, on whether – if a family member showed me “Well, we have a new baby in our family. My niece has a new baby, and she showed me a photo of the new baby having his first bath”, that would not be offensive to me. But there are contexts where posting it online, it could become offensive, and such. So with these types of gray areas, and the fact that you can have one set of content that has a bunch of different – I don’t know what you would call it… Acceptable rankings, if you will, depending on who’s viewing it, and what the context, and all that… How do you approach making sense of all the gray area? When there is everything from perfectly fine to absolutely not fine, and it’s all valid for the same thing.
Right. That’s a really pertinent question, and it’s something that we’re dealing with a lot. Part of it is sort of – and I don’t think that we’ve completely cracked it, but one thing that we do do is that we can also have database of evil where evil is relative for the client… Which also leads to sort of this idea of customized models per client based on the feedback that is coming from them, where basically you can say, “Okay, so–” And we have this now, we have two clients, one is baby’s first bath is violative, and one it isn’t, and so we are already juggling this with different levels of human intervention to get it to really perform, because that’s what’s training it to get it to that point.
And there’s even other examples where “Baby’s first bath”, and things that, people are like “Oh, but it’s so clear what child abuse and pedophilia are”, and clearly it’s not. And even things like asking someone “Are your parents home?” So if it’s a conversation between two children, on a chatroom, that’s totally fine. But it can take a much darker turn when you suddenly see that it’s a user that is also in adult chat rooms, or it’s being posted at around 8:30, 9, 10pm, when kids are supposed to be starting to be in bed… I don’t know, my kids go to bed early.
So even then, you can say, “Well, I have language understanding, and this is a kids’ chat room”, but suddenly, there’s all these other levels that you need to take into account to understand if a phrase really is just nothing.
I am living what you’ve just described… I have growing kids, but I also have a daughter, who is just getting to the point where we’re letting her get online, and do some of this stuff, and some of it is in supposedly safe environments… But then, as the nosy dad, who’s just worrying about keeping his child safe, there are all sorts of gray areas, and stuff. And then there are also some moments where I’m having a lot of trouble telling whether it is a safe context or not. It’s not very clear. And so I can imagine that that is extremely challenging to solve as a technical problem, that can be recreated across a lot of different audiences.
[26:32] Totally. And I think that there’s always going to be, at least when we’re training or whatever, there always has to be human in the loop for these gray areas. So we do as much as we can with technology, and we bring it there, but even if as a parent you’re looking at it and you’re saying, “I don’t know…” And so sometimes we can leverage things that you don’t have access to. Like, we can look at the history of the user, or the other chat rooms, or other things that are going on in the space, or who has been in this chat room before… But sometimes it comes down to - you just don’t know.
So I have a lot of – Chris always knows I like to ask a lot of practical questions… But before I get to those in terms of some of the things you’re doing, and how you’re doing them, I’m wondering, for a company that’s using some of ActiveFence’s technology, what does that connection look like? One of the examples I’m thinking in my mind is, I’m working on a website for some of our partners where people can contribute endless cards for tools that they’re working on, like software tools… And technically, they could submit anything in that description of that tool. Now, I think we have hopefully vetted people that will be submitting content, and actually not everyone has accounts, and so it’s fairly restricted… But yeah, I’m wondering, in that situation, or a much more scaled up situation, what does – is working with the right view to have your software platform, and then you send off content to some API, and get a threat score or something, and then you figure out what to do with that threat score in terms of – how does this actually practically work out for a company in terms of… Because I imagine it’s complicated. Every company has their different platform, right? And also, the format of a Facebook message going to a WebHook is going to be different than a blog post being posted to a content management platform… So in terms of data moving around, how does that work out practically?
Yeah, so I think there’s two parts to – maybe more parts, but two main parts to the question. The first is how would you as a user interact with us? And so we have a UI, a platform, where you can really see the content that’s coming in, and you can define sort of codeless workflows where if something is above a certain risk score threshold, then it’s automatically filtered out; if it’s below a particular risk score threshold, then you don’t even look at it, and then what is your threshold for human moderation. That sort of gets around this sort of precision recall conundrum, where you’re like “Well, I set a threshold, and I always have to choose what am I maximizing?” And you can say, “Well, let’s set one threshold where you’re maximizing your precision, and another one where you’re comfortable to recall, and then you look in this sort of band.” And then you can use that to moderate.
We also have an API, you can do synchronous calls for text, near real-time, really, really fast… So for chat, if you want pre-published, and so forth… And we also have async for text, and for images, for video… We can send the full context. So you can send your content, and you have the body of the media, and the title, and the description, whatever you have. That’s for the first part of your question. And for the second part it’s maybe a little bit more interesting, kind of, because everyone – you can build an API, or whatever… But what we’ve spent a lot of time on is both optimizing our API, so making sure that it’s very robust, and responsive, and so forth, and then also modeling our data.
[30:22] So we have a very rich understanding of the worlds of platforms, of how we can model the world of online media, or online platforms, or user-generated content; pick your favorite term. We model it, we have a very robust and flexible schema, where we’re able to sort of monitor a user, and it’s related to the posts that they put, and how many likes they have… And it’s not always relevant, and you don’t always need to use it all, but we have this sort of – we have users, and we have, we have contents, and we have collections, and each of those are modeled a bit differently… And so once the data comes in and we ingest the data, and it’s modeled this, then we can go ahead and take it apart and score the different parts of it, and then through our API, which is able to handle really high throughput and fast SLA, basically start giving you responses.
And we’ve done a lot of work on our backend optimizing, and we’re batching models on GPUs, and doing all sorts of picking, we have all kinds of code that we’ve written that basically optimizes what machine type you wanna run on, to make sure that everything runs as smoothly and as robustly and as reliably as possible to get those responses out.
So Matar, one of my questions that’s just in the back of my mind is just the practicalities of running the type of platform that you’re building, and the service that you’re running… I could imagine, “Oh, I have this model that is able – model one is able to detect this harmful type of meme, and model two is able to detect this harmful type of video”, and then all of a sudden, you’re proliferating hundreds and thousands of models for little pieces of what you’re trying to detect… And then another sort of scenario is “Oh, I’m gonna try to standardize everything into more generalized models, that handle multi multiple modes of data, or try to synthesize things together…” How just practically as a development and research team have you started thinking about “When is this something maybe we want to combine together in maybe a larger model that’s trying to address multi-tasks sort of thing, or multiple types of data?” And then the other side of that is maybe sometimes it is useful to just spin up hundreds of small models, and ensemble them together in some way. Any thoughts on that?
Yeah, so we actually do do that. Sometimes we have models that are just really lean, and we serve them as is, and that’s sort of for, like I said, for near-real-time responses. For when we do contextual stuff - so like I said, we really need to extract information as many different ways as we can. So we’re looking for logos, and we’re listening to the audio, and looking for known phrases, and keywords, and language understanding, and what have you… And all these smaller models then we do combine into ensembles. We have a feature store that we can basically take from, combine, train the relevant models, and then productionize them. And then add – we call them indicators, but essentially indicators from which then we can get features, and then go to a model which is an ensemble of these.
[34:22] So we use both approaches based on the SLA requirements, based on also the explainability that we need… We want to be able to explain why something, like this particular logo was found… Because sometimes the moderator may not have the full knowledge that we have. And so a big thing that we deal with is “How can we take our intelligence and leverage it to the fullest extent?” So one way is really to put it in the models, and the other way is really to educate the moderators through explainability of the model, so they can really understand why things – sometimes things aren’t obvious.
Yeah. And I guess you started getting to my other question, which is how and when do you bring in the subject matter experts into the loop? Because I imagine there’s certain cases where you’re highly probable that this is some type of harmful situation, and maybe given a restricted set of subject matter experts in an area, maybe they’re restricted to only reviewing X amount of content per day, or something… Is that a situation that you run into, where you have to prioritize what you’re reviewing with subject matter experts based on some predictive measure that you have, and do that in some sort of ranked way? Or do you handle that in some other way?
Do you mean in terms of what the analysts are reviewing for us?
Yeah…
The labeling, or for reviewing the–
Right. So I’m assuming that there’s a limited number of those people; there’s not infinite of those people, so…
Right. There’s not an infinite number of those people, and also, we want to be very aware of their well being. We care a lot about the well being of the people that we work with. ActiveFence invests a lot in that. And so specifically for these analysts I want to make sure that I prioritize what it is that they need to review. I don’t always need – I don’t need them to review everything, right? Usually, what I would go for is I want to review the gray zone, right?
So we have implemented active learning, which is basically to prioritize what it is that we want to train on. And so that also prioritizes what it is that we want to review and to label… Because I’m always going for the gray zone, right? The things that we’re not quite sure of, that we don’t really know - that’s where it goes to the expert, right? It goes back to Chris’s question, “How do you know?” Sometimes you do know, but it’s tough. And those are the things that I want to label, because those things that are tough are what is going to feed in, and to give my descriminator the maximum power that it needs.
And how much – sorry to steal all the questions, Chris; I’m just so fascinated by all this…
No worries.
One thing that is always on my mind, and maybe I wrestle with sometimes, is how much do your data science people, or the people that are working sort of with the models directly interact with the subject matter experts, and share knowledge across that boundary? How do you balance that? Because that’s always something I think I struggle with in projects, is - ultimately, it would be great to bring the subject matter experts in all along the way, in every step of everything, because you learn so much… But the fact of the matter is you’ve got a limited number of those people, but also you have to ship things, right? So you can’t necessarily have the luxury of always having a discussion before you make a development decision. So how do you balance that, especially because this is such a complicated environment in terms of the subject matter? How have you found ways to balance that, and any thoughts or takeaways that you have from that experience?
[38:18] Yeah, so one thing that we did is we actually embedded subject matter experts, like researchers, into our dev team, to sort of be part of the process. We also have our analysts that are labelers, they work really closely; they’re just part of the same group, and so they’re not out there. However, again, it’s a limited number, and it’s a limited number of violations… We’re constantly exposed to new stuff that we have to handle. And they’re just a matter of relationship building, and of doing check bases and constant feedback… So they’re kicking off a new project, so we come, we learn, “What are you guys doing?” Because a lot of times they’re learning on the fly too, right? They have this new trendy thing that they’re learning about, and so as they’re learning, we’re trying to gather as much as we can from them, and then just a constant feedback. “How does this look? How does this look? Is this there, is this not?” But I think the key was embedding them with us.
We did have situations where, basically we wanted to develop models for things that we didn’t want to expose our data scientists to, and that only a very, very few number of people in the company can be exposed to, because of the nature of the violation… And that was much trickier, because there was complete dependence on the data scientists, like “How do you build and train a model without looking at the data?”
So I’m kind of curious, because – and it changed the question I was about to ask you just a little bit, with what you’ve just said… So I’m gonna combine two things… The sense that I got - because you keep talking about going to the gray area, and stuff - is that almost the core of your research effort is to replace the human intuition that’s necessary early on to identify the nuance that’s there, with more and better models as you’re moving forward into that. Almost a bell curve of difficulty where that gray area is the hardest… But I am curious - and I’m wondering if that’s the case, but I’m also curious, when you mentioned those things… It seems like, I’m guessing, that the things that you really don’t want to expose someone to - they’re not in the gray area; they’re way over into the deeply evil side; you’re never gonna forget having been exposed to that, if you are. How do those balance out? You have that gray area that you’re focusing on, that you’ve mentioned several times, and then you have those kinds of things… When you have something that’s so explicitly evil, and it will imprint a human’s mind in a very negative way - are those different problems that you’re solving as a data scientist a little bit, whereas the gray area, there’s so much nuance there…? Do you see what I’m getting at? How do you balance the approach to building models to handle one thing that’s really obviously bad, and you just don’t want to get anyone to it, versus developing intuition, or an alternative to intuition in the gray area?
Yeah, so I think - and let me know if this doesn’t quite answer your question… But a lot of times it can be the same model. So you have a model and it knows how to identify it – because it gets the distribution, the data is distributed in some way along the space, right? And so the things that are very obvious are going to be on one set of the discriminative boundary; like, really – pick your favorite violation, and I’ll give you examples of things that are very, very, very clearly violative, and they’re on that line. And then when you’re training your model, you don’t only want to give it just the really horrible examples, and then things that are just puppies and snowflakes, that are very obviously not… Because the distribution between those – they’re so far away that your decision boundary is just never going to converge; it can flip flop back and forth, and you’ll never know.
[41:58] So as we’re doing it, we’re also trying to find things that are on the borderline, because that’s what’s going to help us really make sure that we’re able to find a good decision boundary. Because at the end of the day, it’s really important – the basics is that we have to be able to catch the ISIS, and we have to be able to catch beheadings, and all these terrible, terrible things, but we also want to be able to catch things that are less obvious within, still within that space.
So that’s where I’m talking about the gray area… For the grooming model - I don’t even want to say, but we could think of phrases that are very obviously grooming, right? Where you’re sexually harassing a minor, and it’s there in plain text, right? But that same model, if you want it to be any good - sure, it’s helpful that it can find the obvious stuff, but you also want to train it on things that are closer to the boundaries, because that’s what will help you in the long run.
I have a quick follow-up that as you were talking, it was coming into my mind; it’s a very human question, and for a moment I want to move you out of the data science bit a little bit, and just - your organization is in a little bit of a unique position on that. You mentioned that there are certain things you want to keep as many of your of your folks on your team not exposed to, but that does leave some people exposed to some pretty awful stuff. And that does impact people, we know. I’m guessing that you’re one of the people that has had to see some of those pretty tough things to see… How do you cope with that a little bit, and keep – you seem to be really super-grounded in that. But as part of the job, you’re going to have to cope with some really tough stuff. And I know there has been things that I have seen myself online, that I wish I just had not seen. I remember early on in the Al Qaeda period some time back, and I watched something that was in the news, that happened to be out there, and I was like I wish I’d never seen that, and I will never forget that as long as I live. I’m just curious, in a human sense, how do you cope with terrible things, and keep it in a healthy – for yourself in a healthy place, if that makes sense?
Yeah, that does make sense. So I think personally I’m a very mission-driven person. It’s very clear to me why it is that we do what we do. And I really, really believe in it. Like I said, a bunch of us are parents, or have nieces and nephews, or just care about kids, or care about communities and environments… And I just very, very deeply believe in what we at ActiveFence do. And you can really feel that when you’re in the office and when you’re working with people, everyone is very mission-driven. That being said, we also do our absolute best to support and protect everyone that works with us. So whether it’s different wellness support programs… We have a psychologist on staff who specializes in resilience, and she’s available to everyone, and she does group, and one-on-one, and really helps people build this sort of resilience in the face of what it is that we do.
And for me, what personally works for me is just understanding why what we do is so important. And yeah, I’ve definitely seen things that will never leave me. Never. And I accept that, because I’m doing my part to make everything just –
You’re helping the world in that way. I get that.
Hopefully. I try.
I think even though we’ve talked about hard things in this episode, I’m super-encouraged and want to thank you and the team at ActiveFence for what you’re doing.
Me too.
Yeah, it’s something that’s desperately needed, and thank you so much for digging into these problems, and doing it with such technical excellence as well, and deep insight. As we close out here and we look to the future, what on the positive side sort of excites you about where this technology is headed, as you look to the future?
Yeah. So to me, first of all it’s super-exciting that this is no – like I said, we started off, it’s no longer nice to have. It’s a basic expectation. So I think, first of all, to me, that’s really exciting, because people are not taking safety for granted. They understand how critical it is, and it’s coming from the users, it’s not just sort of like “Oh, well, it is what it is. This is the price I pay for being online.” No, that shouldn’t be the price that you pay. So to me, that’s exciting.
And on the tech side, I think what’s cool is that we’re seeing a lot of open sourcing of different models, whether it’s data generation, or audio transcription, or Z shots, or all these things that are just – it’s like a candy store, right? You can start thinking about all these things, technologies used for completely different things, and you can say, “Well, how can I take these ideas and use them to just extract more signal, and to look at these things from different angles?” And it’s an adversarial space, and so it keeps it interesting, at least…
Well, Matar, it’s very inspirational the work you’re doing. I know that it’s tough work, but thank you very much, and to your teammates, for doing the work that you’re doing. It was great having you on the show. I’m looking forward too to having you back some time as you guys search forward and have some more stuff that you want to share with us… So thank you very much for your time today.
Thank you so much for having me. Thank you for caring, and for asking really, really interesting questions. I appreciate it.
Our transcripts are open source on GitHub. Improvements are welcome. 💚