Practical AI – Episode #161

OpenAI and Hugging Face tooling

get Fully-Connected with Chris and Daniel

All Episodes

The time has come! OpenAI’s API is now available with no waitlist. Chris and Daniel dig into the API and playground during this episode, and they also discuss some of the latest tool from Hugging Face (including new reinforcement learning environments). Finally, Daniel gives an update on how he is building out infrastructure for a new AI team.

Featuring

Sponsors

RudderStack – Smart customer data pipeline made for developers. RudderStack is the smart customer data pipeline. Connect your whole customer data stack. Warehouse-first, open source Segment alternative.

Me, Myself, and AI – A podcast on artificial intelligence and business produced by MIT Sloan Management Review and Boston Consulting Group. Each episode, Sam Ransbotham and Sheervin Khodabandeh talk to AI leaders from organizations like Nasdaq, Spotify, Starbucks, and IKEA. Me, Myself, and AI is available wherever you get your podcasts. Just search Me, Myself, and AI.

Notes & Links

📝 Edit Notes

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Welcome to another Fully Connected episode of Practical AI. This is where Chris and I keep you fully connected with everything that’s happening in the AI community. We’ll take some time to discuss the latest AI news, and we’ll dig into some learning resources to help you level up your machine learning game. I’m Daniel Whitenack, I’m a data scientist at SIL International. I’m joined, as always, by my co-host Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?

I am very well, Daniel. Looking forward to diving into some of today’s topics. And as you’ll explain to the audience, we get to put you on the hot seat just a little bit today.

A little bit, yeah, yeah, to talk about some of the things in my life… But yeah, it’s exciting. Lots of exciting things going on as we wrap up what is the last weeks of 2021. That seems like it went by fast. I don’t know if it did for you.

It has gone by incredibly fast. The last two years - just craziness. I was just like, “Where did they go?” But you know what? We’ve got some new things happening here, some new things to talk about that you can guide us through.

Yeah. So I think maybe one of the things that we can talk about, which we’ve talked about several times on the show, and has been a developing theme that we visit occasionally, is GPT-3 and the OpenAI APIs. I don’t know when you saw this, but recently, I saw how OpenAI recently made their API available with no waitlist. So previously, you had to apply on a waitlist, they would approve you, and then you could use some of their models. And I think originally, even when we first talked about GPT-3, they released it and it was fairly closely guarded, I think mostly because of what they considered a safety process, and making sure that people didn’t misuse the model in certain ways. And I think, based on the blog post where they talk about them opening up some of the availability, they emphasize a lot of these kind of safety features that they have put in place.

And that might be actually – before we dive into it, that might be a good place to start, is why. For those who are not already very familiar with it, why would this model that they’ve released need all of this careful vetting and slow rollout and such? Because it’s been quite a while since they did it. So do you want to talk toward that a little bit?

Yeah. I think I it’s probably– first of all, I’m not speaking for OpenAI, but I think in general, what people’s thought process is around these types of models is that GPT-3, just kind of stepping back, is a large-scale language model that enables a variety of natural language processing tasks to be performed. And some of those, for example like natural language generation, are very– performance is very impressive from GPT-3. So one thought is, well, GPT-3 could be used to do malicious things, like create a bunch of fake news type of stuff, misinformation, distribute a lot of this kind of thing; as well, GPT-3 was trained on a huge amount of data that was kind of crawled from the internet and other places… And I think the behavior of the model, because all of the biases and other things that exist in that kind of large corpus, aren’t totally probed out. So another question is, how are biases showing up in this data? If we just say, “Hey, everyone, use this thing to generate text”, or integrate it in their applications, they might not take that caution in mind if they’re just applying it wholesale across the board.

Yeah. And just to name at least the nine categories without all the detail, they know that they prohibit users from knowingly generating, or allowing others to knowingly generate with the account the following categories of content. There’s nine. They are hate, harassment, violence, self-harm, adult, political, spam, deception, and malware. And apparently, they have spent a fair amount of effort putting these safeguards around this so that it could do– it’ll be interesting to see whether or not, going forward, they’re able to put safeguards around other models they release much faster, now that they’ve got the infrastructure in place. So it might speed up the ability to get on to new models… Because this has been– I mean, how long has it been now? Maybe a year and a half?

It’s been quite some time, yeah. So it’s interesting, they put a lot of thought process into this and they’re to be commended for leading a lot of the thought around these areas. I think there’s different views on how the community should go about addressing these things, and whether it should be via an access-controlled API or via open source code and models and data sets, or however it should be approached… But they have put a lot of thought into this. They talk about, in the blog post, how they are putting things in place to review applications, so applications of the model, applications of the AI before they go live, monitoring those applications for misuse, and better supporting applications as they scale, and understand the effects of the technology. So - very interesting developments here from OpenAI. Chris, have you logged in and tried anything with the OpenAI Playground, or anything like that?

I’ve just gotten an account, now that they’ve opened it up. I am ready… So I’m going to look for you to help me get into this. I have my brand new account, I have the site that they send you to open, where it says, “Welcome to OpenAI. Start with the basics.” I’m looking at their examples, I have OpenAI Codex open… And by the way, for our listeners out there, this might be a good moment if you don’t already have the account, pause the podcast for a second, go grab an account and follow along with us, if you’re not driving in the car or something. And I would love - I know, Daniel, you’ve been working with this for some time… If you could guide us through a little bit about what to see. I know you’re a practitioner, you’re not with OpenAI, but as someone who has used it, if you can maybe give us a little boost on things that you’ve already learned, that would be fantastic.

So we can maybe just look at some of the basic functionality that you can do here. You can do a lot of things, so this is by no means a complete introduction. But if you log into the API interface, one really nice thing is that they do give a big blocks that give you introduction and examples, so you can scroll through those… But they have this cool thing called the Playground. So they have documentation examples, and then they have the Playground. Ultimately, you can use the API in a variety of ways, including like a REST type interface. But they built this kind of Playground to help you try out things and see, “What are the types of things that I could do? What’s applicable to my specific use case?” So if you click on the Playground, it opens up this, essentially, a text box. The most basic of things that you could do is just start typing. I could say, “Chris Benson is really cool–”

Oh, boy…

“…and knows much–”

I thought no deception, remember? No deception.

“…about AI.” Yeah. So, “Chris Benson is really cool and knows a bunch about AI”, and then I could click “generate”, and it’s going to start generating a bunch of text. So yeah, I don’t know if you want to hear what it generated as a result of that, Chris… So I put in “Chris Benson is really cool and knows a bunch of AI”, and then it started saying, it said, “I think he’s still a bit of a newbie, but he’s learning fast. This is his first post. Enjoy it.” And then it goes on, “I’ve been playing with this new stuff called machine learning lately, and I’ve found it rather fun. What I’m going to talk about today…” So you can keep clicking that button, and it’s going to generate more and more of Chris Benson’s new blog post about machine learning and how he’s finding it rather fun.

Oh, boy. I was about to put yours in. But since you put such an unrealistic thing about me, I just put in, “There was a guy named Chris who had nine dogs”, and I started with that. It took an interesting path. It says, “He wanted to go for a drive in his new car, a ’67 Ford Mustang. It was a nice car, and he was proud of it. He asked his neighbor if he could borrow his dog for a few minutes. The neighbor said, ‘Sure, take him for a few minutes.’ Chris went to the garage and–

Borrow the dog… [laughs]

So that’s what I had. This is fun. I have a feeling this is going to turn into my favorite Saturday night with a drink or two - not too many, of course, because we’re responsible, and no driving… But yeah, this might be the new game-with-friends-on-a-Saturday-night thing to do after a beer, or a glass of wine.

Yeah. But I mean, you can see– and I think the interesting thing is that there’s a number of things that you can tweak here, right? Along the right-hand side of the Playground you can change the different engines that are available to do this, but then also response length… You can change various parameters, hyperparameters about what’s going on, and you can show probabilities or not… So there’s more here that you can get. And a cool thing similar to– if you were writing REST API calls or testing them in Postman, you can generate code to make that call. Here, in this Playground, it’s similar. You can actually generate code. So if you click on “view code”, you can then see, “Hey, here’s the Python code that will actually call the OpenAI API”, and get the similar response from the engine. So it’s telling you how to integrate this sort of text completion into an application, and you can go down and see how to do that with Python, or just calling a REST API, which is pretty cool.

Yeah. A REST API or cURL.

So they give the JSON and the cURL.

Yeah. Now, the interesting– so this is cool. The text completion thing is cool, but I think the maybe more interesting thing, at least for our team and how we’ve looked at models like these, is that they can be quickly adapted to a very specific task that you are more interested in than maybe just general text completion. So if you’re in the Playground, there’s a little dropdown called “load a preset”. You could load, for example, a Q&A preset - that’s the first one that pops up for me - and you’ll see what happens is the Playground will prefill a little bit of example data for the model. So it’s giving us a sort of pattern. You can think about this like a very small amount of prompt data, or a very few shot example type of thing, where you are giving just a little bit of warm-up to the model to tell it, “Hey, this is the sort of thing that I want to generate”, right? And so when you pull up the Q&A thing, for those that are listening, there’s a Q, colon, and there’s a question, and then an A, colon, answer, right? Q, colon, question; A, colon, answer; Q, colon, question; A, colon, answer. And it provides a bunch of these answers. So the model, all of a sudden, is realizing, “Oh, it wants me to generate things that are like Q, colon, some question, and A, give me the answer.” And so now at the bottom, you can type in a new question. We just recorded an episode about federated learning, so I’m curious if it knows what that is. So I’m going to say, “What is federated learning?”, question mark, and I’ll generate. It says, “Federated learning is a machine learning technique that allows a single machine to learn from multiple sources of data”, which is, actually, quite relevant. How did it know how to do that? Well, it’s trained on a whole bunch of text data from the internet and the world, right? So at some point, it maybe knows something about that, or has been prompted.

I just want to share with you what I put in while you were doing that.

I put in, for question, “What is GPT-3?” And the answer was, “GPT-3 is a question answering system developed by IBM.”

Well, they should work on that.

They might–

Maybe we should switch over to the IBM Watson API. [laughter] This is a lot of fun. And actually, you could scroll through these. You’ll actually see there’s examples that they give for summarizing text, or text to a command, or parsing unstructured data, or classification. And so it basically gives you an example of, “Hey, this is the way that you can prompt GPT-3 in order to have it do a new task for you.” We’ve used this for some data augmentation-type things, where we’ve wanted to generate data in a certain way, for a purpose, and actually use GPT-3 to help us generate that data, and that’s been very helpful for us. So that’s maybe a general rundown of what is GPT-3 and why you might want to check it out.

Yeah. Just to call out some of the things, in that dropdown there’s chat, there’s Q&A, which we talked about, grammatical standard English summarized for a second grader, text to command, English to French, parse unstructured data and classification, and then it has a More Examples section. But yeah, it looks good. I’m looking forward to diving into this.

Well, Chris, Hugging Face continues to be the darling of the AI world. Does it not?

It does.

New things all the time, cool stuff… I’ve seen a couple of things come out from Hugging Face recently. So for those maybe listeners who are new to the AI community or aren’t familiar, Hugging Face is an AI company, and actually has a whole host of things that are quite relevant to AI development and research and application, one of those being a model hub where you can get models, actually, of all types now. It started with natural language processing models, but now it has vision models and speech models, it has data sets that you can pull, it has spaces where you can host machine learning applications, it has an accelerated inference API where you can serve inferences from your models… You can think about this almost like– you know, people post their code to GitHub, people post their software containers to Docker Hub. You can post models and data sets to Hugging Face’s hub, and those can be public or private, and so you can version your models and datasets there, and serve your models and datasets there for your company… So it’s becoming a one-stop shop for a whole bunch of really useful AI tooling. At least that’s how I’m starting to see it.

That’s totally right. Hugging Face is one of the names in the AI world that everybody respects, and everybody recognizes. They just keep doing innovative things that are cool and they’re super user-friendly, and so they’re one of the go-to’s in this space that you always are going to be using.

Yeah. And one of the things that came out recently, which is really exciting, is a first ML-Agents reinforcement learning environment on Hugging Face–

Called SnowballFight!

Ah, yeah, yeah. I mean, very relevant for the Christmas holiday season, right? Snowball fight. Maybe people are in someplace where you don’t have snow, and you can now have a snowball fight on Hugging Face. What was your first impression when you saw this, Chris?

I’m playing with it, I’m loading it up as we speak, and it is cool, good names, good graphics… I am trying to load up the SnowballFight demo here right now.

Yeah. I was playing it a bunch earlier and had some fun. If you load into it, you can actually play the game interactively. So just to give people a sense of what we’re looking at - you load into it, it loads this interface where a game comes up, and that’s driven by Unity. And you can move around this little guy in a snowball field, and throw snowballs at another avatar on the other side. It’s a lot of fun.

The music’s playing in my earphones, just so that you know. So you have a soundtrack going…

Ha-ha! Festive.

Yeah. Oh boy, this is fun. Okay. Since nobody can see what I’m doing, I’ll stop.

Now we know what Chris is going to do the rest of the day.

Oh, this is it. This has taken over. So don’t let my boss know that I’m onto this.

Well, one cool thing is I think we do have scheduled to have Thomas from Hugging Face, who created this, in an upcoming episode, so we’ll dive into it more then… But I wanted to mention it here, because I think it’s cool that Hugging Face made this transition; I mean, at first, it really centered around chat interfaces, and then more broadly open source NLP, and then into open source and general-purpose AI tooling and hosting services. And now we see this reinforcement learning piece coming in, where the goal is that within Hugging Face itself, you’ll be able to build and share reinforcement learning environments.

Now, Chris, I know that we’ve talked about it a little bit in the past. Reinforcement learning, in general, is not a model. It’s a framework in which you can train agents or models. And if you think of a self-driving car or something like that, if you say, “I’m going to go from point A to point B in a self-driving car”, well, there’s a whole bunch of routes you could take, right? And it’s not really that there’s one perfect solution to that problem; it’s more about the decisions you make along your route, based on what you’ve done so far and the feedback that you’re getting from your environment. And so in order to train an agent to execute decisions in that environment or make actions in that environment, you need to have a simulated environment, that will allow an agent to navigate in that environment, get feedback and rewards, and then be trained accordingly to all operate in that environment, in this case, to win a snowball fight. So although the game is interesting, that’s ultimately the most interesting thing about this, is that it provides a route towards training reinforcement learning agents in environments that are shared on Hugging Face.

It’s a powerful approach… And at a previous company, back when deep reinforcement learning was still fairly new, we were using it for training robots on our team at this other place, and it was really good there. It’s used in video games. It’s used in the industry I’m in now, in a nonspecific way. It’s used to move all sorts of things we call platforms, things that move around and do things… And that’s how we get autonomy to work these days. I mean, it is a fantastic tool.

There is one that I think is worth calling out, there was a DARPA – it’s a public DARPA thing. And DARPA - it’s an interesting place. It is the Defense Advanced Research Projects Agency, and they do all sorts of government-oriented and military-oriented experimentation, very cutting edge. And about a year ago, they did something called AlphaDogfight. And I know I’ve brought it up in the past, but in a simulated environment, they trained a model to be able to do dog fighting against other models, and ultimate–

Like in planes.

Like as in airplanes, eight. I’m really glad you said that, just to be very, very clear. [laughter]

I didn’t want people to think about robot dogs fighting each other, or something.

Yeah. We’re talking about – think of the movie Top Gun, okay? That kind of dog fight, like pilot Tom Cruise, all that kind of stuff; not the same plane, though. And they train them. And so at the end of it… This was not the Navy, which is what Top Gun is, but the weapon school, which is the equivalent of Top Gun in the United States Air Force, they had a weapon school instructor - this would be like the equivalent of the instructors at Top Gun - go up against the model, and the model demolished the instructor, over and over and over again. And so you’re talking about one of the best fighter pilots in the world, period, getting demolished by the simulated AI that was based on deep reinforcement learning, beating it in the simulator. They were using a real – I believe it was an F-16 simulator. I watched the whole thing live and it was just amazing. So this is some pretty cool technology here, and it’s evolving rapidly.

But I think one of the things that I’ve always wondered as related to reinforcement learning is you see the power of that, but then you see how hard it is to create these environments in which you need to train the reinforcement learning agent. So in order to train a reinforcement learning agent, you have two choices. You can either train the agent in the real world scenario… Like, if you’re training something that’s going to fly a plane, that’s not very practical, because you’re going to crash a bunch of planes and maybe kill some civilians.

Yeah. People don’t like that.

So you have to create this simulated environment. However, creating that is actually– I mean, it’s not my core competency in terms of 3D environments and Unity and games and simulation and all that stuff… So I think the idea that there could be a place in which these environments are created and shared more broadly on Hugging Face to enable people to share things, modify them, update them, train agents on them - I think it’s a really interesting concept, because personally, it would be hard for me to know where to start in creating this environment if I didn’t have a good way to jump off.

So to generalize on that a little bit, it’s a great way of accomplishing this in any kind of environment or industry where the cost of that training would be prohibitively expensive. You mentioned about where you don’t crash planes 1,000 times, and all the damage… Another area would be medicine. If you want to try new surgeries or ultra-surgeries and carry procedures forward, and you don’t want to kill patients in the process, this is an area where you can use deep reinforcement learning for that. There’s so many areas out there that are just very expensive to do that. And I don’t mean just financially expensive, but loss of life, and things like that. You’re seeing it all over the place. You’re seeing it more and more. And I think simulation is going to become ever more a part of businesses - or organizations in general - getting done the things that they’re trying to get done.

So one more thing that I wanted to mention from Hugging Face, just before we’re moving on - they’ve had a couple of releases that are quite interesting. The other one I wanted to mention was what they’re calling a data measurements tool, which is an open source project. This just was released as well, and you can look through it. The data measurement’s role they say is an interactive interface and open source library that lets dataset creators and users automatically calculate metrics that are meaningful and useful for responsible data development. This is new enough that I haven’t dug into it, but I think this does certain things, everything from basic things that you might expect from exploratory data analysis, like figuring out missing values and descriptive statistics, all the way to analyzing biases in datasets, maybe related to particular factors, like gender or other things. And so I think, ultimately, what they’re trying to do here is create a nice way for people that are maybe using data sets off of Hugging Face’s data sets hub, or creating new data sets to really understand a little bit more about them and document them a little bit better, so that people aren’t just pulling whatever data set looks good, without understanding the implications of that. So it’s a pretty cool route that they’re going, I think.

Even as we’re talking, I’m looking through some of the graphs they have that do analysis on the data sets that they’re offering… I mean, the one that I’m in right now is “Hate speech 18”, and you can go explore, things like that. So it’s interesting in a bad way; things I definitely wouldn’t want my child to be– it kind of gives you all these different ways of analyzing and measuring. So very, very cool technology, a tool that is long overdue now, that is here. I mean, all kudos to them for doing it. Wish I’d had this for a while. So it’s funny, Daniel, as we look at some of these tools right now, it feels like this industry is maturing a little bit in terms of not just having–

Yeah. We have nice tools now, right?

Yeah. Not just having the models, but having some of the tooling we need around it to make it safe and get to what you need to get to for a good output, without some of the missteps.

So Chris, you had just started to talk about how we are getting to a point where there’s a good number of tools that fulfill a lot of the needs that an AI researcher or a data scientist– their needs are so diverse, everything from analyzing data sets, to serving models, to dealing with infrastructure and tracking things… There’s a lot of good tools out there now, and I know that our team - and maybe this is a follow up from a previous conversation that we had about building data teams. Our team at SIL, we’ve been in the growth phase. We’re building up a team that’s doing NLP research and development, and we’ve gone through a kind of process of figuring out what tools work well for us and how to plug all of these tools together. It’s kind of taken a year to work through a lot of those things, but I know there’s a lot of things that we talked about on this podcast, around MLOps and GPU servers and tracking models and experiments, and all of those things. And I thought it might be good to follow up on that conversation that we had before and talk a little bit about how some of these things might tie together in a real-world environment… Because we talked about a lot of them individually.

I really want to hear what you’re doing at SIL. And the reason I say that is - for those of us who have followed the show for a while, when we’ve had some of these other related conversations, I know we talked about some of the infrastructure that we had at Lockheed Martin, which was a very large organization, and kind of how we approach things… But that’s a different set of business drivers in a different set of constraints about how you go and evaluate such software. And so you’re coming from a different organization, different size, different constraints, different budgets, all that kind of thing. I would love not only for you to share what you have done and how you arrived at those, but what some of those constraints were that you had to learn to live within and make it work.

Yeah, for sure. So to give a little bit of context, we’ve now got probably a group of - depending on how you count them, probably 15 or so people between people on my team, plus academic collaborators, plus other close collaborators that are working on a similar set of problems for NLP tasks. And how we thought about this was, “We need a way for this team to do a diverse set of experiments.” We’re working on all sorts of things, from machine translation, to spoken language identification and speech-related problems, to chat and dialogue… And so we need people to be able to use a whole variety of tooling, but also, we want to create some standardization and centralization around how we’re tracking experiments, how we’re running jobs, and how we’re sharing models one with another… And so we’re kind of settling down on some of that. Part of that was thinking about, okay, where are we going to run training? Where are we going to run inference in our context? And how and where are we going to store and track models, data sets and code?

So some of that’s a little bit easier than others. I mean, code we version in GitHub. That’s pretty standard for everyone. But also, we use Google Colab a lot, because our organization - we use G Suite for all of our docs and drive and all of that. So we use Google Colab a lot. So if we think, okay, some people are going to have code that lives in GitHub, some people are going to have code that’s living in notebooks and Google Colab… Colab, of course, has some GPU resources, but in order to train some of these larger NLP and speech models, we need other, more robust GPU resources. So we did end up getting an on-prem GPU server, which is sitting down in Dallas, but that brings up a new set of questions. So if we’ve got 10 to 15 people, and sometimes more, distributed all over the US, but also in Europe, in Asia - basically, all over the world - how do we get them all running things in a reasonable way on this on-prem server in Dallas? You might be saying HPC stuff. I know you worked in HPC stuff for some time…

I was going to say a scheduler is one of those things, in terms of getting the jobs lined up and stuff. How did you approach it? I’m curious what’s different from how we addressed it.

So I think one of the things that we wanted to make sure was that our people running and supporting the server on the ground weren’t really the ones that were going to– although they’re installing the server, they’re not really administrating the server. We don’t have a large DevOps and engineering team behind our NLP team supporting us, so we wanted a simple solution that we could work with on our distributed team. So we ended up using ClearML for this.

So ClearML - we actually had conversation on the podcast about a similar tool, which is in the same vein as Weights & Biases. That’s another very popular one. But ClearML allows you to have a dashboard where you track all your experiments and your runs. And you can, actually, just by– so I could be running a Google Colab notebook, import the ClearML library, and register that experiment in the ClearML dashboard… But I could also, from the ClearML dashboard, enqueue a job on the GPU server, that will also be registered on the same dashboard and will be sent to the GPU server in Dallas, sort of like a scheduler. It’s not as full-featured as these really robust, big schedulers that are used on supercomputers, but it’s enough for our team, because we can say, “Well, we have this many queues on our GPUs”, and you put things in there, and they’ll run in that order. We don’t really need much more functionality than that. And all of that’s registered. And all of the input-output data is registered in a backing data store in S3.

For us, we’re running code locally on our laptops, code in Colab, and code on the on-prem server. All of that is importing ClearML, and all of those jobs are being registered in ClearML, and all of the input data and the output artifacts, like model files, are being stored in S3, in a versioned way. So we know what data was used to train this model, when it was created, we can look at the exact run, and all of that type of stuff.

Without diving down the alternatives that you didn’t choose, because I don’t want to do that to them - but abstracting that a little bit, what’s some of the reasoning that you chose ClearML, or conversely, that you didn’t choose a competitor, from a capability standpoint or from how it was satisfying the need? What were some of the things that made you arrive at this solution being the right one for your organization, that were not universal across all the solutions?

Yeah. I think it was a combination of 1) just the simplicity administrating the solution for not systems administrator people, although we got their help, right?

The majority of people running and operating this thing are data scientists and NLP people, so we needed something that we could support and not some thing that we would have to know a lot about HPC systems to support. So that was one thing. And we needed a way to queue jobs. And the fact that we could also integrate that with our runs on Google Colab and all that was really nice.

Now, we do use also– so this is for our training jobs. We do also use other solutions for persistent data pipelines. We use Pachyderm for that, which actually allows you to create and subscribe to data sources and pump those through to update data sets in a versioned sort of way. So we use that for other purposes, but ClearML gave us that experimentation for NLP research. And what we’ve found is we can run our experiments there, but then we can also kind of– and this is where I wanted to get with the connecting the pieces.

So we’re using notebooks, we’re using the GPU server, but then these models and data sets that we’re creating in our experiments - we can upload those now to Hugging Face, data sets hub, models hub. Those are versioned in that model hub. And then any inference we can do, we could either use the Hugging Face inference API, or we could call down the model using the Hugging Face libraries into Python code, where we could serve that model in a custom application of some type.

It provides a really, then, flexible route towards inference as well… Because if you can now store your models in a standardized way in a hub, and have a standardized way of serving them, it lets you have a consistency and efficiency around how you do that bit as well.

I’m curious about one of your constraints in general, aside from– because I understand the solution you’ve taken us through so far. Are you able to keep things cloud-based, or at least locally on that server and its immediate environment? Do you have any kind of edge considerations that you have to push to? Is that part of your requirement or not?

So it depends on the project. So some of our projects, the end target for deploying these models is some inference server in the cloud. So that’s part of what we deploy to. And that could be, like I say, either a custom inference server that we’ve built, or something that we’re deploying to a hosted inference service. But we also deploy to edge devices. Particularly, some of our speech solutions, we’re running on edge devices. That does have other concerns, like you’re talking about. But the nice thing is, if you’ve got your models – for the most part, on our edge devices, they are connected to the internet. So we can ship a Docker container to those edge devices, and if we’re, for example, downloading a model from Modelhub or S3, it can directly download that version of the model from there at run time. So we have a little bit of flexibility there on the edge devices because they are connected to the internet.

Gotcha. I’m curious if– and I know a while back you took us through some specs in a previous episode for a GPU server that you had… Is this the same one that we’re talking about, or is this a different one?

That’s my one-off DIY build that you’re talking about, the previous one… Which was an interesting build, and I still use it, more for the one-off things that I’m doing. This server - it’s a rack-mount unit with A100s in it, so A100 GPUs. Another thing that’s nice about that solution is the A100 GPUs have this MEG technology, which lets you split up the GPU into multiple virtual GPUs, which is really great because we don’t all the time–

It’s nice.

Yeah. We don’t all the time need to run big jobs on whole combinations of our GPUs. We might need to run a whole bunch of training jobs, right?

Yeah. You get a lot more utility.

Yeah, exactly. We can split them up and slice and dice them the way that makes sense for the season of research that we’re in. I think that’s a really beneficial thing. So yeah, I would highly recommend people look into that technology if if they’re able.

It is. That was my single favorite feature when the A100s came out, was that ability to do that instead of having to use them all. It was otherwise - once you had the A100s to look back, it felt very inefficient in terms of how you were going and doing training.

I want to go back and hit a really basic question that people are facing a lot, and that is, how did you determine your crossover for the organization for when it needed to have an on-prem server, versus when it could use cloud resources, whether that be Google Colab or AWS, or any of the others that are out there? How did you make such a determination?

Yeah. I think it was basically when we looked at, for the year, an estimate of this scale of training that we would need to be doing for our models, and we realized that we would be making back our cost on the GPU server with the amount of training that we’re doing. I think it’s partially that. So there was a break-even point there. But then also, it was when we realized we could develop also some operational efficiencies by having people centralize their jobs on this server, in these queues, by combining all of these people’s work, rather than this person over here spinning up GPU server in the cloud, and this person over here spinning it up, and this person over here spinning it up… And they’re not utilizing all of those GPUs to their full extent, right? By utilizing more of a job queue thing, then we could do that.

Now, you could also spin up our cloud server and implement similar queues and such. But yeah. So there’s a variety of options. There’s also increasingly favorable options for running these things in the cloud… So it’s still not a story that’s finished, I don’t think.

No, it’s definitely evolving… But we are still at that point where there seems to be a crossover, as you get more capable, sophisticated in your running models on a more consistent basis, and not just doing it for a short while each day kind of thing, and it’s sitting there not running during those off-hours. Once you get to that point, it definitely seems to pay, in the current economics, to go that way.

So as we turn to other things, do you have any learning resources worth sharing today?

Yeah. I mean, I don’t have that many, but I did want to point people to a quick thing that I saw, which I do think was really cool for people to explore, which is pandastutor.com. So pandastutor.com - if you go there, this is a way for you to visualize your Python Pandas data transformation, which is really cool. So for those that don’t know, Pandas is a way to construct data frames of tabular data in Python. It’s kind of ubiquitous in the data science/AI world. It’s very common to use. But it’s so powerful, there’s so many data transformations that you can do… Sometimes it’s hard to visualize and strategize about what your code’s actually doing, and some non-intuitive things can happen. From my perspective, a really cool thing that this is addressing is helping people gain that intuition about what certain transformations in Pandas code does to data, and it does that in a very visual way.

It does. It looks pretty cool. I’m looking through it now. Thank you for sharing this.

Yeah, for sure. There’s also a bunch of things going on in the month of December around the Advent of Code, and 27 days of jacks I’ve seen that. So there’s a whole bunch of every day of December trying to do some coding thing out there. So if you’re into that, you could look up some of those things, as they’re always good learning experiences… But yeah, I wanted to share this Pandas thing that I ran across.

Well, thanks. You’ve really taken us on a bit of a tour today, between OpenAI and Hugging Face, and then how you guys put together your current approach to training… So thank you for sharing that.

Yeah, for sure. It was fun, Chris. And I hope you have a good rest of the week. I’m going to put on my heavy coat and go through the cold, back to my apartment, because now it’s winter… But yeah, appreciate the conversation. Looking forward to chatting next week.

Absolutely. Talk to you then.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00