Practical AI – Episode #219

Capabilities of LLMs 🤯

with Rajiv Shah, ML engineer at Hugging Face 🤗

All Episodes

Large Language Model (LLM) capabilities have reached new heights and are nothing short of mind-blowing! However, with so many advancements happening at once, it can be overwhelming to keep up with all the latest developments. To help us navigate through this complex terrain, we’ve invited Raj - one of the most adept at explaining State-of-the-Art (SOTA) AI in practical terms - to join us on the podcast.

Raj discusses several intriguing topics such as in-context learning, reasoning, LLM options, and related tooling. But that’s not all! We also hear from Raj about the rapidly growing data science and AI community on TikTok.



FastlyOur bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at

Fly.ioThe home of — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at and check out the speedrun in their docs.

Notes & Links

📝 Edit Notes


1 00:00 Welcome to Practical AI
2 00:43 Rajiv Shah
3 01:55 AI on TikTok?
4 03:31 Community engagement on TikTok
5 04:49 Ever-growing mind-blowing moments
6 06:24 Reaching different audiences
7 07:57 What is in-context-learning?
8 10:52 Prompt engineering with better models
9 13:01 Growing productive users
10 14:52 The landscape of large language models
11 18:16 Sorting through this delightful mess
12 19:46 Hugging Face highlights
13 23:06 Practical fine-tuning
14 26:00 What are we talking about?
15 28:29 Where does AI fit into education?
16 30:20 A different kind of consumer
17 31:54 Talking to the average Joe about AI
18 34:02 What do you see through the looking glass?
19 36:06 Great Hugging Face resources
20 37:08 Outro


📝 Edit Transcript


Play the audio to listen along while you enjoy the transcript. 🎧

Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and joined as always by my co-host, Chris Benson. How’s it going, Chris?

Going very well. Spring is in the air, we’re having a good time here. Lots of cool stuff in the AI world.

Lots of new life breathed into interesting AI systems over the past days… And sometimes I hear about this in cool videos, which are way cooler than any videos that I produce, from our friend Raj, who’s with us today, Rajiv Shah, who’s a machine learning engineer at Hugging Face. How’re you doing Raj?

I’m doing great. Thanks for having me on.

Yeah. So the last time you were on the show we talked about data leakage… Have you leaked any data since the prior episode?

I think any data scientist that’s out there has leaked data on a regular basis like that, right? [laughter] It’s a hazard of the job. And one of the things I like to do is continually remind the new folks that that’s likely to happen.

That they are in the process of, and should remember, yeah. [laughs] Yeah, I did mention I’ve seen a lot of cool videos from you recently, and we were chatting even a bit I think on LinkedIn about - there is data science AI community on TikTok, and other places… Tell us a little bit about that. It’s a fun topic, I’m curious of what is the AI scene like on TikTok?

So let me start by like how I got into it. About a year ago, I was trying to get my son, who’s just starting college, to do a real practical project around AI. He’s taking computer science, but he doesn’t know what GitHub is, so I’m like “Can we build a discord bot, for example?” Something that appeals to him. And so I was like “Let’s give us 24 hours. We’ll do this over the weekend, we’ll both go our separate ways, then we’ll come back and kind of see what we’ve done and see if we could share what we’ve learned from each other.”

And so I go out, and I go get a blog tutorial and work my way through it, and kind of get something working… And then I go to him the next day and he’s like “Yeah, I kind of got stuck.” I was like “Well, let me see if I can help you through it. Show me the steps and what you’ve accomplished in doing it.” He pops open a YouTube video, and that’s what he used to follow it. And for somebody like me who self-taught their way into data science, that was largely focused on kind of reading and written material, it kind of really blew my mind that somebody would learn how to code through a video. But it really just opened my eyes to that, because already at that time, like with my daughter, I share videos on like food, and politics, and music… But it just really came to me how this is just becoming an emerging part of education and how people learn, kind of as we move on here.

Yeah. And have you seen engagement with your videos? I remember the one recently I saw was the – what is it, segment anything, or everything…? I forget which – anything and everything, whatever that one is. I saw your video on that one, which was cool, because it’s also very engaging. You’ve got like this skit element to it, but there’s real information content in there, in an engaging way. How do you see people respond to these?

[00:03:57.29] So I’ve had great feedback, and I try to keep mine very focused on data science. I try not to be too clickbaity, I try to be – if I was on a data science team, would I recommend somebody watch the video like that? But the video style also lets you do different things. So I think when I first started videos, I used to be a professor, I did like the traditional, “Let me just lecture you on this topic for 30 seconds.” But I think as you mentioned, over time, there’s more creative ways of doing it. And one of the things TikTok allows you to do is often tell it in a story or skit format, where you can have the voices of multiple people. If you’re sitting at home on your phone, that’s a much more interesting way to like get a nuanced conversation, rather than reading kind of some blog post that has “Here’s four different points on this.” So I think there’s a lot of potential for kind of teaching nuanced with something like TikTok.

Yeah, that’s cool. And there’s no shortage of things right now to talk about. It’s like, you probably are more constrained on your ability to pump out these videos than like the AI things that are coming out. We’ve all had our minds blown recently, especially the capabilities of large language models, but there’s also, of course, other things, in computer vision, and other things.

For you what have been those mind-blowing moments, or what has been on your mind over the past – I can’t even say like the past year; like the past two weeks, I don’t know… [laughs]

So I think we just have to look back and reflect that we’re really in a great place of a huge amount of innovation in a short amount of time. This is one of those peak times in AI that - it won’t be like this a year from now, right? It wasn’t like this two years or three years from now, where literally every week there’s new developments. It’s a fabulous time if you’re an AI junkie, and you like to kind of check out and see the newest tools, and see that incremental advance. There’s no better time; it’s not going to last for long, so kind of enjoy it.

I also kind of pushed back on this for lots of practicing data scientists that are very practical, that - you know, you don’t need to watch this stuff every day or every week like that. Many of these things are exciting, but if you’re day in and day out, you’re in enterprise data science, you’re inside doing churn analysis, or some marketing analysis, many of these developments are going to take a while before they filter to you; you’ll have plenty of time to get up to speed. They’re not going to change the face of every data scientist in the next two months like that.

So I’m curious - it’s a follow-up to both of these last few questions combined… You’re going into different mediums now for teaching, you’re hitting short video, longer video, different things… We have all of this happening so fast. How are you thinking about reaching different audiences in data science? It’s kind of funny, once upon a time it was just data science, but now we have different audiences, different age groups, different purposes… How are you making those different connections?

I was just talking to someone today [unintelligible 00:06:47.01] is like “Students coming into college now can’t type.” Typing isn’t a thing anymore, right? Because of the way they’ve grown up with devices. Like, they can poke, and touch screens. But that’s gotta influence – if we’re not adapting to that, then we’re not staying up, right, Chris?

You just made me feel really old…

And I think one thing that’s happened is - like, data science came out of statistics, and for a long time, the path to learn that was you went to college, you sat in a classroom, you had a statistics book to do that… But I think this is the transformative part about AI and data science, where now it’s touching so many people. And especially you see this with these large language models, where if you’re a teenager, you have a GPU, all of a sudden now you can kind of download and follow a script, and get something running on your local machine, where you can interact with this AI… Which a couple years ago would have been unheard of, for somebody to have such wide access to that.

So I think the hard part about communicating to so many audiences is also a great part that we have such a large community that’s engaged, and interested, and wants to use these tools.

[00:07:56.25] I’m going to bring a couple things here on the fly for you, Raj, because you are so good at explaining these things… So I’m at a conference right now, so I walked from a talk over back to here, and - yeah, one of the things that they were talking about was in-context learning with large language models. Could you kind of help us – so we’ve talked a lot on the show about prompting large language models, this sort of thing, but I don’t know that we’ve specifically kind of talked through this like in-context learning. What does that exactly mean, and what should people take away from it maybe?

So if we look at the development of these language models, a couple years ago, if you look at – there was blog posts by [unintelligible 00:08:36.10] kind of working with LSTMs, how we could get these models to generate text for us. And this is where we have kind of the statistical probabilities of being able to put together text, and it knows, like, “The cat ate the dog”, or that there’s some probabilities, and we could put a sentence together. A couple years ago, these were fantastic things, making like really weird stories. That’s all they were good for, when we look at like kind of the GPU tools like that.

Now, what’s happened is, as we’ve kind of worked with these large language models, and they’ve gotten bigger, where we’ve incorporated more data, we’ve trained them longer, machine learning engineers have noticed a new kind of what they call an emergent behavior that’s come about from these models, that isn’t there at the smaller size of the models. But when these models get really big, they allow this new capability of this in-context learning.

What in-context learning allows you to do is you can give the model a few examples of a type of question, and the model will then continue to answer in that question. So an easy example of this is sentiment. Imagine you had to have movies and you had to write the sentiment. In the old days, if you wanted to do this you would have to go out and label a bunch of movies. Let’s go get 100 or 1000 movies. We read the reviews, we label the sentiment - is this a good review? Is this a bad review? Then we train our model to do that. That’s the traditional data science approach. What we can do with these larger language models is say, “Hey, here’s three examples. Two of these are good movie reviews, one of these is a bad movie review. Now, I’m giving you a new movie review. Will you tell me what this movie review is?” And the model will reach back with us with the answer.

The key here is we’re not changing the weights of the model, we’re not training the model in any way. Just by carefully asking it for some type of information, it knows and can kind of figure out “Oh, you like it like this? Well, I will give you back an answer in that same kind of format, style, same type of information.” And so this, for me, is just mind-blowing, and it also makes us rethink a lot of the tasks we do in NLP, and how many of these we’re gonna be able to use this paradigm to do it. That was a long answer, I’ll let you digest it.

And that was a really good answer. So we all have this new skill that we’ve been developing around prompting; especially this past year, prompt engineering is now a thing, which it wasn’t very far back. It was – you’d go “What? What’s that?” So how does this all tie in? We have this new skill about prompting, and learning how to prompt effectively to get this information… You’re talking about this emergent quality of these large language models… How do those tie in? What does that imply for steps forward, and what should people be thinking about to make that productive for them in day to day use?

Let’s take this example to like something that you would do practically inside an enterprise, where somebody might give you some type of document, or chat transcript, which might be a little bit unstructured… And what you want to do is just categorize it. So now what we can do with using these prompting and these approaches is we can take that amount of information, I can ask the model, “Hey, will you structure, will you clean this? Will you take out the HTML format text?” It’ll do that. And then I can ask another prompt “Can you summarize this? Like, take this from a 100-line conversation just down to the essentials. 20 lines.” You can write a prompt for that. Then you can ask it “Hey, will you categorize this?” I need to see, “Should I send this to my claims department? Does it go to HR? Does it go to IT?” We can write a prompt for that.

[00:12:04.18] So now what you have developers using is tools like Langchain, where they can tie together several of these prompts, and create workflows that in the prior to this, we’d have to use separate machine learning models to do each of those tasks. And I think this is really for me what the mind-blowing part of it, is how we can change machine learning and really do a lot of this democratization that we’ve talked about for a long time, but do it through a natural language interface, where somebody can just literally give it these tasks in a human language, and then have them accomplished.

For the data scientists out there it’s a little mind-blowing, because I’ve been in this place where we’ve tried to teach people citizen data science, and we have classes on how to properly partition data in holdouts, and loss metrics, and all of this… But this approach dramatically kind of changes how the number of tasks people can do, kind of without having to learn all those concepts.

That’s a great point. With the advent of ChatGPT, and some of the others that are out, Bard, and everything coming out, it has exploded the audience that can productively use this technology. Do you see any limitations in that going forward, or do you think it’s going to continue to grow?

To Daniel’s point earlier, this is the mind-blowing part; I give you the simple example. Now what you see people doing is taking this, but combining this with other APIs and other services. So in that case of the movie reviews, maybe I want to get the weather forecast, or I want to find out if the theater was open that day, something else - well, now I can use that same type of natural language interface and connect to other APIs, other services, other information. And so this is where we see some of the most powerful applications of this, with tools like HuggingGPT, which allow you to interconnect with lots of different Hugging Face models, where I can ask it a question and give it a picture, and the model will automatically go out, figure out the appropriate Hugging Face models to use, run them, figure out the answer and bring that back to me.

Or - and this repo has been going crazy - the Auto-GPT one, where we essentially take… It’s not just for models; we allow the large language model to do any task, where we can say, “Hey, start up a business and raise some money for me.” And then the model will go out, answer that, go see “Hey, is there some other databases? Is there some other APIs that I can use?” and will continue to iterate. It might cost a lot of the tokens for GPT 4, but it’ll continue to iterate and try and try and try and do it.

And I think, for me, if you asked me a year ago if this was possible, I would have said “No way. That’s three or four years out. I can kind of see how you’re doing it, but…” To me, this is why it’s such a special moment we’re living in, because I don’t think any of us could have predicted we’d be here a year ago.

Even in our conversations so far we’ve listed out like a bunch of models. So GPTs, and Auto-GPT, and Hugging GPT, and Bloom, Flan, Flamingo, 7 Billion, whatever…

In terms of large language models and what’s out there right now, one interesting thing is like open access or various patterns around that, and hosting… How do you think about the landscape of large language models? What does that look like right now? What are the major categories that we could kind of have in our mind as clusters of these things?

At this point, there’s tens of kind of large language models. And yeah, there’s a number of different ways we can kind of categorize our thinking about them. One of them is - kind of the simplest - which ones are proprietary, which ones are open source. There’s a spectrum when we talk about access to these. So there’s some, for example OpenAI, where you don’t have access to the model. You don’t know what data it was trained on, you don’t know the model architecture. You just send your data to them, they send back the predictions. And so I think that’s one model there.

[00:16:02.11] And then all the way at the other extreme - today, for example, Databricks released the latest version of its DALL-E model, which was an open source model that was then instruction-tuned on a dataset that Databricks created themselves, that they’re making available kind of open source for commercial use itself there. So there’s the whole spectrum there. But there’s other spectrums here too, because the models, for example, vary in size, where you have for example something like Bloom, that was developed by Hugging Face, which is one of the largest open source models; it’s something like 170 billion parameters… To some of these much smaller models that are coming out, like the Llama models, and others that are maybe a billion parameters. And that size has implications in terms of how much reasoning ability, how much stuff is inside there.

But inference - is it something that your teenager is going to run on their own GPU, or is this something that’s going to take a multi-GPU cluster to be able to effectively use? There’s other dimensions, like what data the models were trained on. For example, with the open source models we know what data they were trained on. One piece of this, for example, that’s come up is knowing how much code a model was trained on… Because one of the things that’s often asked for is, “Hey, can we build a text-to-code type model, where I want to do some type of autocomplete, some type of code generation type project?” Well, if I start with a large language model that already understands code, it’s a lot easier to fine-tune it and make that capability. So understanding the underlying characteristics of that data.

Daniel, it’s like an alphabet soup of different names, and literally every week they’re popping up, and there’s so many of these different characteristics… Because they also differ, for example, on the model itself, and what the licensing is, and the model weights, the dataset that it was trained, the training code that it was done… We see this with kind of how Meta released the Llama model, where they told everybody about it, but then they released the weights, but then they gated the weights, so only academic people were getting to them… But then the weights were essentially leaked, and now they’re all over the internet, so now everybody’s using them… So it becomes very confusing in this big, thick mix of how to sort this out.

So you’re an organization out in the world today, and you’re trying to make sense of all of this. And if you just look at your last answer alone, it’s just overwhelming for most organizations to look at – there’s all these different characteristics, there’s big models, small models, open source, closed, you name it; you can slice it so many different ways. How do you make sense of that? If you are – let’s say that you’re in management at an organization; not dissuade the data scientist who’s 25, and gets the data side, but you’re trying to figure out “How do I do this?” in the larger sense. How do you start making sense of that? How do you know if you need your own model that you’re going to, create if you’re gonna go use somebody else’s, big, small… What’s a good starting point for people to start sorting through the mess that we’re all delighting in today?

And it is a mess, and I get calls all the time from model governance folks that are trying to like “We need to set out a blueprint for our company, we need to think through this…” Because right now, the incredible change, the pace of change, and all of that - that’s the downside of that. Like, if you’re trying to understand what’s going on, it’s really hard to. And I think a lot of organizations at this point - there’s not a lot of easy cases for “Let’s implement this, because it’s going to 10x our revenue for this particular thing.” I think there is a lot of breathing room in terms of enterprises and being able to figure out what the best strategy is for the models, over the next year or so, like that.

[00:19:45.16] I personally really benefited from Hugging Face tooling around this. Some of the decisions that I’ve made in terms of my own integrations into the applications that I’m building are because I know there’s a community around some of these sets of tools. There’s sort of interoperability if I want to pull in like this model size, or that model size, or like whatever it is. And even these large models - like you mentioned, Bloom - there’s so much integrated tooling with… I remember a really awesome blog post about running Bloom in Colab using accelerator bits and bytes and these things for like quantization, and all this… And all of that set of tooling from this Hugging Face ecosystem I think is so powerful for people actually practically trying to do this. I’m wondering - there’s so many cool tools coming out as well in that ecosystem; you’re, of course, at the center of it, being part of that community and that company… Any highlights that you’d like to highlight? I highlighted the one which is really cool, and that I’m playing with, but what else should be on our radar?

That’s great. And I know both of you kind of enjoy the Hugging Face ecosystem, and have spoken highly of it before… The Hugging Face ecosystem is all about just helping to kind of create and democratize machine learning, build out the open source for it. To Chris’s earlier point, we have a place where everybody can go and check the models, and read what is the licensing for the model, what are the implications for that, and learn about that.

Now, when it comes to these large language models, we’ve been busy building out pieces on that. So if you think about kind of training these large language models, Nathan on our team has written some blog posts around using techniques like reinforcement learning with human feedback… That’s the latest cutting-edge approaches to figuring out like how to get these models to align exactly with what humans do. Because yes, we can feed a bunch of data into the models, but what comes out of them often isn’t what you and I would think is the best. And so using reinforcement learning with human feedback does that.

I think one of the things I’m excited about is the PEFT library that we have, which is parameter-efficient fine-tuning. And if you look at these models, they’re huge. They take a ton of resources to do. PEFT has a number of different approaches, and there’s “How can we fine-tune these models without having to load the entire model and modify every weight in this?” And there’s a number of different techniques. For example, just “Hey, can we take the entire model weights and find a smaller structure inside them?”, like a low-rank approximation; I can’t think of that name. Can we get then that little dense piece and just train that part, and add that onto it? And if we do that, that actually works as a fine-tuning technique without having to train the entire model. So I think this is where the Hugging Face team is busy building out a lot of infrastructure and tooling, so we can kind of all effectively use these large language models.

It reminds me that tooling is tactical in terms of solving problems… And for Daniel and me, given the podcast name, tactical is practical.

[laughs] Wow, that was good. Maybe we should redo our tagline there, Chris. I’d have to run it through ChatGPT first, to make sure it was good…

Of course. [laughs]

We always talk, or we’ve talked many times on the podcast about how a lot of times the practical side of AI is on the inference side, not as much on the training side potentially, because like 99% of what you’re going to run your model in production is inference. I’m wondering, with these large language models I can see various scenarios happening, right? A lot of people are just putting that thin UI on top of OpenAI, and they’re never training anything, and they’re using that in-context learning… But now with the tooling that you just talked about, there’s sort of this ability to fine-tune these large models in a way that wouldn’t require you to have a bunch of racks of GPUs, right? But maybe you could even do it in some hosted system, like a Colab, or something like that, right? So how do you think that shifts people’s kind of approach to how they’re solving problems over the long run?

[00:24:07.03] Because it was sort of like for a while everybody’s training their sci-kit learn model, right? And then it seemed like for a while, “Okay, now, I’m just gonna use APIs, because I can’t train these models”, and now we’re kind of coming back to this “Okay, well, what about fine-tuning parameter-efficient? …like, we’re not loading the whole model in.” How do you think that changes things moving forward?

As somebody who’s worked inside enterprises for a long time, I knew the infatuation with OpenAI’s APIs was only going to last so long… Because I’ve tried to sell data scientists a blackbox solution - you don’t get very far. If it’s inside your enterprise, and your reputation, your job is on the line to make sure that model works, you want full control over it. Not to mention enterprises want full control kind of over their data that’s going into the model and how it’s being used.

You’re gonna see - and this is where there’s been so much energy, is in this development of open source large language models. But what’s blown me away in the last few months is just how widespread this community is… Because I think some of the developments you’ve seen are around C++ interfaces for large language models, right? Things that no data scientist I know would be able to develop something like that. But because there’s so much excitement, we got other folks, typical software developers, engaged in building tools. And I think there’s a lot of focus right now on building these types of tools for this efficient type of use of large language models, because nobody wants to have a cluster of GPUs like that. Microsoft, in fact, just today released their DeepSpeed-Chat tooling to help people train models, using less infrastructure, being able to do it faster.

So I think there’s going to be tremendous development of tools, because at the end of the day, most people would like to have a model that they can fit inside their computer, or a couple of GPUs, something that doesn’t take a lot, that they can control, that they can tune… And so I think we’ll see a lot of development and progress in terms of open source pieces for that.

Well, Raj, I am curious to know how many of your conversations these days around AI models and large language models are about some of that tooling and practical stuff that we just talked about, and how many are around sort of like ethical concerns, or hallucinations, or environmental concerns… What does that look like in your life right now?

So that, of course, is a huge part… Because again, this is like the difference between traditional machine learning, where we often thought about bias in models, right? Like, is your model going to work for kind of a young generation, versus an older generation? But now, with large language models and the ability of generative models - they’re creating information; like, how accurate it is.

One of the common fallacies we see, and hopefully, most of the listeners here are quite aware of that - these models lie. They’re just going to create output, and the output doesn’t necessarily have necessarily a tie to reality with that. So this is one of the biggest education pieces that we have to do; because people see OpenAI, they see the other tools, and they’re used to just typing in a question and getting back an answer… But to really use this like in let’s say an enterprise setting - I always suggest to people to pair this with traditional information retrieval techniques. We already know good ways of having to search and pull information. Let’s use those ways that are factually-based, and then we can still layer on top a large language model to give you that nice, chatty type interface, right? The large language models are great at writing like that, and take advantage of both.

But yeah, there’s a tremendous amount of like education that has to be done around, for example, hallucinations. That’s just the tip of it. There’s also “What’s the training data that was used for these models? Where did that come from?” And then once you’ve used these models, and you get output from these models - and this is where customers, especially for some of the code generation ones, and image generations are worried about, is they’re worried about their own legal consequences of using these models, that that might have some type of leakage from the training data, and copywritten material that could be in the outputs. It’s a lot of different issues going on…

[00:28:06.24] I’ve had conversations with people in various companies over things like the OpenAI licensing model, since they’re using ChatGPT, and it’s really made people aware that you can be giving over IP for using. And that’s just one of many possible concerns. I wanted to throw something – and I know you’ve been asked this a whole bunch of times, because it’s a really big topic, and I’d love to hear your take on it… Given where we’re at right now with large language models and some of the variants that we’ve talked about here, where does this sit in the concept of education? You have the gamut being run from “You’re not allowed to use any of these models for your coursework”, and then on the other side - and I think I may have mentioned this to Daniel a few weeks ago… I have a 10-year-old daughter, actually 11-year-old, in the fifth grade, and she had an assignment and I actually started us off going and doing some stuff in ChatGPT. Ended up having her do her own work, but I actually incorporated it in. But I’ve also talked to people who are deathly afraid of it skewing Academia, and how you’re measuring students’ progress… What might be a reasonable path forward in terms of trying to integrate this new technology into schooling?

I’m very pragmatic and know that we just have to kind of accept it and adopt it.

Me too.

Now, I agree that there’s going to be short-term issues to figure out who has access to the technology… Because this is an easy way for people who have access to those resources versus don’t to further differentiate themselves, and kind of even increase the differences between groups even more like that.

But I’m very pragmatic like this. I think it’s a very helpful tool, it’s very useful, and it’s going to be a part of how we work. It’s not only on the education of like students in terms of young people. I also think we also need to get our co-workers on board too, because I think a lot of us, probably listeners are early adopters that like playing with this… But I spent time with teaching my sales team how to use the tools. Claude is built into Slack. I’m like “Hey, look at what you can do with this”, because I think it’s one of those things that can enable a lot of people. But it takes a little bit of education, a little bit of pushing to get people who aren’t kind of used to these tools adopted, and especially - not only with the good that can come, but also, like we talked about earlier, the hallucinations, and so that they properly kind of use these tools as well.

Also, I think it’s the web developer, other developer community that’s starting to enter this space, like you were talking about with the C++ stuff, or other things. There’s other people contributing, which I think is great. You have a wider set of views being brought to the table around how these things should behave, how we should use them… There’s a lot more people at the table. And I know one of the things - I’ve seen, of course, a ton of that startup energy, too; like, people building things on top of this… Some very quickly; like I say, there’s just a thin landing page on top of OpenAI, but others that are like really fascinating and interesting use cases for this technology.

Of course, a lot of that community as well overlaps with the community using Hugging Face tooling, and those that Hugging Faces interacting with… What is it like for you to see that energy around startups? There’s so many things coming out… I know startups are already low percentage chance of success… But a lot of these things are really amazing, and I think could reshape how we work, how we learn, like you’re talking about, Chris, and other things… So what are you thinking around that front, and also having a sort of front row seat to see a lot of these things being released?

It’s an amazing time like that, and I love seeing the startups, because people are experimenting, trying new ideas, trying new things. Most of them will undoubtedly fail, but I think in the meantime we’re gonna get a lot of good ideas for different ways and approaches that we can kind of use these tools. And that that right there has been very excited about that.

[00:31:53.29] One of the things that I’ve been really having some interesting conversations is are about people who are not us, not in our audience; people who are in the larger world, and really may have loosely followed kind of what’s happening in the AI space, and kind of in the mainstream media, but they’re struggling to really understand what’s happening right now. And we kind of started the show off on that whole premise, that there’s so much happening right now, to the point of - I was at dinner function recently… It was just a couple of weeks ago, and I met this really cool dude who was in his mid-80s, but really sharp, followed technology, and we started talking about AI… And he’s just like “I’m trying to track it and understand.” And one of the points I turned to him, with the idea that we’re having these large language models that are now penetrating into everyone’s consciousness - I said, “This is that moment where you’re gonna look back and realize this was where you were conscious of AI being a part of your life. And it will take off from this point forward.” When you’re talking to people about these issues, how do you adjust people who are not used to this stuff the way we are? How do you get them into the right way of thinking about it in a productive way, and kind of onboard them? Because it’s not the same conversation today as it was a year ago, as you said. It’s changed. So how do you approach tackling that?

I think one of the easier ways is if I can get them to use the technology. If they can use an image generation where they can type in something and see the differences in results they might get; or use a ChatGPT where I can kind of coach them and do that. Because you’re right, trying to explain exactly what this does, without the context of actually using it - it’s like telling somebody about something in the future, that it’s hard to kind of contextualize and understand what’s going on. The easiest answer for me is just like getting them using it a little bit, and then that helps to then showing them then what are the boundaries, what are the limitations, what can we do, what are the possibilities, once they have a grounding on that.

As kind of a follow-on to that, we’ve kind of acknowledged we’re in this sort of historical moment. And in years past on this show, and last time you came on, and stuff, we might have talked about kind of historical moments in the context of AI. But I think we’re all agreeing that it’s becoming an historical moment for the whole world, whether you’re in AI or outside of AI, because it’s impacting everybody. You also acknowledged along the way that there are kind of ebbs and flows that we have. We’re certainly at one of those moments of just intense new stuff coming out. What do you see in the future, both short-term and long-term? Where do you think we’re going from here? Because it feels like we’re in an Alice Through the Looking Glass kind of moment. So what might the future look like, and what are you guys anticipating at Hugging Face?

So I agree, this is just an amazing moment… And I think it’s more so for the people that are in it that understand AI, and what’s going on, and kind of what are the steps we’ve made over the last year, and where we can go going forward. I still think we still have to figure out when we’re talking about kind of larger humanity, and the larger group of people, exactly what is the impact, and how we’re going to use this. Because yes, we have chatbots, but most of us didn’t spend a lot of our life before using chatbots – like, I don’t know how much of our lives going forward we’ll have to do that. So we’ll have to see how that’s integrated. But I think all of this just shows us that the idea of AI, the idea of using machines to help us make better decisions is something that is becoming much more widespread. We’re really kind of on a path with Hugging Face to help democratize that, bring that barrier down, allow more people - so not just the people who have been trained for four years in statistics and went and got a PhD, but somebody that can think through a problem a little bit, go interface back and forth with a computer, can all of a sudden build code, or solve a problem by tying some prompts together, and really allowing lots more people to harness the collective AI, the collective information that we have, and allow for more productive uses like that.

Gotcha. Good answer. As you know, Daniel and I have been longtime fanboys of Hugging Face. We think it’s a fantastic community, amazing tooling… So as we close out, do you want to point some folks to maybe a few things that Hugging Face has to offer that might be good ways of ramping up in different areas? Could you just kind of call them out?

Absolutely. So the Hugging Face website is a great place to start. There’s a free online course that you can start with using transformers there. There’s forums, there’s Discord, there’s a community there… Feel free to kind of jump in and kind of get engaged there. And then we’re building out lots of pieces. You’ll see more models coming up over the next few months that we’re going to be releasing, more tooling for working with this… So yeah, there’s a lot going on.

Fantastic. Well, thank you very much for coming back on the show. You’re always exciting, you’re fantastic at representing Hugging Face and sharing your perspective with everyone, and we’ll have to do it again sometime soon. Thanks a lot, man.

Absolutely. Thank you for having me, I enjoyed this.


Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00