Mamba & Jamba with Yoav Shoham, Co-Founder at AI21 Labs (Practical AI #266)

All Episodes

First there was Mamba… now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good ‘ol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21’s co-founder Yoav.

Changelog++ members save 3 minutes on this episode because they made the ads disappear. Join!

41 minutes
Recorded Apr 17, 2024
Published Apr 24, 2024
Download (40MB)
Transcript
🎧 19,082

Featuring

Yoav Shoham – Twitter, LinkedIn
Chris Benson – Twitter, GitHub, LinkedIn, Website
Daniel Whitenack – Twitter, GitHub, Website

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title
1	00:00	Welcome to Practical AI
2	00:43	Introducing Yoav!
3	01:41	AI21's background
4	05:47	Enterprise use-cases
5	08:43	Deciding value
6	14:26	Sponsor: Changelog News
7	16:25	Architecting AI systems
8	19:39	Interacting with AIOS
9	22:09	Jamba heritage
10	24:47	Why switch architecture?
11	28:17	What it took to get here
12	32:40	Why go open source?
13	34:45	Jamba family innovation
14	37:04	Where is Jamba going?
15	39:37	Thanks AI21!
16	40:26	Outro

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another episode of the Practical AI podcast. My name is Daniel Whitenack. I am CEO and founder at Prediction Guard, and I’m joined as always by my co-host, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris?

Chris Benson

Doing great today, Daniel. How’s it going?

Daniel Whitenack

It’s going great. The sun is out, and summer is upon us, along with lots of new AI models and excitement going on in the space… And on that note, specifically as related to large language models, we’re really excited to have with us today Yoav, who is the co-founder and co-CEO of AI21 and Professor Emeritus at Stanford. Welcome, Yoav. How are you doing?

Yoav Shoham

I’m doing good. Really pleasure to be with you guys.

Daniel Whitenack

Yeah, we’re so excited to have you on. It’s a show we’ve been wanting to have for some time now. I’m wondering if you could kind of give us a little bit of the background of AI21, and specifically maybe how you view AI21 as fitting into this wider landscape of LLM companies and technology.

Yoav Shoham

So maybe a good starting point will be to say why we started the company in the first place a little over six years ago. We started the company because we believe that deep learning - remember, at the time LLMs were not a thing. But deep learning was, mostly applied to vision. We believe that modern day AI requires deep learning; it’s a necessary component, but not sufficient. We believe that certain aspects of intelligence, this thing we often call reasoning, will not emerge purely from the statistics, and it’s the sort of thing AI did back in the ‘80s. And we believe that we left money on the table, and it’s time to bring the two together. That’s why we started the company.

Now, fast-forward to today, what does the landscape look like and where do we fit in - so although I said that large language models… So very quickly we fell into LLMs. We were the heaviest users of GPT 3 when it came out, we decided to roll our own… And really, language is where the action is, because we often say that machine vision is a lens into the human eye, but language is a lens into the human mind. Because there’s no thought as intricate and nuanced as you want that can’t in some way be expressed in language. Vision isn’t an “easy” problem. Of course it’s not easy, but there’s something to understand that this is a phone… I don’t really care what the pixel is way on the side here. That’s always exactly true, but it’s really primarily true. But that’s not true with language. Language connections matter terribly. You change a word here, the whole meaning of the sentence changes in general. You can’t escape semantics when you deal with language. And so it’s harder, but if you crack it, that’s gold.

If you look at the enterprise - and from the beginning we were focused on the enterprise - 80% of the data in the enterprise is text. Mostly, I did not use the way underused. And there’s a really good opportunity there, and that’s kind of been our focus.

So of course, we’re not the only people with large language models. We are one of the handful of companies that do really large, very capable language models. Our first model was called Jurassic 1; it’s going back a few years. It was one of the most innovative models, which was a good workhorse. It was a GPT-like autoregressive left-to-right model, and at the time to slightly bigger, slightly better than GPT 3. Of course, both those models are by now eclipsed.

We very recently released our most recent model called Jamba, which is very interesting in a number of ways… And we can dig deeper, but maybe at 30,000 feet architecturally it’s different. It’s not pure transformer model, it really is mostly based on structured space state model, SSSM as they’re called… And we can speak about the advantages and disadvantages of those, but basically, we took that architecture and added elements of Transformers, the attention layer, to get the best of both worlds, and you get performance that is as good as any model of its size, better than most of its kind of size group… And extremely efficient. We have a context length that’s larger than any other model of this size. The version we release has a 250k context window length, although we trained it up to a million, and yet it all fits onto a single 80-gigabyte GPU. And so your show is titled Practical AI; this starts to make it practical.

Daniel Whitenack

That’s great. And speaking of practicalities, you mentioned the focus on enterprise from the beginning. You also mentioned that a lot of data in the enterprise is kind of locked up in this unstructured text. I remember when I first got into data science, the focus was “Oh, we’re going to do big data, and all of this cool analytics stuff with data warehouses”, and I think that’s sort of waned a little bit.

[00:06:12.12] I’m wondering if you could talk to that point… What types of value can enterprises get out of this sort of text that’s sitting around? Because I think maybe a lot of listeners, maybe they’ve tried these chat interfaces, whether it be ChatGPT, or Gemini, or whatever, but maybe they’re less exposed to the workloads that enterprises are doing with LLMs. So could you give us a picture of how enterprises are unlocking value with that kind of 80% of text data? …maybe just by way of example, or at a high level.

Yoav Shoham

Sure. And really, the use cases are quite broad. The industries are very broad, whether it’s finance, or healthcare, education, or you name it. And the use cases are varied. But to pick some concrete ones, let’s say you have manuals. There are companies with thousands of manuals. And whether it’s the end user wanting to – I recently had a new sort of oven-microwave combination, and for the life of me, I couldn’t find the relevant information in the manual. So I searched online… So it’d be really convenient to go and ask a question and get just the right answer. But even if it’s not the end user, it could be the tech support person, who themselves want to get quick answers. So that’s an example. We call this contextual answers.

Another would be summarization. Rather than response to a specific query, you have this 10k report that came out, and you want a pithy summarization of it. Maybe a summarization geared towards certain aspects you care about. So that’d be another use case. These are both ways of consuming data. There’s, of course – gen AI is a terrible name, but we won’t fight that battle.

Daniel Whitenack

You’re stuck with it.

Yoav Shoham

Well, if you’ll get me started, I’ll start complaining about gen AI, about AGI, and so on. [laughter] But certainly, some use cases call for producing information, not only consuming information. So for example, one of our use cases that’s very successful are product descriptions. You have companies, retailers and e-commerce companies who have thousands of products that come online constantly. And writing a product description is labor-intensive, error-prone, expensive, time-consuming… And we’re able to compress all of that dramatically. So these are some use cases.

Chris Benson

I’m kind of curious also, as you’re looking at these opportunities in the enterprise, and addressing these various use cases, as a company who is creating models and putting them out there for enterprises to use, for people not in the industry itself, how do you as a co-founder and CEO see your company as – how do you say “Let’s go do this”? We see the value in this compared to others that are making models. In other words, if you say “I’m going to make a model”, what is it about that motivation which makes you think you’ll make a difference in that enterprise market? And you’re kind of representing all companies that do so, just to shed some insight on how a founder thinks in this space?

Yoav Shoham

I wouldn’t purport to represent the entire industry, so I’ll speak for ourselves…

Chris Benson

Fair enough. I overshot on my asking. No worries.

Yoav Shoham

But maybe some of it is common to others. So first of all, the baseline is a general-purpose, very capable model. There’s a need for that. Now, there are companies who provide services using other people’s models, and that’s totally legit. If you actually own the model, you can do things that you wouldn’t be able to otherwise. And our emphasis, in addition to the general capability of the model, is in order to make it practical, there are two things, especially in the enterprise.

[00:10:07.00] So if you’re using a chatbot to write a homework assignment, the stakes are low. A mistake doesn’t carry a big penalty, and probably nobody would read it anyway. But if you’re writing a memo to your boss, or to your prized client, and if you’re brilliant 95% of the time, and garbage 5% of the time, you’re dead in the water. And so reliability is key, and as we know, large language models are these amazing, creative, knowledgeable assistants, but probabilistic. And so you will get – here’s another term I don’t like… Hallucination. But you’ll get stuff that either isn’t grounded in fact, doesn’t make logical sense, and so on. And so you can’t do that. So you need to get high reliability. That’s number one. I’ll tell you in a moment how we do that.

But the other thing, it needs to be efficient. For every customer query, you’re going to pay $10 to answer it, and it’ll take you 20 seconds to answer it. That’s no good either. So you need to address that also. So we have several things we’re doing in this regard. The first is what we call task-specific models. In addition to our general-purpose model, like Jamba, that came out, we provide language models that are tailored to specific use cases. You can think about it as a matrix. You have industries, and you have use cases, and it turns out that while initially you might think that “Oh, I’m going to do a healthcare LLM, or finance”, that’s a little bit boiling the ocean. You want to be more specific, and one way to be specific is to think about what you’re going to use it for; these are the columns.

So for example, take summarization. That’s a specific task, and I can optimize your system… And I am deliberately saying “system” and not “language model.” I’ll tell you in a moment why. But you can optimize that for that use case. So all companies now are experimenting with multiple solutions, as they should. And in this particular use case, a very large financial institution took several of their financial documents, several hundred, and tested various solutions; our task-specific model, in summarization, and some of the general-purpose models of other companies. And ours were just hands-down better in terms of the quality of the answers they got. There was no hallucination, if you pardon the expression; very on point, very grounded, and so on, because it was optimized for the task. But by the way, the system is a fraction of the size of a general-purpose model, so you get the answers immediately, and the cost of serving is low. And this enables use cases that those latency-immune economics enable use cases that would just be unrealistic otherwise. So our task-specific models are one approach. And maybe I won’t overload my answer with saying why it’s not only models, but we’ll get to AI systems.

The other is - and it’s related - having models that are highly efficient. That goes to Jamba, as an example of a model that’s very capable, but not big. If I were to jump ahead, and let’s think about 2024… What are we going to see in this space? Among other things, you will see focus on total cost of ownership of the reality of serving these models; you’re going to see a focus on reliability, and you’re also going to see a focus on - another term I hate - agents, but AI systems that are more elaborate than this transactional interactions with [unintelligible 00:13:47.29] about tokens in, a few seconds, token back, thank you, onto the next one. More elaborate. So this is, I think, what’s going to happen technologically in the industry. You’re also going to see correlated with that the industry move from what today is a mass experimentation to actual deployments. We’re seeing signs of it now, and I think in ‘24 you’ll see this sort of phase shift there also.

Break: [00:14:16.02]

Daniel Whitenack

Yoav, I love that you bring in this element of thinking about AI systems, not just large language models or the model. Maybe that ties a little bit into what you were just talking about, about more complicated workloads or automations that are likely coming as part of the solutions that people are building… But I’m wondering if you could comment on that. Where does systematic thinking and the thinking about architecting AI systems fit within what you’re seeing people do now, and what you think needs to happen for them to get value out of these models?

Yoav Shoham

So the part of the answer that I’m comfortable speaking about has to do with what is out there already, and the other is I’ll speculate maybe at a little more higher level. So even if you look at task-specific models, they’re really not models. They’re little systems. So when you, say, want to do summarization, and you say “I care about these elements”, a little data processing and reasoning goes on before you call the language model. So you feed it; you don’t stick it into the context, you actually do some reasoning, so you can steer the model in the right direction. And then when you get something back, you don’t just spit it out. You don’t sort of sample [unintelligible 00:17:38.25] and give the top manager. You get answers, and you evaluate them with validators, and only when you’re confident that the answer is legit, you return it to the user. And it may sound very expensive, but actually the operation of an LLM totally dominates in terms of the compute resources and time, these other elements. And that’s an example of a system around the language model, but that’s a baby step. What you’re gonna see is - and you’re already seeing it now, but right now it’s people touching parts of the elephant, and doing it in a very ad hoc-y way… But you’re gonna see people stitching together multiple calls to a language model, because a task may require multiple things… And it’s not just chaining; it can be more complicated scripts that you’re running. But you can’t just do it. It’s not like [unintelligible 00:18:32.02] scripting language and running it, because the computing elements here are different. They’re expensive, and they’re error prone, and if you just for example cascade calls to a language model, number one, it can be very expensive, and second, these errors compound, and you get at the end much more noise than signal. And so you need to worry about that. You need to execute differently.

And so that’s an example of what you’ll see… And there are other aspects of these AI systems that you’ll see come into play. The term “orchestration” is often used here. It means different things to different people, but very much you have these elements that are running, either sequentially or in parallel, and somehow you need to execute this execution, kind of like an operating system, but an operating system with AI elements. And so we and other people use the term AI OS - again, an overloaded term; it doesn’t mean anything precise. But that’s the spirit of things.

Daniel Whitenack

I kind of want to get maybe to the roles that are interacting with this AI OS, because I think one of the things people are struggling with is “How do I put the right talent in place to build these–” Because you’re talking about programmatic, operational, systematic thinking, which is kind of – like, there’s an element of engineering there. But it’s not people that are necessarily building their own models; they’re architecting these solutions and putting the right checks, the right validations in place, they’re creating more than chains, these workflows…

[00:20:15.03] And there’s some engineers coming to the table there, but there’s also domain experts who maybe are able to speak into some of how the models are prompted… So do you have any kind of observations from your experience with how people are putting together teams to architect these solutions and these systems, like you’ve just described? Is it from your perspective still going to be a heavy kind of engineering dominated type of process going forward? Or are you seeing a mix? What’s your observation there?

Yoav Shoham

So my answer won’t be based on an observation, because the systems don’t exist yet. They’re baby solutions right now, but I don’t think they represent what we’ll see going forward. But in answering your question, it very much will be a mix. There will be companies such as ours that will put in the foundational infrastructure to run these complicated flows. These will have to be extensible systems, and they’ll be extensible in a variety of ways. Some of them absolutely you’ll be able to have programmers write the actual code and insert the code there… But there absolutely will be a role for low code or even no code specification of the flow you want on top of this framework. There will be a data scientist that will write validations of various kinds, and data pipelines, for sure…

And so I think everybody from the developer, to the data scientist, to the business user, who’s somewhat savvy, to the end user, who just wants a system that works, everybody will have a role in the interaction. And we haven’t mentioned DevOps yet. DevOps here is going to be very important also.

Chris Benson

As we’ve kind of talked around the ecosystem a little bit, and about systems themselves, can we turn a little bit, and can you tell us a little bit about, as we’re leading toward into Jamba, but I’d like to know a little bit about kind of where the company has been, and some of the models that you have put out there leading into this one, and kind of the heritage of how you’ve developed that. I would really be interested in kind of how you’ve pursued that since you started the company.

Yoav Shoham

I can divide it into three periods in our long history of six years.

Chris Benson

That’s an eon in AI these days.

Yoav Shoham

I had a different color hair when we started. As I said, we’ve started by building the Jurassic one. We just felt like we absolutely had to build it, and we innovated there, but in a minor way. We had a vocabulary that was five times the size of what was common at the time. Rather than 50,000 tokens, we had 250,000. It was slightly larger than GPT 3. Not to make a point just because it works out that way; 178 billion parameters, a dense model. And that served us well. But the next phase in our sort of – so we did many things. We had our own application called [unintelligible 00:23:22.26] done very well, a reading and writing assistant using our technology… But on the models themselves, the next thing we put out are task-specific models, which basically - it’s not really distillation, and it’s not just fine-tuning. Like I said, it’s putting a system around it, but at the end of the day you get something compact for certain use cases, and as I said, that’s growing. That is our second phase.

[00:23:50.26] And the third phase was really seeking a way to make these models fundamentally more scalable, more efficient to serve, especially in this era of RAG kind of solutions. So you have stuff that you want to kind of bring in at inference time to influence the output of the system… And at some point, the system chokes. We had a context window of 4k, and then 8k, and then 16k… Now, although some bigger numbers are thrown out, but most models choke at 32k, maybe 64k. That’s not enough if you want to put it – so we wanted something that… Now, if you were to run it on 64 H100’s, you can do a lot of things. But that’s not realistic. So the question was how to get something that’s efficient, that can run effectively on a small footprint, and that’s how we got to Jamba.

Daniel Whitenack

With Jamba you mentioned taking some things from kind of the Mamba architecture, this sort of SSM, and adding in some transformer-based things. For those that aren’t familiar with the kind of background, with those types of models, maybe the kind of non-transformer models that people were exploring, could you give a little bit of context to that and why it was important for – I mean, you’ve already mentioned efficiency and other things, but why you felt it was kind of important in this generation of model to pull the trigger in a slightly different architectural direction?

Yoav Shoham

Sure. And for this maybe we can double-click a little bit about how these systems are architected. So at some point, the dominant architecture where the RNN – you know, these, and then LSTMs… As you go left to right, the system doesn’t remember the distant past, what he does; it carries with it a state that somehow encapsulates everything that it’s seen so far. That’s quite powerful. But as this past gets long, it gets harder and harder to encode and access that information that has been encoded. And it worked fine for vision, because in vision, object recognition is something very local. It’s iconic; iconic in the sense that what you see is what you get. Like I said, this phone - this is a phone; I don’t care what’s here. So I go along, I hit the phone… So I don’t need to remember. But language - it’s different.

And in fact, if you looked at the benchmarks - by the way, another pet peeve of mine; benchmarks can be very misleading. But that aside, if you looked at the national language benchmarks, they kind of puttered along with not much progress, until transformers came in. And transformers, again, coincidentally - what is it, about six years now? They changed the architecture, and they had the attention mechanism that says “No. I mean, as I’m going along, I can relate disparate pieces of information.” And that allowed you to do things you couldn’t do otherwise. And that’s great.

So the [unintelligible 00:26:50.21] You pay a price, because the complexity is quadratic now in the context length. And that kills you. Which wasn’t the case with RNNs or LSTMs. There it’s linear. So the question is how can you have your cake and eat it, too? Enjoy the benefits of being [unintelligible 00:27:11.21] disparate pieces of information, and yet have something that if it’s not linear, close to linear.

And so Mamba – first, I’ll say Mamba is a straight kind of left-to-right what’s called SSM model. And this structure saves space. But innovation-wise it was a version that allowed you to actually parallelize the training, and much more efficient… But it still suffered from the lower quality of answers. And so what our guys did was say “Okay, we’ll take this as a basic building block”, and Mamba is four months old now. It was Canary recently.

Daniel Whitenack

Yeah.

Yoav Shoham

[00:27:51.18] But I said “That seemed like a really good idea… But let’s now take elements of the transformer architecture and put it in”, so every few - in our case it was every 8 or 16, depending on which version - layers, you put an attention mechanism. So you take a little performance hit, but not nearly as much as if you had transformers all the way. So that’s kind of how it led to this particular architecture.

Daniel Whitenack

Well, Yoav, you did mention that Mamba is only a recently released architecture, and published architecture… But you’ve been able to move quite quickly, and I want to talk a little bit about Jamba and the release and all of that, but prior to that, it might be interesting for listeners - you know, most of our listeners aren’t sitting in a company that is trying to be a foundation model builder, building these kind of more general-purpose models… I’m wondering if you could give a picture a little bit behind the scenes, whatever you think would be interesting, on what does it actually take to go from “Hey, this idea we want to mix, kind of get the best of both worlds with Mamba and transformers”, all the way to “Hey, here’s our blog post. We’re releasing a model.” What were some of the challenges in that kind of middle zone, and what is that process like to determine, from dataset to exact architecture, and sort of final training runs?

Yoav Shoham

So first, I’ll say that I don’t think that everybody needs to be building foundation models. But as I said to somebody, organizations are technical, and want to remain relevant; even if they’re not building foundation models, they should understand how they’re built. And if they really put their mind to it and the resources, they could build one… Because it really gives you a visceral, deep sense of what’s going on.

Now, regarding the Jamba, we actually try to be very transparent. So this is our first open source model, and the reason we did it was that it is very novel, and there’s lots of more experimentation to be done here, optimization, serving the – you know, training use models can’t be done on every type of infrastructure. Serving them similarly. And when you do serve them right now, we’ve had several years to optimize the serving of transformers. We want to enable the community to innovate here. And so we were quite explicit in our white paper, perhaps unusually so relative to the industry. So to the listeners who want to kind of get the nitty-gritty, I really encourage them to look at the technical whitepaper.

But I can tell you there’s been a ton of experimentation [unintelligible 00:30:41.15] that our guys did, trading off lots of – people use the term hyperparameters; a lot of things are very different from one another. But how many layers do you want? And how many Mamba layers, how many attention layers, batch sizes? …all kinds of stuff that – you know, what actually makes the difference… It’s hard to sometimes understand what makes the difference. And again, we tried to share the – for example, I said that Mamba’s performance doesn’t compete with the performance of comparably-sized transformer models. But when you look at the details, it’s actually quite competitive on many of the benchmarks. But then there are a few that it’s really bad at. And that gives you a clue of why that’s the case. It can latch on to surface formulations and syntax that the transformers managed to just abstract away from. And so we describe how you make this observation, and you correct for it. So there’s a lot of details that go into making these decisions.

And then there’s also pragmatic decisions. For example, we wanted a model that will fit on a single 80-gigabyte GPU. That was a design decision. And from that emanated a few things. We did put a bigger model, and certain context windows will fit there, others won’t… Still, 256k is humongous, compared to the alternative… But we can also do a million and larger, but not on a single GPU. And so those are some of the design decisions and the rationale. Honestly, it is a process – although condensed, a process that involved hundreds of decisions that it led to what we put out.

Chris Benson

[00:32:34.09] That was a really great explanation. I appreciate that. As you were going through it and I was thinking about the applicability for Jamba in the enterprise, and kind of bringing the innovation, I’m curious, is why – I know you had kind of alluded to the fact that Jamba early in the explanation was kind of the first open source model… And so I was wondering, as you’re trying to enable enterprise innovation, what was the change in your thought process that made you decide to go open source with Jamba, versus the earlier models? What was the thinking around that? I was curious, as you said, and wanted to wait till we got to the end.

Yoav Shoham

Yeah, it really was very simple. We felt like if we were the only ones augmenting and pushing out this model, it wouldn’t advance as fast as it could. And we saw that within days of our putting it out there there was… I think today – I haven’t tracked it, but when I looked at it about a week ago, there were 30,000 downloads, and I forget how many forks, but a large number of forks.

So either way, very important to say what we put out is a base model, not a fine-tuned model. And we were very clear about it, and we cautioned people for using it for production purposes, or for a user-facing application. And of course, we’ll be coming up with our – in fact, we’ve announced that it’s available for preview, our aligned model. But we felt like it was really important for the community to add value to this architecture, and that’s why we did it.

Daniel Whitenack

For those that are listening a little bit later on the podcast… So it looks like Jamba - at the time we’re recording, this was released, at least on Hugging Face. Well, it was updated 15 days ago, and I see the blog post at the end of March, I believe. But now on Hugging Face there’s sort of 38 models I see with Jamba in the name; that’s sort of not including those maybe that forked and just created their own special name. So already you’re seeing this kind of explosion of a model family, I guess, which is quite interesting.

I’m wondering, over time, as a company, you mentioned kind of not being the only ones working on the model family and wanting to see it become more… Is that observation kind of based on what you’ve seen in other model families, whether be LLaMA 2, or Mistral, and others. Because when I look at a model like that that’s released, I almost immediately – you mentioned DevOps; people have automated pipelines in place to create the quantized version of this, or fine-tune it for that on their dataset. We had Nous Research – we had a discussion about Nous Research and what they’re doing in some of these areas as well… So what is the sort of innovation that you’re hoping for with the kind of Jamba model family? You’re releasing the base model, there could be fine-tunes, but I think also there could be much more than that. So what are you kind of hoping to see, as people get hands-on with the model and try to explore various elements of how to use it?

Yoav Shoham

[00:35:50.08] Yeah. Fine-tuning is happening, and will happen… Like I said, we have our own fine-tuned or aligned model… But that’s not the reason we put it out there. The reason we put it out there is that people can contribute to the very model, so others can benefit from it. And I think there’s at least two areas where a lot of value can be brought. One is serving efficiency. For example, when you consume it on Hugging Face, it’s less efficient than we consume it on our platform, because we have optimized the serving. And we’ll continue to optimize. But there’s a lot of smart people out there, and we’d love for them to optimize it further, and everybody will benefit, including us. That’s one thing.

The other thing - we would really value it if this kind of model were able to be trained on multiple types of infrastructure, which currently isn’t the case. And so I think by putting it out there, people now - they can look at the whitepaper, they can look at the model, and they can now enable for the training of such models… Which will benefit everybody, including us.

Chris Benson

So as we start to wind up here… Fascinating discussion; thank you very much for taking us through all the insight. I’d like to wind up asking kind of where you think things are going, and if you could address it potentially at two levels, both kind of where your own organization expects to go, what kind of thinking you have, over whatever horizon is on your mind, but also - give us insight into how you think the industry as a whole is progressing, and how you expect that kind of servicing the enterprise need to evolve with the strategies that are out there. We’d love to understand how you’re seeing the world in that way.

Yoav Shoham

I think the key notion is reliability. Trust and reliability. You need to have the same kind of trust in the system to be able to predict what they’ll do, be able to understand what they did, as you do with other pieces of software. We always have errors; even the Pentium had a bug. But that’s an exception. Whereas currently, it’s the rule for language models. That can’t be in the enterprise. And everything that I think about what’s going to happen in the enterprise orients around that. I think you’ll see special-purpose models like our task-specific models. I think you’ll see AI systems increasingly sophisticated and robust. Right now they’re not robust, they’re experimental, but we’ll see even more AI systems… And I think - this may sound philosophical, so bear with me… But there’s a question within the AI community - do these language model actually understand what they’re talking about? They spit out this incredibly convincing stuff, very smart, sometimes on point… And how can they not understand? And sometimes they’re totally stupid. And everybody – we all have favorite examples. And I think we need to get to the point where we believe that the systems actually understand what they’re talking about. And what understanding is is - again, it sounds philosophical, and there’s a philosophical aspect to it, for sure… But it has very practical ramifications. So when I think about the future, the only pragmatic things - task-specific models, AI systems… But in the background, this notion of understanding. These systems need to really understand. That’s what I’m looking at.

Daniel Whitenack

Yeah, that’s great. Well, I think, as a part of the development towards that, certainly open models and innovation around these model families, like we talked about, I hope is a key piece of that. And from a member of the community - I just want to express my thanks to AI21 for being a leader, both in terms of the thinking, and infrastructure, and innovation in this area, but also a leader in terms of putting things out there for the community to work on as a community. So thank you for what you’ve done with Jamba, and really excited to follow AI21 and where you’re headed next… So thank you so much for joining us, Yoav. It’s been a pleasure.

Chris Benson

Thanks very much for having me.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Practical AI – Episode #266

Mamba & Jamba

with Yoav Shoham, Co-Founder at AI21 Labs

Featuring

Featuring

Sponsors

Notes & Links

Chapters

Transcript