First there was Mamba⦠now there is Jamba from AI21. This is a model that combines the best non-transformer goodness of Mamba with good āol attention layers. This results in a highly performant and efficient model that AI21 has open sourced! We hear all about it (along with a variety of other LLM things) from AI21ās co-founder Yoav.
Yoav Shoham: But maybe some of it is common to others. So first of all, the baseline is a general-purpose, very capable model. Thereās a need for that. Now, there are companies who provide services using other peopleās models, and thatās totally legit. If you actually own the model, you can do things that you wouldnāt be able to otherwise. And our emphasis, in addition to the general capability of the model, is in order to make it practical, there are two things, especially in the enterprise.
[00:10:07.00] So if youāre using a chatbot to write a homework assignment, the stakes are low. A mistake doesnāt carry a big penalty, and probably nobody would read it anyway. But if youāre writing a memo to your boss, or to your prized client, and if youāre brilliant 95% of the time, and garbage 5% of the time, youāre dead in the water. And so reliability is key, and as we know, large language models are these amazing, creative, knowledgeable assistants, but probabilistic. And so you will get ā hereās another term I donāt like⦠Hallucination. But youāll get stuff that either isnāt grounded in fact, doesnāt make logical sense, and so on. And so you canāt do that. So you need to get high reliability. Thatās number one. Iāll tell you in a moment how we do that.
But the other thing, it needs to be efficient. For every customer query, youāre going to pay $10 to answer it, and itāll take you 20 seconds to answer it. Thatās no good either. So you need to address that also. So we have several things weāre doing in this regard. The first is what we call task-specific models. In addition to our general-purpose model, like Jamba, that came out, we provide language models that are tailored to specific use cases. You can think about it as a matrix. You have industries, and you have use cases, and it turns out that while initially you might think that āOh, Iām going to do a healthcare LLM, or financeā, thatās a little bit boiling the ocean. You want to be more specific, and one way to be specific is to think about what youāre going to use it for; these are the columns.
So for example, take summarization. Thatās a specific task, and I can optimize your system⦠And I am deliberately saying āsystemā and not ālanguage model.ā Iāll tell you in a moment why. But you can optimize that for that use case. So all companies now are experimenting with multiple solutions, as they should. And in this particular use case, a very large financial institution took several of their financial documents, several hundred, and tested various solutions; our task-specific model, in summarization, and some of the general-purpose models of other companies. And ours were just hands-down better in terms of the quality of the answers they got. There was no hallucination, if you pardon the expression; very on point, very grounded, and so on, because it was optimized for the task. But by the way, the system is a fraction of the size of a general-purpose model, so you get the answers immediately, and the cost of serving is low. And this enables use cases that those latency-immune economics enable use cases that would just be unrealistic otherwise. So our task-specific models are one approach. And maybe I wonāt overload my answer with saying why itās not only models, but weāll get to AI systems.
The other is - and itās related - having models that are highly efficient. That goes to Jamba, as an example of a model thatās very capable, but not big. If I were to jump ahead, and letās think about 2024⦠What are we going to see in this space? Among other things, you will see focus on total cost of ownership of the reality of serving these models; youāre going to see a focus on reliability, and youāre also going to see a focus on - another term I hate - agents, but AI systems that are more elaborate than this transactional interactions with [unintelligible 00:13:47.29] about tokens in, a few seconds, token back, thank you, onto the next one. More elaborate. So this is, I think, whatās going to happen technologically in the industry. Youāre also going to see correlated with that the industry move from what today is a mass experimentation to actual deployments. Weāre seeing signs of it now, and I think in ā24 youāll see this sort of phase shift there also.
Break: [00:14:16.02]