Practical AI ā Episode #254
Large Action Models (LAMs) & Rabbits š
get fully-Connected with Chris & Daniel
Recently the release of the rabbit r1 device resulted in huge interest in both the device and āLarge Action Modelsā (or LAMs). What is an LAM? Is this something new? Did these models come out of nowhere, or are they related to other things we are already using? Chris and Daniel dig into LAMs in this episode and discuss neuro-symbolic AI, AI tool usage, multimodal models, and more.
Featuring
Sponsors
Read Write Own ā Read, Write, Own: Building the Next Era of the Internetāa new book from entrepreneur and investor Chris Dixonāexplores one possible solution to the internetās authenticity problem: Blockchains. From AI that tracks its source material to generative programs that compensateārather than cannibalizeācreators. Itās a call to action for a more open, transparent, and democratic internet. One that opens the black box of AI, tracks the origins we see online, and much more. Order your copy of Read, Write, Own today at readwriteown.com
Shopify ā Sign up for a $1/month trial period at shopify.com/practicalai
Fly.io ā The home of Changelog.com ā Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
Notes & Links
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:07 | Welcome to Practical AI | 00:36 |
2 | 00:43 | Daniel is in Atlanta? | 01:06 |
3 | 01:49 | Daniel's embarrassing moment | 00:59 |
4 | 02:48 | New AI devices | 00:41 |
5 | 03:28 | Where does privacy fit? | 01:19 |
6 | 04:47 | We've been giving data for years | 02:12 |
7 | 06:59 | Perception of humanity | 02:28 |
8 | 09:27 | A look at the rabbit r1 | 02:16 |
9 | 11:43 | Why hardware? | 01:49 |
10 | 13:32 | Moving past our phones | 01:15 |
11 | 14:46 | Sponsor: Read Write Own | 01:08 |
12 | 15:54 | Explaining the LAM | 04:34 |
13 | 20:28 | Different inputs for models | 02:42 |
14 | 23:10 | Integrating external systems | 02:56 |
15 | 26:06 | Structuring an unstructured world | 03:41 |
16 | 29:59 | Sponsor: Shopify | 02:18 |
17 | 32:35 | Origins of LAMs | 06:00 |
18 | 38:35 | How instruments could be used as inputs | 02:43 |
19 | 41:18 | If this approach sticks | 03:42 |
20 | 45:00 | Predictions on LAMs | 01:14 |
21 | 46:14 | This has been fun! | 01:05 |
22 | 47:27 | Outro | 00:45 |
Transcript
Play the audio to listen along while you enjoy the transcript. š§
Welcome to another Fully Connected episode of the Practical AI podcast. In these Fully Connected episodes we try to keep you up to date with everything thatās happening in the AI and machine learning world, and try to give you a few learning resources to level up your AI game. This is Daniel Whitenack. Iām the founder and CEO of Prediction Guard, and Iām joined as always by my co-host, Chris Benson, whoās a tech strategist at Lockheed Martin. How are you doing, Chris?
Doing very well, Daniel. Enjoying the day. And by the way, since youāve traveled to the Atlanta area tonight, we havenāt gotten together, but youāre just a few minutes away, actually, soā¦ Welcome to Atlanta!
Just got in. Yeah, weāre within - not maybe a short drive, depending on your view of what a short drive isā¦
Anything under three hours is short in Atlanta, and I think youāre like 45 minutes away from me right now.
Yeah, so hopefully weāll get a chance to catch up tomorrow, which will be awesome, because we rarely get to see each other in person. Itās been an interesting couple of weeks for me. So for those that are listening from abroad maybe, we had some major ice and snow type storms recently, and my great and embarrassing moment was I was walking back from the office, in the freezing rain, and I slipped and fell, and my laptop bag, with laptop in it, broke my fall, which is maybe goodā¦ But that also broke the laptop. So ā actually, the laptop works, itās just the screen doesnāt work, so maybe Iāll be able to resolve thatā¦
Itās like a mini portable server there, isnāt it?
Yeah, exactly. You have enough monitors around, itās not that much of an issueā¦ But yeah, I had to put Ubuntu on a burner laptop for the trip. So yeahā¦ Itās always a fun time. Speaking of personal devices, thereās been a lot of interesting news and releases, not of ā well, I guess of models, but also of interesting actual hardware devices related to AI recently. One of those is the Rabbit R1, which was announced and sort of launched to preorders with a lot of acclaim.
Another one that I saw was the AI PIN, which is like a little ā I donāt know, my grandma would call it a brooch, maybe. Like a large pin that you put on your jacket, or something like thatā¦ I am wondering, Chris, as you see these devices - and I want to dig a lot more into some of the interesting research and models and data behind some of these things like Rabbitā¦ But just generally, what are your thoughts on this sort of trend of AI-driven personal devices to help you with all of your personal things, and plugged in to all of your personal data, and sort of AI attached to everything in your life?
Well, I think itās coming. Maybe itās hereā¦ But I know that I am definitely torn. I mean, I love the idea of all this help along the way. Thereās so many ā I forget everything. Iām terrible. If I donāt write something down and then follow up on the list, I am not a naturally organized person. My wife is, and my wife is always reminding me that I really struggle in this area. And usually sheās not being very nice in the way that she does it. Itās all love, Iām sureā¦ But yeah, so part of me is like āWow, this is the way I can actually be all there, get all the things done.ā But the idea of just giving up all my data, and just being ā like so many others, that aspect is not appealing. So I guess Iām ā Iām not leaping.
How much different do you think this sort of thing is than everything we already give over with our smartphones?
Itās a good point youāre making.
I mean, weāve had computing devices with us in our pocket or on our person 24/7 for at least the past 10 years; at least for those that have adopted the iPhone or whatever, when it came out. But yeah, so in terms of location, certainly account access and certain automations, what do you think makes ā because obviously, this is something on the mind of the makers of these devices, because I think both the AI PIN and the Rabbit make some sort of explicit statements in their launch, and on their website about āPrivacy is really important to us. This is how weāre doing things, because we really care about this.ā So obviously, they anticipated some kind of additional reaction. But we all already have smartphones. I think most of us, if we are willing to admit it, we know that weāre being tracked everywhere, and all of our data goes everywhereā¦ So I donāt know, what is it about this AI element that you think either makes an actual difference in terms of the substance of whatās happening with the data? Or is it just a perception thing?
[00:06:11.02] Itās probably a perception thing with me. Because everything that you said, I agree with; youāre dead on. And weāve been giving this data for years, and weāve gotten comfortable with it, and thatās just something that we all kind of donāt like about it, but weāve been accepting it for years. And I guess itās the expectation with these AI assistants that weāve been hearing about for so long coming, and weāre starting to see things like the Rabbit come into market, and such, that thereās probably a whole new level of kind of analysis of us, and all the things, and in a sense knowing you better than you do, that is uncomfortable and probably will not be as uncomfortable in the years to come, because weāll grow used to that as well. But I have to admit, right now itās an emotional reaction, and it makes me a little bit leery.
Yeah, maybe itās prior to these sorts of devices there was sort of the perception at least that āYes, my data is going somewhere. Maybe thereās a nefarious person behind this, but thereās sort of a person behind this. Like, the data is going all to Facebook or Meta, and maybe theyāre even listening in on me, and putting ads for mattresses in my feedā, or whatever the thing isā¦ So that perception has been around for quite some time, regardless of whether Facebook is actually listening in or whatever. Or itās another party, like the NSA and the governmentās listening inā¦ But I think all of those perceptions really relied on this idea that even if thereās something bad happening, that I donāt want happening with my data, thereās sort of a group of people back there doing something with it. And now thereās this sort of idea of this agentic entity behind the scenes thatās doing something with my data, without human oversight. I think maybe thatās ā if thereās anything sort of fundamentally different here, I think itās the level of automation and sort of agentic nature of this, which does provide some sort of difference. Although thereās always like - you know, if youāre processing voice or something, thereās voice analytics, and you can put that to text, andā¦ Then there are always NLP models in the background doing various things, or whatever. So thereās some level of automation thatās already been there, butā¦
I agree. You mentioned perception up front, and I think that makes a big differenceā¦ And you mentioned NSA. Intelligence agencies - I think we all just assume that theyāre all listening to all the things, all the time now, and thatās one of those things thatās completely beyond your control. And so thereās almost no reason to worry about it, I suppose, unless you happen to be one of the people that an intelligence agency would care about, which I donāt particularly think I am. So it just goes someplace and you just kind of shrug it off.
Thereās a certain amount of what weāve done these years with mobile, where youāre opting in. I think itās leveling up, as weāre saying, with some of these AI agents coming out; we know how much data about ourselves is going to be there, and so itās just escalating the opt-in up to a whole new level. So hopefully weāll see what happensā¦ I hope it works out well.
Yeah. We havenāt really ā for the listeners maybe that are just listening to this and havenāt actuallyā¦ Maybe youāre in parallel doing the search and looking at these devices, but in case youāre on your run, or in your car, we can describe a little bitā¦ So I described the AI PIN thing a little bitā¦ The Rabbit I thought was a really, really cool design. I donāt know if thereās any nerds out there that love the sort of synthesizer analog sequencer, teenage engineering stuff thatās out thereā¦ But actually, the sort of hardware design teenage engineering was involved in that in some way.
[00:10:05.03] So itās like a little square thing, the Rabbit R1. Itās got like one button you can push and speak a command; itās got a little actual hardware wheel that you can spin to scroll, and the screen is kind of just - they show it as black most of the time, but it pops up with the song youāre playing on Spotify, or some of the things you would expect to be happening on a touchscreen, or that sort of thingā¦ But the primary interface is thought to be in my understanding speech; not that you would be pulling up a keyboard on the thing and typing in a lot. Thatās kind of not the point. The point would be this sort of speech-driven conversational - and Iād even call it an operating system - conversational operating system to do certain actions or tasks, which weāll talk a lot more about the kind of research behind thatā¦ But thatās kind of what the device is, and looks like.
Itās interesting that, going with the device route, and the fact that theyāre selling the actual unit itselfā¦ And over the years we started on our computer, or we started on desktops, and then went to laptops, and then went to our phonesā¦ And the phones have evolved over time. And weāve been talking about wearables and things like that over the years as theyāve evolved, but I think thereās a little bit of a gamble in actually having it as a physical device, because thatās something else that theyāre presuming youāre gonna put at the center of your life. That versus being kind of the traditional phone app approach, where youāre using the thing that your customer already has in their hands. What are your thoughts about the physicalness of this offering?
I think itās interestingā¦ One of the points, if you watch the release or launch or promotion video for the Rabbit R1, he talks about sort of the app-driven nature of smartphones, and thereās an app for everythingā¦ And thereās so many apps now that navigating apps is kind of a task in and of itself. And that Silicon Valley meme, āNo one ever deletes an appā, right? So you just accumulate more and more apps, and they kind of build up on your phone, and now you have to organize them into little groupings, or whateverā¦ So I think the point being that itās nice that thereās an app for everything, but the navigation and orchestration of those various apps is sometimes not seamless, and burdensome. Iām even thinking about myself, and kind of checking over here ā I got into Uber, oh, I forgot to switch over my payment on my Uber app, so now Iāve got to open my bank app, and then grab my virtual card number, and copy that overā¦ But then Iāve got to go to my password management app to copy my passwordā¦ Thereās all these sorts of interactions between various things that arenāt as seamless, as you might think they would be. But itās easy for me to say in words, conversationally, āHey, I want to update the payment on my current Uber rideā, or whatever. So the thought that that would be an easy thing to express conversationally is interesting; and then have that be accomplished in the background, if it actually works, is also quite interesting.
1
I agree with that. And I canāt help but wonder, if you look back at the advent of the phone, and the smartphone, and the iPhone comes out, and it really isnāt really so much a phone anymore, but a little computerā¦ And so the idea of the phone being the base device in your life has been something that has been with us now for over 15 years. And so one of the things I wonder is, could there be a trend where maybe the phone doesnāt become ā if you think about it, youāre texting, but a lot of your texting isnāt really texting, itās messaging in appsā¦ Maybe the phone is no longer the central device in your life going forward, and maybe youāre actually having your primary thing. That would obviously play into Rabbitās approach, where theyāre giving you another device, it packages everything together in that AI OS that theyāre talking about, where conversationally it runs your life, if you expose your life to it the way you are across many apps on the phoneā¦ But itās an opportunity potentially to take a left turn with the way we think about devices, and maybe in the not so distant future maybe the phone is no longer the centerpiece.
Alright, Chris, well, thereās a few things interacting in the background here in terms of the technology behind the Rabbit device, and Iām sure other similar types of devices that have come out. Actually, thereās some of this sort of technology that weāve talked a little bit about on the podcast before. I donāt know if you remember we had the episode with AskUI, which - they had this sort of multi-modal model; I think a lot of their focus over time was on testing. A lot of people might test web applications or websites using something like Selenium, or something like that, that automates desktop activity or interactions with web applicationsā¦ And actually automates that for testing purposes or other purposes. AskUI had some of this technology a while back to kind of perform certain actions using AI on a user interface without sort of hard coding; like, click on 100 pixels this way, and 20 pixels down this way. So that I think has been going on for some time.
This adds a sort of different element to it, in that thereās the voice interactionā¦ But then theyāre really emphasizing the flexibility of this, and the updating of itā¦ So actually, they emphasize ā I think some of the examples they gave is I have a certain configuration on my laptop or on my screen that Iām using with a browser, with certain plugins that make it look a certain wayā¦ And everything sort of looks different for everybody, and itās all configured in their own sort of way. Even app-wise, apps kind of are very personalized now, which makes it a challenge to say āClick on this button at this place.ā It might not be at the same place for everybody all the time. And of course, apps update, and that sort of thing.
So the solution that Rabbit has come out with to deal with this is what theyāre calling a large action model. And specifically, theyāre talking about this large action model being a neurosymbolic model. And I want to talk through a little bit of that. But before I do, I think we sort of have to back up and talk a little bit about AI models, large language models; ChatGPT has been interacting with external things for some time now, and I think thereās confusion at least about how that happens, and what the model is doingā¦ So it might be good just to kind of set the stage for this in terms of how these models are interacting with external things.
The way that this looks, at least in the Rabbit case, is you click the button and you say āOh, I want to change the payment card on my Uber [unintelligible 00:19:07.21] and stuff happens in the background and somehow the large action model interacts with Uber, and maybe my bank app or whatever, and actually makes the update. So the question is how this happens. Have you used any of the plugins or anything in ChatGPT, or the kind of search the web type of plugin to a chat interface, or anything like that?
Absolutely. I mean, thatās what makes the ā I mean, I think people tend to focus on the model itself. Thatās where all the glory is, and people say āAh, this model versus that.ā But so much of the power comes in the plugins themselves, or other ways in which they interact with the world. And so as weāre trying to kind of pave our way into the future and figure out how weāre going to use these, and how theyāre going to impact our lives, whether it be the Rabbit way, or whether youāre talking ChatGPT with its plugins - thatās the key. Itās all those interactions, itās the touchpoints with the different things that you care about which makes it worthwhile. So yes, absolutely, and Iām looking forward to doing it [unintelligible 00:20:12.05]
[00:20:14.27] Yeah. So thereās a couple of things maybe that we can talk about, and actually, some of them are even highlighted in recent things that happened, that we may want to highlight also. One of those is, if you think about a large language model like that used in ChatGPT, or NeuralChat, LLaMA 2, whatever it isā¦ You put text in, and you get text out. Weāve talked about that a lot on the show. So you put your prompt in, and you get a completion, itās like fancy autocomplete, and you get this completion out. Not that interesting.
Weāve talked a little bit about RAG on the show, which means I am programming some logic around my prompt such that when I get my user input, Iām searching some of my own data or some external data that Iāve stored in a vector database, or in a set of embeddings, to retrieve text thatās semantically similar to my query, and just pushing that into the prompt as a sort of grounding mechanism to sort of ground the answer in that external data. So youāve got sort of basic autocomplete, youāve got retrieval to insert external data via a vector database, youāve got some multimodal inputā¦ And by multimodal models, Iām meaning things like LLaVA. And actually, this week there was a great - published on January 24th, I saw it in the daily papers on Hugging Faceā¦ āMM LLMs: Recent advances in multimodal large language models.ā So if youāre wanting to know sort of the state of the art and whatās going on in multimodal large language models, I just mentioned - thatās probably a much deeper dive that you can go into. So check out that, and weāll link in our show notes.
But these are models that would not only take a text prompt, but might take a text prompt paired with an image, right? So you could put an image in, and you say ā also have a text prompt that says āIs there a raccoon in this image?ā And hopefully the reasoning happens and it says yes or no if thereās a ā
Is there always a raccoon in the image?
Thereās always a raccoon everywhereā¦ Thatās one element of this; that would be a specialized model that allows you to integrate multiple modes of data. And thereās similar ones out there for audio, and text, and other things. So again, in summary, youāve got text-to-text autocomplete, youāve got this retrieval mechanism to pull in some external text data into your text prompt, youāve got specialized models that allow you to bring in an image in textā¦ All of thatās super-interesting, and I think itās connected to what Rabbit is doing. But thereās actually more to whatās going on with, letās say when people perform actions on external systems, or integrate external systems with these sorts of AI models. And this is what in the sort of Langchain world, if youāve interacted with Langchain at all, they would call this maybe tools. And you even saw things in the past like ToolFormer and other models where the idea was āWell, okay, I have - maybe itās the Google Search APIā, or one of these search APIs, right? I know that I can take a JSON object, send it off to that API, and get a search result, right? Okay, so now if I want to call that search API with an AI model, what I need to do is get the AI model to generate the right JSON-structured output that I can then just programmatically - not with any sort of fancy AI logic, but programmatically - take that JSON object and send it off to the API, get the response, and either plug that in in a sort of retrieval way that we talked about beforeā¦ And just give it back to the user as the response that they wanted, right?
[00:24:28.08] So this has been happening for quite a while. This is kind of - like, we saw one of these cool AI demos every week, where āOh, the AI is integrated with Kayak now, to get me a rental car. And the AI is integrated with this external system.ā All really cool, but at the heart of that was the idea that I would generate structured output that I could use in a regular computer programming way to call an API, and then get a result back, which I would then use in my system. So thatās kind of this tool idea, which is still not quite what Rabbit is doing, but I think thatās something that people donāt realize is happening behind the scenes in these tools.
I think thatās really popular āin the enterpriseā, with air quotes there, because that approach is, in large organizations, theyāre going to other ā the cloud providers, with their APIs, Microsoft has the relationship with Open AI, and theyāre wrapping that, Google has their APIs, and theyāre using RAG in that same way, to try to integrate with systems, instead of actually creating the models on their own. I would say thatās a very, very popular approach right now in the enterprise environments, that are still more software-driven, and still trying to figure out how to use APIs for AI models.
Yeah, and I can give you a concrete example of something we did with a customer at Prediction Guard, which is the Shopify API. So eCommerce customer, the Shopify API has this sort of Shopify ā I think itās called ShopifyQL, query language. Itās structured, and you can call the regular API via GraphQL. And so itās a very structured sort of way you can call this API to get sales information, or order information or do certain tasks. And so you can create a natural language query and say āOkay, well, donāt try to give me natural language out. Give me ShopifyQL, or give me something that I can plug into a GraphQL query, and then Iām going to go off and query the Shopify API, and either perform some interaction or get some data.ā So this is very popular. This is how you sort of get AI on top of tools.
Whatās interesting, I think, that Rabbit observes in what theyāre saying, and others have observed as wellā¦ I think you take the case like AskUI, like we talked about beforeā¦ And the observation is that not everything has this sort of nice structured way you can interact with it with an API.
So think about ā pull out your phone; youāve got all of these apps on your phone. Some of them will have a nice API thatās well defined, some of them will have an API that me as a user, I know nothing about. Thereās maybe an API that exists there, but itās hard to use, or not that well documented, or maybe I donāt have the right account to use it, or somethingā¦ Thereās all of these interactions that I want to do on my accounts, with my web apps, with my apps, that have no defined structured API to execute all of those things.
[00:27:58.09] So then the question comes - and thatās why I wanted to lead up to this, is because even if you can retrieve data to get grounded answers, even if you can integrate images, even if you can interact with APIs, all of that gets you pretty far, as weāve seen, but ultimately, not everything is going to have a nice structured API, or itās not going to have an API thatās updated, or has all the features that you want, or does all the things you want. So I think the fundamental question that the Rabbit research team is thinking about is āHow do we then reformulate the problem in a flexible way, to allow a user to trigger an AI system to perform arbitrary actions across an arbitrary number of applications, or an application, without knowing beforehand the structure of that application or its API?ā So I think thatās the really interesting question.
I agree with you completely. And thereās so much complexityā¦ They refer to it as human intentions expressed through actions on a computer. And that sounds really, really simple when you say it like that, but thatās quite a challenge to make that work in an unstructured world. So Iām really curious - they have the research page, but I donāt think theyāve put out any papers that describe some of the research theyāve done yet, have they?
Just in general termsā¦ And thatās where we get to the exciting world of large action models.
Somehow that makes me think of like Arnold Schwarzenegger.
Large action heroes.
There you go. Exactly. Yeah.
Yeah, Chris, so coming from Arnold Schwarzenegger and large action heroes, to large action modelsā¦ I was wondering if this was a term that Rabbit came up with. I think it has existed for some amount of time; I at least saw it at least as far as back as June of last year 2023, I saw Silvio Savareseās article on Salesforce AI Research blog about āLAMs, from large language models to large action models.ā I think the focus of that article was very much on the sort of agentic stuff that we talked about before, in terms of interacting with different systems, but in a very automated way. The term large action model as far as Rabbit refers to it, itās this new architecture that they are saying that theyāve come up with - and Iām sure they have, because seems like the device worksā¦ We donāt know, I think, all of the details about it; at least I havenāt seen all of the details, or itās sort of not transparent in the way that maybe a model release would be on Hugging Face, with code associated with it, and a long research paperā¦ Maybe Iām missing that somewhere, or listeners can tell me if theyāve found it. I couldnāt find that.
They do have a research page though, which gives us a few clues as to whatās going on, and some explanation in kind of general terms. And what theyāve described is that their goal is to observe human interactions with a UI, and there seems to be some sort of multimodal model that is detecting what things are where in the UIā¦ And theyāre mapping that onto some kind of flexible, symbolic, synthesized representation of a program.
So the user is doing this thing - so Iām changing the payment on my Uber app, and thatās represented or synthesized behind the scenes in some sort of structured way, and kind of updated over time as it sees demonstrations, human demonstrations of this going on. And so the words that they ā Iāll just kind of read this, so people, if theyāre not looking at the articleā¦ They say āWe designed the technical stack from the ground up, from the data collection platform to the new network architectureā, and hereās the sort of very dense, loaded wording that probably has a lot packed into itā¦ They say āthat utilizes both transformer-style attention, and graph-based message passing, combined with program synthesizers, that are demonstration and example-guided.ā So thatās a lot in that statement, and of course, they mentioned a few, in more description, in other places. But it seems like my sort of interpretation of this is that the requested action comes in to the system, to the network architecture, and thereās a neural layerā¦ So this is a neural symbolic model.
[00:36:01.18] So thereās a neural layer that somehow interprets that user action into a set of symbols, or representations that itās learned about the UI; the Shopify UI, or the Uber UI, or whatever. And then they use some sort of symbolic logic processing of this sort of synthesized program to actually execute a series of actions within the app, and perform an action that itās learned through demonstration.
So this is sort of what they mean, I think, when theyāre talking about neurosymbolic. So thereās a neural network portion of this, kind of like when you put something into ChatGPT, or a transformer-based large language model, and you get something out. In the case of - we were talking about getting JSON structured out when weāre interacting with an external tool, but here it seems like youāre getting some sort of thing out, whatever that is - a set of symbols, or some sort of structured thing - thatās then passed through symbolic processing layers, that are essentially symbolic and rule-based ways to execute a learned program over this application. And by program here, I think they mean ā they reference a couple of papers, and my best interpretation is that they mean not a computer program in the sense of Python code, but a logical program that represents an action, like āHere is the logical program to update the payment on the Uber app. You go here, and then you click this, and then you enter that, and then you blah, blahā, you do those things. Except here, those programs - so the synthesized programs are learned by looking at human intentions, and what they do in an application. And thatās how those programs are synthesized.
So that was a long ā I donāt know how well that held together, but that was my best, at this point, without seeing anything else, from a single sort of blog postā¦
When you can keep me quiet for a couple of minutes there, it means youāre doing a pretty good job. I have a question I want to throw out, and I donāt know that youāll be able to answer it, obviously, but itās just to speculateā¦ While we were talking about that, and thinking about multimodal, Iām wondering - the device itself comes with many of the same sensors that youāre going to find in a cell phone these daysā¦ But Iām wondering if that feeds in more than just the speech. And it obviously has the camera on it, it comes with a [unintelligible 00:38:50.14] I canāt say the word. GPS, accelerometer, and gyroscope. And obviously ā so itās detecting motion, and location, all the things; it has the camera, it has the micā¦ How much of that do you think is relevant to the large action model in terms of inputs? Do you think that there is potentially relevance in the non-speech and non-camera concerns on it? Do you think the way people move could have some play in there? I know weāre being purely speculative, but it just caught my imagination.
Yeah, Iām not sure. I mean, it could be that thatās used in ways similar to how those sensors are used on smartphones these days. Like, if Iām asking Rabbit to book me an Uber, to here, or something like that, right? Now, it could infer the location maybe of where I am, based on where Iām wanting to go, or ask me where I am. But likely, the easiest thing would be to use a GPS sensor, to know my location and just put that as the pin in the Uber app, and now it knows.
So I think thereās some level of interaction between these things. Iām not sure how much, but it seems like, at least in terms of location, I could definitely see that coming into play. Iām not sure on the other ones.
[00:40:16.10] Well, physically, it looks like a lot like a smartphone without the phone.
Yeah, a smartphone ā a different sort of aspect ratio, but still kind of touchscreen. I think you can still pull up a keyboard, and that sort of thing. And you see things when you prompt it. So yeah, I imagine that thatās maybe an evolution of this over time, as sensory input of various things. I could imagine that being very interesting in running, or fitness type of scenarios. If Iāve got my Rabbit with me, and I instruct Rabbit to post a celebratory social media post every time I keep my mileage, or my time per mile, at a certain level, or something, and itās using some sort of sensors on the device to do that. I think thereās probably ways that will work out [unintelligible 00:41:15.12]
Itāll be interesting that if this approach sticks - and I might make an analogy to things like the Oura ring for health, wearing that, and then competitors started coming out, and then Amazon has their own version of a health ring thatās coming out. Along those lines, you have all these incumbent players in the AI space that are, for the most part, very large, well-funded cloud companies, and in at least one case, a retail company blended in thereā¦ And so if this might be an alternative, in some ways, to the smartphone being the dominant device, and it has all the same capabilities, plus more, and they have the LAM behind it to drive that functionality, how long does it take for an Amazon or a Google or a Microsoft to come along after this and start producing their own variant? ā¦because they already have the infrastructure that they need to produce the backend, and theyāre going to be able to produce ā Google and Amazon certainly produce frontend stuff quite a lot as well. So itāll be interesting to see if this is the beginning of a new marketplace opening up in the AI space as an [unintelligible 00:42:29.29]
So thereās already really great hardware out there for smartphones, and I wonder if something like this is kind of a shock to the market. But in some ways, just as phones with external key buttons sort of morphed into smartphones with touchscreens, otherwise I could see smartphones that are primarily app-driven in the way that we interact with them now being pushed in a certain direction because of these interfaces. So smartphones wonāt look the same in two years as they do now, and they wonāt follow that same sort of app-driven trajectory like they are now, probably because of things that are rethoughtā¦ And it might not be that we all have Rabbits in our pocket, but maybe smartphones become more like Rabbits over time. Iām not sure. I think that thatās very likely a thing that happened.
[00:43:37.20] Itās also interesting to me - itās a little bit hard to parse out for me whatās the workload like between whatās happening on the device and whatās happening in the cloud, and what sort of connectivity is actually needed for full functionality with the device. Maybe thatās something, if you want to share your own findings on that, in our Slack community at Changelog.com/community, weād love to hear about it.
My understanding is there is at least a good portion of the LAM and the LAM-powered routines that are operating in a centralized sort of platform and hardware. So thereās not this kind of huge large model running on a very low-power device that might suck away all the energyā¦ But I think thatās also an interesting direction, is how far could we get, especially with local models getting so good recently, with fine-tuned, local, optimized, quantized models doing action-related things on edge devices in our pockets, that arenāt relying on stable and high-speed internet connectionsā¦ Which also, of course, helps with the privacy-related issues as well.
I agree. By the way, Iām going to make a predictionā¦ Iām predicting that a large cloud computing service provider will purchase Rabbit.
Alright, you heard it here first. I donāt know what sort of odds Chris is giving, orā¦ Iām not gonna bet against him, thatās for sure. But yeah, I think thatās interesting. I think there will be a lot of action models of some type, whether those will be tool using LLMs, or LAMs, or SLMs, or whatever; whatever weāve got coming up.
And they could have named it a Lamb instead of a Rabbit, I just wanna point out. Theyāre getting their animals mixed up.
Yeah, thatās a really good point. I donāt know if they came up with Rabbit before LAM, but maybe they just had the lack of the b thereā¦ But I think they probably could have figured out something.
Yeah. And the only thing that could have been in [unintelligible 00:45:59.01] is a raccoon, of course. But thatās beside the point. I had to come around full circle there.
Of course, of course. Weāll leave that device up to you as well. [laughs] Alright. Well, this has been fun, Chris. I do recommend, in terms of - if people want to learn more, thereās a really good research page on Rabbit.tech, rabbit.tech/research, and down at the bottom of the page thereās a list of references that they share throughout, that people might find interesting as they explore the technology. I would also recommend that people look at Langchainās documentation on toolsā¦ And also maybe just check out a couple of these tools. Theyāre not that complicated. Like I say, they expect JSON input, and then they run a software function and do a thing. Thatās sort of whatās happening there. So maybe check out some of those in the array of tools that people have built for Langchain, and try using them. So yeah, this has been fun, Chris.
It was great. Thanks for bringing the Rabbit to our attention.
Yeah. Hopefully see you in person soon.
Thatās right.
And yeah, weāll include some links in our show notes, so everyone, take a look at them. Talk to you soon, Chris.
Have a good one.
Our transcripts are open source on GitHub. Improvements are welcome. š