KBall and returning guest Tejas Kumar dive into the topic of building LLM agents using JavaScript. What they are, how they can be useful (including how Tejas used home-built agents to double his podcasting productivity) & how to get started building and running your own agents, even all on your own device with local models.
Tejas Kumar: Yeah, that’s a good question. I just follow a number of experts’ definitions of this thing. I tend not to try and coin terms myself, mainly because I’m just not very credentialed, if we’re being honest… So how do I see agents? I summarize it, I summate it - I’m trying to find the right word - I deduce it from definitions from industry experts who have done it before me. So people like Andrew Ng, the founder of Coursera, and now the founder of DeepLearning.ai - I think he’s got some great content about this, where he defines agentic workflows as workflows that have LLMs perform three tasks, either all three or a subset of them. And those are reflection, meaning generate some output and reflect on it, “Is it good, is it not?”, and then iteratively work on it until it cannot be improved further. So there’s reflection. There’s tool calling, as I mentioned, with RAG, where the large language model will, sort of like a human being, recognize – for example, if you ask me to do a complex calculation, like 324 divided by 9 times 7, I’ll just be like “It’s time to get a calculator.” I’ll recognize that this is the sort of boundary of my capabilities, and go use a tool. So number two is tool calling. And number three - I think it was agent collaboration, where you have – yeah, it’s LLM as judge. It’s this model where a capable model (pun intended) coordinates lesser capable models towards an outcome you want. So it’s like GPT 4.0 being the most capable of Open AI’s models would orchestrate like three or four different 3.5 turbo models that are doing various tasks or generations. And so those three, either one of them or all of them, make up according to Andrew Ng an gent workflow.
[05:47] According to David Khourshid, AI agents are an implementation of the actor model, which is just a programming model where you have an entity called an actor, that sort of acts in response to observing its environment. So the classic implementation of the actor model is PacMan. Actually, a great example of AI, but rule-based AI, where the rules are known ahead of time, is PacMan, where you have PacMan, the little yellow pizza thing, and it’s observing the environment: where are the ghosts, where are the cherries, where are the dots… And you as the player take on the role of the actor. But there’s also demo mode, where the actor model is in play. And according to David Khourshid, this implements agentic workflows. However, it’s rule-based, it’s not generative, but it’s still an agentic workflow, where PacMan is an agent.
So I just take a mishmash of those two - these are the preeminent leaders in the space in my mind - and marry them, and that’s the working definition that I have for an agent. So it’s not a sentence, it’s not a nutshell, but I’m trying to give you more sort of a broad framework of how I see agent workflows. I have seen this term abused, where people will build – maybe abused is too strong… But people will build a custom GPT; this is a feature you can use from Open AI’s GPT 4. They’ll just build a custom GPT, add a system prompt, add some knowledge that GPT 4 can do RAG on, and call this an agent. I disagree; I don’t think that’s an agent, that’s just a RAG application. It doesn’t really do any of the things like we talked about: reflection, tool calling, collaboration, or observing an environment and responding accordingly. So I’d say those four tenets make an agent an agent.