There are a ton of problems around building LLM apps in production and the last mile of that problem. Travis Fischer, builder of open AI projects like @ChatGPTBot, joins us to talk through these problems (and how to overcome them). He helps us understand the hierarchy of complexity from simple prompting to augmentation, agents, and fine-tuning. Along the way we discuss the frontend developer community that is rapidly adopting AI technology via Typescript (not Python).
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
|Chapter Number||Chapter Start Time||Chapter Title|
|2||00:43||Travis Fischer & Stealth.ai|
|3||07:01||Surprising uses for models|
|4||10:10||Things to keep in mind|
|5||14:13||How to use these tools|
|6||19:45||Tips for evaluation|
|7||23:48||ABCs to riding the wave|
|8||26:45||The 2 AI communities|
|10||35:56||Travis' controversial take|
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist building a tool called Prediction Guard. I’m joined as always by my co-host, Chris Benson, who’s the tech strategist at Lockheed Martin. How are you doing, Chris?
Doing very well. We’re still in 2023, the most exciting year in AI history.
It is. It’s hard to keep up, but it’s also sometimes hard to understand what of these cool demos and models and integrations are actually production-ready, and how are people actually taking these things into production? And we’re really happy to have with us today Travis Fischer, who is the founder and CEO at Stealth.ai startup, and is focused 100% on that, delivering products with AI. So we’re happy to have you here. Welcome, Travis.
Thank you, guys. It’s a pleasure to be here. Looking forward for the conversation.
Yeah. Well, on Twitter you posted this diagram, which I think maybe you have penned right now, which is “How to use large language models effectively”, and it’s sort of like a start simple to complex scale. I’ve found that really great, and I’ve actually shared that diagram with a number of people in various Slack channels…
This is how you should be thinking. How did you - maybe not specifically with that figure, which we can talk about, but like how did you get into thinking about how to use large language models effectively? …or actually, how to build products with these models.
Yeah, it’s a great question. So I’ll give you my quick what I’ve been up to in the last six months, which is going to answer some of this stuff. I’m a huge fan of open source. When ChatGPT launched on November 30th, 48 hours later I released the ChatGPT npm package, which was using an unofficial API, and it just allowed like thousands of developers to go and start building with this cool new thing. The GPT series and LLMs had been along for a while before that, but it was kind of this step function in terms of just their mainstream adoption, and it really just caught people’s attention. After that, I released the ChatGPT Twitter bot, which now has about 123,000 followers. I run a group called ChatGPTHackers.dev. That has about 9,500 AI developers. Just a whole spectrum of people. We have like researchers in there, and then we have like prompt engineering script kiddies.
And because – I mean, my background is computer science, and I do have some formal education in machine learning, but it’s not like I’m an AI expert, by any means. And what has really captured me really over the past year or so has just been the rate of progress, and trying to wrap my head around it and understand it. And because it’s been moving so quickly, I’ve been optimizing for my rate of learning, and I personally learned best by building out in public and building open source and just sharing what I learn as I go.
So I think there’s a lot of complexity in terms of AI, as you guys know, and to some degree, even just having a mental model of what are the different approaches that you could start with, or how to approach solving a problem is already a difficult starting place. And I know for the “How to use algorithms effectively?” my inspiration there was Andrej Karpathy. He recently, maybe a month or two ago, tweeted something about - all these big companies are interested in using AI, they’re aware that they should be using it to some extent, and so they’re like “Well, we need to hire a team of ML engineers and get on this”, right? And the huge unlock now is with these foundational models; like, for most problems you don’t need to do that. And yeah, there’s the production side, and the practical side, and I’m sure we’ll get into that, but in terms of where to start, and starting simple, and actually validating for your business use case that you could actually solve it with AI, that you understand the problem domain enough that you have, whether it’s a training data, or you’re actually solving a real customer problem… Like, starting as simple as possible, with hosted foundational models a lot of times, is a great way to get started, and just to validate quickly. As you kind of inevitably find points where your workflow breaks down, or where you’re not getting quality [unintelligible 00:04:52.13] constraints, like security, privacy you know of your data, there’s kind of this ladder of complexity I like to look at. And you start with just prompt engineering up at the top, and then it’s about “Well, how much can I reduce hallucinations, or add domain-specific context into my prompt by doing information retrieval?” And then at some point you’re like “Well, a single prompt isn’t doing it, so maybe I add in some iterative process to that, where I use another language model?” There’s all these techniques for doing kind of multi-step prompting. But you can do all of that with a hosted model, and you can get 95% of the way there for a lot of problems and domains these days, in a way that was previously locked behind proprietary data providers, and you had to have so many resources to be able to do that.
So it’s really this democratizing point in the industry at the applied AI level that we’re at right now, and I think, from the conversations I’ve been having with folks, who are a lot of them full-stack TypeScript devs, who are building applications, and they want to use AI, they know it’s cool, they don’t know how to get started, or they’re like “Oh, I need to learn Python. I need to train these custom models” and stuff… All of that is super-important, and it comes into play at a time, or for particular types of problems, but the majority of people, for getting started - start simple is the main takeaway from that.
[06:13] I think that’s a great insight, and I think that’s one of the places where so many people go wrong, is jumping into too much complexity, they don’t find [unintelligible 00:06:18.25] and potentially even don’t look for things that work just as fine that are not AI, in that way. So I love the “Go simple and build from there” philosophy. I think that’s incredibly practical.
I get the sense that a lot of this sort of like chaining, and like bringing models together, doing the information retrieval - it’s sort of almost like a hacking culture around this language model prompting… Which is really cool, and that can go so, so far. Maybe, like you say, there’s privacy, or domain-specific concerns with like enterprise use cases… But in your – like, you mentioned the community that you’ve kind of built up and you’re part of on Discord… What are some of the things that have maybe surprised you, that you’ve seen, that “Hey, I didn’t even think that maybe this was possible with just this layer of using a hosted model, using pre-training, using retrieval, or whatever it is?” What are some of those things that you’ve seen that kind of surprise you, or maybe help develop your thinking around this topic?
I have a few examples and one story. Examples would be folks who are taking these models and applying it to their personal finances. There’s one guy in our Discord who’s like an ex-hedge fund guy, and he created a very basic agent that uses a large language – I think he’s probably using GPT-4 to take this unstructured information from his bank’s website about his expenses, and extract structured information about that, and then he can graph it, and whatever. So I think there’s a lot of hacking going on around this stuff. It is very, very early.
Another story of something that surprised me - and this is just a fun story, but when I released the kind of unofficial API wrapper for Chat-GPT, we kind of had this cat and mouse game going back and forth with OpenAI for a while, because apparently, there was kind of a group within OpenAI that was like “Oh, this is amazing. Look what the open source community is doing; they’re building all this cool stuff.” And then there was another group that was like “Well, we’re gonna have the official API. Eventually, we want to control this stuff”, right? So there was kind of this back and forth. And at one point, our community found a public model, but it wasn’t like publicly-disclosed; it was security through obscurity… But it was a fine-tuned chat model that ChatGPTi was actually using at one point… And all of the open source projects started to use this thing. And there were tens of thousands of actual real consumers at the end, who were building on top of this. And of course, OpenAI knew that we were doing this. I talked with one of their security engineers about this after the fact. And instead of what you would expect, like just shutting it off, instead, they switched it out with what they call ChatGPT. And just all of a sudden, one day in our Discord we started getting hundreds of messages from users saying “I think I got hacked. I’m seeing all these meows in response…”
So it goes to show, you know, the moral here - and I ended up hearing from the OpenAI engineer that they were watching our Discord, taking screenshots and laughing their asses off at this happening. But it goes to show that - one, the level of… There’s no switching costs to these things, right? It’s like, text in, text out, fairly basic, and there’s entire new venues of vulnerabilities of like swapping it out with a cat, or something. Like, what does security look like in this world, when – I just thought it was an interesting kind of anecdote.
[09:54] Probably that vulnerability of all of a sudden getting meows - that is a possibility. But I’m wondering, as you’ve spent a lot of time with these models, you’ve also – like, you’re building products on top of these models… From your perspective, taking an LLM integration to that sort of last mile of like integrate it into a product, supporting users… What are the things that should be on either developers, or data scientists’ minds as they think about taking the step from like demo, to like product integration? I guess that would be the question.
So I like to say that absolutely. Everything in engineering is about trade-offs, and it’s about really thoroughly understanding trade-offs, and then being able to effectively communicate those trade-offs and the pros and cons and everything. And it really boils down to those two things, over and over. So let’s talk about some of the trade-offs that are most important to using language models in practice. You have the most obvious one, which is quality. Like, can I use these language models to actually perform the tasks that I want? You have oftentimes secondary, but equally important trade-offs, like “How much does this cost to run in production? What is the latency for my use case for the end users? How consistent and reliable is it? Can I have actual – is my use case fault-tolerant?” …which is a great initial question, because we’re kind of moving from a world of like very deterministic human driving the program, to a world where the more control you give to language models and their reasoning abilities - this is getting more into the agentic side of things - the more that it becomes slightly non-deterministic, or very non-deterministic… And so the ability to have guardrails around these things, the ability to have consistency and predictability is extremely important. And one of the first questions that you should ask yourself when you’re thinking about integrating with LLMs is, for your particular use case, for your job to be done for your customers, to what extent do you need 100% reliability, versus like 99% reliability? And that may sound like a little bit; for certain domains or problems it’s everything, right? And so that’s one fundamental questions.
There are techniques, and we’ve talked about them, I’m sure you guys are very aware as well, of like going from that 99%, getting close, adding extra nines of reliability, and that’s also a very active area of research, where folks are actively figuring out ways to increase the reliability of these models. But the fundamental trade-offs are quality, costs, latency, and reliability. And using a hosted model is going to be great for quickly – like with minimal resources, and validating your use case; for a lot of those types of trade-offs, it may make more sense than to use a local model. And there’s kind of been this Cambrian explosion of open source large language models and other specialized machine learning models, and we’re gonna continue to see that proliferate.
I like to think of the open source kind of state of the art as 6 to 12 months behind the proprietary versions. We’ll see if that holds… But because there’s like zero switching costs with these models, because there’s just so much competition, the prices are gonna keep going down over time. We’re gonna see the open source side of these models continue to get more powerful. And so for a lot of use cases where you’re dealing with “Well, maybe I need ultra-low latency on the device” or maybe the cost is a factor, and I need to be running in my own data centers. Or maybe you need to - after a certain point, once you’ve validated your use case, you want to fine-tune and distill the model down, and have a really locked-in, like a checkpointed, “This is completely unit-tested, this is an evaluated version of things.” And I think we’re at the stage right now where there’s so much hype and so many people building AI applications and demos, and that’s great. Just getting it out there, proliferating through open source, through Twitter, whatever it is - this is awesome. But the version of that last mile and the productionization concerns really need to dive deep on all of these kind of fundamental trade-offs that I’m talking about, in the hosted models versus local models, and fine-tuning and distillation, they all become really important very quickly.
So before the break, as we were talking about these different characteristics that kind of affect applied AI and affect deployment, I was really taken by the fact that so many of them are not really AI-specific. You could almost argue that applied AI in so many ways is about software, it’s about the systems, it’s now about cloud, it’s about all these other things blended together to produce solutions that are productive in the world, and have value for people and organizations. We talked about unit testing, and stuff like that… What is your thinking around kind of the integration of all those things? Because the model itself, to your point about hype, still kind of gets all the attention, and the amazing things… And it is amazing what we’re doing, But to make this stuff work in life, there’s all these other concerns that - there’s so many cool things in 2023 happening on the models side that the other 99% to make it real kind of gets… When you’re working with people around understanding how all this fits together, so that they can do that, how do you frame that so that their attention gets on the right thing, their budgets are properly allocated to attend to all the things? I’ve seen organizations really struggle with that, because they go into it with hype, focusing on just the model, and building skill sets and budgets around the model, and then they try to figure out the whole thing with cloud and deployment and things afterwards, and they have a hard time. How do you navigate that, given the hype cycle that we’re operating in?
My first piece of advice would be that for your particular use case, if your job to be done – whatever business use case you’re solving, you have to keep in mind that AI, like all software, is a tool, and it may be a really shiny tool, it may be a tool that is evolving very quickly in front of us right now… It’s a very powerful tool, but it is a tool to solve a business use case and a problem for humans. So rooting the framing in that I think is very important.
The second thing I’ll say is a lot of AI right now, and especially the stuff that gets a light shined on it, and in the open, because the application layer is so new, and there’s so much low-hanging fruit - as you said, we need to have more emphasis on the engineering rigor under the hood… And so one practical piece of advice there is to really focus on an evaluation set for your particular use case. And you might have existing data, you might have existing kind of input/output pairs for your particular example, or you might not have that. But like starting from there, and working backwards, of like “This is what the end user is going to see”, and then working backwards from that, to think about, “Well, how can I use language models or other expert-focused machine learning models to solve that?” I think is very important, because that also gives you a grounded North Star, that like so much of the prompt engineering and tuning of these models is based around, “Well, I think this is going to work better”, or “I eyeballed it on this one example, and it seems to work for this”, right? But really applying some fundamental engineering rigor at that level, where you have an evaluation set that you can track, that you can improve over time, that you can – and not just tracking the quality of these models, but tracking other trade-offs, in terms of pricing, latency, recall… There’s a whole slew of trade-offs that can matter, depending on your particular use case.
And then the other piece of practical advice I would say is the kind of diagram of this ladder of complexity that I was referring to before - like, every time you take a step down that ladder of complexity, from using a hosted model, just using prompting, and then going to some type of information retrieval embedded in the context, to having multiple chains of prompts, to going down to fine-tuning a hosting model or fine-tuning a local model - at the very, very bottom is building your own model, right?
[18:07] Every time you take a step down that ladder of complexity, it adds engineering complexity, and it’s going to make your solution more complex to maintain. So really having a good handle on how you can start simple, and only move down when you need to, or when you hit a constraint, like “Okay, this is great. I have a working solution with a hosted API. But now I need to worry about the price, because I’m going to production”, and the unit economics. Maybe at that point, then you think about, “Well, now I have this great solution, and I can auto generate an eval set for myself, and have a bunch of inputs and outputs and fine-tune a model that is hyperdistilled and efficient and focused.” That’s great. Don’t start there. For most use cases. Right?
The one other thing I would say, at the practical level, is where language models tend to break down or lack reliability is oftentimes when you’re trying to give too much to the model to do at once. And so breaking the problem down into sub-problems that are a lot more focused is one of the most practical – like, I just find myself telling people over and over again, it’s like “Okay, that’s awesome. Break your problem up into subproblems.” And you know, how to do that is a whole other problem in itself. Maybe someday in the near future language models will do that for us. I don’t know. That’s getting into the more sensational side of things. But as a general principle, breaking your problem up into subproblems, thinking about how you can articulate your problem as succinctly as possible, in a way that is native to the language models, is a really key practice.
I love how you talked about like evaluation, forming your evaluation set, getting some ground truth, also breaking up your problem… Maybe having an evaluation set for each of those subproblems would be a good idea. I think there’s this general perception that large language models are this kind of unique thing, these chat interfaces are this kind of unique thing… Like, how do you like evaluate that in the way – I think what people have in their mind is “Oh, if I’m doing sentiment analysis, it’s either positive or negative or neutral, and I can calculate an accuracy, for example.” Whereas they might struggle to think about “Okay, well, there’s this output from this language model. It seems coherent and fluent. How do I evaluate this?” And so I think there’s maybe a bit of confusion around the evaluation side. Can you share any tips or thoughts in terms of what you’ve found to be useful in your own work in terms of evaluation sets, and like how you think about how good the output of a language model is?
My first thought would be the less that it’s about me thinking about how good it is, and the more that it can be objective, like using some constant way of evaluating it, the better. There’s one project that I really liked recently, by Lance Martin, it’s called Autoevaluator. I don’t know if you guys have seen it, but it’s specifically for the domain of QA. So question answering. And he recently partnered with Langchain to create a hosted version of it. But the way that I think about this is a little abstract, and it’s really like starting from your job to be done – oftentimes, sentiment analysis isn’t the job to be done; it’s like a piece of a job to be done, right? So again, it’s like breaking up the problem and understanding how to think about and structure those problems as whether it’s an expert model, that just does sentiment analysis, or it’s using a large language model that can do sentiment analysis, it’s really good at that, but it also can do a whole bunch of other things as well.
So the more focused your task is, the more clearly articulated your task is, and the more structured the output that you have at the individual LLM call level, the better and the easier it is to create reliability around these things, and to actually test them with more traditional software engineering practices, like writing unit tests, or integration tests.
One thing that I’m actively working on right now for the TypeScript community is a way to invoke large language models and have structured guards on them.
[22:02] I know Prediction Guard, GuardRails, there’s a few projects that are doing this, but really then having to actually be typed in TypeScript. So you can make an LLM call like it’s a function, but get some JSON that has these fields, that have these types, and have it kind of – you know, there are techniques that you can do under the hood to self-heal if the JSON isn’t properly formed, or you maybe you want to generate some TypeScript code and you want to validate that’s the correct AST, or something. Like, there’s techniques that you can do to constrain the output of the language model for your particular task. But in my view, these techniques are constantly shifting, kind of the best practice and the state of the art there. And so I think libraries like Langchain and the open source framework that I’m currently working on I think will do a lot to help developers to abstract out some of the complexity of just viewing this as a general-purpose tool. And again, it’s like, you start simple…
One of the great things about language models is they can do just about anything. That’s also one of the downsides, right? When it’s so unconstrained, how do you even approach the problem? So having best practices, having examples constraining the problem… And really, it’s the ability to have a unit test or an assertion in a traditional programming language at the large language model call level, where it’s like “I assert that the output should be valid JSON”, or “I assert the output should conform and be valid TypeScript syntax.” And if not, actually self-reflect on that and put it back into the large language model and regenerate it.
All those things I think are foundational primitives at the large language model level that will allow developers who want to build real reliable applications to do so more reliably, because they can focus on their domain-specific business logic, or aspects that are away from a lot of this kind of implementation details, that are also constantly shifting under our feet, right?
Fantastic explanation. You keep talking over and over again about how things are shifting, and the evolution of the engineering around it… That puts a burden on these hackers and developers that are trying to go out and implement these things at this point, because this year has just been phenomenal progress… But that makes it really hard for mere humans here to kind of track that, and keep up with it. So you kind of talked about some of the concepts right there, but if you’re thinking about – like, you’re about to turn to somebody who’s a hacker, and they’re looking for that guidance, what are some really… Not necessarily comprehensive, but “Hey, go do 1, 2, 3, or A, B, C, and that will help you kind of keep leveling up?” What are some of the things that you’re telling people these days to say “If you want to keep up this year with all of this insane progress”, and LLMs, and all the different model types that we’re seeing, the progress, what are you going to do in the practical side as a hacker to manage that? Do you have any tips that you can kind of take us through about –
Absolutely. One, it’s super-noisy. There’s so much happening. We’re in the middle of this exponential wave, and I think a lot of people are like “Oh, I have FOMO, I want to be on this wave. But where do I start?” And there’s just so much noise. Which is great on the one hand, but on the practical side, how do you give it life? Where do you start?
So there’s a couple levels to this. I gave talks to kind of like a ChatGPT for beginners type crowd, and really, my main advice is one, just use it, just go in and try it. That’s simple. But two, more importantly, the next time that you have an actual problem that you think like “Maybe I could use Chat-GPT or a language model for it”, actually try using it to solve your own problem. Because what that does is it starts to build up this muscle in your brain around thinking about using these new types of tools to solve problems. And it’s really a different type of tool. It’s like exercising; you need to start exercising that muscle early and often.
And there’s a lot of noise, there’s a lot of different AI tools. The side of things which I’m confident will be just as relevant a year from now or a couple years from now is building up that muscle to think about how to actually use AI to solve your own particular problems. It’s one thing to talk about hypotheticals and general cases of problems where these tools excel, but it’s another thing entirely to start building your own personal muscle.
I totally agree with that.
Whether it’s a personal problem, and you want to just go talk to ChatGPT, or you have a problem at work, and you’re like “Well, I think I could use a language model, a hosted API to solve this, or something like that - starting simple, starting from your own problems will start to build up that muscle, and you’ll naturally learn it and take it from there.
One of the projects I did a couple months ago was I ported scikit-learn to TypeScript. And it’s not like a full port; it’s like, it auto-generates all of the TypeScript classes, like 260 classes, and then under the hood, it creates some Python sub-process, and then marshals and does the inter-process communication between them. but it works extremely well, and you can call and do [unintelligible 00:29:24.06] Just all these fundamental things that the Python and machine learning world takes for granted, there are versions of that that exist in the npm ecosystem, it’s just they’re all over the place in terms of quality, in terms of – there’s so many just fundamental aspects of machine learning that the TypeScript world is missing out on. And one of the primary drivers behind kind of what I’m working on - and I’m happy to share, I’m building a reliable TypeScript open source framework for building reliable agents.
[29:57] Thank you. I view agents as this new – it’s like, if large language models or CPUs and kind of this new compute paradigm, there are these reasoning engines… Yeah, they’re great at generating text, but the real emergent property, the real game-changing property of them is reasoning. If they’re kind of the new reason engines, and you have like, that’s your CPU layer, and then you have a storage layer that’s all these vector databases, and you’re kind of overhyped on that side of things, on top of that you have like “How do you actually run programs?” And that’s agents. And I view it as there’s kind of a spectrum of like traditional programming that might happen to use a large language model, and then on the other end you have like full self-driving agents that are making decisions, and creating tasks, and just fully autonomous. And I’m excited to kind of focus on somewhere in the middle, and focus on more reliable use cases that we can actually build reliably today.
So you have hit an area that I have so much passion for…
I’m sitting here, waiting to ask my next question here… And Daniel has heard me whine about this for years, what I’m about to say; I want to get your take on it. So there is more to the world than just Python, and I’m a multi-language person, and I don’t necessarily go all-in on any one language or the other. I’m a TypeScript user… In the last year, I’ve been doing Rust; I had been doing more Go before that, outside of the AI and Python stuff… But I had a use case where I was building something, and I had to eke every little bit of performance out of the available hardware to do what it was – it was going to be C++ or Rust, and it wasn’t going to be C++. So I went to learn Rust. And then I’m in Rust, and I’m doing that, and I’m looking at, as an analogy here for what we’re about to go at… I’m looking at WebAssembly, and the Rust community and other language communities are so into that fact of “Write in with the thing that you need to be in, and yet have access to that in terms of deployment, and still having great performance” and stuff. And every time I’m now messing with WebAssembly and Rust, I’m thinking, “When is the AI world going to catch up on having multifaceted (from a language standpoint) access to the models, instead of everything being Python-first?” And so asking the pardon of the Python lovers in the audience, when am I going to be in Rust, or Go - and you’re obviously doing it in TypeScript - but the language of my choice, and taking advantage of, as you called it, the new CPU of reasoning from that point, instead of having to do a context-switch? It is an ongoing year after year after year frustration that I have, as you can probably tell by now.
So I’m hoping that you’re about to give me the golden path out of here, because I need one.
Okay. Well, first of all, I love your framing. I love your passion for this. I also feel very similarly. I think the reason why I’m starting with TypeScript is because the developer experience at the application level I think is really important for the type of framework I’m looking to build. But I view WebAssembly, Wasm, as kind of the ultimate compiled language runtime that I want to target. Because you could imagine a world not too distant from right now where you have agents that are running in data centers, they’re running on edge… So anything that’s kind of – whether it’s a Cloudflare Worker, or a Vercel edge function, or within a service worker in your browser, but the common thread there is Wasm. To what extent, starting with developer experience at the TypeScript level, and then focusing on that at the runtime level - there’s still a clear path forward for a lot of folks for just using hosted APIs. That’s one area that you can have that multi-language very easily; that’s a natural point.
[34:09] But you’ve got to be kind of in the cloud for most of that in a practical sense, which I’m not always.
Yeah, 100%. And then there’s like the whole open source models, or the practical side of things, where you’re like “Well, I need to have full hardware for the latency”, or something that’s like on-device… And I am extremely bullish on WebAssembly as – there’s a quote that I like, and I forget who it was from, from the Linux Foundation, or something… It was like “If WebAssembly existed 10 years ago, then Docker would have never needed to exist.” And I think it will have that level of impact, eventually. I think, potentially –
I do, too.
Yeah. I think potentially the kind of unlock here, the path that could bring it into more of the mainstream could be AI. I don’t know – at the model level there’s just so much momentum behind Python, and all the core kind of researchers and stuff are Python-first… So when I did this scikit-learn kind of port to TypeScript, there was a Python port called Pyodide… And it’s a Python runtime. You guys might know better than I do, but it’s targeting WebAssembly, and it allows them to run a subset of scikit-learn in WebAssembly-supported environments, including Node.js in the browser. And that’s super, super-fascinating to me.
Yeah, I think that there’s a couple related projects. I think PyScript from Anaconda is trying certain things like that… But I’m really interested in that space as well, because I’ve seen – it’s sort of like a different kind of diversity than we normally talk about, but the fact that more developers from more diverse backgrounds are at the table building AI things I think is an amazing thing, and I think a lot of good is going to come from that… So I’m really happy to see a lot of that happening.
Well, if we have time, just one more maybe controversial take on this…
Sure. We like controversial takes.
Awesome, awesome. You know, as we get closer to building reliable agents - and the way that I kind of was framing it before, it’s kind of a fundamental new compute paradigm with large models as CPUs, and you’re build these agents on top of them… As they eventually get more and more reliable and more autonomous - right now, a lot of them are just toys, let’s be clear. But as that happens, I view it as a new higher-level programming language. And we’re working with natural language… The AST of that language is, in my view, a directed graph, and the nodes are specific LLM calls, or a call to a tool, or a call to an API. And there’s massive problems around how to add reliability at that level of having a structured output, or guardrails; like, some of these things are clearer than others. And then at the whole graph level, that becomes a program or an agent.
To some degree, we’re talking about all of these Python and Rust and implementation details, and that’s all very important, but I wonder to what extent 10 years from now we will even be talking about a lot of the current levels of programming abstractions that are hyper-relevant to us today as practitioners, or how quickly we’ll move towards this world of a higher-level abstraction for solving problems that are just significantly more efficient, more approachable, because it’s kind of based on natural language… Anyone in this field that talks to you about timelines is just throwing a dart with a blindfold on, but that’s one thread that I’m really excited about.
You kind of went already ahead to where I was hoping you would go, which is what’s keeping you up at night, what’s on your mind in terms of like looking forward, and all of that… And I agree, I think this is a really, really interesting direction, and I certainly hope that we see that timeline progress rapidly. I think we probably will.
So yeah, it’s been a pleasure to have you on the show, Travis. I’m really looking forward to keeping in contact, and seeing all the amazing things you do, and trying out some things in TypeScript. It’s an exciting time to be part of this, and - yeah, looking forward to keeping in contact. Thanks for joining us.
Our transcripts are open source on GitHub. Improvements are welcome. 💚