End-to-end cloud compute for AI/ML with Erik Bernhardsson from Modal (Practical AI #214)

All Episodes

We’ve all experienced pain moving from local development, to testing, and then on to production. This cycle can be long and tedious, especially as AI models and datasets are integrated. Modal is trying to make this loop of development as seamless as possible for AI practitioners, and their platform is pretty incredible!

Erik from Modal joins us in this episode to help us understand how we can run or deploy machine learning models, massively parallel compute jobs, task queues, web apps, and much more, without our own infrastructure.

Changelog++ members save 1 minute on this episode because they made the ads disappear. Join!

44 minutes
Recorded Mar 1, 2023
Published Mar 7, 2023
Download (43MB)
Transcript
🎧 24,807

Featuring

Erik Bernhardsson – Website, GitHub, X
Chris Benson – Website, GitHub, LinkedIn, X
Daniel Whitenack – Website, GitHub, X

Sponsors

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:00	Welcome to Practical AI	00:42
2	00:42	Erik Bernhardsson	01:47
3	02:29	What got Modal started	06:26
4	08:55	What makes Modal different?	03:18
5	12:13	Pros and cons of this workflow	03:23
6	15:36	What it's like in my experience	06:01
7	21:37	Most unexpected uses for Modal	02:20
8	23:57	The classic Modal workflow	07:06
9	31:04	Tips for migrating into Modal	03:31
10	34:35	How will Modal grow?	03:22
11	37:56	Will you take Modal to the edge?	02:07
12	40:03	Wrap up	03:21
13	43:24	Outro	00:54

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Daniel Whitenack

Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How’re you doing, Chris?

I’m doing very well. How are you today, Daniel?

Daniel Whitenack

I am actually doing amazing. So I’m not in my normal location, I’m down in Orlando, Florida. So one thing is it’s sunny outside, and I can be outside without suffering. But also, while I’m in like in-person meetings here with some of our collaborators and partners, and they wanted me to do a demo today… So I got up early this morning at like 6 AM before hotel breakfast, and I threw together a quick demo, and I used Modal for that… And there’s literally someone that stood up out of their seat and clapped after the demo. So our guest today is Erik Bernhardsson, with Modal, and so basically Erik is making me look good in all respects, and I’m pretty excited to talk more about Modal and share it with everyone today. Welcome, Erik.

Erik Bernhardsson

Hi. Hi. Thanks for having me. I’m excited to talk about Modal or anything else.

Daniel Whitenack

Yeah, I think - Chris, do you remember, quite a while ago… I don’t remember when this was - maybe Erik, you remember - I think you wrote a blog post about building data teams, or something like that… I forget exactly what it was, but I remember Chris and I talking about it on the podcast. I’ll have to see if I can find it back in your blog, but…

Erik Bernhardsson

Yeah, that was in the summer of 2021.

Daniel Whitenack

Yeah, yeah. So we should have had you on the show then, but I’m glad that we get to have you on the show now. So you describe Modal as an end-to-end stack for cloud compute. So I guess one big question, maybe to start things out, is - cloud compute isn’t new, but it definitely can be complicated, depending on what you’re trying to do… What got you starting to think about the set of problems that you’re addressing with Modal? What got you going down this path?

Erik Bernhardsson

Yeah, it’s kind of a longer story maybe… But I’ve been working with data for 15 years, or maybe more, but most of my career. [unintelligible 00:03:05.02]I was at Spotify for seven years, I built the music recommendation system there, and I open sourced a vector database called [unintelligible 00:03:10.29] I did kind of everything, from deep learning, to like business intelligence, to large-scale, big data type, like Hadoop stuff… Then I was a CTO at a company called Better for six years, I managed data teams, but also managed other teams… And so as I was thinking about starting a company, I kept coming back to data, and my starting point was really just like, it’s hard to work with data, and I feel like data teams don’t have the tools they need. And initially, I was super-agnostic as to what to build. I kind of frankly wanted to rebuild everything, which is not particularly realistic…

Daniel Whitenack

Maybe in a lifetime.

Erik Bernhardsson

[03:52] …aspirational. Megalomaniac, maybe. But what I realized was that, at least if you want to rethink a lot of the data stack, a good place to start is at the bottom. So I almost like sometimes joke that I kind of like grudgingly had to start – you know, it’s like a spite startup. I’m doing this work now at like the most lowest level, which is to solve the compute problem of like “I have code, and I want to deploy it in the cloud.” Why is that so hard? I don’t know. But it’s a big problem for meta data teams… Just the problem of taking code and scaling it out, or scheduling it, or running it on GPUs, or setting up web endpoints, or whatever it is… And really focusing on that problem as building this core foundational layer - that’s very abstract, and very general-purpose, so that’s also why our website I think it’s really confusing the first time… But in particular, what I would say we’ve been focusing on the last six months is online inference. So a lot of machine learning AI models focusing on that use case as a sort of initial starting point. But Modal always always to me had this promise of running almost anything. It’s almost like a Kubernetes in the cloud.

Daniel Whitenack

Yeah. And one of the interesting things to me, I think, were maybe it took me a second for this to sink in, but once it did, it was a really encouraging thing for me, was like - I have my code locally, and I know how to run it locally… But then you have this sort of concept of these decorators within Python code that kind of take your code and you run it like Python, you know, main.py, whatever… But actually, something – like, the moment I realized, “Okay, well, now this function is actually not running locally.” Like, I just did some sort of like batch inference or something with this script and didn’t – my fans aren’t going on my laptop, because this is actually running somewhere else… Could you describe – I mean, there’s a lot of ways that you could have gone about this sort of lower level of the problems that data teams face. There’s a really fundamental piece of this, which is like the local to cloud, or local to deployment cycle. And with Modal, that seems very, very quick. How did you zero in on that kind of workflow?

Erik Bernhardsson

We built something that architecturally looks something like AWS lambda. It’s like a function as a service; we take code and execute it in a serverless way in the cloud. The starting point, the reason why I ended up going down this rabbit hole of doing that - the whole serverless runtime - is really kind of thinking about developer productivity and developer happiness. And my sort of philosophical observation as a CTO for many years is that developer productivity, I think, is very often well understood in terms of feedback loops. So as you write code, there’s almost like a nested set of for loops. It’s the innermost loop of like you write some code, then it’s a syntax error, then you fix it, and then you run it, maybe have some unit tests… But then there’s these outer loops that are often like “Okay, let’s deploy this to the cloud”, or “Let’s run this on a massive dataset”, and that’s when the iteration speed gets very, very slow.

So you look at data teams, they’re often particularly exposed to these feedback loops, because they have to run on large datasets, or they always have to run things in production; you can’t really run like things on like synthetic data as a data team. You have to kind of deploy it into production, or run it on a real data set. And so it really frustrated a lot of data teams, that sort of like very slow iteration speed. I write some code, now I have to create a container, push it to the cloud, then go and click on an interface, or merge some pull requests or whatever, then my container fails, now I have to go look at logs, or whatever…

[07:53] So I started thinking about, “What if we bring the infrastructure into that innermost loop?” The loop of “Okay, you just write code, and then you immediately run it”, but it actually runs in the cloud. And in order to do that, we realized we can’t do this with Kubernetes; we can’t do this using Lambda. We basically have to build our own infrastructure that takes code and can launch containers, maybe hundreds of containers in the cloud in a few seconds. So we went very deep down that rabbit hole and built basically our own container runtime, our own file system, our own container builder… Luckily, I’m not afraid to go deep and solve tricky container problems and dealing with Linux and file systems… But that’s a lot of what we had to build in the last two years, is that is that foundational level, that runtime. But the benefit is now we have this super-nice developer experience, and we can just take code locally, you can spawn 100 containers in the cloud in a few seconds, running the latest code in the latest container.

It sounds fascinating, I’m really interested in it, but I want to ask you to step back for a second with a follow-up and bridge a gap of understanding for me… You were saying you can’t do it with Kubernetes, can’t do it with AWS Lambda, and I believe you, but I don’t know why. And I’m imagining that maybe a few of our listeners don’t know why either. Could you kind of tell us what it is – because a lot of them, their companies are in one of those big three providers, and to kind of show them… You kind of demonstrated with the user experience quite well a moment ago… But could you talk a little bit about what was falling down in those kind of more mainstream big three kind of approach, Google, AWS and Azure, so that we can understand that? Because you made a statement, I’m with you on that, but just bridging it.

Erik Bernhardsson

Yeah. First of all, I’m the world’s biggest AWS fan, right? We run everything on AWS. I love it for the capabilities it brings me as a developer to run things at scale. Developer experience in AWS has never been particularly good…

Yes. True.

Erik Bernhardsson

I’ve been banging my head for years against AWS documentation. And in the end, I usually figure it out, but it was a pretty jarring experience. I think in particular, the problem with what Kubernetes and AWS or Lambda or EC2 etc. that we saw, either for users to use it directly or for us to build on top of that, is just the iteration speed. For instance, in Kubernetes - let’s say you want to run something in Kubernetes in production, going from code locally. Well, now you have to first build a container, then you have to do some sort of Docker push to registry, right? Then you have to kick off a Kubernetes job. Then you have to go and look at the logs of that Kubernetes. And by the way, kicking off a Kubernetes job - that often entails the kubelet worker pulling down that Docker image.

And so we were looking under the hood and trying to understand how Docker works. And Docker - it’s an amazing piece of technology, for this sort of new way of thinking that it brings to the table around insulated containers, but it’s quite inefficient in starting containers. Most containers end up having lots of data that’s never actually read; there’s thousands of timezone files, locale information about timezones in Uzbekistan, or whatever; you’re never going to read those, right?

Unless you’re in Uzbekistan. Sorry… Just getting that in there.

Erik Bernhardsson

Yeah, you know, whatever. Or on uninhabited islands. There’s timezone information about uninhabited islands in like the standard Linux distribution. Okay, great, but get them out of my Docker container. But the other thing is also Docker is quite inefficient, and it has this layer thing… But other than that, it doesn’t really the duplicate information. And so what we realized is that, what if we rethink how those containers get pushed and pulled, and we end up building our own file system? We need to deduplicate the content by computing a checksum of every file. It’s actually sort of similar to how Lambda works. But Lamda is also not fast enough, in the sense that if you publish a new Lambda, it still takes about a minute for you to be able to run it. Lambda also has other limitations; it doesn’t support GPU, it doesn’t support long-running jobs etc.

So those are all the reasons why we ended up deciding we can’t build this on top of Kubernetes, or Lambda, or any existing solution. Also not Docker. We ended up using lower-level primitives instead and building a lot of it ourselves.

Daniel Whitenack

[12:13] And are there specific things about – and I sort of like in my own experience in using Modal have experienced this, but from your perspective, I would be interested to hear… You talked about kind of moving towards this use case… Like, the use cases around machine learning around AI as being kind of like very well suited to this workflow. Of those types of workflows, have any sort of like added benefits and/or challenges that may be running a web scraper, or something like that, like some other sort of use case which is related to data, but maybe not involving sort of like serialized model files, and inference, and GPUs… Like, what are those things about these machine learning or AI workflows where you think either there’s specific challenges that people have, that are solved by this kind of quick cycle workflow, versus just kind of like other data-related workflows?

Erik Bernhardsson

Yeah, I think we focused a lot on online inference, recently. So basically, let’s say you have a model - it could either be some off-the-shelf model from Hugging Face, or some fine-tuned model that you have yourself, and you want to deploy that. And particularly if that model uses a GPU, the set of vendors that support that is somewhat limited. And the other reason why is also cost. Traditionally, if you go through the Kubernetes and EC2 route, if you want to deploy a model inference endpoint, you have to spin up an instance that sits idle for most of the time. You can set up auto-scaling, but auto-scaling is pretty slow.

So moving to serverless makes a lot of sense from a cost perspective, and so I think that’s the other reason why we’ve seen a lot of – it’s not just us; I saw that Banana was in the previous episode, for instance. There’s a couple of other vendors that are also focused on this. I think costs is driving a lot of that demand for serverless vendors, for GPU compute specifically.
I also think it’s something that just came up in the last few months, where a lot of people ended up realizing we’re very good at training models, building custom stuff… We don’t want to deal with infrastructure, running it in production. And so there’s been a lot of demand for vendors like Modal, where they can just take a model and publish it to Modal and run it in production, and not have to think about auto-scaling policies, and not have to think about setting up web endpoints, and dealing with security groups, and all that stuff.

That being said, Modal, kind of going back to its roots, we did – it’s not just online inference. We started out focusing a lot on what I think of as like embarrassingly parallel problem; this idea that you have something you want to fan out, and do a lot of stuff in parallel… So besides online inference, Modal also does a fair amount of batch inference, or sort of parallelizable things. A lot of people actually use this for web scraping; other people also use this for things like computational biotech, large-scale transcoding… You can also use this for various types of simulations or back testing, that kind of stuff. So there’s a pretty wide range of things at Modal as well. But I think right now, the user experience of online inference is like 9 out of 10, I would say, at Modal. The user experience for batch inference and large scale, like parallelism, is like 8 out of 10. We’re working on a lot of the other stuff, like data pipelines, like building more complex support for scheduling, and that kind of stuff, where right now it’s good, but it’s not quite yet where we want, or where we think the long-term potential is.

Daniel Whitenack

So Erik, I mentioned - and full disclosure to everyone in the world, I’m a huge fan of Modal, and have been using it a lot, and building things in it, including that side project I’m working on prediction guard… And I just counted - I’m in the interface now - I have 129 Modal apps deployed right now.

Erik Bernhardsson

Wow!

Daniel Whitenack

[16:00] So I want to try to describe from my end… It’s hard, because this is an audio podcast, and talking about how things work without showing something visual is a little bit tough… But I’m gonna do my best at trying to describe how I would describe it, and then I’d love you to fill in the gaps, or correct me if I’m wrong at any point. So if you think about running something in Modal, you can write a Python script, like let’s say app.py, or whatever; you can have functions in that script. And then actually, one of the things I love is like dependencies is a really annoying part of particularly AI and ML workflows… So you can decorate certain functions in your code with like stub.function, and then define a Modal stub in your code, which is essentially like referencing a container with certain dependencies in it. And then when you execute your code, you say, “python app.py”, and when it gets to executing that function in your code which is decorated with the stub, it actually doesn’t run it locally; it spins up a container in Modal, and runs that in the cloud. So you can do this either by just calling that function, or you can actually deploy then your app and have that function be accessible as like a serverless function, or a web endpoint for your other applications or your other APIs to access. So I don’t know if I did a great job at describing that, Erik. That was my initial attempt. Feel free to make that more coherent.

Erik Bernhardsson

No, I think that’s exactly right. I think you’ve touched on a couple of points of Modal where we maybe think different about infrastructure than other – in particular this guy Swyx wrote a great blog post about it, it’s called a self-provisioning runtime. To me, it’s been kind of putting words to an idea that I always had around – it’s sort of similar, if you ever used a service like Pulumi, for instance, or like TerraForm, or something like that… This sort of idea of infrastructure as code. But Modal has always gone further than that. It’s like infrastructure and the app code, like put it together in the same code, and have like the app itself define the infrastructure it needs to run.

So with Modal, in code you define the containers you need, including Python dependencies, or any other binary dependencies you need. You can have different functions using different containers, calling each other, just like Python functions, right? And it just like provisions itself. You can say, “This function should run on a GPU, this function should have 16 CPUs available, this other function is 128 gigabytes of RAM”, like in code. It’s zero config Modal. There’s not a YAML file. There’s nothing you can configure in Modal. Everything is in code. And to me, it all goes back to this idea of like how do you make developers productive and having the fast feedback loop? I think, traditionally, we’ve had to give that up, and basically make engineers run things locally in order to get the fast feedback loops they need… But then the problem is later they still need to deploy it to the cloud. And then you have a whole set of issues that then break, because the cloud is running in a different environment.

This goes back to what I said maybe 20 minutes ago - what if you can take the infrastructure and bringing it to the innermost loop of how you iterate; then you solve this problem with having different environments, because it’s always running in the cloud. And it’s fast enough that like it feels – some people even say Modal is faster than running things locally, even though it’s running in the cloud. You never ever have to think about these environment conflicts, because it’s always running in the exact same container at any time, and it’s fast enough that it’s not this frustrating thing where you have to build containers and push them around. You sort of get the best of two worlds. You get the developer productivity of running things locally, but you have the full power of the cloud, and all the power of containers, and GPUs, and whatever. I don’t know if that makes sense.

Daniel Whitenack

[19:59] Yeah, so Chris and I have talked about this at certain points in the podcast… I have always really had this disdain for maintaining a whole bunch of local environments as well. I’m not a Conda user like. I have very minimal setup locally on my machine, and one of the things I think I kind of grasp onto is “Oh, well, I can develop now locally with Modal, and just import OS and import JSON”, and kind of normal(ish) things, and import Modal… But when I need to access Transformers, or PyTorch, or some random other package that normalizes index scripts or something… I actually have zero concern about setting that up locally to test, because I can just add that as a dependency in the Modal function. And that runs in the cloud in its own container, so I actually never even have to install that locally. Now, I could do that maybe before using a local build of a Docker image or something, but again, like you’re talking about, Erik, that has another cycle associated with it, which is also annoying.

Erik Bernhardsson

Yeah. It’s kind of annoying.

Daniel Whitenack

So yeah, I love that this… I can just think through – like, my imports are minimal… I can even run like a PyTest, and it’s just testing as it’s going to run in production, right? Because it’s running in a container in the cloud already. It’s running that function. Yeah. So that’s kind of like a lot of my love, I feel like, that I’ve enjoyed about it. What are the surprising ways that you’ve seen people use Modal, that maybe have been unlocked for users that were really either difficult for them before, or like “Oh, I didn’t expect people to do this with Modal”? Have you encountered any of those things that stand out?

Erik Bernhardsson

I mean, model inference in itself is a little bit of a serendipitous thing for us; we didn’t expect that people would do that. In general, we thought of Modal primarily initially as more of like a batch workhorse, something that helps you scale out. But we’ve seen a lot of traction on online inference and model deployments, and so for that reason, we’re focusing a lot on improving startup performance right now… Because when you’re doing all that inference, you have to spin up containers very quickly. You also have to load models very quickly. And especially when you’re dealing with GPUs, there’s a lot of, you know, overhead of copying models to GPUs etc. So it’s getting that down. That’s been a big focus of ours for the last few months.

I guess another thing I’ve been sort of surprised by is we enabled the functionality to set up WebHooks pretty easily. So in Modal you can define “Oh, make this function expose to the web, and give it its own URL”, and now you can call this URL and it triggers something in Modal, it triggers some Python code. People started leveraging that for building full-blown web apps on mobile, which I was kind of surprised by… Like graphical UIs, and all kinds of stuff, and like hosting whole UIs. Because I never anticipated that being like a use case. I always thought of “Well, people are going to use, whatever, a reseller, Heroku maybe, for something like that. But that’s been sort of interesting to see that a lot of people are using that, so it’s pretty promising. Maybe there’s something more to be done there. Like I tend to think our bread and butter is machine learning and AI, and data pipelines… So I don’t want to go all-in on sort of building more like a web hosting platform, but I think there’s something interesting. It’s sort of similar along the same lines.

A lot of people have been using us more for sort of job queues type things, more like almost like a replacement to Celery. The idea that they create a Modal function, and then they can enqueue work for it, and they never have to think about scaling, or deployment, approximation of Celery job queues… And that was also something we didn’t really think about, but a bunch of people have been telling us they actually do, so that’s kind of cool.

[23:57] So I’ve got a follow up question here… And you’ve both sort of covered it to some degree already, but as the person who has not yet had the chance to use it, I’m really curious, and I’m imagining there are a few people listening as well, that are wondering… Could you take us, Erik, through kind of a classic workflow with Modal? We’ve done that with other technologies that you may have heard on other episodes, and stuff, but I’m trying to get in my mind… Daniel is doing it all the time, but I’ve been left behind a little bit on this. Kind of just take us through a typical AI/ML workflow on Modal, just verbally, like what the steps are, just to kind of show us that simplicity. People probably are going to be thinking about whatever they were on previously, if they’re on some other platform; just as a point of comparison about how you’re doing that, I’m just kind of curious if you can – any example is fine.

Erik Bernhardsson

Yeah, I mean, we’ve optimized a lot for making it possible to deploy things and run things in the cloud in a few minutes, so it’s actually pretty straightforward. In Modal, you basically take any Python function… So let’s say you have a Python function that maybe uses Hugging Face just as an example, and it uses some off the shelf model [unintelligible 00:25:02.27] Stable Diffusion. And so let’s say you have an existing Python function that uses Hugging Face, and it takes a prompt, and it returns an image. Now you can decorate that Python function in Modal with a special decorator, and then annotate it and say, “Use this image” and then define an image in code using the special Modal syntax. You can also give it a Docker file, but it’s actually – almost everyone just does it in Python internally.

So in code, you can say, basically, [unintelligible 00:25:30.22] and then install these Python packages, like Transformers, and Accelerate, and diffusers, and a few other things. And then annotate the function to say “Use that image”, and then that’s pretty much it. Now you can run on the command line, you can do model deploy, or model run, and then it just takes that code, builds the container if it doesn’t exist, and runs it in the cloud. And that can typically really take less than – if the image is already built, it typically takes about a second to take the code locally, spot a container in the cloud running that code… It works for any Python function.

I mean, that’s dead simple right there. Yeah, and it works for any Python function. You can run pretty much any code you want, because we support fat containers, meaning you can install Python packages, you can install FFmpeg if you want to transcode some video, you can install whatever thing you want. And we have a lot of functionality for manipulating images and building dependencies, and doing pretty advanced stuff as a part of that. Pre-baking models into images is something people want to do sometimes to optimize cold start performance.

But yeah, getting started with Modal, we really optimized for having that sort of like magic experience the first time you try Modal, like making it easy to install the Python package, set up a token, and run code immediately in the cloud. We want that first experience to be magic, and sort of set a tone for what Modal is. In fact, at Modal we think there’s a better way to work with infrastructure in the cloud.

Daniel Whitenack

So one of the things I was wondering about, which - I guess it was a surprise to me; I didn’t really think about it when I was first using it. Everybody has a different setup, but usually I’ve got my code editor over here, and I’ve got my terminal over here on maybe another monitor, or something. So I’ve got both up. And I was writing my – it was a WebHook in Modal, and I had a Python app whatever… And when it’s a WebHook, then the code runs, and then Modal gives you this link where you can ping like a development WebHook. And of course, I never get my code right the first time around, so I bring up Postman or something and I try to hit that link. And of course, I get whatever error… And kind of without realizing I just went over to my code, and I fixed it, and I just saved the file, and I saw over here in my terminal, like it just redeployed and gave me the link again. I think that was a really cool surprise for me, I guess. It’s like “Oh, I can just keep this up over here in the terminal.”

[28:09] How does that work exactly, and was that something that you stumbled upon? Because I found that a really satisfying way to develop, because it’s like “Oh, I just keep this up, I keep modifying the file and trying it until it works.” And then I can just like Ctrl+C and say “modal deploy”, and then I’m done.

Erik Bernhardsson

Yeah, for sure. I know I’m harping on it, but kind of thinking about like feedback loops, and the sort of iteration of speed… As a CTO I managed a lot of different teams. I managed data some frontend teams, some backend teams… And it’s sort of interesting how different disciplines of software engineering have figured out their own iteration cycles, like the ability to get feedback loops very quickly… Backend engineers tend to write a lot of unit tests; that’s like their way. They write some code, and then they run all the unit tests, or maybe they run a specific unit test they know is gonna break, and they have that sort of way to get a fast feedback loop. If you go to frontend engineers, they have kind of a setup like you just described - they have like one monitor with a website, and then one monitor where they write code. And when they save, it just hot-reloads the code. So I feel like sometimes data and backend people don’t give enough credit to frontend engineers. They have really figured out a lot of stuff around like software engineering for like fast feedback loops, and… Actually, if you look at like the modern toolchain for frontend engineering, I actually think in many ways it’s like more advanced than any other part of software engineering.

So that is the sort of feedback loop that I wanted to have with Modal, and what I think makes engineers happy is that super-snappy feedback. You just have to save code, and then it’s live in the cloud. Yeah, so we built that specifically for the web-serving part of Modal, because that’s something you kind of want to have, is that ability to – it’s maybe less like sort of visual feedback, but it’s like the ability to deploy something in the cloud and then you can hit it with Postman or Curl or whatever, immediately.

Yeah, I mean, under the hood, it’s not super-complex. Actually, we refactored it yesterday; it’s kind of funny… We just monitor the file system, and then when we see that any file was updated, we just reload the entire app in a sub-process and live patch the app running in the cloud. So it’s pretty straightforward. We had a lot of that already built, so…

Daniel Whitenack

I think the problems you’ve been solving for like the past two years are probably really complicated for you to loop that into the category of really simple problems; I think that would probably be quite complicated for many, many people.

Erik Bernhardsson

Yeah, for sure. I guess it’s simple in the sense that it’s sort of – you know, we already built so much of the underlying complexity to make that relatively easy to support the hot reloading… Like, the fact that we already built so much complexity around like take code and deploy to the cloud, and do that very quickly… That’s a very nice foundation to then leverage the –

Daniel Whitenack

Yeah, it’s your sort of bread and butter.

Erik Bernhardsson

Yeah, yeah, like building that fast container, fast filesystem stuff. It’s a lot of cool stuff that that unlocks.

So this is a particularly interesting episode, I would argue, for me, and probably for quite a few of our listeners that listen regularly… Because we’re talking about something - and Erik, we have the privilege of you as the person who’s created this, but we also have Daniel, whom I’ve been working closely with, and our listeners have been listening to… And hearing Daniel’s passion and him building his own business on your platform… And we talked to lots of different companies… And so it definitely has intrigued me in a way that not every different company owner, if you would, has. I’m kind of curious - I’m thinking about it from a slightly different perspective from Daniel, but you’ve really got me wondering how to make this happen. I work at a big company, as you know, we have big investments in kind of the big cloud providers, as all large companies do… What are good strategies for companies to say, “Okay, we have so much in these other big names and stuff that are out there…” How do we start using Modal effectively? What are the kinds of things you’ve seen your larger customers do in terms of migration over, or things that you might recommend, that enable something of a migration to be more seamless, less painful…

[32:18] Because normally, when you think of large company migrations, they are almost always fraught with pain, and misery, and challenges for the IT crews. So how do people get to this thing that we’re hearing about today, and mitigate all of those problems?

Erik Bernhardsson

Yeah. I mean, first of all, admittedly, we’re fairly early, and so a lot of our customer base is early-stage companies, like starting from a clean slate, who have absolutely zero infrastructure. And that makes it a little bit easier…

It does.

Erik Bernhardsson

…in part because there’s like nothing legacy that they have to port over, in part also because they’re just desperate for tools, and so the sales process is a little bit easier for us… I find that the conversation when we talk to bigger customers is obviously quite different. First of all, there’s often an existing data platform that’s already built in-house; there’s, of course, also a security compliance question, and that’s something we’re working on. I think long-term there’s a lot of really cool stuff you can do around VPC peering, and other things to enable big companies to have the security guarantees that they need. But I also think it’s a separate conversation, where – at a bigger company, there’s one person who’s a decision-maker who has the credit card, there’s another person who built the data platform, who now is saying, “Oh, actually, we shouldn’t use that. We should use Modal instead.” And so it’s tougher competition. And then there’s maybe a data scientist; the third person is the data scientist, and they really want to deploy models, they don’t really care about the infrastructure, but [unintelligible 00:33:40.08] about Modal… I tend to think in those conversations it’s about finding a niche use case that’s low risk, just to set in some sort of critical path of the whole business relies on this… And so it could be some sort of greenfield, something new, deploying a model or a very simple pipeline, something that maybe doesn’t touch super-sensitive data, or have super-critical guarantees. Something like a research project… That’s typically where I tend to start is. And often, trying to find people, data scientists and machine learning engineers who feel like the platform team doesn’t really have time for them. They want something that lets them iterate quickly, without having to bother the ops team. Those are probably the easiest conversations to have inside the bigger companies.

That’s good guidance. I appreciate that.

Daniel Whitenack

I think it’s very fitting that the platform is, at least right now - and please correct me if I’m wrong - very Python-centric in terms of like the development workflow and what’s supported. Do you see this being sort of like – like you said, you can support so many different types of jobs and apps in Modal… So on one side, you could say, “Well, this could become very general-purpose”, in some ways. Or it could fill a really niche gap, that obviously it is starting to fill, and just do that really well, and like continue to kind of go deeper there. What do you see as kind of the path forward? Or maybe it’s a both/and, with some things coming sooner than later…

Erik Bernhardsson

I think of Modal as my 20-year project. I’m finally building a tool I always wanted to have, and I want to spend the rest of my career doing that, ideally. My end goal is to build a very general-purpose set of tools that help data teams be more productive. That being said, kind of like what I said at the start of this show, I realized that’s almost like a megalomaniac mission. I think it all comes down to like in practice to finding something that resonates with customers, and drives growth, and validates demand and then sort of sequencing, kind of layering on sort of adjacent products over time.

[35:58] We tend to think right now we have one sort of use case, and one sort of target persona that works really well right now, which is deploying online machine learning inference. I think that is an area where we see enormous amounts of demand and traction. So kind of how that fits into sequencing - I think an obvious next step for us is to make fine-tuning and training easier to do in Modal, but also thinking about pre-processing, scheduling, retraining that sort of happens in a loop, on a regular basis. Maybe thinking about how do you move your datasets into Modal to some extent too, and hosting more like stateful applications…

I think there’s a long list to sort of layering on step by step more and more advanced features, and gradually expand to take over – because I think the demand is there. No one wants to have this 35 different points solution that they have to integrate themselves, right? And a lot of the data landscape today, I think, is very fragmented. And as a result, a lot of data teams have to integrate so many different vendors, and kind of duct tape them together… I think there’s a big case to be made for either some sort of consolidation, or some sort of defragmentation of the space, where fewer vendors do more. So long-term, that’s absolutely my vision. We’re starting with this [unintelligible 00:37:14.08] Similarly in terms of languages, right? Like you mentioned, Python versus other languages - we think Python right now is a great place to start, because that’s 90% plus of data teams use Python. But I definitely think long-term, a lot of the infrastructure that we built is low-level, and it’s written in Rust; it doesn’t really care about what stuff it’s running. We think it could be great to add support for TypeScript, or R, or Go, or Rust, or whatever. So there’s many different axes to this, like in terms of how we think about sequencing and expansion.

I’m just saying, you saw me raise my hand. I love Rust; it’s my current favorite language.

Erik Bernhardsson

Nice!

Go and Rust are on the backend. But let me ask you a question that came to mind as you were going through that. As you’re kind of exploring the world, and you have certain areas of focus, but there’s also some kind of able to stretch out depending on different parts of the strategy you have, how do you see kind of - to use a very generic, open term - the edge out there? …things that are not in the cloud. Do you see you doing anything in the future that would be kind of edge-based, or do you see yourself more as the cloud partner for things that might be out on the edge, and you have APIs and such available to those? How do you conceive either working with, or including the edge in your overall strategy?

Erik Bernhardsson

I think edge is primarily useful for like very, very latency-sensitive applications, and that’s probably a segment of the market that we just feel like that’s not what Modal is going to be good at. Because if you do things in like WASM, or V8 isolates… In that case, you can make it like kind of fast enough. But the way we focus on serverless right now is sort of fat, traditional Linux distributions in containers or VMs. And that just has – it’s always gonna have some non-trivial overhead; maybe a second, maybe eventually we can get it to a few hundred milliseconds…

I think the edge workloads that people talk about - that’s when you really need one millisecond, right? You’re really trying to – either you’re doing some IoT type controlling devices for manufacturing, or you’re doing high-performance CDN, SEO type stuff, where you want your website to be absurdly fast… Those types of workloads I don’t really think Modal is suited super-well for, and I’m more than happy to let other vendors dominate that space. We tend to think on the timescale of a few hundred milliseconds and up; that’s where we focus right now.

That’s a great answer. And definitely – I mean, trying to address every problem out there in the larger space isn’t a successful approach. So when I talk to people and I hear “No, we’re not gonna go there”, I usually take that as a very good thing in terms of focus and good strategy… So - good to hear that.

Erik Bernhardsson

Cool. Yeah.

Daniel Whitenack

[40:04] As we wrap up here, I’d be curious to hear – obviously, you’re very passionate about this project, you want to work on it for 20 years, this is your life’s work, it sounds like… What are the things that are on your mind right now in terms of the things that you’re excited about seeing happen in Modal? And like over the next year, what are you most excited about seeing come to pass as you continue working on the project?

Erik Bernhardsson

The thing that I personally spend the most time on is probably figuring out the ergonomics of the SDK itself; like, in code, how do you express programs that execute in a distributed way in the cloud, and still making it feel like intuitive and easy to the user, without having to think about the fact that this function runs in a different container than this function.

We made that work reasonably well for online inference, but I think when you go to training and start dealing with file systems, there’s certain things that are still a bit like gnarly, and I’m working a lot on that right now. So making that user experience good and sort of intuitive I think it’s really important.

On a similar note, Modal right now is somewhat janky when you run it inside notebooks, for some particular reasons; I’m not gonna get into it, but it’s something I definitely want to make – the user experience, if you’re running Modal inside a notebook, I think should obviously be… You know, we need to fix that, too. It’s fine. It’s not like terrible, but I definitely don’t think it’s quite yet where it is if you run Modal in a script.

There’s always backend stuff… We definitely need to scale this up 10x or 100x the scale we are. We see a lot of demand… Modal does not have a publicly available sign up right now. Like, you sign up and you go on a waitlist. And part of it is that we just want to have a little bit more control over the scale; there’s a lot of work we need to do on the backend to build the foundational architecture running all of this stuff. It’s a very hard problem; it’s building, essentially, our own lambdas, our own Kubernetes.

There’s a lot of work we need to do on GPU support, and in particular, cold start with GPU models, and fast-loading of GPU models. So those are some – there’s a lot of cool work, we’re spending a lot of time on there, especially when it comes to like containers, and a general like isolation of VMs. It turns out that supporting GPUs in a secure way, in a multi-tenant environment is quite hard, so we’re going very deep - I’m reading about Linux Device Drivers, and CUDA, and trying to understand all those things.

Yeah, I mean, those are all the things we’re working on. I think, in a year’s time, I think Modal will see a lot more traction for other things than just online inference. We’re gonna see a lot of people using Modal for training, we’re going to see a lot of people using Modal for parallelization… I think we’re going to have much more customers on the enterprise side; right now we’re focusing very much on the startups, but we’re laying a lot of the security and compliance work to be able to go upmarket. Yeah, those are some of the things we’re pretty excited about.

Daniel Whitenack

Yeah, yeah. There’s a lot to be excited about. And yeah, please pass on my personal thanks again to the Modal team for making me look good today, and recently. I’m really excited about what you’re doing, and we appreciate you taking time to chat with us.

Erik Bernhardsson

Yeah, of course. I’m also very excited about this, so always happy to talk about it.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art