Stephan Ewen, Founder and CEO of Restate.dev joins the show to talk about the coming era of resilient apps, the meaning of and what it takes to achieve idempotency, this world of stateful durable execution functions, and when it makes sense to reach for this tech.
Stephan Ewen: Let’s say you’re recording the episodes and then every time an episode is done – no, let’s do an AI thing here, or so.
So you’re building your chat where you can chat with an episode, like “Okay, tell me where did they talk about this”, or “Tell me what episodes talked about these topics”, and so on. So what you’re doing whenever an episode is done - you’re feeding it first through a model that transcribes the audio, then you’re chunking it up, feed it through embeddings models stored maybe in a vector database, and then you have kind of a RAG style way of… You know, when a query comes, create the embedding, look up the similarity search in your vector database, feed it to the model to get the answer… For something like this, if you – let’s say you started just building the flow in let’s say a Node.js application, like in a simpler way; you just said “Okay, here’s the episode.” I have something, it gets uploaded… Let’s say you’re uploading it to an S3 bucket and there’s an event whenever something gets uploaded to this bucket; you have an event that represents this, and then it starts a Node.js script, or something like this. And this script is of the type that if it fails, somebody would have to restart it. Now let’s say you’re trying to implement that with Restate. I would say approach it the following way… The first thing is get a handle of Restate itself; there’s a cloud service that you can use on our site, which has a free tier. Either go there, or just use one of these ways to run it yourself on a single machine with an EBS volume.
[01:22:06.14] Then you have the server there. Then put your Node.js script maybe – you can actually put it on something like Lambda or ECS; just like use a serverless option to host this. And then use the Restate SDK to define the entrypoint and tell Restate “Okay, hey, here’s the service that you now should durably manage.” So Restate will then go there and discover this and understand “Okay, hey, there’s this (what what do we call it?) video transcriber, or video embedder service.” And then Restate knows about this. And then you would go to your Amazon console and say “Okay, for this type of event I want to create a webhook to Restate”, so that it makes an invocation to Restate and says “Okay, this thing has been updated.” You know, the kind of event that would previously call directly your Node.js process or script, you actually make it an HTTP call to Restate, and Restate would then call your process.
You’ve already gained one thing right away. You now basically have a reliable queue in front of it, right? Like, just that if you don’t do anything special. So when the webhook comes, it’s going to be acknowledged back, and Restate has this [unintelligible 01:23:15.22] of your process if it crashes, it will retry this… It will actually give you a nice observability; much more than you would get from your average message queue, about like individual retries, configuration about time-offs and back-offs and timelines and so on.
As a next step, you would actually then go into your script and say “Okay, let’s actually identify the steps where if something fails in this step, or after that step, I don’t want it to go back.” Like let’s say forking the process that does the transcribing, or like calling the LLM to create the embeddings. You introduce then the Restate context that you get by using the Restate SDK and just say, “Okay let me wrap these API calls just with restate.run.” That will capture the results of this durably, and basically turn – you’ve now turned it basically into a workflow.
Let’s say you want to do something like parallelize the different steps. Maybe just typing this one by one through this embeddings model is a little tricky thing… You want to fan out. You could then go and say “Let me try and do the exact same thing I do in a regular Node process and just make a bunch of function calls record, like remember the promises”, sort of a way to promise that all for those in the end join the results, and put those in the database. You can do exactly that in your code. Just, again, anchor this in the Restate context, so you get like this durable parallelization, durable stuff like scatter, gather, and so on. And so you would then incrementally sort rewrite your code to say “Okay, let’s actually make this step durable, let’s make that step durable, and that step durable.”
Say as a next thing maybe one of your folks wants to approve it before it really goes out. So you then possibly – you “Let’s do that in the simplest most possible way”, which is we create an awakeable or a durable promise in Restate and say “Okay, somebody needs to complete this actively”, like send an event, make an HTTP caller to complete this and say “Okay, this is approved, go through” or “No, this is not approved. Abort.” You can then use, for example – you could put the result, the transcription just in Restate, save it, and somebody could look at it from the UI and then say “Okay, yeah, I’m making an API call here to approve this”, and it continues. And so you can then incrementally rebuild your process into durable steps.
As the next thing you could then, for example, take it and migrate it from a long-running process to a Lambda function. Because one of the nice things you have with durable execution is when it’s waiting for something else to happen, it can actually just make this thing go away. Because with durable execution, it knows how to recover back to the place where it was by replaying the history of durable steps.
[01:26:14.28] So you could then say – you know, if you’re on vacation and you approve it a week later, you don’t have some process running and waiting for it. It’s just like, it’s going to go away, and when the approval finally comes, it’s going to come back, use the durable steps to replay back to the point, and then to the remaining steps. And so typically, folks would incrementally then rework their non-durable services; first connect them to Restate to basically get the equivalent of a durable queue, and then incrementally rework it and say “Okay, I want to add durable steps here, maybe parallelization, maybe a signal”, and…