Stephan Ewen, Founder and CEO of Restate.dev joins the show to talk about the coming era of resilient apps, the meaning of and what it takes to achieve idempotency, this world of stateful durable execution functions, and when it makes sense to reach for this tech.
Stephan Ewen: Letâs say youâre recording the episodes and then every time an episode is done â no, letâs do an AI thing here, or so.
So youâre building your chat where you can chat with an episode, like âOkay, tell me where did they talk about thisâ, or âTell me what episodes talked about these topicsâ, and so on. So what youâre doing whenever an episode is done - youâre feeding it first through a model that transcribes the audio, then youâre chunking it up, feed it through embeddings models stored maybe in a vector database, and then you have kind of a RAG style way of⌠You know, when a query comes, create the embedding, look up the similarity search in your vector database, feed it to the model to get the answer⌠For something like this, if you â letâs say you started just building the flow in letâs say a Node.js application, like in a simpler way; you just said âOkay, hereâs the episode.â I have something, it gets uploaded⌠Letâs say youâre uploading it to an S3 bucket and thereâs an event whenever something gets uploaded to this bucket; you have an event that represents this, and then it starts a Node.js script, or something like this. And this script is of the type that if it fails, somebody would have to restart it. Now letâs say youâre trying to implement that with Restate. I would say approach it the following way⌠The first thing is get a handle of Restate itself; thereâs a cloud service that you can use on our site, which has a free tier. Either go there, or just use one of these ways to run it yourself on a single machine with an EBS volume.
[01:22:06.14] Then you have the server there. Then put your Node.js script maybe â you can actually put it on something like Lambda or ECS; just like use a serverless option to host this. And then use the Restate SDK to define the entrypoint and tell Restate âOkay, hey, hereâs the service that you now should durably manage.â So Restate will then go there and discover this and understand âOkay, hey, thereâs this (what what do we call it?) video transcriber, or video embedder service.â And then Restate knows about this. And then you would go to your Amazon console and say âOkay, for this type of event I want to create a webhook to Restateâ, so that it makes an invocation to Restate and says âOkay, this thing has been updated.â You know, the kind of event that would previously call directly your Node.js process or script, you actually make it an HTTP call to Restate, and Restate would then call your process.
Youâve already gained one thing right away. You now basically have a reliable queue in front of it, right? Like, just that if you donât do anything special. So when the webhook comes, itâs going to be acknowledged back, and Restate has this [unintelligible 01:23:15.22] of your process if it crashes, it will retry this⌠It will actually give you a nice observability; much more than you would get from your average message queue, about like individual retries, configuration about time-offs and back-offs and timelines and so on.
As a next step, you would actually then go into your script and say âOkay, letâs actually identify the steps where if something fails in this step, or after that step, I donât want it to go back.â Like letâs say forking the process that does the transcribing, or like calling the LLM to create the embeddings. You introduce then the Restate context that you get by using the Restate SDK and just say, âOkay let me wrap these API calls just with restate.run.â That will capture the results of this durably, and basically turn â youâve now turned it basically into a workflow.
Letâs say you want to do something like parallelize the different steps. Maybe just typing this one by one through this embeddings model is a little tricky thing⌠You want to fan out. You could then go and say âLet me try and do the exact same thing I do in a regular Node process and just make a bunch of function calls record, like remember the promisesâ, sort of a way to promise that all for those in the end join the results, and put those in the database. You can do exactly that in your code. Just, again, anchor this in the Restate context, so you get like this durable parallelization, durable stuff like scatter, gather, and so on. And so you would then incrementally sort rewrite your code to say âOkay, letâs actually make this step durable, letâs make that step durable, and that step durable.â
Say as a next thing maybe one of your folks wants to approve it before it really goes out. So you then possibly â you âLetâs do that in the simplest most possible wayâ, which is we create an awakeable or a durable promise in Restate and say âOkay, somebody needs to complete this activelyâ, like send an event, make an HTTP caller to complete this and say âOkay, this is approved, go throughâ or âNo, this is not approved. Abort.â You can then use, for example â you could put the result, the transcription just in Restate, save it, and somebody could look at it from the UI and then say âOkay, yeah, Iâm making an API call here to approve thisâ, and it continues. And so you can then incrementally rebuild your process into durable steps.
As the next thing you could then, for example, take it and migrate it from a long-running process to a Lambda function. Because one of the nice things you have with durable execution is when itâs waiting for something else to happen, it can actually just make this thing go away. Because with durable execution, it knows how to recover back to the place where it was by replaying the history of durable steps.
[01:26:14.28] So you could then say â you know, if youâre on vacation and you approve it a week later, you donât have some process running and waiting for it. Itâs just like, itâs going to go away, and when the approval finally comes, itâs going to come back, use the durable steps to replay back to the point, and then to the remaining steps. And so typically, folks would incrementally then rework their non-durable services; first connect them to Restate to basically get the equivalent of a durable queue, and then incrementally rework it and say âOkay, I want to add durable steps here, maybe parallelization, maybe a signalâ, andâŚ