Workflow orchestration has always been a pain for data scientists, but this is exacerbated in these AI hype days by agentic workflows executing arbitrary (not pre-defined) workflows with a variety of failure modes. Adam from Prefect joins us to talk through their open source Python library for orchestration and visibility into python-based pipelines. Along the way, he introduces us to things like Marvin, their AI engineering framework, and ControlFlow, their agent workflow system.
Adam Azzam: If I return back to – I gave like five or six theses of things where I was encountering failure a lot… And I think that some of those things are sources of failure that many folks are familiar with. I’m calling out to an external service, and the service is flaky, and it’s bad… That’s existed forever. As long as people are building data pipelines, upstream sources being flaky - that makes sense. Hitting deterministic errors of like - I’m ingesting data, but somebody added a new field, or they removed a field, and I’m dependent on that, and now my pipelines broken. Or I’m scraping something, and target.com, instead of labeling the name of their product with a div whose name or ID is this, they’ve changed it to that, and now all of my data is corrupted, because I couldn’t detect that in real time.
[00:16:12.02] And then the last piece is the loading part, where you’ve gotten, you’ve cleaned all your data, and now you want to go put it in a persistent place where you can go query, or do analytics on. So that’s like classical stuff. The extraction, calling out to an external service, the transformation that you’re doing deterministically in the loading. So the sort of ETL business of all of that. That’s like a persistent problem that exists far before Gen AI, or ML workflows… And that’s sort of what’s been category-defining for workflow orchestration. That’s like the single case that people usually break in an orchestrator for is when they’re doing ETL type jobs.
I would say that what’s unique these days is that since workflows in LLM land are now more dynamic, so you really can’t plot out every single thing that’s going to happen from the start, they are now basically – we’re dealing in English now, or you may not know the full space of their responses at the beginning of a workflow. So I think that LLMs introduce a dynamism component that’s hard to reason about, and has kind of escaped classical workflow orchestration.
I think the second piece is that the nature of errors here just feel totally new. So the fact that you can ask an LLM for a particular shape of a response, and then you can get parsing errors out on the end of it - that’s a new source of failure that’s now buttoned up with, say, some commercial LLM providers that give you very structured, guaranteed output, so you don’t see as much parsing errors… But now – I like to joke that you can lead an LLM to JSON, but you cannot make it think, where - like, you can say “Look, I’ve got this job description, and the title is… I’m going to give you a schema that says the job title, the location”, or whatever. And sometimes you’ll say the title, it’s required, it has to be a string, and the response you get out is “I’m sorry, but I could not find a title.” And when you’re doing this at tens of thousands of jobs, now you also have to reason about “Okay, I got now the parsing error, but the error was pushed down deeper in the stack.” Now there’s data quality errors that I have to reason about, that I didn’t really have to account for in last generation’s ETL. Things were much more deterministic, we had stronger contracts about what you’re going to get.
And then I would say the last piece is – what makes this harder is that so far… And I hate to keep throwing out random definitions. I hate being a merchant of complexity, and talking about why things are super-hard… But trying to at least motivate why this is a new source of difficulty, right? Like, we’ve got tools to handle this, but why do we have these tools in the first place? And the last piece is around like agentic workflows. Now, this is a buzz term… So what do I mean when I say “agentic workflows”? Everything that I’ve talked about so far, of like I get a document, and I want to extract stuff from it. Or maybe I want to classify it, or I want to summarize it. These are all sort of modern takes on classical ML problems, right? You don’t have to bring as much training data, you don’t have to train a model first… You’re basically throwing the weight of the compressed internet at every problem that you come across.
But with agentic workflows, what I mean by that - those are things that operate in a loop, are able to call out to external tools should it choose to, and can create, refine and reflect on its own plan, which means when you’re orchestrating agentic workflows, you have to do this interplay between who’s doing the orchestration… There are some times where I’m coming correct with a plan, and I’m saying “First extract this topic, then classify it, then write an email and send it off”, but now I have to be able to add resiliency to a workflow that I’m unaware of at the beginning of it. So if it decides “Call out to this tool, call out to this API”, I now need to be able to reason about resiliency for a workflow that I don’t have any visibility to at the beginning.
And so I would say that those last three pieces around parsing, around not knowing your your full decision space at the beginning, and then how that feeds into now having to hand off some bits of orchestration to the LLM to create its own tasks that you now have to execute, I think that’s what makes the new generation of orchestration a much harder and a much more interesting problem.
Break: [00:20:44.08]