We all hear a lot about MLOps these days, but where does MLOps end and DevOps begin? Our friend Luis from OctoML joins us in this episode to discuss treating AI/ML models as regular software components (once they are trained and ready for deployment). We get into topics including optimization on various kinds of hardware and deployment of models at the edge.
Changelog++ – You love our content and you want to take it to the next level by showing your support. We’ll take you closer to the metal with extended episodes, make the ads disappear, and increment your audio quality with higher bitrate mp3s. Let’s do this!
Click here to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I am a data scientist with SIL International. I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?
I’m doing very well, Daniel. It’s a beautiful spring day here in Atlanta, and we are going to have a good time for the next hour or so.
Yeah, definitely. It’s interesting, the topic of MLOps has increasingly come up on the show, and we’ve had different takes on the topic, and I’m really excited today to have a different perspective on that, and to welcome back to the show, Luis Ceze, who is CEO of OctoML. Welcome back, Luis.
Thank you, Daniel. Thank you, Chris. It’s great to be back here. I had a lot of fun almost a year ago now.
It was almost a year. It’s crazy, yeah.
A lot has happened since then, and right now it’s also a beautiful spring day in Seattle as well.
Yeah, it was about a year ago we talked through some things about Apache TVM and OctoML. Do you want to give just a quick update on maybe the Apache TVM world, and then maybe circle back over to OctoML and what’s been happening with OctoML in the meantime?
Yeah, absolutely. So on Apache TVM, lots of progress there. The community kept growing nicely and steadily with a lot of fantastic people doing work on machine learning systems, compilers, and so on. So TVM, there’s a lot of progress on automation and better performance – we call it performance automation to make it easier to get to high-performance machine learning code in different hardware. We held our TVM conference in December last year, the largest ever. We had just about 1,600 registrants and 1,700–
Oh, wow. That’s awesome.
…and 700 people actually attending live, and then more folks that consumed the content after that. And it was really nice to see contributors to TVM, but also folks from the general machine learning acceleration community, and a hundred vendors participate, and cloud providers participate, and so on.
[03:55] Also related to Apache TVM, we announced the TVM Unity effort, which is essentially an effort on bringing together all the key threads in performance automation, extensibility, and so on; an integration with the rest of the ecosystem front and center on TVM. So our view there is really what we call not too opinionated on how you actually get a model to run well on the harder targets, about how do you actually enable people to do what they want productively, including using other pieces of the ecosystem. So yeah, TVM is moving along really, really well, so it’s great to see.
And now on the OctoML side - so since may last year, we more than doubled. The team is about 130 people now. We made significant changes to our platform, to the SaaS platform that uses TVM as one of its key components to automate the process of deploying machine learning models. And we recently released also a private accelerated model hub, which is a set of models that are pre-accelerated to a bunch of different hardware, so folks can see the power of the platform in automating the process of getting models from the hands of data scientists into deployable artifacts. We also formed a lot of former partnerships with key hardware vendors like AMD, ARM, Qualcomm, and cloud providers like Microsoft Azure.
Yeah, that’s awesome. You mentioned this idea of the hub… We’ve definitely seen model and data hubs just grow huge over the past year… Probably one of the things over the past year that we’ve really seen explode is Hugging Face with 30,000 plus models now, and other hub environments… So yeah, it’s interesting to hear how that kind of idea is impacting a lot of – whether it’s people trying to optimize models for certain hardware, or people just trying to try out things, and that sort of thing. Has that impacted the types of clients and customers that are coming into OctoML, because in general, they can access models much quicker, and then they realize, “Oh, these are slow”?
Right. Yeah, great point. It absolutely has affected – so not just customers that come our way, but also, I’ll say the entire ecosystem… Because the way I see the maturation of these model hubs is that it’s much easier for folks to find models to start from, or even find models that already do what they need to do and just get them to deployment. So it really make it a lot easier for folks to get to a working model that does what they need to do… Which means that a lot of the action now shifts to how do you get these models to production, to add value as part of an application.
It’s great to see our friends at Hugging Face make incredible progress in democratizing creating new machine learning models and creating communities around it. And our point of view on the model hubs is to complement that. So we’re not talking about a place where people come and find new models like Hugging Face and refine those models to create other models; it’s more about, “Here are some popular models that people can come and see pre-accelerated to a bunch of different hardware targets, and see how they compare across edge devices and cloud instances and so on.”
Yeah. And maybe as a reference, I like to bring this up occasionally in the podcast, because a lot of people are focused on that training new models side, and maybe coming up with cool demos and such… But the bulk of what happens in industry in terms of how you run models is inference, right? So have you seen people come in, they’re really excited about the demo that their data scientists created, but then just really blocked on that sort of – what is the typical kind of process and viewpoint that you see come into OctoML, where they maybe… Do they already know what they want to do, and they’re just really blocked on scaling that up, or is it something else?
[08:00] Yeah, the parts of the flow that we cover is you have a model that you want to deploy, and then you have to navigate all the paths from a model to what kind of hardware you’re going to deploy it on, and for that hardware, what kind of libraries and compilers and tools you should use actually to arrive at a deployable artifact. And even just extracting the model from what the data scientists produce, from model to a working piece of software is something that takes manual work, and then we automate that, right? But this might be a good moment for us to step back and talk a little bit about MLOps, right?
So thinking about the entire flow from data to a deployed model that’s actually adding value to a business or adding value to a user, there’s several steps, right? So you curate the data to create training datasets, and then you think about model architectures, and you train models, or you do some architecture search to find what’s the right architecture for the model that you want. And then after that, once you arrive at a model that has the right statistical properties for what you want to do, you need to turn that into a working piece of software that you can deploy, right? And that step is very labor-intensive. You have to extract code from – it could be a jumbled mass of Python code to go and extract a model that you can put in a box with a clean interface. So you have to extract that.
That terrible button in your notebook that’s like, “Export this notebook to a Python script…”
Yeah. And then from there to a working piece of software that you can go and deploy is a lot of work. And then you have to go and optimize, make sure it has the right performance properties such that it has the right latency in case it’s interactive, or it has the right throughput in case it works in batch… All the way to, if you’re going to deploy it in the cloud, make sure that you find the right, cost-effective way of doing so with the right… And also have the right reliability and the expected behavior in deployment, right?
Our opinion, by the way, is that model creation and model training does have its special place in the flow. I can understand why in MLOps people think about those steps, but I think everything else comes after that, like how do you process a model into – how to put a model in a container, and how do you monitor that model in deployment? How do you build CI/CD integrations? And so on. All of that should just be DevOps. People are building and calling that MLOps as well. I feel like it creates a lot of confusion, because the way I think about machine learning models today, if you really zoom out a million feet, is they are an integral component of any intelligent application today, which is pretty much any application that we’re excited about today can be called an intelligent application, right? So it has a natural user interface, can recognize your voice, recognize gestures, it has rich media, and it has machine learning components as an integral part of them.
But machine learning models are not treated as any other piece of software. They’re treated as this special thing that’s just hard to deploy, hard to integrate, and so on, and we need to get past that, I think, to improve the cadence of innovation with intelligent applications, so people don’t have to treat machine learning in any special way. And this colors a lot in how I and we see at OctoML, the value that we can add is really enable folks to treat machine learning models as if it were any other piece of software.
You have no idea how relieved I am to hear you say that, because that’s like a huge hot button issue for me. I mean, that’s like – Daniel’s heard me rant about this repeatedly over time… The model is just part of the software, at the end of the day. A model that’s not, is not usable. So as you’re looking at deploying it out, as maybe MLOps kind of outgrows its diaper and gets into the big boy pants of DevOps, and actually becomes part of the real world around it, and usable. How is that changing? We talked to you a year ago and kind of had some of the same conversation, but I know this is a very fast-moving ecosystem, and the evolution at it. So can you talk a little bit about – I shudder to say it… As MLOps is kind of growing up, and as hopefully it gets further and further recognized and integrated into DevOps, how do you see that evolving over time into maturity? What does that look like to you?
[12:05] Great question. So I would say that one thing that’s changing fast, that I think people are starting to understand or have some common view of what MLOps is… Because if you ask ten people in this space what MLOps is, you’re probably going to get twelve answers… So people are going to have multiple answers to that too, right?
I think there’s clarity being brought there on what is it that should have a different name. And if you allow me to be cynical for just a second, sometimes people like giving names to things because it makes it easier for you to grab attention from the investment community, from investors… It’s easier to grab attention from folks that are more on the – they’re on the cool technology side, “Let’s give a name to something so it looks different.” But I feel like sometimes you give a new name to something that already exists, that you can just do better, can call was a lot of confusion, right? And I think that there’s some maturity that’s starting to happen on what people call MLOps. And I’m glad to see that most of it is going more and more towards how you deal with data, and how you create models, because everything else I think should just be called DevOps, Chris. Should it even be called MLOps?
We’re building solutions, aren’t we? We’re building big solutions to solve real-world problems, and these are all parts of that solution.
That’s right. Exactly. And then to answer your – I completely agree with that, 100%. And to answer your question more directly, what I see changing even in that space is a couple of things. One, I would say a year ago there was a lot more emphasis on end-to-end fully integrated platforms; like, you get SageMaker, or you get Azure ML, or some other big tools. But now I think there’s a lot more attention to best-in-class for each one of the steps in this flow, and have clean integration points, right? So you know, have tools to deal with data that has a clean integration point with how you move on to the training steps, how do you move on to, say, network architecture search, to how do you package the model and how do you actually monitor the modeling deployments. And where we sit in that flow is, again – and we have an API that allows you to… You know, we take a model as input and we produce your deployable artifact.
There’s a lot of evolution. I can’t talk about all the details, because otherwise my marketing folks are going to be mad at me, but you’re going to hear more about what we’re up to soon. But basically, in a word, we’re all about automation. So automating the manual steps of getting a working software model from, say, your Jupyter Notebook, your Python scripts, and putting it into a container that you can deploy, right? So that’s highly specialized to the hardware target.
But once we do that, if you put this in the right format, you should be able to use your regular DevOps flows, right? So if you have the right API, you can use GitHub Actions, for example, to do CI/CD on your model. As you change your model, you run through this flow. And then if you put in the right container format, you can use existing microservices to serve your model, right?
And then if you also put the right hooks for monitoring, you can collect data from these deployed models and put it in, say, a Datadog, where you’re visualizing your data. You can put views on top of that to look at model behaviors. So that means that I think a lot of the work being done in model monitoring today is really, really important. But I think it’s less about the MLOps part of it, of like, how do you collect the data? But it’s more about how to abstract it away and find higher-level behaviors for models that you should go and debug, because those things are different than how you debug software today, right? Sorry, that was a very long answer to your question, Chris. I don’t even know if it was an answer, it was more like a quick tangent there.
So one thing I was thinking about, Luis, as you were talking about this transition from MLOps to DevOps is just, I guess, the team dynamics that are at play here, and the sort of human dynamics. And I’m just thinking in my own experience with the teams that I’ve been on, and working on AI models and different applications, there is this real sense that you can do a lot with, like you said, GitHub Actions. I love GitHub Actions. I do so much with that. But that sort of onboarding into that for someone coming from a Ph.D. scientist route, and they have no idea this thing exists, right? And as soon as they find out, they can – they’re smart; they can grab onto it and use it and figure out ways to use it, or things like Datadog, like you were talking about, other things out there. How much of this sort of confusion in the terminology and the workflows here is caused by this mismatch of what teams are aware of, and their spheres of knowledge, versus actual functionality differences?
Yeah, I know. This is a great question. So I would say that a key aspect in the human and team dynamics today is that you have folks that create models, typically data scientists, or some people call them ML engineers or data engineers. And then once they arrive at a model, they hand off to a team if the company’s big enough, that turns that into the deployable thing, right? And that team is still special compared to the DevOps teams, okay? So these are folks that are more sophisticated engineers and understand machine learning, they understand the tooling involved and then put the work, and then put that in a format that you can actually go and deploy.
And you’re right that the tools for model deployment today are largely not super-accessible to data scientists per se. They’re more accessible to folks that are machine learning infrastructure – I would say machine learning engineers. And I think this can also change, by the way. I feel like with the right tools, you should be able to get a data scientist to export their model into a well-defined container that contains a model that then you can hand off to an existing DevOps team in IT infrastructure. They should not be specialized to machine learning, right?
So I think that step from a model to the thing that the existing DevOps teams in IT infrastructure could use - that can be automated. I think there’s still a lot of work to make that automated because it involves human engineering today, but fundamentally, it can be a automated. And once you do that, I feel like you make both teams productive, right? Data scientists can focus on making the models, and then the DevOps teams can focus and continue making their applications run well, and integrate models as if it were any other piece of software, and bring best practices to machine learning deployment.
[20:08] It’s funny, when you were talking about that a little bit, I was thinking you’re solving two problems. You’re solving the problem that you’re describing, but you’re also solving the problem that these data and development capabilities and organizations - it sounds like high school. You have the jocks, and you have the nerds, and you have different groups socially that are doing stuff, and you’re going to automate the whole thing and it’s going to bring everybody together, which is a good thing. It’s a good thing in this world to do that.
So I’m curious, as you are bringing everyone together, how does that change the dynamics for organizations working for these different individuals that have different functions right now, and often they’re a little silo and they’re trying to interact? As you get those automations, it sounds like it’ll make it more efficient. How does that look to you? What kind of workflow are you striving for as you achieve this?
Good question. And again, this is all happening and maturing fast, right? So our current view is that with the right automation, again, you could have the folks creating models do the best job they possibly can in creating the models that have the right properties, right? And then on the DevOps side, they can focus and continue deploying software the way they do. And I think that the dynamic is going to change that organizations wouldn’t have to go and look for people that are actually specialists in both… Because that’s the reality today. So this team in between folks that create models and folks that run the regular DevOps - these are people that, again, as I said before, understand machine learning, they understand the tooling around all the guts of, say, TensorFlow and PyTorch, and what are the right libraries to use depending on the hardware target, what are the right compilers to use… For example, should you use Tensor IT for NVIDIA, should you use TVM in case you’re going to have broader options of hardware? And then you have to do performance evaluation to make sure that you’re getting the right performance from your model running on the chosen hardware. Often they have to make decisions about the procurement… So if you’re going to deploy it in the cloud at reasonable scale, chances are that your model is going to have a line item on the budget because it’s going to cost a lot of money to run at scale. So you have to go and understand what are the cost implications.
These are the kind of things that none of these teams are used to. It’s not their strength. So DevOps will understand enough machine learning to do that, and then data scientists don’t understand enough of the systems aspects to do that themselves. So I think that the dynamics that’s going to change is if we automate this right, companies wouldn’t have to go and look for this kind of people that understand this intersection, right? And I hope this is going to make it even more accessible for users to put models into production, because they don’t have to go look for people like that. They don’t have to have the systems expertise, coupled with machine learning to make use of machine learning.
Yeah. This point about automation is a really key one, and I’m trying to find other – you may have some examples outside of the machine learning world, where there’s reasonable parallels, but I was just thinking about something like cyber-security, or something like that. At a certain point, I think if you look at the history of how things developed, you really had to have very specialized people having very specialized knowledge to understand what sorts of vulnerabilities are in my software, cybersecurity-wise. And to some degree, those sorts of people are still very valuable, and they have their place. Then you come along and there’s now systems – I was just looking at one called Snyk, where it’s like, you can just run this automation suite on your software and figure out all the various vulnerabilities and like the open source packages and the dependencies that you’re importing, and where you’re doing something wrong in terms of exposing this or that… That allows any kind of person that can stick that in a kind of DevOps workflow to be able to enable better security for their application. Maybe not prevent everything, but you certainly can do much better. And I wonder - I don’t know if that’s a good parallel for what you’re talking about here, where some of these things are still at that stage where you really have to know about all of these granularities, but as automation kicks in, a lot of those things are going to be taken care of.
[24:13] Yeah, that’s a fantastic analogy. Really fantastic, Daniel. When you were describing it, I was thinking about Snyk exactly, because a lot of the tools like Sneak can plug into your DevOps workflows, and whenever there’s new code committed, they’re going to kick off your static analysis to find vulnerabilities. You’re going to make sure that all the open source code used there has actually been vetted and has checked all of the security properties, right? And that automation, I think, brought a significant progress in producing secure code.
The parallel there is valid. The only thing that I think is so different is that you have very, very different kinds of people. People writing regular code today - they’re subject to the flow… like Snyk would use to go and analyze code or typical software engineers. And then you have some – you may not need a security engineer, but even if you did, a security engineer still kind of thinks like a software developer, except that they know what all the best practices for security are, right?
So the kind of automation that we are talking about in machine learning - I think it’s deeper and different because of the following. So first of all, the difference between somebody who can create models and somebody who can write systems software and can write software you go and deploy - it’s so much wider than what you typically have in between folks that worry and don’t worry about security today.
And then second, the kind of automation that’s needed - it’s still pretty deep, right? So as you’re saying, if you’re going to export your model from a Jupyter Notebook to turn that into a workable piece of software that you can go and deploy requires a lot of manual software engineering that hasn’t been automated yet. So there’s still a lot of work to be done, and I think that kind of automation is not quite as well defined, I would say, and as clean as the automation required to go and analyze code for security vulnerabilities.
As you’re talking about this, I’m trying to visualize the description… And I’d like to ask you, can you give me a concrete example of how this is evolving now that either it would’ve been much harder like a year ago when we spoke last, or maybe even not possible, in the sense of given the constraints that most organizations are dealing with? It could be anything, but what’s a typical use case that you feel that you’re enabling at this point going into it?
Good question. So let me give you one specific example. Suppose that you have a computer vision module in your application today, because you’re going to go and verify whether images don’t have anything inappropriate in them when you upload to a blog interface, let’s say, okay? So the way you do that today, you have to – the way it happened in the past, like let’s say a year ago, is you go and maybe you find a model, you’re going to probably put quick a bit of work on this model to make sure that it’s actually classifying appropriate content correctly. And so this is all done by the data scientists, data engineers, I would say, and machine learning creatives… Let’s use that label now. And then once you arrive at the model, you have two options. You just go and say, “You know what - let’s just use the regular, say, PyTorch or TensorFlow serving mode and just hope that that’s fast enough.” If it’s not fast enough, then you’re probably going to hire a consultant to go and help you. So you’re probably going to go hire folks to do it.
And now I think with tools like TVM and things that we’re building - and then some other folks that are in this space as well - I’d say it essentially enables one to take model and go through not just the default path and package with existing libraries that are not optimized for that, but help you choose, “ Alright, if you’re going to deploy on an Intel processor, what are the right libraries to use? If you’re going to deploy in NVIDIA, should you be using a Tensor IT compiler to generate a more performant version?” and the wrappers around it to go and run that for you and more easily produce a higher-performing output.
[28:03] Even in the last year, that already is getting a lot easier. But even then after that, you still have to get that output and put an interface around it that has just the right API to integrate in your application. And that’s what we firmly believe that we can automate even that, to really just go from your upload, your raw PyTorch or TensorFlow model to the service, and then you get a package ready to be deployed, with the right interface that you define, right?
Yeah. This might be a sort of off-the-wall question, but I know also, Luis, that you do some teaching and lecturing, and other things… And I’m wondering - in a lot of the workshops, even the ones that I’ve taught at various places, a lot of the focus is on the model creation pathway. And I find increasingly that people that I interact with in industry, once they’re in a position, are really not even aware of some of those components that you just talked about, like model optimization, like the different ways of serializing models, the different ways of serving models, like batch inference, or other ways of applying models. And yeah, I’m just wondering if you have any thoughts on - is that true across the way that we’re bringing up the next generation of machine learning practitioners? And are there ways that we can maybe shift the balance a little bit?
That’s another great question. So you’re right that a lot of the way that folks are learning about machine learning today and getting started with it, they’re not thinking about model optimization deployment, because luckily we actually create pretty good tools to arrive at models and test them and make sure that they have the right accuracy, it has the right properties. But chances are that most of those models created wouldn’t actually see deployment. They wouldn’t make it to deployment, right? So it’s good for people to learn, but chances are they wouldn’t be deployable. And honestly, I want it to stay that way, even for the sophisticated users.
So let me talk about the other users. These are folks who have already done this for a while, and now they are improving models in a significant way, because they’re doing things that haven’t been done the way they wanted before, right? Today, people already start worrying about performance too early. They have to look, “Okay, if I make this change in my model, am I going to be able to deploy it? If I could deploy it, should I deploy it on CPUs or GPUs?”, and so on. People should not worry about this. I want human creativity/ingenuity to go into making a better model, and not have to worry about systems aspects too early, because that’s going to constrain the way they see the model. It’s as if you’re designing a new feature for a car, you’re all thinking about all the different ways it’s going to be used too early, and you’re not innovative enough because you get constrained by practicalities before you’re able to unlock your creativity.
If you don’t mind, I’ll just go on a quick side comment… I like where Hugging Face is going, where it makes it very easy for folks to start from an existing model, make modifications from a foundation model, and then specialize on specific use cases. And then you don’t have to worry about some of the systems details yet. When you actually put it into deployment, chances are you going to have to worry about it, because performance is going to come and bite you if you don’t, but at that conceptualization stage and model creation stage, I don’t think people should worry about details of how these models are going to be implemented, right? So I want it to stay that way. But I want it to stay that way and still allow these models to actually be deployed.
So I’d like to follow up… It’s actually kind of combining what you were just addressing with Daniel’s previous question a little bit, because the two of you together really got me thinking about a different way, in that sense… And that is, even now, with all of the model creation and all the detail on that, and yet that’s not an industry – we’re using foundation models, and there’s a lot of transfer learning, and typically, students aren’t doing this off the bat. They’re kind of going through it the hard way to teach the stuff, but then they get into industry and they’re struggling, because they only have a narrow focus of the picture. So as you’re doing this work, and as Daniel brought that point up, I’m wondering - does it make more sense then, by using these kind of tools, to get more of an end-to-end learning process going? …and by providing the right ecosystem and tool as you’re doing, you’re essentially saving them from wasting what creativity they have on the wrong thing, because you’re kind of helping them through that, but at the same time, you’re crucially helping them understand the full end-to-end workflow on how to get it out there. Is that the way of teaching AI going forward, in your view?
Yes. I’m so glad you brought this up, because when I said that they shouldn’t worry about all the practicalities of deployment, I meant that they shouldn’t be constrained, but they should still have an idea of what would it take to take their models into production. And the more automated we make that, the more in the loop of model creation you can put it; for example, like, succeeding where we want to be on taking fresh models and deployable artifacts, evaluated, benchmarked, and so on - if you make that fast enough and productive enough, you can actually put this in the active loop of, as people are developing models, they can just go and try them out, right? But not like the way it is today, where I have to think about, “Is it a CPU or GPU? What kind of GPU?”, and then go and benchmark. If it’s just completely automatic and seamless, you can always get for all the versions of your models as it evolves, you can see how well this model will do in production, and in various scenarios, and just doing model creation.
Think of it as an outer loop of what they call network architecture search. Even outer loop from that. It’s like, for all the candidates models, for what I’m thinking about doing, just give me an idea of how well this way of thinking about the original model would do in production.
If you don’t mind me add one more thing - it’s not related to what you talked about, Chris, but you made me think of another thing that we’ve been observing that I think is interesting.
Even though we’re talking about a model, say model A or model B, and how to take that model to production, the reality is, as we were talking about, models are integral part of applications, so of bigger applications. And it’s not just a single model, by the way; it’s all an ensemble of models.
You have computer vision, you have language, and you have decision threes, and you all combine these ensemble of models that they might talk to each other directly, because there’s a data flow from model A, output of model A could go to input of model B…
But also, even if they don’t, chances are they’re actually in a package, running some machine in some container in the cloud that interact with each other because of performance. These are all system aspects that we have to worry about, because in the end, you’re going to have to package all of these into modules that you can actually deploy, right?
That’s a great point you’re making.
The ensemble of models is something that’s important, right? So yeah…
As you’re talking about these things, and going from this sort of ensemble and systematic thinking, and these things interacting with each other, I’m wondering about some other jargon that has come up on the show in the past, and is, I think, related to some of this automation and workflow sorts of things and that’s this sort of low code/no code type of stuff. There’s increasingly this messaging around “We’re making AI and machine learning systems low-code/no-code” etc. And I have maybe my own sort of opinions on where we’re at on that spectrum and where things could or might go… I was wondering maybe your perspective on that, and like how far can we push the automation aspects and the things that are – where are the machine learning engineers and the DevOps engineers really going to sink their teeth in to the lower-level code, and where are those opportunities really for the low-code and automation pieces that are maybe a little bit hyped at the moment?
[36:19] Yeah. Good points and juicy topics. So I would say that for model creation, I see a path to low-code/no-code, and it’s kind of going that way… At least low-code I can see a path, because there’s ways of you defining the dataset, help partition it, there are tools now to better label the data and create that, and this all has very little to no code. And then when you think about classes of models, you can imagine just being a high-level choice and not a programmatic thing that you write in a piece of code… I see a path there. But now from that to deployment, I have a harder time. I can see some code; I don’t call it low code, because you still have to figure out, “Okay, what is the API–”, again, going back to the thing that we keep repeating every five minutes, like models being part of applications, right? So they still have to define an API that is going to call that model, and then to be part of the rest of the application. So that part I still feel like – again, maybe the rest of code, we succeeded and all of that is already low-code and no-code. I don’t know if that’s going to be the case. But you have to have a very well-defined API for that to work, and that involves some significant amount of code, fundamentally, in my view.
And then there’s some other aspects that when you’re actually taking a model to deployment that you have to think about, that I think are important. For example, is it latency-sensitive, or is it throughput? For example, are you going to care about how long each prediction or each inference takes or are you going to care about, “Oh, if I do overall this bucket of data”, that you’re going to have the right throughput, right? And these are things that – you know, you have a lot deeper systems thinking there that even if it’s low-code, it still requires a lot of deep understanding of what you’re doing that. By that time, you’re going to require specialized people. It doesn’t really matter whether it’s low-code or no-code, right?
I’ve got a follow-up, and that would be - you know, if we go back to addressing kind of the low-code/no-code approach to model creation, or more specifically optimization, there are a lot of kind of known things, and you could actually can train a model to do the optimization along those lines… And we’re seeing that.
Great to see AuotNAS and AutoML works, it’s finally happening, right? Yeah.
Yeah. I’m wondering if – one of the challenges is we’re addressing deployment, and recognizing that people focused for a while on model creation way more than getting the model out there in a usable fashion, and that we’re getting mature about that now, your organization, and there are others out there too that are thinking very deeply about this. Do you think that there’s an opportunity for maybe low-code/no-code approaches once we arrive at more kind of standardization? Right now it’s still very early, it feels like, in terms of different approaches. So there’s a real custom feel to how you’re going to deploy, especially when you combine it with all sorts of edge targets, with edge being a catch-all phrase that could be almost anything. And so there’s a ton of specificity to your target at this point. Do you think over time, as those get categorized and kind of best practices emerge, that there might be more opportunities? Or do you think that’s still going to be a challenge, given some of the API concerns and such that you mentioned in your previous answer?
Yeah, I would say that once – again, once how the model is used in an application is well defined, and the API is settled, and then we’re talking about more the evolution of the model, I can see that being very low to no code, because by that time you’ve already seen a path to deployment, you’ve defined the box where it fits in, and then I can see the model evolving, having very little code for the model updates.
[39:50] And to your point specifically, if you’re going to deploy it in the edge and you have to see how it works on a wide variety of devices, I also see a path to automation, to select - for this model that you created, it’s going to work on 85% of the phones, for example.
I think going where automation is going, having the ability to benchmark it across all sorts of scenarios where the model is going to be deployed and validate that it’s going to work across the set of devices where you care about, I can imagine a feedback loop with the model creator that says, “Okay, for this decision that I’ve made, it’s going to work on these classes of devices.” It’s a great point. It made me visualize that in a clean way. I can see that happening, but again, only after you’ve defined where the model fits in the larger application, right?
I’m relieved to hear that, because I work in an industry where absolutely everything that we target would be on the edge.
Oh, that’s great. Yeah, yeah.
And so if you hadn’t given me some hope right there, I was going to start crying on that. So thank you, I appreciate that.
I always like to step back and think about models being part of a bigger application. That big application was written by someone. I had to put a lot of – by teams, right? I had to think clearly about where is the model going to fit, right? So once that is defined, I can see a lot of rest being automated and being very low to no code.
I think that one way of actually summarizing this… If we nailed how to turn trained models - and by that point, you can do this in low to no code - into this agile performant, reliable pieces of software that you can integrate throughout the application, once we nail that automation, everything’s going to get easier, in my opinion; managing applications, and also creating better models, because then you have separation of concerns the way that I think needs to be done here… Given that machine learning creators are going to think very differently than the hardcore software engineers that are on the other side of the application building. We want to make sure it stays that way, right?
[laughs] Yeah. So maybe as you look at, hopefully, that future, what is your sense of like over the next year - when we have you back on the show, a year from now, what are those things that you would really hope that are maybe enabled, that aren’t at the moment, but are sort of achievable within that kind of timeframe?
Yeah, so I would say, again, being able to get a model that was freshly created by a data scientist, a machine learning creative, without thinking about the systems aspect of deployment, having their models benchmarked, knowing how it runs and know how much it costs, they’re going to click deploy, it’s going to produce a box that the DevOps folks can go and deploy within the application… I think we have line of sight to that, and I hope next year in the show here, we’re going to be talking about all the ways that people are using that.
But now, if you ask me to think about what about five years out, I think what’s interesting to think about here is that - how do you even stop thinking about what’s edge, what’s in the edge, what’s in the cloud, what runs where, and just think about, “Hey, I want to solve this problem that involves machine learning”?
So users are on the edge, computers are in a building somewhere, spread all over the world, right? And I think an application creator - I don’t want to use the word developer; an application creator here should be able to specify, “Here’s what they want to do”, and then the system should automatically figure out, “Okay, so we need these kinds of models, and this one should run in the edge, this one should run in the cloud, this one might run in a cell phone-based station”, and automatically split what should run where. That should be done automatically, because again, that’s an optimization problem that once you define the constraints, you should be able to place the right piece of your model in the right physical place automatically. And if we nailed the automation that we just talked about from a model to the thing that runs in a hardware really well, that part is already also done, right? So now we can go and work on a higher-level problem, like how do you break down your model into the pieces of “What’s in the edge, what’s in the infrastructure, what’s in the cloud?” just the right way.
I also think that that should be possible, because hey, we have machine learning designing chips today. That’s pretty hard, right? So you have machine learning designing much better ways of doing power management in large-scale data centers. These are all low-level things that we used to do by hand, or with heuristics, and we abstracted that away. As this goes up and up the stack, I’m very excited about the future of creating exciting applications without having to worry about all the design constraints that we have to worry about today.
Awesome, yeah. Well, I’m very excited about that future, and as always, it’s a pleasure to talk to you, Luis, and I’m really excited by the things that you and your team at OctoML are doing. Listeners, make sure and check out the show notes for some links to some really great stuff, both for TVM and for OctoML… And yeah, thank you again for taking time, Luis. It was a pleasure.
Thank you, Daniel. Thank you, Chris. Always a pleasure. And again, looking forward to coming back and talking about all the other new stuff the next year. So thank you. You guys are fun.
Our transcripts are open source on GitHub. Improvements are welcome. 💚