Ship It! – Episode #126

Kubernetes is an anti-platform

with Adam Jacob

All Episodes

Adam Jacob remains optimistic about the future for infrastructure and is building new ideas to make it better.

Featuring

Sponsors

SentryCode breaks, fix it faster. Don’t just observe. Take action. Sentry is the only app monitoring platform built for developers that gets to the root cause for every issue. 100,000+ growing teams use sentry to find problems fast. Use the code CHANGELOG when you sign up to get $100 OFF the team plan.

Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.

RetoolThe low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog

Notes & Links

📝 Edit Notes

Chapters

1 00:00 This is Ship It! 00:53
2 00:53 Sponsor: Sentry 01:46
3 02:42 Bring back old Twitter 07:16
4 09:58 Adam's hot take 07:02
5 16:59 Infrastructure is binary 05:07
6 22:16 Sponsor: Fly.io 03:31
7 26:01 You're just holding it wrong 03:51
8 29:51 You need good friends 04:02
9 33:54 Learning from System initiative 04:12
10 38:06 Unexpected difficulties 01:45
11 39:51 Use it how you want to use it 02:26
12 42:17 The scope of System Initiative 06:07
13 48:24 Everyone can do infrastructure 05:41
14 54:16 Sponsor: Retool 04:34
15 58:53 System Initiative bridges the gap 04:41
16 1:03:34 Infrastructure history log? 02:46
17 1:06:20 The adoption cycle 01:46
18 1:08:06 Good for onboarding 02:21
19 1:10:27 Tech debt & migration 01:22
20 1:11:50 Current infrastructure-as-code tools 03:43
21 1:15:33 A different future 01:25
22 1:16:57 Dealing with APIs 05:15
23 1:22:12 The Boogeyman isn't coming 02:46
24 1:24:58 Realize the untapped potential 07:04
25 1:32:02 Go to a start up! 01:37
26 1:33:39 Thanks for joining us! 01:10
27 1:34:48 Outro 00:52

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Hello, and welcome to Ship It. I am your host, Justin Garrison, and with me as always is Autumn Nash. How’s it going, Autumn?

I am almost caffeinated enough to be fun…

Got to kick in at some point.

Getting there.

I’ve had four glasses of water, I’m probably going to have to pee during this episode. It’s cool. Because this episode is probably going to be long, which is awesome, because we have a veteran of Changelog… And I was really just jealous of all of the other shows, getting Adam to come on and talk to them about everything that he’s been doing, and all of his hot takes about stuff. We were even talking about hot takes already, in the pre-show. So Adam, welcome to Ship It. Finally, your first appearance on Ship It.

Woo-hoo! Thank you for having me.

Your seventh Changelog appearance.

When we almost had Adam on, and then he didn’t, I almost cried. I was so sad. I was like “What do you mean?!”

I don’t want to make you cry. That’s terrible.

I was like “We were going to have so much fun.”

I feel like I should get like a jacket. The Saturday Night Live, when you host five times, or whatever…

Oh, yeah. You’ve got to get all the shows. You’ve got to collect all the shows and you get a special shirt.

Yeah, yeah. Like a Pokemon.

You know when you get all the certifications for one cloud provider, and they give you the golden jacket? Adam needs the golden jacket, but with each like show…

Podcast.

Yeah. The like patches…

Did you know you can just buy a gold jacket on Amazon.com?

And I’m really tempted to go and just buy the jacket, and just go to re:Invent and just wear it. Just, I have no certifications, but I was like “I’ve got a gold jacket.”

Yeah, that’s true.

You weren’t even going to tell me that that was possible?

You don’t follow me on Blue Sky, so… It’s fine.

I do follow you on Blue Sky. I’m just never there, okay?

This is my problem as well, Autumn.

I want to love Blue Sky, I love everybody that like does it. I’m just like…

I don’t know, it’s close, but not quite right.

Yeah…

You two are just too attracted to the bug zapper light, and you’re like “I just have to be at this blue orb of a thing, and I’m going to get shocked.”

I don’t even know if I love Twitter anymore. I don’t even know if that’s it. and LinkedIn’s not really that great either. It’s all so like decoupled and spread out that it makes it hard to be excited about any of them at this point.

That is exactly how I feel, Autumn. We had this incredible moment where all these people were having this [unintelligible 00:04:55.05] pile conversation, and you could really see the zeitgeist happen in all these communities you were a part of, peripherally. And now I can’t, because we’re all split up, and Elon Musk’s a dick, and all this crazy *bleep* blew it all apart. And I had this conversation with Bryan Cantrill when it was happening, and I was like “Dude, you’re going to ruin this for everybody. We’re all going to leave, and then it’s going to be a real bummer, and everybody’s gonna be like “Mastodon!” and then they’re going to learn that Mastodon will never scale… And then we’ll all leave Mastodon, and then we’re going to go something else…” And here we are.

Not only will it not scale, but it’s too compartmentalized, that you never end up being able to find – Twitter, you could just find something cool that you’ve never been into, and all of a sudden… You could have your whole ADHD hyperfixation go down a rabbit hole, get other people’s opinions on how cool it was… But now we’re like “Okay, cool, I know I like art, or tech, or whatever, but now I’m not exposed to new things.” I loved learning new things on Twitter, and meeting new, cool people.

Me too.

Algorithms are good for some things, and that is a problem where a lot of people shift – the pendulum swung all the way over, and like “No algorithms anymore. I want to control this.” And I think a lot of the culture moved into TikTok.

Totally.

All the conversation and attention and culture shifted over there, which is all just algorithm and finding that stuff automatically…

But I can’t watch TikTok about shady things while my kids are around… So it like really ruins it.

The barrier to entry and consumption is way too high for something like TikTok.

I’m just like “This takes so –” You know what I mean? Like text-based things… You could be reading the most fire Reddit thread with so much garbage in it and your kids sitting next to you.

And it’s fine.

Yes. Now I’ve got to go find air pods, and I’m going to lose them… And there’s a bunch of 20-year-olds complaining about stuff I don’t care about…

And I’m not actually cool enough…

Yeah, I’m not. Also, it keeps making me buy things. I didn’t know they had a store, and then all of a sudden at 11 o’clock at night I’m buying something I don’t need…

It’s all very upsetting. I’m with you.

Yeah, it’s not my jam. I want old Twitter back. Can we just make that happen?

I was actually saying that – I’m saying this to every venture capitalist who listens. This is an obvious thing to do. And what we need to do is – you just need to raise money, and the business plan is really straightforward. Twitter.

They open-sourced it. Let’s just clone it real quick and then change five things.

But how’s it different? You’re like “It’s not. It’s Twitter.” And they’d be like “Is it decentralized?” You’d be like “No. Not at all.” It’s just the way it worked before.

We don’t want it.

And they’re like “Is there an algorithm?” You’re like “Yeah.” And, uh, do you control it? Yeah. And are there threads? No. Just reply. You just – you know, it gets weird. It’s a mess. It’s awful.

Raise money. Do it. I will come.

Oh man, the threads on Twitter ruined a lot of things.

I know we would. It’s just, restart Twitter, call it – Tweeter, or something like that.

I’ll be one of your founding engineers. The book on Twitter’s algorithm is one of my favorite data-intensive application books. I love it.

Well, I mean, I just saw that X lost 79% of its value. So maybe Elon will sell you the original Twitter domain –

[00:08:05.23] No part of me is the person who should restart Twitter, but I appreciate – but I want someone else to do it very much.

We have faith in you, Adam.

Thank you. Yeah.

Also, if it’s going to tell me I have to follow someone over and over again, at least you have good hot takes, instead of like his “Leave the cloud.”

That’s true.

So I made a startup Twitter for our new startup, and I keep getting these notifications and I’m like “Why am I getting all these weird right-wing Elon Musk notifications?” It’s from my business Twitter that follows nobody. And it’s like just dogfooding me Elon Musk over and over again, and the craziest right-wing stuff… And it’s never liked a tweet. It’s never followed anyone. And it’s just these crazy amounts of notification of the most unhinged stuff you’ve ever seen. And I’m like “Tell me you–”

I just created an Instagram account for the first time last week, to repost some of my videos, and it’s not much better. It’s not much better.

You have an Instagram?! No, no, you have to fix your algorithm.

That’s what my wife said. “You’ve got to fix the algorithm.” It’s like a time investment.

It’s a lot of work.

Oh, no. Instagram is the only thing that hasn’t been completely ruined. They ruined it a little bit, but it’s the only thing that’s like holding on for dear life… And it’s where old people watch TikToks when we know that they’re very [unintelligible 00:09:22.23]

I keep seeing all the TikToks I saw from like a month ago. I’m like “Oh, look [unintelligible 00:09:27.18]

All the best ones.

Exactly, Adam. They’re tried and trued and tested TikToks, okay? They’re the ones that aren’t just made for the teeny boppers who like haven’t experienced life yet. They’re good.

That cat does know how to talk to humans.

Like, it’s got a fire soundtrack and not the same TikTok song over and over again.

It’s all real. Well, there you go. Show over.

At some point we were going to talk about infrastructure, and things…

Yeah. Let’s talk about infrastructure.

Actually, Adam, one of your other episodes – I don’t remember which; it was probably one of the interviews. You had a hot take that I wanted to ask you about, because I think I agree, but you always have a better formulated idea around it… They were talking about configuration management was an absolutely great thing for infrastructure in general from forever ago. And you said that overall, Kubernetes was a negative.

Ooh…

And I want to learn more about that.

I mean, this is a particularly spicy hot take, so thanks for dragging me straight into the jowls of –

I feel like we started with we eased in a little bit [unintelligible 00:10:30.02] millennial talk…

Yeah, you eased me in with Twitter, and then you took me straight here. I think - okay… So let me acknowledge, before I get into my hot take, that lots of people have built lovely things that they are very proud of, and that have solved their problems with Kubernetes. And the community, lots of intelligent people who built that stuff, that I have a lot of respect for - good job. So let’s just get that out of the way. And I think a couple of interesting things happen. So one is we’ve had this move from a long time ago. So I would say for me it really started with like CFEngine in the nineties; and you could probably go back further. If Mark was on here, he’d probably tell you sort of what goes further than that. But that was the beginning to me of this idea of like those declarative semantics, like “We’re just going to declare what we want the infrastructure to be, and then the system is going to converge, and sort of over time, it’s going to figure out how to do it, and it’s going to figure out how to deal with transient errors”, and all this other stuff. It made a lot of sense in the data center, and it still makes sense in a lot of ways. And then as you scale, it gets a lot easier to reason about, because the system is taking care of more of the complexity of figuring out timing. So just this sort of repetitive loop, control loop that happens is part and parcel of that puzzle. And Kubernetes is in some ways a very clear descendant of that idea, but just sort of cranked up to 11, where it’s like “Okay, what if everything became these relatively high-level declarations, and then we were feeding those high-level declarations into this reactive system”, which is basically watches on a key-value store, “and then we wrapped those in controllers that understood how to do the reconciliation loop into the world?” I promise I’m getting to the hot take.

[00:12:32.08] So it turns out that that interface is awkwardly difficult, because on the one hand, it’s like high-level enough that it feels like it’s just what you should interact with. Like there’s a deployment semantic, and it feels like I should just write it. And then it turns out that actually managing those things directly is not fun. And so then everybody changed their tactics, and they were like “Well, Kubernetes isn’t actually a thing you’re supposed to touch.” It’s supposed to be a thing you abstract. Except, when we build these declarative interfaces, what we’re doing is building an interface that’s very hard to abstract. When we abstract things, we’re often doing it because we want to change the behavior of the people who are going to then consume the abstraction, right? We want it to feel different. We’re using it as a primitive. And so when we talk about Kubernetes as a primitive, we’re like “This is a primitive thing.” People talk about it like it’s the kernel, you know? It’s not the kernel, because you can’t actually abstract Kubernetes into a system that feels like anything other than Kubernetes. Because that declarative semantic and all of that logic that lives in those CRDs - it binds so tightly that when you try to abstract it, it just leaks instantaneously. And you can tell. If you use any system that sits on top of Kubernetes, you’re like “Nah… It smells like Kubernetes to me.” And that’s because you can’t effectively build a new system on top of it. It’s just Kubernetes. It’s Kubernetes-ness leaks into the universe.

And so when you think about its success and its huge adoption, and the incredible amount of engineering that’s gone into it, and its pedigree, I think it kind of set back our ability to imagine how the world could be, like a decade… Where a lot of people were just like “Oh, we solved this problem now. It’s just Kubernetes.” I’m like, we haven’t solved that problem; it won’t just be Kubernetes, because you can’t fundamentally build anything new on top of it. The best you can do is make it smell like Kubernetes, only a little easier, I guess, but not really…

I agree, because I keep seeing all these platform engineering teams, like “Go build on top of Kubernetes.” You’re not building on top of Kubernetes, you’re automating Kubernetes. You’re just building new things to automate the little pieces that are already abstracted.

Because there’s no possible way to do it. If you wanted to change the flow – so think about it this way… AWS, for as much as the API is a mess, it has a very imperative API. And if I wanted to change how it works, I can do that, because I can stitch together those API calls in a different order, I can change how they feel… There’s all kinds of tricks that I can pull.

Also, the building blocks are small enough that – you know what I mean?

It’s like Legos, and they give you the small enough Legos that you can make it what you need it to be. That’s the secret sauce. Azure has not figured that out yet, and neither has Google Cloud.

And Kubernetes is not a useful Lego. It can do what it does, and that’s great, and I can program it to do whatever I want… This is the argument I always get into with people who really love it. They’re like “Well, just write a CRD.” And I’m like “Dude.” If the thing I have to do is write my custom controller code in order to change the imperative behavior, so that I can then interact with a system at a distance in a control loop… Like, miss me. Why would anyone do that?

And so I think, as a direction of research for the industry, I think it’s held us back. And I think people are going to catch up to that truth, because we’re going to start building systems that don’t use it, and they’re going to behave better, and you won’t be able to match them. And that’s a huge opportunity in the startup ecosystem right now, that is, I think, really underserved… Because everybody looks at the mass of Kubernetes and goes “How could you possibly compete with Kubernetes?” And I’m like, “You’re not listening to the people who use Kubernetes.” If you listen to them, most people who use Kubernetes, they don’t love Kubernetes. They love what they’ve built.

[00:16:22.12] So if you’re like “Hey, how do you feel about Kubernetes?” They’re like “Well, I’ve built this incredible infrastructure, and it does this thing I need it to do.” And you’re like “Well, how do you feel about Kubernetes?” “Ugh.” And then they unveil their long list of stuff that bothers them. And that means there’s an opportunity to actually build a better thing, that solves the same problem, but actually allows you to solve it in a way that feels better, and that people love, in a way that’s different than it is now. It doesn’t mean Kubernetes is bad, it just means – this is what happens. There’s an evolution of technology, ideas come in, we roll around with them, and we build new ones… And so we’re ready for something new there. I just don’t know what it’ll be.

You know what’s funny? We were at Scale, and it was like, you guys have all these talks about Kubernetes, and none of them made me want to touch Kubernetes… But you know what’s funny? Engineers love solving problems and making things easier, and I think the reason why people love what they’ve built on top of Kubernetes is because it shows how much you can automate and make painless, but that’s because there was pain to begin with.

I have the similar conversation when I talk to people about the cloud. “Hey, do you love AWS?” “No, but look at the thing I’ve built. Look at the thing it enabled me to do.” And mostly because the thing they either were used to was an on-prem email system that was like “Oh, I need a VM. I need a disk. I need whatever. Wait six months.”

Yeah. Give me a ticket.

Yeah. Or it wasn’t possible for them before, because they had no access to any of that technology resources. Like, I need a message queue. That’s going to take me a year just to create one. So the cloud, in a lot of ways, I could build these things on it, but also, again, it’s still kind of a mess.

You know how we make this all full circle with our Twitter conversation? Okay, hold on… It’s going to be a ride.

Hit it.

You know how we always talk about how people make things – either developers want things to be simple and abstracted, so you can get stuff done, because sometimes you just want to do stuff, right? Or we get a little crazy and we’re like “We want to pick every single, just, detail…” But sometimes, when you’re either starting something new or you want to build fast, that takes forever, and it’s not really useful. So you always have the – I don’t know, what were we talking about last week? There’s always some sort of software, like – oh, the person that was making Linux, but he was making a version of Linux that was simpler, so you could use it, and adapt it, and it’s more abstracted. But then you have people who really like to make all those, Linux, or whatever. That’s Mastodon and Twitter.

But people either go too far with wanting to be so specific about every little thing, when you’re like “Dude, sometimes I just want to s**t-post, or just build something really quick… But they go back and forth of like “How do we find that medium of like enough control, and enough ownership, but not making it painful”?

Yeah. I mean, I think I see a lot in that vein. We just GA-ed System Initiative.

I was so excited for you.

Thank you.

It’s such a rad idea.

I was very excited for me, too. And I’m excited for everybody else, because they get to use it. But in that moment, I knew what the feedback would be, which is there’s a segment of people who see it and they’re just instantaneously like “Yes.” They understand the primitive, they understand how it’s going to work when you think about programming in the large, they understand how it will probably evolve, and they’re in.

And then there’s another set of people who look at it and they’re just like “This is hot garbage. It’ill never work.” And then there’s some people who are like “Meg, maybe. I’ll check back in six months or so.”

That is so true.

And that happens in every technology… But I don’t think we encourage people enough to get a little weird. Because often, if you try to get weird – so if said to someone “Hey, I’m going to go build a thing that’s going to compete with Kubernetes”, 99% of the world would be like “That’s dumb.” And they would have 100 reasons, that are all true, about why that’s dumb. They’re like “Oh, you’re gonna compete with Google, and Microsoft, and Amazon, and Samsung, and all these people all at once. Red Hat… You’re gonna compete with all of them?” And you’re like “Well, yeah, that’s dumb, when you put it that way.”

[00:20:16.22] But at some point, that’s how you got Amazon, Google and Microsoft…

But at some point, that’s how that happened… And you have to be willing to let people get a little weird. And I think as a community, especially as ops people, because what we tend to love are the – it works or it doesn’t; infrastructure is a little binary. And there are ways to build infrastructure that are objectively worse than other ways. And so we’re more prone to that, I think, than most, and I think - yeah, we need to encourage people to get a little weirder, and to build some new stuff.

And to what you said at the beginning of this conversation - the people that built Kubernetes and did all this fantastic work, don’t tie your identity to a technology. That’s not who you are.

Everybody does that, though. That is the engineering culture. For one, people will be like “I used this, and I worked so hard to use this, I’m never learning anything else.” Or they want a new flavor of something every week, and I’m like “I’m not doing that every week.” Sometimes using something old works, but you shouldn’t be married to it. It should never be “I did it this one way, and we’re never doing it differently.” So it’s like some weird in the middle of trying to find…

I think there’s a version of that where like – when I think about myself, there’s a huge number of technologies that I feel like are part of my identity, and some of them I no longer use, like Pearl and Ruby. I’m forever indebted to those communities. They’re a part of me, absolutely, and who I am as a person. They influenced what I believe, how I think… All this. I’m deep. And I haven’t written a lot of Perl in a very long time. I haven’t written any Ruby in a long time. I think I wrote Perl actually more recently than Ruby, which is hilarious…

How did you make that happen?

I think I had a refactoring to do inside the codebase, and I used Perl to do it… Because I knew what to do, so I was just like “Perl’s good at that.”

Break: [00:22:10.23]

How much of the Kubernetes – I think that the simplicity of write a CRD kind of gives people a disillusion of “Oh, this will be easy.”

Yeah, nothing is simple about writing a CRD.

I’ve said for a very long time that people use deployments because they’re good enough. It’s a thing that’s like “Oh, I could just use this abstraction, and I don’t need something else on top of it, because this is better than what I had before.” And once you have one of those components of Kubernetes that you’re using and you’re just shipping it, you’re not going to write another one. You’re just like “Well, I’m just going to automate all the other pieces that I don’t need in here.

In a weird way, I think sometimes good enough makes sense. But then sometimes it’s like, you know…

I’ve been annoyed by the deployment thing forever. I had this conversation with Kelsey years ago, where I was like “This is not a good enough – it’s close to being a good abstraction for deployments, but not enough.” And he was like “Well, it should never get better, because we never should have written it in the first place, because it’s not primitive enough.” And I’m like “Maybe that’s true, man. And also, you did give it to me…” And then there’s a bunch of other stuff that I dislike. Think about like communities you’re a part of… I have a visceral distaste for communities whose starting position is “We’re the smartest people who’ve ever lived, and we’re bringing down to you the great wisdom that you have never seen before.” That just – I just dislike it instantaneously. Like, the part of me that loves going to like punk/rock shows, and likes hardcore - it’s just like “No.” I’m upset about it right now. My shoulders get higher… It bothers me deeply.

The last interview I had at Google 8 or 10 years ago, I was like “I will never go back there”, because I felt that way as soon as – they’re like “You have been blessed with coming to our [unintelligible 00:27:47.14]

Right? To the sanctum…

I think that’s a lot of big tech companies now.

And we’ve like brought down to you the good time – that really drives me up the wall. And one of the ways that that gets expressed is that when something’s not right, and you say it’s not right, the response tends to be “Well, you’re just not holding it correctly. If you understood the deep truths we understand, then you would understand that this thing you’re saying is wrong.” And I get this a lot when I talk about this abstraction thing in Kubernetes. I’m like “Have you ever tried to build on top of this thing? It’s a nightmare.” And then people are like “Well, you just don’t understand how it works.” And I’m like “No, I get it. I understand how it works. I’m telling you, it’s a nightmare.” And they’re like “But not for me.” And I think it’s because they’ve figured out – I love the video game Dark Souls. I use this analogy a lot, and I’m sorry I’m repeating it again, kind of… But Dark Souls is a video game that’s very hard. And when I took time off between Chef and System Initiative, one of the things I wanted to do was learn to play Dark Souls. And I did. It took months. And when I finished Dark Souls, I felt as good about myself as I felt like when my daughter was born. [laughter] I was like “Yes…!” I had achieved mastery. And I love that video game. And I think people love their infrastructure, and love the technology like they love Dark Souls. They’re like “No, you don’t understand. This was hard. I figured out some junk.” And then they bring it back. And I just feel like Kubernetes is a little infused with that ethos, with that idea that “No, we’ve already figured all this out. This is the way it’s supposed to work. If you don’t think it works this way, then you’re a dummy, and holding it wrong.” It took me months – I patched the Kubernetes documentation because secrets aren’t secret. It drives me insane. But they’re not. And all I wanted to do was put a paragraph in the documentation that was like “These are not secret.” And it was like a long GitHub conversation where people were like “Wow, do we really have to say that?” And I’m like “Yeah, you really do. You really do.”

But why would you argue about documentation making it easier? That’s the point where you’ve gotten so with your head up your own butt that you’re not learning anymore. You’re not making it better if you can’t take any feedback.

I think that’s real. And it’s a thing I worry about with myself all the time. We were joking before everybody started listening to us that I’m on podcasts all the time, and I have hot takes, and opinions… And I worry constantly about being a person who falls into that trap. I don’t want to be that person.

[00:30:21.12] You have to have good friends, that will shade you every now and then. My friends will like tell you about your whole life and you’re just like “Well, I love you… But dang, did you –”

Yeah, those are all of my best friends.

You have to. If you don’t have people who will be like “That looks horrible on you” and never say that in public again, they’re not real friends.

Right. Metallica broke my heart the last time I saw them, because all I could think of was that James Hetfield no longer has real friends… Because he –

I love you so much.

It was like two days back to back, right? And they were like “It’s gonna be great. No repeat weekend”, or whatever. So we’re in the [unintelligible 00:30:54.08] in LA, 100,000 people. Almost everyone who is there was there both days. This *bleep* comes out and says the exact same sentence, to greet you and tell you what’s happening… And then the exact – and then they just kick into another song. You know, like, his banter didn’t change between day one and day two. And this was not the first day of the tour. This was like day 20 or whatever, like night 20 of the run. So if anyone loved him, they’d have been like “Dude, you’ve gotta say something other than what you said yesterday. It’s not a big reach, just a little.” But the thing is, you can’t say that to s**t to James Hetfield. He’s James Hetfield. And if you did, he’d be like “They seemed to like it”, because they did. You know, he was like [unintelligible 00:31:43.11]

It’s funny that I’ve also been saying that for like 30 years…

Yeah, and he’s probably been saying it for 30 years… So I think part of why James perhaps has no one who loves him is because James can win the argument just by being like “Well, but I’m James Hetfield, and you are an asshole.”

Oh, you come with sound effects, and everything…

And I worry about that all the time, about technology and culture… We do that all the time. And we put people up on pedestals, we listen to what they say, and it becomes easy to play yourself on TV. And I feel like especially in technology - I’m not sure we do anything that’s worthy of playing yourself on TV for. I get it when you’re Hetfield. But should Kelsey Hightower get to be James Hetfield? I mean, maybe. I love Kelsey.

See, Kelsey keeps it real, though.

He does.

Kelsey also would be like “Don’t use Kubernetes.”

But let’s be honest, so did Hetfield. For like a really long time. I’m not saying Kelsey isn’t keeping it real. So oh my God, how much am I not shading Kelsey Hightower right now…? I like Kelsey Hightower, we’re friends. I’m not shading Kelsey.

But think about Mark Zuckerberg. Zuck has had an interesting arc, where he’s gone from scrappy, and giving hot takes, to like robot Zuck, that nobody wanted to hear, to now he’s interesting Zuck, because he’s like fully embraced his Zuckness again, and he’s come all the way around. And I think that’s because Zuck has friends.

No, I really think that’s true, though. If you really want to hit like heights of success, especially when you start to get successful, you have to be able to be introspective and have friends that will be like “You’re kind of being a dick right now.” Look at Elon Musk. Nobody tells him no.

Right. How many true friends does that guy have left? And I feel like the answer is not very many. You know?

Dude. I don’t know, sometimes I wonder back and I’m almost like “Was he ever was he ever smart, or do we just think he was smart?”

I think he’s smart.

But I feel like he probably was, and then people just kept telling him yes, even when his ideas were absolute s**t.

I mean, I think he still is. I think he’s still smart. But it doesn’t mean he’s not awful.

That’s true.

[00:33:54.13] So shifting gears a little bit… We’re staying on Twitter this entire time. It’s great. System Initiative launched. You GA-ed the actual service… And if no one that’s listening to this podcast has not heard of System Initiative already, I don’t know where you are. Because it’s been going around the news, not only when it was open-sourced and kind of out there, as like “Oh, this is the thing that’s different”, and people were already talking about it… Now it’s like “A service. Just go consume it if you want to. Go play around with it.” What did you learn in that – was it like six months or so, eight months, that it was like –

A year…

A year, from like “This is a thing that people can come and try, and now we have a service that you can consume and actually get your hands on and use.”

Yeah, that’s a good question. I mean, the big one is that I don’t want – our ambition was not to like make something that’s like a little bit better. I’ve had enough time in my career and enough success that I was given the leeway to truly try to build something that I thought would be incredible.

And so when you decide to do that, one hard part is that it’s hard to build something incredible without having just a huge pile of challenges in the way. The first version of System Initiative looked a lot like how people think AI agents and infrastructure will work. We built a system that was just basically “How could I build you a working infrastructure from the smallest declaration?” So you could say “I want to run a PHP app”, and it would just figure out how to do that for you, and then show you what it was, and then you could go do it. And that’s how this idea of high-fidelity models or whatever got into System Initiative.

And it turned out that interface was awful, because you had to figure out what the right incantation of constraints was that got you to the thing you really wanted… But the models stuck around, and so that became the foundation of some of those things.

A lot of what we learned was just how high the bar actually is. That in order to actually be useful, and in order to be powerful enough that you could really use it to solve real complex problems, that nobody had ever seen before, it just has to do a lot. The bar is high. So you hear a lot about wanting to launch products when they’re early, and messy, and not quite right… And that’s all true, and I believe that very much, that there’s people who will understand what it is that you are building, and they will love it, and if you find those people early, they’ll help you grow, and make the product better… That’s all really true. But the number one thing we learned in that year was just how high the bar actually was, to be like “Well, you’ve got to be able to customize everything. And you’ve got to be able to see those customizations happen sort of in real time. And then you got to be able to share that back with the community. And then we’ve got to be able to vet that. And then you’ve got to be able to update it inside your workspace. And then you’ve got to be able to create all the policy that you need, and build new stuff.” And all these functions run in these Firecracker containers, and this huge distributed system… And so how do you make that elegant enough that if you need to install a new tool or a new library, that’s easy enough to do?

And then once you know that that’s enough, then you have to like polish that experience, so that people can come and use it, and figure out how to onboard, figure out how to – you know, we need to write documentation, so that you can actually go read about how to work the system, and how it functions… And then you’ve got to watch all that information… So, I mean, we learned so much. But the big thing you learned in that year was just how much polish you really did need to put into the experience in order to earnestly say that yeah, you could use this today to build production infrastructure, and you should.

And you’ll find things that aren’t ready, or things that aren’t there, but that’s the game of building new technology together in public, in big open source communities… But you’ve got to be ready for that, too. So that’s a lot of what happened.

I think a year ago we knew what the shape of System Initiative was, and I knew we had done enough work and done enough user studies and enough of those sorts of things to know that the basics were there… But that refinement over the course of the last year of “Okay, but is it actually good enough? How much further does it need to go?” It was a lot.

What was the hardest part, that you weren’t maybe expecting to be the hardest part? …because you have had a really long career; you’ve done a lot of things. So what really jumped at you that you thought “I wasn’t expecting this part to be so hard”?

[00:38:19.29] I mean, I didn’t expect the whole thing. So the idea that what would happen - it was better to express the infrastructure through like a living architecture diagram… I was convinced that was a dumb idea. It turned out that it’s a great idea, and it works way better… But whoa. Because the history of that design is not good in our space. It just isn’t. And you have to admit it, but it turns out that’s because the primitive it was sitting on top of wasn’t right, and wasn’t integrated. It wasn’t because we shouldn’t have nice things, it was because they can’t be toys. But then once you realize that it can’t be a toy, the amount of technical complexity that cascades from that truth is like crazy. We had to build a custom database, that has custom, tiered storage. It’s all in memory, and then it flushes to disk, and then it flushes to the actual database on the other side, and it gossips the traffic around, so that when you make a change, that workspace snapshot actually gets like gossiped around to other active servers in the cluster, so that if you hit it, it’s already in memory before you make a request… Like, there’s all kinds of crazy stuff we had to do to actually just get to a spot where the system could be programmable and reactive in this way that it needed to be. And we didn’t see any of that coming. We were like “Wouldn’t it be cool if it worked like this?” And then you just keep going at it and being like “Yeah, well, that is cool, and it kind of works, but it’s not powerful enough yet.” And then you just keep digging, like some kind of weird badger.

Yeah. I ran System Initiative locally to play around with in the dev environment, like “Hey, this is how it works”, and I was like “This is a really complex system”, and it reminded me a lot of creating my first Kubernetes cluster, where I’m like “There’s a lot of components here that are moving, that need to be set up just right.” And at some point, that complexity led to “I should just pay a service to do this thing.”

Totally.

And it seems like it’s going down that route. How much of that is like do you expect people to operate and do System Initiative well, and how much of it is just like “I don’t want people to ever touch the side of it. They should just consume the frontend”?

I mean, I want people to use it however they want to use it. I believe fundamentally, that because I think it’s really a foundational technology, it’s a new primitive that we should use to build all kinds of new interesting stuff on, that no one’s ever had before, I just can’t limit the ways that people want to do that. So if you’re going to do it to compete with me, fine. If you want to run it on-prem, let’s figure that out. My job - I’m building a business that sells System Initiative for money, so I’m definitely going to have to run it on prem. I’m going to sell it to governments, I’m gonna sell it to huge businesses. They’re going to be like “This has to run inside my firewall. It has to run inside this air gap environment.” There’s all kinds of stuff that it’s going to have to be able to do. And we could do it today if a customer needed it badly enough, or wanted it badly enough, and we’re willing to pay for it.

I think over time, all of those different deployment scenarios will become pieces of the product that are easy enough to do. There are a lot of moving pieces, but that doesn’t mean – we kind of know how to build installers. We kind of know how to think about building those sorts of systems. But I think you need to see – you’ll see the community start to show up in some of those ways. Right now, those questions are all about like “Well, how’s my business going to adapt to should you use SaaS, or should you BYOC, or should you do this or should you do that?” Over time, as people use this and learn what it is and really see its power, they’re going to come and use it in ways that I don’t predict, and we’ll learn from each other how to do that. And I’m stoked to see it.

But yeah, I think it is a complex system, it is kind of great to run a SaaS to do it, because it’s a lot better to just sign up, fill in a profile, and then you’re in a workspace and you can start automating stuff… That’s better than having to check out the source code, compile it… Much less think about how you’re going to run it in production, because it is a complicated system.

[00:42:16.03] And what’s your ideal – I came to the livestream for how you’re running System Initiative on top of ECS… And one of the things that I was curious about and seeing that was like seeing the diagram of all the components. It’s complex. There’s a big diagram of a lot of stuff that’s all connected together in some way, shape or form. But there’s got to be a threshold there of like running a couple EC2 instances or an ASG with a load balancer. I probably don’t need System Initiative. Doing something that is fully System Initiative, everything, every component of it, to run it as a SaaS - it seems complex to do in one main diagram. And when I look back and think of the enterprise environments I’ve been in… I couldn’t imagine all of Disney Plus –

In a single diagram? No way.

It’d be too mnuch.

It’d be bananas.

So where do you put the walls on [unintelligible 00:43:04.21]

Yeah, great question. I think one piece of it is – let’s take the part about “If it’s small, maybe you don’t need it.” If you look at what it’s like to use System Initiative, to interact with AWS, versus going to the AWS Console to do it… Like, come on. You should just come to System Initiative…

I don’t know if that’s a fair –

…you should plug it in, and you should just build whatever the little thing is you want. You’ll be way into the free tier. You can have 100 resources for free. And we don’t count resources like TerraForm does. It’s not like a block. They’re actual, real things that you do in AWS. So you can run some real infrastructure here without paying me a dime. And you should, and it’s better, and that’s great.

I mean, people have decided that Kubernetes is a better interface than AWS, so we know it’s bad.

That’s what I’m talking about. Yeah, people are like “It’s gonna be great.” Anyway. So then the flip of that is – I think that question of like… You know, we call it the “staring at the sun” problem internally… Which is - it’s kind of amazing that you can see everything. It’s also incredibly overwhelming, and not necessarily useful. There’s a piece of my brain that loves it, because it’s like, it is so cool that you can see it, and I’ve never been able to see it before… And so that I can see it that way is kind of incredible.

What we’re building now is essentially a system of views. If you think of all of the infrastructure that’s in like a workspace, so all the stuff you’re managing, and then you can create a new view, and then decide which things are gonna be seen inside that view. So you might be like an application view, or you might build a database view, or you might build those kinds of things… And then you can link those views together. So each view that you create is itself an asset you can reference. So imagine pulling one of them into that diagram, and being like “Hey, from the application view, click this one and it’ll take you to the database layer that you’re attached to. And click this one and it’ll take you to this other perspective.”

And if you wanted to have that big, global view of everything all at once, you could. You would just put all the components in it, and arrange it correctly, and now you have a global view that is like the one that we showed you in that livestream. But I think that kind of composition - that’s coming soon. Because it’s an obvious thing to do, I think.

But what has not been obvious is how do you build it. So if you look over the course of the last couple of years, initially we started out with the idea that there was like an architecture layer, and then like a component layer, and you would click through to one or the other. So actually like two diagrams stacked on top of each other. And when we put that in front of users, it was a disaster show. No one understood what goes on in the architecture layer, what doesn’t… Because you imagine that it would be like, “Well, I should be able to dive down into something.” And you’re like “Well, but what’s inside of it?”

But if you did it on like resources and teams levels… You know how usually you have teams for data, or platform… You know what I mean? Because I think that’s actually really – like, yes, that’s overwhelming to see it all, but as someone who’s been a solutions architect and an engineer, that’s actually really interesting, that you could go big, because sometimes people have a really hard time explaining their problems and their architecture… And they don’t keep it updated… So that would actually be really helpful, trying to help people.

[00:46:20.04] And it’s kept up to date just by doing the work.

Exactly. That’s what I’m saying.

That’s the goal. And it’s how it works now. But that’s why we wound up with views, right? Because you couldn’t build a feature that was like “Well, there’s these five teams.” And so those are five different views of your data. If you do that, you’re gonna go to a giant bank and they’re gonna be like “We have 500 teams. How you like me now?” So instead, you have to build the primitive.

But yeah, you’re right. That’s exactly what it is. So that view primitive, you can use it to structure it that way, to slice it in the way that the team needs it sliced… And it’s that game of discovering primitives; that’s the product development game.

Maybe not teams, but sections. You know how they have it – I don’t know how to explain it, but different layers of the way that [unintelligible 00:47:04.24]

Yeah, that’s exactly it. That’s where it’s heading… And in a minute there’ll be a YouTube video that’s us reading out the opportunity canvas and the story map for how a feature gets built… In days. Early next week, probably.

That’s so cool.

Yeah. And I feel like a lot of these – if views are customizable and I can define them, at some point you’re just gonna be able to overlay the org chart right on top of it and it’s the same thing.

That’s what I’m saying.

These are mirrors.

Yeah. And if you wanna blow your mind, now think about you have all that data, and it’s all on this big, reactive, programmable graph… Now overlay security policy, overlay financial calculations, overlay compliance obligations… And then start building completely different interfaces for those people to interact with that information. So like this is the right interface for infrastructure engineering… What’s the right one for application deployment? What’s the right one for security? What’s the right one for finance? And you start building all these different, full interfaces over the top of the data. And at that point it really does look like Unity, where sometimes what you’re doing is 3D modeling, sometimes what you’re doing is core engine programming, sometimes what you’re doing is this. And different people are all collaborating on the same codebase, on the same live assets, all at the same time, in order to generate this big, complicated enterprise. And that’s the dream.

Now, I’m not bringing up AI too soon in this conversation, but I think that the Copilots and the things that are writing code have made it more simple to get into writing functional code; whether it’s good or bad, people will write functional code, and they’ll ship it, and like “This thing works.” And that’s lowering the value of a developer that used to do those sorts of things. I used to be the one that took these tickets, and I wrote that code, and I didn’t write any tests, and I shipped it into production. And in this case, I feel like with System Initiative you’re lowering the value or the perceived work that an infrastructure engineer used to have to figure out to do these complex things, of like “That load balancer connects to this thing, these ASGs scale based on these metrics”, all that stuff… And it makes everyone able to jump into the infrastructure game and say like “Oh, here’s my change set. I can now be the infrastructure engineer.”

I mean, kinda… Except that the truth is the domain remains impenetrable to outsiders. So can you teach someone the basics? Sure. But do you know why you would choose one load balancer algorithm over another? Like, I know, because I had to do that, and I was in the era where we invented a bunch of load balancing algorithms thinking they would be better than round robin, and then we just caused a bunch of interesting failure conditions, right…?

Yeah. And two tier round robin just wins…

[00:49:49.23] Yeah, over and over and over and over, right? Forever. And hundreds of millions of dollars invested in trying to invent a better algorithm, and we can’t. So the truth is that the domain is complex. And the other piece here is that what actually happens, at least in technology, but I think history bears this out… If the demand for something is effectively uncapped, if we haven’t found the ceiling of its demand, then creating more efficiency does not actually cause us to have fewer jobs, cause people to be valued less… The opposite happens. So when you think about the demand for infrastructure, compute infrastructure, it is effectively unbounded. We have no idea what the top end of the human appetite for infrastructure technology is.

So if we make it better to do, if we make it more efficient to work with it, if we can unlock more of that potential, it’s not like what happens is now people aren’t in the infrastructure game anymore. Instead, what happens is we get to build even more of it. We get to specialize even more. We get to see even more what we can build in new and interesting ways, because the demand side is still so high. This is why software developers - there’s a story around AI that’s like “Ah, software developers are going to be dead. We’ll just ask the agents to do it, and it’s all going to work out.” This is why that’s a silly point of view, because - call me when the demand for software development has peaked.

They keep telling everybody it has, and I’m like, “Okay, sure.”

No, just the pay.

Maybe the pay. But even the pay won’t, because what’ll happen is it’ll just shift even further into interesting new realms. That’s what happens, you know? And I think with AI - one of the things that’s interesting about System Initiative and AI is that, you know, AI works best when it’s augmenting the expertise that you already have. And maybe we’ll get to a spot where it can do that without our expertise, but today it doesn’t, and it’s a pretty far future bet to say that it will. We need new technology innovation to get beyond that, for now.

And I think when you think about System Initiative, one thing I like about it is that when you imagine how those agents write code, it’s less compelling than if an agent was participating like a player in the game. And so if you think about this multiplayer interface that is System Initiative, and these high-fidelity models, it makes perfect sense that you might ask an LLM what it thinks it should do, and then have that LLM put them out into the diagram for you, and have you interact with that thing like a player. And then, you know, vet its choices, see immediately whether or not its choices would work or not, because the simulator tells you so, and tweak the things that don’t make sense and move on. That loop becomes very compelling.

And when you think about infrastructure as code, it is sort of bounded by that idea, that like, well, what it’s going to do is write code for you. But I still have to put it in a pipeline, I still have to figure out if it’s right… And the more complexity that gets there, the harder it becomes. And so I’m a believer in the capability of AI to really transform those user experiences, but I think when we look at it through the lens of like “How’s it going to impact infrastructure?” and “The way it’s going to impact it is through code”, I’m like, “Hmm… Is it?” Because that domain is still brutal. It’s still really – the feedback loops are still real long, you know?

Yeah. Infrastructure as code especially, as much as it allowed developers to start managing code, it made it harder to get into infrastructure.

Infrastructure became – a single player, no one else could do this, now, unless you also write code.

Yeah. And if you’ve ever been the Terraform or the Pulumi person who then tried to get your other developers to write some Terraform and Pulumi with you, who were like application developers, this was not fun.

The same thing happened with CFEngine and Puppet.

Totally. We learned it was Chef too, right? Chef was great at this. Everybody did this. They were like, “Now all the developers - they’re all developers, they can write the Chef code.” And like, “No, they can’t.” They don’t want to. It’s not fun, it’s not better… And so it turned out the way Facebook did it was right, which is they handed people a small number of Facebook-specific abstractions in Chef, and people drove those, and they were thrilled about it, and it was fine. And then there was a team of 20 people that maintained those abstractions across all of Facebook, and it was cool, you know?

Break: [00:54:05.20]

You know how usually engineers are this - I mean, not smaller picture, but you’re like actually doing the thing, right? And then you have a bunch of stakeholders, and product, and SAs and other people that are doing the big picture. System Initiative would be a great way to bridge the gap to help everybody stay on the same page. I think that would be – it would help to lower so many misunderstandings and to show the value and help [unintelligible 00:59:18.04] total costs. But I also think – like, this is also like managed database, right? We were like “Oh, it’s going to kill DBAs, and it’s going to be horrible.” But it evolved DBAs into being like specialists, architects, or like consultants, or whatever. And you still need those people who know how to do it, because people just think they can throw stuff into a database and it never works that way, right? So I think this is going to evolve the way that engineers work, that DevOps engineers work, and it’s going to be different…

Plus, this is perfect, because if you just let AI have the keys to the candy store, you don’t know what they broke or how to fix it. But if you have all these different sections – you know when ‘d you play like Ninja Turtles back in the day, and you’re playing, and then the computer’s playing, and you’re like going together? You could have, like you said, AI in a small space, that now you have System Initiative to use it as like the checks and balances, and to see what it’s doing… And I think that way is the way that engineers and AI work together.

Yeah. I think so, too.

With extreme checks and balances, but you’re using it to be like effective, and to like work faster, but you’re also like… It helps you to still know where the AI is, what’s it doing… It’s like when you write smaller chunks of code and you have like tests to make sure “Okay, I broke it here” and when you put a bunch of print statements, you’re like “Okay, this is where the bug is.” You know what I mean?

I do. Think about that with the conversation we just had about views, and now imagine that view constrains the context that you’re telling the AI to work in, as an example.

Not just that, but say that you – okay, so say you’re using a managed database or product from a cloud company, just anything, and you go to your solution architect and you’re like – sometimes a whole architect and data team will be like “I can’t figure this out”, and then you’re coming to somebody. But we’re “Well, we need more information to help you. If you can show us your whole –” You know what I mean? Like, you’re going to make it easier for people to work together.

And then say if you’re an engineer manager, in product, in SAs, you’re in a meeting and you’re trying to show something to your stakeholders - to be able to have that, to show them, and then do demos… This could really enable people to be able to communicate and to collaborate better.

Look, your words to God’s ears. Like, that’s the pitch. You’re ready to go. Let’s take you to the enterprise. I mean, that’s it, though. Like, that’s why we think it’s such a big deal… Because you don’t ever get to skip a step. System Initiative today - you can use it to replace production infrastructure, but it needs more assets. You’ve got to be willing to dig in a little… You know? You’ve still gotta have a little elbow grease. But the foundation it’s starting from is so good. We’re really confident that - yeah, people are going to show up, they’re going to put that elbow grease in there, we’re going to figure out what these things are, and it’s going to compound really quickly.

This also goes back to that thing Kelsey was saying, about how engineers are going to be forced to be more bigger picture, and be better communicators. And I’ve found – somebody retweeted it, and it was basically saying we’re going to replace engineers with AI, but we’ll only have engineers describing the problem and giving the computer instructions. And then somebody was like “Did you really say that? Because that’s what they do now.”

[01:02:30.12] That’s what engineers do now.

It was so funny. It was like trying to reinvent the wheel. Like, that’s literally what we do now. And then you have like the higher program languages that are – it’s like, we’ve been doing this. When you do Python instead of Bash, or like you’re not using C++ and you’re using like Java, because it’s a different abstraction… Like, it’s just funny. Like, you’re not getting rid of engineers. We’re just changing the evolution again. And this could like really –

Well, and this was always the game. DevOps happened because we learned that in order to build really great things on the internet, you had to be really closer together between engineering and operations. You couldn’t run it the way that we were running enterprises…

And SREs…

…in that time. And those things all evolved because of that recognition, that in order to have great things, we had to come closer together to get them. And then the tooling we built to try to help other people come closer together actually drove us further apart. And so if we really want to fix it, if we really want to make it better, we have to bring people back together again, in the workflow. They have to actually work together again. They have to actually come together to do that job.

That collaboration piece I think is huge, because the way that I’ve learned, throughout my career, is just like working next to someone. And back at the data center I learned so much about resiliency. It was like “Yeah, you connect that cable to the switch over there, because if this one fails, and that power supply goes away–” And like being able to “Oh, that makes sense”, because we can figure out why and how these things are going to fail.

And infrastructure as code shifted me to an individual player game, where I just did asynchronous reviews and no one actually reviewed or ran my infrastructure as code. They’re just like “Yeah, it looks good to me. You’re past the lint, so we’re good.” But the feedback loop for learning how to do it was so much harder, because I had to copy and paste that other Jenkins file, to then go do it over here, because I don’t know how to start, I don’t want to write groovy. I don’t want to write Groovy. This is just a necessary evil.

One of the things that – the last time I was excited about an infrastructure diagramming tool was probably eight years ago. I don’t even remember what the product was called, but I remember reading their white paper… And they had this notion of being able to see the history of how the infrastructure has evolved, and being able to go backwards in time and say “What has changed here, and why did it change, or who made the change?” Do you see that as a view capability inside of System Initiative? Because you have these change sets, and you have this way to diff this stuff already… Could I zoom back to six months ago and see what it was like?

Yeah, it’s actually better than that, in some ways, technology-wise. So what’s happening here is we have these huge snapshots, and they’re immutable. So every change you make, you’re actually generating a new immutable image of the entire thing. So in your head, if you imagine that what we’re doing is taking a snapshot of the database every time, that’s basically what we’re doing.

So we can absolutely think about, “Well, what’s the retention? How many versions of that snapshot should we keep, and how long?” And then those snapshots - you can do deltas. You can be like “Tell me what changed between six months ago and today.” And all of that is very computable. So we have all the data to do that.

Right now - you can’t do it today, but the data is there to do it. And it’s built into a design that you will be able to. I think now we’re a little more focused on like seeing the deltas between “Hey, I’ve made this change, then someone else came in and applied something to production, and now that impacts my change.” And you see that all happen in real time in System Initiative now; it’s like an auto rebasing branch, basically, which is cool… But we can make that even cooler by showing you sort of like what the actual impact is on your change.

[01:06:04.03] So we can be like “Hey, you changed this component, then somebody else changed it in production. Here’s how that changes what you changed.” And you can sort of see that. But yeah, over time, for sure, I think what you’re going to get is this ability to store those snapshots over time, and then go back and see the delta.

But what you’ve just described is like System Initiative isn’t owning the entirety of who can make changes. Like, this isn’t a locked down, no one ever goes to the AWS Console anymore, because System Initiative can react to what happened there. How much of that do you see – is this a companies should adopt… I mean, obviously you would want them to 100% adopt System Initiative, like “Everything’s here”, but that’s not practical for a lot of –

Somewhere you start. Yeah. I mean, we say it’s not practical, but that’s how every automation technology worked. Every Pulumi user, mostly, they were Terraform users, and then they converted. And lots of, lots of Puppet users converted to Chef, and there were Chef users who converted to Puppet. And there were CFEngine users who converted to both. And we destroyed BladeLogic. Nobody’s buying BladeLogic anymore, maybe. Somebody’s going to be like “I use BladeLogic!” But…

Yeah, someone’s gonna comment…

Someone’s using BladeLogic. But you can sort of force those – if the technology has enough compelling value, people will replace it over time. But yeah, our goal is that we definitely – I lived that experience with Chef, where… I didn’t love how long the adoption cycles were, and how all or nothing it felt. And so yeah, we’re hoping that part of that adoption cycle is actually just – it’s going to start with just importing the things you already have, that you want to deal with, and then over time you could track just the resource side of that and be like “Well, I can show it to you in a diagram, and we can see what its resource value is.” But then it sort of starts to make sense that like “Well, why wouldn’t I also manage it through here? Because it’s here, and it’s easy, and it’s straightforward.” I think that’s probably how that goes.

There’s all sorts of interesting possible futures, where you do like automatic discovery of things… But there’s a lot of interesting problems with how do you lay it out, you know? And – anyway, there’s a lot of interesting stuff hiding inside there, but like…

This would also be really good for distributed teams and remote work and onboarding new engineers, because now you’re teaching them the bigger story… Because coming into like a huge enterprise or a huge like codebase, and just infrastructure is really hard. Also, like with teams, a lot of enterprises doing open source, or having internal infrastructure and then external infrastructure, or like open source infrastructure, and like being able to manage two different kinds of infrastructure is really hard.

So being able to give a top view, bigger picture, and then to be able to learn and onboard and say “Hey, well, this piece fits here.” Half the time, it takes like so much research and discovery hours just trying to figure out what your whole infrastructure is…

Yeah. One of the earliest discovery calls we had for System Initiative was with a big global bank. And they were like “Well, if you could just tell me how many Kubernetes clusters I have…” They’re like “I’ll pay you for that.” And I’m like “Well, that’s an interesting business model…”

But if we could go to every enterprise and be like “How long does it take you just to make the diagrams to figure out what’s going on?”

For real. Yeah, if you think about what an enterprise architect is today, there’s a real change that can happen here, where enterprise architects become like concierges to that complexity… Which is what they already are. If you go to an enterprise architect inside of a huge enterprise, they’re essentially the people that you go to, and you’re like “How does the bank work?” And they can be like “Well, how detailed would you like me to be?” And they can move really elegantly between those levels of abstraction. And I think, ideally, you’d be able to express that through something like System Initiative, so that you could have that mechanism there. What exactly that looks like and how exactly it feels - we’ve got to go through that together as a community of people, because it’s too big to imagine you could just know the answer sort of from the jump…

[01:10:04.12] But I think we can discover from here what all the right primitives are to allow that to happen. Do you know what I mean? Like, we can roll around with it together enough to be like “Oh, yeah. It’s cool that I can have views, and it’s cool I can link between them… But what I really need is blah, blah, blah, blah.”

Seeing how people use your software is like a whole other learning opportunity.

So fun.

Yeah. This is also interesting on how it could affect tech debt and like migrations, and like when you are trying to update your infrastructure… Because having this much insight into it, you could then – like, how you were saying you first find all the different pieces and you put your stuff in there, but then you’re like “Okay, how can I make this more efficient? How can I save money? How can I make this better?” But when you have that type of data, you can really have the information you need to evolve your infrastructure.

Yeah. And doing that evolution, you could think of doing it as a programmable thing. Like, what are the things that I need as inputs to the transformation, that then allow the infrastructure to become something else? So in a not too long future, you could like write a component whose job is to take as an input all of the infrastructure you have, and then programmatically translate it into other infrastructure.

Also, observability is one of the ways that people fail so hard in engineering; it’s not even knowing how their infrastructure is working, or how their customers are like getting things… And this would give you more observability and better monitoring…

That’s the dream.

…but also it would help you to get help better. You know how a lot of people will like use consultants for like your database or whatever, or just to get something more efficient; you can now get better help, because you have more information to be able to give people on the bigger picture, to get better troubleshooting or whatever you need.

Where do you think the existing crop of infrastructure as code tools fit in any sort of future?

I mean, nothing goes away, like we were just joking about BladeLogic. You know what I mean?

Yeah. If someone uses BladeLogic and listens to this show, please email us. I would love to talk to you.

They for sure do.

Come on, I want to hear about it.

They definitely do. And they would be fascinating to talk to.

But I’ll use Chef as an example, because it was mine. That business is still growing. It’s not shrinking. It’s getting bigger. And a lot of people who are listening to this podcast are probably thinking in their heads “Oh, that’s like old, busted technology. It’s legacy already.”

No. It runs the world.

But as a business, it continues to kill it. Like, it’s still doing quite well. It’s bigger than it ever was, they’re growing… I don’t know exactly what percentage of it is in progress as revenue, but you can go read it, and it’s pretty good. So I don’t think that what happens is a technology like System Initiative comes along and then suddenly everybody just stops. That doesn’t really happen. What does happen though is that what’s possible changes. And so the infrastructure as code locks in a ceiling of what’s possible to build around it, because of just how the technology works, how the primitive functions, what are the other things we need to put in around it… It’s not just infrastructure as code. It’s GitOps, and CI, and CD pipelines… Like, why is a pipeline the right abstraction for infrastructure? It doesn’t actually make any sense. And if you think about it, it makes sense for an application in some ways, because we’re building an artifact, and then we’re pushing the artifact somewhere, and then we’re doing stuff… And so it makes some sense that it’s a pipeline. But infrastructure - that’s not how it works; it never really has been. It’s weird that that’s the abstraction we got to. But it makes perfect sense when you say “Well, but I define it as code. Code has pipelines, code goes through these things, it looks like that…” So all these decisions kind of stem from that one place.

And so I think what’s going to happen is as people kind of come to grips with what this technology is, and how it’s different, they’re going to start riffing on what’s possible, and that is going to snowball, and eventually it’ll become the way people think it should be, as opposed to the new-fangled way, where I have to convince you that it’s going to be better than infrastructure as code. And it won’t take too long before I don’t have to convince you.

[01:14:05.18] The question won’t be “Is it better?”, the question will be “How long before I can try it?” How long before that wave reaches me?” But it’s not because the wave’s not coming. It happened, and it’s not up to me really to decide what that is, it’s up to people who use it. So other people have to also fall in love with that primitive the way that I have.

So I don’t think it’s going to disappear overnight, but I do think that you can’t – I would predict that the first reflexive thing people will do is (and they’re already doing it) is sort of looking at it superficially, and going “Well, what if we built a better UI on top of infrastructure as code? What if we did this? What if we did that?” And they’ll look at those things superficially, and what they’ll learn is what I learned, which is you can’t actually build the user experience you want on top of that foundation. But they’ll try.

I literally talked to someone yesterday who was trying to build that, and I was like “You’re doing this on top of a Terraform state file?”

That’s right.

“Hold on, wait a minute… Let’s just back up one step here.”

“Let’s talk about how that’s going to end.” And it’s going to end badly. And maybe it won’t. Like, I could just be wrong. But it’s not like I wanted to do it this way. I didn’t start out thinking “Ha-ha! I know where the root cause is.” It wasn’t like that at all. I was like, the root cause is that the whole system is messed up. Like, it’s clearly that the way we put this together is driving the outcomes, and that’s why it’s so resilient to cross technology over time. 15 years of DevOps, we’re still getting the same mediocre outcomes we got year one. That’s weird.

This reminds me of the cloud, and the on-prem and the cloud debate, too… Because first people did on-prem, because that’s what you had. Then everybody went to the cloud for everything. But the cloud’s really good for search use cases, like when you want to experiment. If you have something that you know exactly what you need –

And it’s fixed in size and time.

Yeah. On-prem is better. But when you’re going to experiment or build a new project, or for certain use cases, the cloud is a really good tool. And I think we went through this whole “Oh my God, put everything in the cloud.” And then people were like “Oh wait, this is expensive.” You know what I mean? So I think this could also be a great way to like experiment and to learn new things about your infrastructure, too.

Yeah. And look, I’m obviously in love with the thing that we’ve built, so…

It’s your baby.

Yeah. So I understand my own bias. But I’m never going back. Like, I’m never going to go back to building systems the way that I built them before. There’s nothing you could do to make me go back to writing infrastructure as code, and thinking about that as the primitive for how I’m going to do this work, ever. I’m never going to do it. And that doesn’t mean that it was bad technology, it doesn’t mean I don’t understand it… It just means that – well, whatever; I’ve seen a different future, and that’s the one that I want, and I’m going to go get it. And then we’ll see how many people decide to come along on that trip with me, you know? But I’m going there, because I’ve seen it, and it’s better.

With the question Autumn just brought up, about like on prem and cloud, System Initiative assumes it can programmatically do things. Like, it assumes APIs for your infrastructure. What do you think is the barrier for the – if someone wants to do this on prem… Like, I’m dealing with a SOAP API, I’ve got an SSH in a box over here… I don’t even think it’s the functions in TypeScript as a barrier. It’s just like, I don’t have access to do something programmatically.

Yeah. I mean, look, if you don’t, then you don’t. What are you going to do? But it was the same thing as when people were like “Well, I can’t put an agent on anywhere, and I want configuration management. What can I do?” And I’m like “You can’t.” And they were like “Well, what if I used Ansible?” and I’m like “How many things do you have?” And they’re like “700,000.” And I’m like “You still can’t.” Welcome to agents, you know? But it doesn’t matter what you say to me; agents are your answer, and you’re just going to have to cope. And what’s cool about it is - think about that question you just asked. I have this SOAP API I need to use in order to automate my application. Or to deal with my infrastructure. Because it updates the CMDB that we built in 1998, that has all the data in it, and we use it for compliance, and it’s the only way anything could work. Imagine trying to wrap your Terraform or your Pulumi so that it writes to that SOAP API, and keeps it in – just no.

[01:18:19.21] I don’t have to imagine, I have tried it. Yes.

It’s crazy, right?

Why did Justin look like he still felt the pain? He had a brief flashback…

So in System Initiative though this could actually just literally be a component that takes in the infrastructure, writes to the API, and reacts to its changes. So you change a piece of infrastructure, the little component that talks to soap reacts every time, fires that function, hits the SOAP API and sends the data over. And it really is as straightforward as like writing a function and declaring the inputs that it reacts to. And it’s going to have a lot of inputs, so it’s going to react a lot… But fine, that’s what you want. Right? Every time somebody does something and applies it to – that’s literally the thing you desire.

And so when you think about expressing it this way as a primitive, it’s so much easier to do. Now, does it mean that it overcomes all complexity or all of the real gnarly stuff? No.

Nothing will. It’s still infrastructure.

Because I still have to figure out where I’m going to run that function. So do you have some bare metal nodes in a data center somewhere, that we can use a control plane to dispatch a function to, that makes the API call? Because you’re going to need that. And what exactly is the shape of that API? It’s still complicated. But it becomes possible in a way that it really hasn’t been possible before now.

Now, the opposite end of that question is how many of your customers have had to just increase all of their API limits in their cloud provider because they’re firing all these functions for all of these changes to do all this stuff that immediately is like–

Yeah, nobody yet.

Yeah, but even if you had to though, it would be worth it. I’d be like YOLO.

I mean, yeah, nobody’s had to yet… But it’s an interesting thing I’m worried about. But it’s not because we’re firing the resource side, so when you’re like “Tell me what the state of this is.” Those limits are pretty high. And the create limits are kind of what they are. But a cool thing that happens - for example, in your account you have a limit on the number of elastic IPs you can get in AWS, right? And you don’t know when you run out until you hit the API and it tells you that it broke. And if you have a lot of Terraform, it might take 20 minutes for the plan to do its thing, or for the apply to happen, only to learn that you’ve run out of Elastic IPs late in the game. System Initiative, that Elastic IP resource turns red the moment you put it on the diagram, because it has a qualification that hits that API and goes “Do I have any left?” and just like tells you there’s no more.

And so I can at least stop you. I can tell you immediately, “Hey, you’ve hit the limit.” Now, there’s tons of limits like that in AWS that aren’t programmed yet in System Initiative. But as we find them, adding that thing is going into this interface, writing a single function, and then hitting a Contribute button, which sends it to Paul Stack, Paul Stack reads it, hits a button, and now everybody gets it.

Good ol’ Paul.

It’s pretty cool. And that’s the way I think it evolves.

I think this is interesting, because I’m very – I want to use AI, and I want to have it make my life more efficient, but I’m very like not sure and cautious about it. But I think what your startup has that a lot of AI doesn’t is the fact that you have so much background in infrastructure, and that you have these engineers working on something that still gives you enough control, but makes your life better… And I wonder if that’s because of the different podcasts and infrastructure and different work that you’ve done, and people probably have more trust in you than something that has no face, if that will give them more of a confidence to try your product… You know what I mean?

[01:22:06.09] I do, I hope so. I mean, look, here I am, doing it. I hope that’s true.

Because sometimes AI just feels like the boogeyman that’s coming to get us all, and System Initiative doesn’t seem –

The boogeyman’s not coming to get you. Infrastructure is an incredibly complex domain, and people do not give it enough credit. And when people talk about what it is, they tend to think of it – I had this conversation recently with a couple of very luminous people, who made their careers in infrastructure, and their perspective is “I never want to think about infrastructure ever again.” And I had to have a conversation where I was like “Okay, I get it. How do you think all the things that you use, that make it so you don’t have to think about infrastructure again, - how do they work?” And they’re like “Well, um, infrastructure?” And I’m like “Yeah.” So it’s cool you don’t want to think about it anymore, but I do. I think about it all the time. I love this thing. And I get that you don’t want to think about it, but that doesn’t mean that we don’t need it. That doesn’t mean it doesn’t work. That doesn’t mean it’s not necessary. That doesn’t mean that it’s not even more important. Look at something like Oxide, and what Bryan Cantrill and that company has built, where like people have forgotten how computers work.

Oh my gosh, yes.

But Bryan didn’t forget. And so what Bryan and his team could do is reinvent what it means to have computers and data centers, because they never forgot how they worked in the first place. And they could overcome all these objections. And it wasn’t because they invented a ton – they did invent a lot of net new technology, but the bulk of it, they didn’t. But what they did remember was how it worked, and they were unafraid of going back to those foundations, and where it didn’t make sense ripping it apart in service of that user experience. They were like “No, you’ve got to be able to uncrate an Oxide rack, plug a power thing in, and it’s got to turn on and just work. And if it doesn’t work like that, it’s not good enough.” And so they’re like “We’ve got to get rid of the BIOS, we’ve got to get rid of –” There’s all this stuff that’s in the way of it just working, and so they just ripped it all apart. And the opportunity we have now in the era of the cloud and all these managed services is that the people who remember and/or learn how these things work in a deep way, and fall in love with them, will be able to then use those things to create new ways of working, and new ways of thinking, that will be magical to other people… Because they can’t even fathom that you could still work at that layer. And it’s a real superpower from a startup point of view. Like, if you’re listening to this podcast and thinking “Should I start a startup?”, if there’s weird stuff like that you love and you know that other people don’t know and love, you can crush it by using that specialist knowledge to do things that other people think is impossible… Because people just forget that we built all of this. Like, all of this was just normal people like us, who just did it over time. It’s just the accretion of those people’s choices over time. And you already did it. We could do it again. We can do whatever we want. And that’s so interesting and empowering.

Have you been to big enterprises conferences lately? Like, it’s that, but times it by a million. Everything is the same AI startup, which is why I think Wiz did so good to get that evaluation, because going to Google Next, everything was the same, and Wiz stood out because not only did it allow you to be multi-cloud, but it gave you so much observability. And I think what you’re doing is like Wiz on steroids. It takes that what they have made noticeable, but it makes it usable, and it’s going to be even more successful.

Oh, thanks. I hope so. I mean, we’ll see. But I think it has every – I think it will. I mean, I think it has everything it needs to do that. But beyond us, it really is – and one of the reasons I love coming on the Changelog, and I’m now stoked to have been on Ship It, and hopefully I get to be on Ship It again someday, is that I love this thing. People who build infrastructure – I love infrastructure best. When people say to me “Oh, I’d rather –”

Dark souls in infrastructure.

Yeah. They’re like “Oh, I never want to think about infrastructure again.” I’m like “Are you for real? This is the funnest –”

This is how you know Adam likes hard things.

[01:26:04.06] It’s the funnest part of the game, because it’s so complicated, and there’s so many details, and they all have to fit together… And I love the complexity of it. And I’m not alone. There’s a ton of people. That’s why this is the career they gravitate to. And I think there’s a lot of untapped potential in our people realizing that they can take that knowledge and that love of that game, and turn it into things that are bigger than they think. Like, there’s real – there’s a real game to play here right now. And the fact that people have decided that it doesn’t matter… Nothing’s better than having that point of view where fundamentally you’re a little bit of an underdog, and people don’t understand what the details are, and then you just crack that ball. And I don’t think I’m alone in being a person who can try to do that. I think across our industry, now’s the time; that confluence of like AI, the cloud, Kubernetes - all these things that have been like “Oh, infrastructure doesn’t matter. It’s going to all get abstracted away.” When the cloud happened, all of it was toil, and it was all mud, and we were going to get rid of the mud… And now we’ve built it all in the cloud and everybody’s like “How do I manage the cloud? I’m stuck in the mud of the cloud.” And it’s like “Yeah, of course you are, because it’s the domain.” It is inherent in the work.

And so the opportunity for us to continue to make those things better is wild, and there’s not enough of us doing it. And so I hope more people decide that what they’re going to do is go build some crazy stuff. And if you want to go build crazy stuff, come show it to me and I will talk about it on podcasts. I will do everything I can to amplify the crazy, because there should be more of it, because the opportunity is so big.

I completely agree.

I hope that’s what comes out of this depressing AI sunken place of tech right now.

It will be. It is.

I hope it’s – I’m so excited to see startups that are solving real problems that we actually asked for…

Have faith. It is what’s going to happen. Because in the end, what matters is what changes people’s actual lived day-to-day lives.

Making their lives better.

And if it doesn’t impact their day-to-day lives, it doesn’t matter how good an idea it is. And it doesn’t matter how cool the technology was. It has to actually impact people’s actual lived experience.

We’ve lost the way of caring.

AI is going to do that, but not in the way people think that it will. People are like “Oh, the way this is going to impact our lives is by decimating what we do.” No, no, no. It’s going to change. It’s going to augment the experience of what we’re already doing. But if it doesn’t, then we won’t do it. Because who’s going to use it to do it? Like, who exactly is the person who’s doing it?

But I feel like we’re in this weird place where they’re not listening to customers.

They’re not.

They’re not listening to what people want. They’re just like “I made this cool thing”, and I’m like, that is the breakdown of being good in technology, is when you no longer build things for a customer and you’re building it for your ego.

Here’s another hot take for someone that I love. Jason Warner from previously GitHub, now Poolside - they’ve just raised $500 million to go and build a better AI thing for programming… But look, even my expression of exactly what it is is hard. Because if you go to the website, it kind of tells you, but it sort of doesn’t. And then if you want to try it - I could email Jason and be like “Let me try it”, and he would let me, and that’d be cool… But I don’t really know what it is, I’m not quite sure. And the reason that he can have all that cash is because we’ve collectively decided as an industry that this is like a transformative moment that’s going to change the shape of everything. It will probably, but will there be winners and losers? Absolutely. All the venture capitalists who are making those bets, they’re just making them.

Which one wins? Well, it’s going to be the one that people actually use to change their day-to-day life, in the flow of whatever they’re going to go and do. And if you’re an infrastructure person, you should be the least worried about AI, because the things that AI needs in order to make those good decisions - that does not exist.

Documentation.

[01:30:05.28] It just doesn’t exist.

Which is funny, because we’re automating our way out of good documentation.

We totally are. But it doesn’t exist. And it’s really hard to train it. You can train it on the code, but the code doesn’t mean anything. And the code doesn’t tell you enough about whether it’s good or bad, or whether it works or it doesn’t…

Which is funny, because we’re going to make a whole generation of engineers who don’t know how to do a lot of the things that they’ve abstracted away, which are going to make it that much harder for them to figure out what went wrong, and it’s going to – I don’t even know.

Well, but it’s going to create incredible opportunity for people who do. And so even if that’s what happens – because it is what’s happening. Right now, a lot of cloud engineers - we replace source control. We had to rebuild source control on top of this graph system that we’ve built. That’s an insane thing to decide to do. And the only reason I didn’t think it was that insane was because I remember what it was like before we had source control. I was there. And so I’m like, well, I remember when we didn’t have CVS, and then I remember what happened when we did… And I remember when you could read the source code to CVS and be like “That’s how source control works.” But if you say it to someone who’s only ever used GitHub, they’re like “What do you mean you rebuilt source control? Source control is air. You can’t rebuild air.” And you’re like “That’s not air. It’s just stuff.” It’s like when you meet a mechanical engineer and they actually – like, they made a motorcycle or something out of spare parts, and I’m like “What kind of *bleep* wizardry is that?” Like, you just built a two-stroke engine because you felt like it? And they’re like “Yeah. It was no big deal. It was two hours of work.” And I’m like, I would die before I built a two-stroke engine, you know? And that’s what I mean.

You gave me hope in technology all over again.

The infrastructure of people, our people, in this era - they should have the most hope, they should have the most opportunity. Because right now, they have the most arbitrage between what we know and what the rest of the world doesn’t.

Guys, Adam just gave us the pep talk that we needed in tech…

Get paid.

…so hard. I feel like I have a future in tech right now.

Because you do.

Also, I’ve never considered working for a startup so hard… Like, I might leave FAANG and go to a startup and like build something cool… Because I’m so tired of building –

Look, Autumn, you’re really good at the System Initiative pitch. So…

Just saying. Call me if you get any spots open. I just want to solve actual problems…

But it’s real. And whether you join a System Initiative or you start something on your own - this is what I mean. The part of what I’ve gotten to do in my life is the business side of it. And the business side of this, the opportunity, not just for System Initiative, industry-wide - it’s never been better, never been bigger open field running for the infrastructure startups and for new ideas. You can clean the table. You just have to decide that you’re gonna. And yeah, I hope people do, because it’s so – it’s like right there. It’s annoying to me.

A week ago when you posted, I shared on LinkedIn and I was like “Finally, it’s something solving–” You know, something I’m excited to use for – do you miss that? Like, everything, you just was like “Oh, it’s another one.” Like, you know? Like, finally I’m like [unintelligible 01:33:25.08]

Yeah, that’s why I spent the last five years building this one. That’s exactly why. Because I was like “Oh, I really want that. And if I want it in my life, I’m going to have to build it myself. And I can, so…”

Adam, we absolutely have to have you back on the show in like a year to hear how this has progressed, and the things you’ve learned, and the crazy things people have done… Because that’s gonna be awesome.

Dude, I’m rooting for you. I’m so excited.

Thank you so much for coming on the show, and thank you for telling us everything from Twitter dumpster fires, all the way to hope in the tech industry. This has been awesome.

It’s the pep talk I needed, Adam. There’s hope.

I’m so glad, because I think it’s really true. There’s no bull****. That’s the reality.

Thank you everyone for listening, and if you haven’t tried it already, go look at System Initiative. See how it’s different than the drudgery you’re doing today, and how it’s different, and maybe what it would be doing in the future for you. Really fantastic just to be able to discuss all that and see what you’ll do.

Thanks, friends. And yeah, if we can be helpful, you know how to find us. We’re easy to find. And yeah, if you have a startup you want to run, you should shout. Not that I write checks, but I’ll help you find money. Let’s go. Get crazy.

That’s awesome.

Alright, thank you so much.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00