Changelog Interviews – Episode #573

Amazon's silent sacking

with Justin Garrison

All Episodes

Justin Garrison joins us to talk about Amazon’s silent sacking, from his perspective. He should know. He works there. Well, as of yesterday he quit. We discuss how the cloud and Kubernetes have transformed the way software is developed and deployed, the impact silent layoffs have on employees and their careers, speaking out about workplace issues (the right way), how changes in organizational structure can lead to gaps in expertise and responsibility which can lead to potential outages and slower response times.

By the way, we officially let the cat off out of the bag in this episode. Justin has joined the ranks here at Changelog and is taking over as the host of Ship It! Expect new episodes soon.

Featuring

Sponsors

Neon – The fully managed serverless Postgres with a generous free tier. We separate storage and compute to offer autoscaling, branching, and bottomless storage.

Socket – Secure your supply chain and ship with confidence. Install the GitHub app, book a demo or learn more

Sentry – Get $100 towards your error monitoring with Sentry! Use the code changelog.

Fly.ioThe home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Notes & Links

📝 Edit Notes

Chapters

1 00:00 This week on The Changelog 01:21
2 01:21 Sponsor: Neon 05:21
3 06:44 Start the show! 02:02
4 08:46 We're bringing Ship It back! 04:33
5 13:20 Let's talk kubernetes 02:57
6 16:17 Time at Amazon and sillent sacking 06:30
7 22:47 Amazon's incentive structure 04:27
8 27:28 Sponsor: Socket 03:24
9 30:52 Whistle blower moment 02:29
10 33:22 What's been the response? 07:03
11 40:24 No more pizza teams 04:22
12 44:46 There will be a major outage 05:27
13 50:31 Sponsor: Sentry 01:23
14 51:54 Jerod's offensive linemen analogy 10:39
15 1:02:33 Similarities with Twitter/X predictions 08:12
16 1:10:46 Let's talk Ship It 06:06
17 1:16:51 Sponsor Ship It 02:22
18 1:19:13 Up next! 01:42

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Well, we are here with Justin Garrison, the new host of the Ship It podcast, maybe you’ve heard of it. Justin, welcome to the show… I guess welcome to the Changelog network.

Yeah, thanks so much.

Fun times ahead. It’s gonna be a good adventure. The shipping game is fun, and the Ship It show has been a lot of fun for us to produce, and a lot of rave around it… We certainly have Gerhard behind the scenes still doing infra here, but not involved in the show directly for the moment… But just, how you feel about this? How do you feel about this podcast, and some of the plans that you’ve started to put in place?

Yeah, I mean, I feel great. As a fan of the show, and what Gerhard was doing with the show - that’s the area of technology and of software that I have traditionally worked in, and the areas that excite me. It’s that moment when code becomes software. As soon as we take the bits we wrote on some storage medium and add electricity to them, and they run through CPU - that’s the software side of it, and that’s the part that excites me. Code is fun, it’s challenging, there’s a lot of things there, but I really enjoy what happens when we actually make this thing run. And I’m really excited about those things. So it’s everything after that moment of “You commit the code. Now what happens?”, that’s where I – I love talking about this stuff.

So this will be your first time with us, but not your first rodeo. You’ve been making opsy/cloudy – I’m not sure how you… Maybe you talk about the cloud and the way that you frame and talk about it - is it DevOps, is it infra, is it SRE? …whatever it is. You’ve been making content in this world for a while, and working in this world for a long while. You are currently working at AWS on the Kubernetes team… But we’ll talk about that here in a few minutes. That whole scenario is – it’s a debacle, I think, but it’s interesting one. But talk a little bit about your history of content creation, the stuff that you’ve been doing, and it’ll help people see why we think this is a great opportunity for Ship It to come back working with you.

Yeah, I mean, the first podcast I ever did was in 2007, which is now ancient history for the podcast ecosystem. I ran a podcast called Linux Mint, or mintCast. It was for the Linux Mint community. I was a fan of Linux Mint. I was using it, I had discovered it while I was in college, and I was like “This is amazing. It’s free software.” As a broke college student I just – I wanted to give back. And the only way that I knew how to give back as a non-developer, as someone that – at the time was like, I was just too embarrassed to write any code whatsoever. And I was just “I can tell people about it. I can just tell them the news, and I can tell them where to find things.” And I love that community building aspect of it. The mintCast Podcast is still going on, which is amazing to me. We handed it off a long time ago, but they’ve kept doing it, and it’s awesome to see that sort of flourish for new people every generation.

But ever since then, I first had a conference talk based on the mintCast Podcast that I did. I showed up at a local Los Angeles Linux community group, and they were “Hey, we want someone to talk lightning talks.” I was I don’t know what that is.” “Well, just go and talk for five minutes.” I was like “Oh, cool. I run a podcast using all open source software, and we publish it, and it’s about open source.” And this was an open source conference, and they were “Oh, great.” Someone in the audience – it was a room of 15 people. Someone in the audience came up to me after and was like “I listen to your show.” I was like “Are you kidding?” That’s when it became real to me. I’m like “Wait, the thing I did in my living room on the weekends, and editing this thing… That actually became someone in real life.” And then having that full cycle kind of happen was just amazing to me. And then ever since then, I’ve been blogging, I’ve been doing conference talks… I was doing all that stuff as an engineer. I worked at Disney for six years on feature animated films, on Disney Plus, I’ve been at Amazon now for three and a half years working on the EKS ecosystem and products… And so I’ve been around different environments - mostly on-prem - in my past, and then in the cloud beyond that… But I’ve been just involved in open source communities, trying to give back in those ways, and sometimes making content and training people [unintelligible 00:10:57.26] It could be a blog post, a video, a podcast… I wrote a book a few years ago with a great friend… And those are all content pieces that people can use, and they can make themselves better asynchronously. And I love that aspect, of I can spend some time now, that will have echoes for a long time.

Yeah. That’s still cool for us as well, when people say “Hey, I was just listening to that show you did about X”, and I’m “That was four years ago.” And they’re “Yeah, but it was very interesting for this reason.” And you’re “Oh, wow, that was something that happened…”, like, it’s so asynchronous… We’re talking years of time, and it’s still out there for people to consume. Of course, books, blog posts - they all kind of can get stale in the same way, but they all have that exact same longtail of value that’s just so cool.

[00:11:46.18] And I love having those conversations with people, “I made something four years ago.” The book I wrote was 2017. Six years or seven years ago now. And people come up to me and ask me about it. “Hey, Cloud-native Infrastructure. I read the book. What do you think about this?” I was like “Hey, let me tell you about everything I’ve learned in the last seven years.” And that’s the great place, because we can start at the same moment of “Hey, that thing I did… Here’s everything I believed at the time. Here’s why I believed it.” You know the context. But now, what is it today? And we can start from the same point. And that’s a great way just to talk to anyone. A small chat in line, of “Hey, how’s the weather?” We all are experiencing that at the same time, and content in a lot of ways is similar, where you get the starting point to have a better, deeper conversation.

Yeah, podcasting has some legs, you know… It’s interesting, even 2007 – I began in 2005. It’s just crazy to think that we’ve been podcasting for so long, basically. That’s insane.

And everything old is new again, right? I mean, I see a lot of those patterns from what I was talking about way back then is coming back again, and it’s just “Hey, we have new branding around it”, or there’s some new tools, and maybe it’s better than it was 15 years ago… But the ideas - a lot of ideas are still the same, and a lot of the people that are trying to get into it and learn it are coming for the same motivations.

And so a lot of the people haven’t really changed as much. It’s the same sort of desires, and they want to learn, and they want to have better lives for themselves, and they want to be better people, in a lot of ways. “Hey, if I learn this thing, I’m reaching this goal.” And they push themselves, and they subscribe to podcasts and read books so that they can push themselves further.

So you’ve learned a lot in the last three and a half years working at Amazon, on the Kubernetes team. I want to talk about – I mean, we all want to talk about the silent sacking thing that’s going on, because - wow… But before we do that, let’s just talk Kubernetes for a few minutes. You’ve been deep into it; here you are, it’s 2024… What’s Kubernetes looking like? Is it still sucking all the air out of the room of the cloud and DevOps? Is it kind of where people are over it, but building on top of it? Are they going around it? Like, what’s your perspective on the cloud in light of Kubernetes right now?

I think cloud changed a lot of things for a lot of people, obviously. I was working on-prem in a data center, and we needed a new server, and I had to go send an email. And that was the first way – I was like “Hey, I need…” I basically needed a prompt. I have to wait three months, then go through procurement to get that prompt, so I can do some work. And that distance between “I need to do something” and actually starting to do something was so vast, that it was just – we couldn’t keep doing that, because I had to keep projects going every three months. And cloud obviously shrink that time dramatically, and just said “Hey, it’s just on-demand, and you only pay for it on-demand.” And when I was first learning Kubernetes, I would go on my lunch break and I would spin up a GKE cluster, because I could get it in five minutes. I would poke at the API, I would deploy some services, I would understand how some of it worked, and I would turn it all down. And over a month of doing that, my cloud bill for poking out at it at lunchtime was under $1. It was just “Wait a minute… I easily can just go learn this stuff and it makes it so accessible to be able to figure out how stuff works and not basically anything.” It’s not even a Starbucks drink. That’s the threshold for “Is this too much money?” It’s like, no, this is just – I lose this money by dropping coins, or something. This was a great investment in my time.

And the cloud still does that. The cloud still pushes people forward, to give them access to those technologies, especially new technologies. It may not be just the VMs and networking anymore, but someone says “Hey, I want to go run a serverless website. Let me go figure out how Lambda works.” Actually, the free tier for that is amazing, and you can just run that for a while, and you can learn a lot of things through doing that.

Then still, just the cloud is expanding more and more, and making a lot more new technologies accessible to more people around the world. And that’s great. Kubernetes on top of that is one of those features. It’s just hey, there’s a standard API that you can kind of interface with.

My first involvement with Kubernetes was basically I deployed it on prem. We’re doing this from bare metal, we’re spinning it up… I was one of the SIG co-chairs for SIG on prem, which doesn’t exist anymore inside of the Kubernetes community… But it was just – we were just trying to do it on prem, and there was a handful of us that are “This is really hard when I don’t have an API. And when not everything looks the same.” So what does that look like?

[00:16:01.00] In my time at Amazon I actually helped with the product for EKS Anywhere, which was helping other people do Kubernetes on prem. It was just this recurring thing for me, where it’s “Hey, this was hard when I was at Disney, it’s hard now when I’m at Amazon. Let’s go ahead and see if we can help people do this a little easier.”

Well, you’re talking about your time on Amazon in the present tense, you’re talking about it in the past tense… We know why. Let’s get everybody in on the story. So we’ve been talking with you over the last three or four months, so we’ve known about this situation with your work at Amazon… I think it was happening and you’ve been waiting, you’ve been waiting, waiting, waiting… Tell everybody the story with Amazon and these silent layoffs, and what’s been going on. You recently went public with a blog post about it, you also recently went public with your involvement in Ship It, so good timing all around… You know, January 2024… It’s time to start making some announcements… Tell that story in brief, and we’ll dive into the details after you’ve laid it out.

Yeah, this is my first time in a professional role doing Dev Rel, doing content creation and doing this style of work. I’ve always just been an engineer, and I love this job. I felt I was good at it. I enjoyed engaging with people and learning something deep technically, and then telling people how it works. And that cycle of back and forth, and then being part of a product that I know a lot of people used was also just wonderful. That was something that I joined Amazon for, and I really a lot of things at Amazon with how they were working on things, and that involvement to get some outside voices or non-engineer voices necessarily. I’m not writing the production code. I’m helping guide the product and then testing it and saying “Hey, this is how customers should or shouldn’t use this thing.”

And I’ve had experience across the board with – I helped launch App Runner, I did a lot of work with ECS, I had some talks I’ve done about Lambda… This is really broad, because no one uses any of these services in a vacuum. And I liked that breadth that I was able to just “Hey, I can pick out any of these things.”

This year - well, last year. It’s 2024. In 2023 Amazon really pushed for return to office… And I was hired as a remote employee before the pandemic. I started during the pandemic, but I had my contracts and negotiations and everything before, as a fully remote employee. I was remote at Disney before, so I was used to that. I needed a remote job. And in 2023 they were really saying “Hey, we’re gonna start returning to office.” I was like “I’m a Virtual Employee, I don’t have an office.” The closest office to me is maybe an hour away, maybe two hours away with LA traffic. I’ve gone there a few times to meet up with people and to have some meetings, but it’s not a regular occurrence. And I was always told that my role would stay remote, and I didn’t have to worry about it.

And then things started changing, more and more noise was happening during the summer; it was actually more people are gonna start coming back to an office part time. And I kept being told over and over again by management and my leaders “No, no, no, you’re fine. You’re a remote employee.” My entire team is not in a location. The DevRel team is all across the United States. And when I joined, we were all over the world. I was like “We don’t have a timezone, let alone a location.”

And so as things started progressing, it was becoming more and more clear that this was going to affect me at some point. And we filled out forms to get a remote exception, which was for a one-year thing; I had to renew that every year. And I got my approval for remote exception just days before I was told that our team was actually going to be disbanded, and our team would not exist anymore under the Kubernetes org, as part of the Kubernetes product team. And they wanted to get rid of this sort of DevRel product space under that product, under the service team. So that was like “Okay, what happens?” There was a handful of us on this team; what do we do? And they said “We want to give you time to find another job internally. And if you find a job, great. Go ahead and take it. You can shift internally.”

Amazon’s also been mostly on a hiring freeze for over a year… And so that was like “Well, there’s a lot that’s not going to be available”, and most of those other teams are also requiring me to go into an office. And so this wasn’t a matter of “Oh, let me just shift and start doing new work.”

[00:20:04.27] I had to find a team that was local in LA, because I couldn’t just go to any office. I had to go to my team office. “Return to team” is what they actually deemed it. And so if I wanted to go to the local office, I had to find out what teams worked from that office, and then see if they had openings, and then work with them. There were a lot of barriers, and I just kept getting more and more frustrated with some of that, “Hey, this isn’t actually just an easy switch.” So there’s not a lot of teams hiring. If they are, I’m probably going to be in a space I have to leave anyway.

And the more people I talked to, I’ve found that more and more people across different areas, different services and different divisions were hitting some of these same limitations and same frustrations, where they’re saying “Actually, I have to find something else or move.” And in many cases, it was just “Hey, well, how do you get out of this situation?” And the official email from Amazon was like “If you don’t return to an office, you will voluntarily resign.” I’m like, that’s not a thing. [laughs] That’s not how it usually works with this sort of employment contract. Because this is a contract. It’s like, “I give you some time and some value, and you give me money.” And that’s how any job works.

And when the rules changed, when the contract changed, it became more of a frustration. I’ve found that a lot of times people were just silent, and they just said “I don’t know what to do, because I can’t do anything.” So they would just sit around. I’m going to answer some questions as they come…

And I finished out my work for what I was already scheduled to do, and I asked for severance. And I said “Hey, this isn’t working. I’ve talked to a bunch of teams, I’ve looked around, but this situation is just not working.” And that was where that frustration was really coming from, where I was like “I can’t do anything, and I’m not going to find another job, and I’m not going to voluntarily resign, because you got rid of the job I loved.” It was like, in any other case, when someone gets rid of the position that you really enjoy doing, there’s some monetary or some situation where it’s “Hey, we’re gonna close out this contract.” I know, it’s an at-will sort of thing. You can be let go at any time. But also, I know there are some labor laws that protect some people in situations.

But I wanted to write that blog post mainly to give voice to all the people I talked to that were “I can’t say anything, and I can’t do anything, and I have no network of external people.” And a lot of them are fresh into technology from the past two, three years. They’re junior engineers that are just being forced out without any sort of compensation, or connections, or anything else, and I was like, I really wanted to give them a voice in a lot of ways, just by sharing my own experience and saying “This is what I’ve gone through, this is what I’ve seen, and I don’t think that’s right.”

And in this post, you lay out Amazon’s – obviously, it’s hard to give personification to a blob, but kind of the Amazon incentive structure on why this might be the case; this, not doing layoffs, but just, I guess, removing positions, and not people. I’m not sure exactly how you say it, but just taking your role away… And I guess they would rather pay you to do nothing, than lay you off… Can you explain your thoughts on why that is?

Yeah, I mean, looking back at the last year of earnings calls, and stock prices and all that - actually, the times that there’s a problem is when you have to say “We have to lay off some percentage of our company.” And most companies – lots of companies did that. This wasn’t an Amazon-specific thing. Lots and lots of technology people were laid off, unfortunately. And many of those public companies, they take a dip for a little while, and then things start to settle out again, because the overhead of creating the products got less, because just people were gone. Whether they were doing more work or not, they were able to recoup some of that money that they were spending on the people that were hired. And at some point, people forget about it. They’re just like “Oh, I forgot you laid off 10% of your staff. And it doesn’t matter, because I as a customer, I’m still fine. I’m still getting value from the thing, and my value hasn’t changed whether you had those people there or not.”

[00:24:15.08] And I look at this a lot of ways like, when Netflix raises their prices, everyone feels it. Everyone’s like “Another $2 a month for that streaming service… Am I getting $2 more value out of this thing?” And they needed to make more money to projection-wise be able to sustain.

Right.

The other way you can do that is to cut things out, which I feel like is a lot of ways that people – like, if I look at my budget and I’m putting it out six months, and I say “You know what, I’m gonna run out of money in a year.” What am I going to do? Am I gonna go get a second job? Or am I gonna go stop having Starbucks every day? Those are the choices that people make a lot of times, and companies can make very similar decisions. Sometimes they will make both decisions; they’ll look for new revenue areas, but also they’re going to save money in various ways. And the people at a company are almost always the most expensive… But when people leaving affects the stock price, that’s way more expensive.

And that stock price value is way more valuable than hundreds or thousands of people. And so if people are quietly leaving, and no one really connects all of the dots, and say “Hey, actually, I saw 100 people on Twitter leave”, that doesn’t affect the stock. But if they say “We laid off these divisions”, that does. And so there’s different monetary – so I can let 100 people leave on their own over three or six months, and I will still lose less money than announcing they’re all gone giving them three months severance, and then having the stock price kind of do a rollercoaster. And so I don’t know for sure, I’m not running Amazon, but these are trends that I started seeing; not just Amazon, but other companies and people I’ve talked to in the community. And that sort of like silently just coasting people out, and saying “Hey, you’re not going to have any career progression. Hey, I’m not going to give you any new interesting work. Hey, you can’t switch teams.” What are people supposed to do? These are their lives, and it’s like “Well, now my projections for six months are like I don’t know when my job is going to end” at that point. You’re just like, “I know that at some point I will not have a job. What do I do? And my contract says I can’t go get another job. So that limits what I’m gonna do. And I can’t just spend no money. So I need to start saving now, and then figure out what’s going to come next and start making those connections.”

And so we’re doing the same projections at an individual level, to say “How do I make sure I can provide for my family and live in six months?” Especially if you’re in the United States and you need healthcare, and you need these other things that are just – you have to pay for them. And the cost of living not only generally goes up, but as I get older, I’m like “Wow, my subscription to life costs more money.” Because I have more medication, I have more aches and pains, I have all these things that have to happen, that it’s just like my baseline when I was 20 is not near what it is now that I’m 40. And those sorts of things pay out, and I have to figure out “Okay, what does it cost to raise my family? And where’s the end of this job, or this paycheck, or these things that I’ve just become accustomed to?”

Do you feel like a whistleblower in a way? Because I feel like this is like a whistleblower moment in a way, because you’re revealing what no one else sees because you have a certain purview of the scenario. And you’re still employed there, right? And so you wrote - not a scathing thing, but a very factual, and I suppose lots of opinion in there of what you assume… Because like you said, you don’t run Amazon, and you’re not a financial advisor, so hey, this is not financial advice either at the very top of it… But it’s kind of whistleblowery - if that’s a word - because it feels like not everybody really realized that there’s this silent sacking as you’ve brought up… But you’re still there.

Right. And I do feel like I collected a lot of that. I took what I was seeing everywhere… Because when you’re on the outside, you don’t pay attention as much to accompany, or you don’t have all the connections with one company.

You have the inside.

When you at least work there, I have all of the connections, the people I know that left. And all these people, I’m like, I loved working with these people, and they’re all gone. And now what do I do? And so I’m basically just collecting all of those LinkedIn posts and all those Twitter posts and all those things and just saying “Hey, why’d you leave? What was the reason that you’re not here anymore?” And everyone has their own reasons, everyone has their own situation, but a common thread was “No career progression”, or “My boss said I had to return to office, or move, or resign. And those were my options.” And so there is a little bit of a whistleblower, only from the, like, it’s going to hit customers at some point. You can’t keep doing everything you were doing before with that few of people. And at some point, the trust, the hard-earned trust that Amazon has built up over the decade of running a great cloud, having awesome operational excellence, and then getting rid of a lot of the people - you can’t do the same things. And as a customer - I was a customer at Disney, and I was a personal customer… I very much – like, I don’t know that this is going to be the same thing in 2024 or 2025. There’s going to be some consequences to losing people that run these services. Because any software that we’re running - the Ship It podcast is all about how do you run it, and how does it maintain itself? As much as we want to say things are self-healing, they only are to a degree. And at some point, you need someone to be able to troubleshoot and fix those things… But those people also need to take vacations, and go to sleep, and have rotations for on-call, that sort of stuff, too.

Right. So you published that post five days ago as we record this… What has been the response? Have you had any response from anybody at Amazon, or has it resonated with other folks? It seems like usually when you blow whistle, people perk up and listen. Has it made a splash at all, or was it just kind of like “Hm…”?

I will say, more people saw than I expected. I mean, I wrote it December 30th. This wasn’t a great news cycle time. I was just at the point where I was like “You know what, I don’t know when this is ending, and I’m okay with that… But I need to tell people about what I think is going on, and why I think it’s not a good situation for a lot of people.”

I’ve had so many DMs from people currently at Amazon, or previously at Amazon, that – I have yet to have anyone disagree with me. No one has come up and said “You’re wrong for this reason.” I would love to learn more. If someone has a story or a situation that I pointed out something wrong, please let me know. But just from my own experience - I’ve tried to just share my experience, because that’s what’s protected under labor law in California, is sharing my experience about working conditions.

Right.

So I was trying to stay very much in the lines of “I’m not sharing confidential internal information. I’m sharing my experience and what I’ve seen as a pattern.” And the amount of people that have reached out and made new connections to agree with me has been very surprising, because I only knew of a small sphere of people that I directly worked with, or I knew their names, or followed them online. And that was the sphere of pattern I was seeing. But seeing this play out in other countries, in completely different divisions - that has been really fascinating. It’s just like “Oh, this is actually a much bigger deal than even I knew from what I could see.”

What about internally at Amazon, your higher-ups? Has anybody said “Alright, Justin, you’re fired”, or “We’ll hook you up with some severance. Sorry about that. Please take the post down”? I don’t know, has anybody said anything?

So I know that the post was escalated and reviewed by HR and legal teams at Amazon… And in both situations they said I didn’t break any policies. There was nothing that I went outside of. I’m not breaking any rules for having a personal opinion on the internet.

Right.

And that was the only communication I was given back, was there would be no discipline, and I wasn’t breaking a policy.

And you’re still employed there, you just don’t have a role.

How long can this go on?

[00:35:46.23] I don’t know. And I asked my leadership team, and my VP, and my skip-level for severance in October, and I said – when this all happened, I said “Hey, what if we don’t find a job? What if we don’t find something else internally?” And they said “Well, you should leave.” I’m like “No, what are you gonna do? You just got rid of that job that I really, really liked. I really enjoyed this job, and I enjoyed working here. Is there any severance?” And they told me yes. And they said “Yeah, that’s a possibility. Once we go through all the other options.” I’m like “Cool. So I’m going to try the other options first.” And so in October I asked for it, and every week since then I would send a message to my leadership and say “Hey, by the way, I’m still here. [unintelligible 00:36:27.06] emailed you for severance to be able to let me go. I know I could leave at any moment. I know I could just say “I’m done, I’m out.” I feel like that’s not the right thing to do from a company perspective, when you’re forcing people out this way. So I can stay here. I’m not in a rush. Obviously, I’m getting paid. I’m learning a lot of great things. I started my own podcast last year, that I’m not going to continue with Ship It, but my job was to learn things, create content and help guide customers. I didn’t stop doing that. I frequently was always able to – I went to KubeCon, I had great conversations with people, I’m around, learning things, I’m doing YouTube streams, I’m doing podcasts, I’m doing blog posts… I’m doing all the things that I was doing before, I’m just not getting official work from the product team, which - I wasn’t always. This wasn’t like “Oh, I had to wait for them to tell me something.” No.

One of the great things about a DevRel position is I can take the initiative and say “Hey, you know what? I really want to learn about this new thing. Let me spend a week on it. Let me figure out how that works, and then go tell people about it.” And I’ve done a lot of videos over the last couple of years on TikTok, and YouTube shorts, and running my own YouTube channel… And those sorts of things are just – I’m still doing them. I didn’t stop. So it’s like, what I’m doing is still the same stuff. Maybe not every hour I’m like “I have to do this thing right now.” Because I don’t have assigned tasks ever since mid-October when I finished out what was assigned to me… So I’m here, I’m still doing stuff, I’m still learning stuff, I still love the Kubernetes community, and open source, and running infrastructure, and cloud… I’m still doing that stuff. I’m just not getting new work.

You mentioned the RTO thing, having to go back to your team’s office… Do you have to do that then? So is that a requirement for you to go into the office? And has that been a burden, if you have?

I’ve still had a year remote exception. So under my current role, I’m fine until August. I don’t have to go into an office.

Well, you know what Milton did on Office Space, right? Do you know what Milton did?

Yeah… [laughs]

[00:38:27.26]

“Mr. Lumbergh told me to talk to payroll, and then payroll told me to talk to Mr. Lumbergh. And I still haven’t received my paycheck; and he took my stapler, and he never brought it back. And then they moved my desk to storage room B, and there was garbage on it, and I really don’t appreciate –”

“Why don’t you go back down and sit at your desk? Mr. Lumbergh should be here any minute.”

“Mr. Lumbergh –”

“Just go sit at your desk, okay?”

“Okay…”

He just kept showing up, he just grabbed his red stapler, and he just went to work, every day… And he had his cake at the office parties… But suddenly, they fixed the bug and they quit paying him.

Well, and writing that blog post I felt very much like I’m in such a privileged position that who wouldn’t want –

You just keep taking the money.

…this couple months that I’ve had of “I get paid very well, I get to learn whatever I want, I have resources that I can run things in cloud environments, I can have all the access to learn and do the things that I know…” So many people would just absolutely love to have that. They would love to have a couple months to go learn new technologies. And at some point, I’m just like “I shouldn’t say anything. I should just stay quiet and just keep doing this.” But I knew everyone else that I talked to, that had no voice whatsoever, and they were being forced out, and they didn’t get this reprieve of a month or two to learn some things; they were the ones that I wrote the blog post – that wasn’t necessarily for me, it was to give them a voice, and to say “Hey, this is something that I hope Amazon and other big companies that are doing this stuff hesitate to do…” Whether they stop or not, at least they know “Hey, someone could talk about this, and this is not a good decision for our employees. This is not something that benefits our employees. This just benefits us. And so how do we make sure that we take care of our employees?”

[00:40:13.07] And I’m okay, because again, I don’t have a role or anything. The moment I’m let go is the – I was already planning that. This isn’t something that is new to me.

Right.

Let’s talk about the ramifications of this… You talked about the No More Pizza teams, and how teams were lean before, and now they’re being emancipated… Or emaciated, sorry.

Emaciated.

A little different, a little different.

My bad. Different word, same E-letter start… And then you predict outages out ahead. Help us understand where – since this is not you breaking the rules, and this has flown by Amazon’s legal and HR, and they’re like “Hey, you haven’t broken any rules. This is not inside information. This is an opinion”, what makes you think that, the outages ahead for this next year?

If we go back and look at DevOps, when DevOps started, DevOps was this thing that – originally, it was your team that ships a product is full stack, right? You don’t have any external dependencies. I worked at Disney for six years, and that was very much not that. Once you have a DevOps team, you’ve lost to DevOps, in many ways. That’s the centralization of what should have been split out around the company. And Disney, at least from my experience, was very centralized. And we had a database team, and we had a network admin team, and we had a compute team… And you centralize the expertise, and then every service picks and chooses from those things. Amazon was the opposite. It was almost exactly the opposite, where every service team [unintelligible 00:41:34.27] or they call them service teams now; [unintelligible 00:41:37.16] is the old word… But in general, you have every one you need to do the job. You don’t have an SRE team that checks things that are alive; the developers who wrote the code are the ones that are on-call. And you don’t have a DBA team, and you have services that you rely on. We use RDS, and all these other things internally, because we just create those services… But it’s a full stack team through and through; it’s what I considered a true DevOps team. And seeing how much duplication there was across the board… It was like, wow, actually, having a DBA on every team is really expensive. Like, that would be great – because they don’t need it all the time, but you have to be kind of a generalist at some point. You can’t go as deep in some of those areas. And if the shift is really “We’re going to lose a lot of people, and we want to maybe run things a little leaner, the leanest way to do that is to centralize the expertise.” And you centralize who knows how to do something deeply, and then everyone picks off of their queue. You add another ticket to their queue, wait for it to come back… But things slow down, because that queuing system just takes a little while. That also causes a lot of gaps. There’s a lot of areas that kind of fall out.

I didn’t know we were doing it before… As I saw a lot of teams in other places, at Disney and other companies, trying to move to a DevOps model, they didn’t realize the gaps. They didn’t realize “Oh, we actually need someone that’s an expert in TerraForm, or in this other thing.” And so we had to keep relying on “Hey, can we borrow that person from that team for a little longer?” But then when it breaks, they’re not on-call for your service, and you have to find someone that’s an expert. And these services - there’s all these things… So these shifts in organizational structure cause gaps. And as those gaps show up, there’s things that slip through, and you don’t know who’s responsible for it, and you don’t know about it, or you assume someone else is going to do it. And those are the things that cause a lot of that risk, is once there’s a gap there of expertise or responsibility, you really have to figure out “Hey, when this isn’t working in the ideal way we think it’s working, or when that API gets changed and we have to upgrade something, or libraries roll out, whatever it is - who’s testing this? Who’s making sure that this is validated?” There’s automation you can do, but for the actual running of services, and making sure that APIs are running - that organizational structure really impacts how the services run.

[00:43:53.19] I see Amazon moving more towards a centralized expertise situation as service teams become smaller. And they don’t necessarily have all of the experts they need to run a service with as much breadth as it has in the past. Kubernetes is one of those services that touches a lot of AWS. There’s a lot of things involved behind the scenes on EKA. It’s not just a bunch of VMs, which is like a full-stack VM, and we just stamp them out. No. We use a lot of the internal services, there’s a lot of stuff that is reliant on – EKS relies on those other AWS services. And so you have to make sure that none of those gaps get missed. And you can be as careful as you want to, and you can checkbox everything, and make sure it’s carefully migrated, but at some point there’s a handoff of on-call, or responsibility, and those are the things that really cause problems at organizations, for everywhere, once you’re running software in these environments.

Well, specifically you said “I suspect there’ll be a major outage in 2024. No amount of multi-region redundancy will really protect you.” And then you said it’s because of an increase in large-scale events. These are things I’m not even aware of, these large-scale events. You’re already teaching me [unintelligible 00:45:00.09] so I love it. This LSE thing… These large scale events are not something they have to – they’re not incentivized, as you say, to announce these things. It’s things that hit the customers that they have to report on, and these are quickly swept into the all greens tab, essentially. But then you go on to say that Amazon is operationally strong, and you say they’re much stronger than any company. So it’s not like you’re sitting here pooing on them, you’re just predicting “Hey, the gap there of people is an issue, and the centralizing is an issue, and then at some point it’s going to bite us, potentially.” But then you say they’re pretty strong, but that strength requires people, and when you reduce your headcount and they’re eliminated, things are gonna suffer. Practice is gonna suffer; operational practice is gonna suffer.

Yeah, LSE is a term internally that we use, but it’s also been in plenty of books about AWS, and things… It’s a way that we measure things internally. Before a dashboard gets updated, we need to make sure internally “Is this actually down?” We’re not going to tell someone “Hey, something’s down” before we know for sure it is. And all the graphs and dashboards might say “Hey, yeah, this is down”, but someone’s gonna verify it. At some point we’re gonna say like “Hey, is this actually down?” And you can do that in a variety of ways, but a lot of times it’s just like “Hey, this dashboard says it’s red now internally. This person said it’s red. Let me verify where and how that’s down.” Because AWS is a giant. It’s however many regions across the globe. This might be a certain sliver of customer in a certain region. This might only be one AZ. And one AZ out of 100 - where are you going to update that? Oh, yeah, 1% is down. What does that actually mean?

And so those sorts of things, large scale events though usually affect multiple services or multiple AZs, so that things are happening at a larger scale of like “Oh, this one customer might have a problem right now.” And those sorts of things just happen as things progress. We push out new code, we make changes… When you talk about a rollout – or I talk about if you want to ship code 100 times a day, I was like “Well, at Amazon that’s one commit change, because it has to go out 100 different ways.” This isn’t 100 different shipping, this is one thing got shipped 100 times. And every single one of those might have some variable that isn’t the same. There might be a service that’s different, or an API that’s different, or whatever it might be. You have to be aware of those things, that it’s not just “The git commit works on my computer, it works in pre-prod… Now we’re good for the rest of prod”, because prod is so big. And so you have to be aware of some of that stuff.

The operationally strong part of it - there’s a thing at Amazon that’s the weekly ops meeting, which is every Wednesday… And it’s one of my favorite things about Amazon. You can read about it, they have a service wheel that they spin to get an update from one or a couple of the 200 services… But they go over the wins for the week. They say “Hey, here’s what we did great this week.” And there’s an Ops Wins email list, which is wonderful to read… Because it’s like “Hey, we changed this flag on this load balancer, and we saved this percentage of latency, or money, or storage, or errors”, whatever it was.

[00:48:06.24] We celebrated those wins over and over again. I’ve never seen that anywhere else, where it’s like hey, actually, I’m writing up “This is one thing that one team did. Here’s everyone celebrating it.” And that’s fantastic, because that operational challenge of running software should be more visible, and the work you do that you think “Oh, all I did was I cut some logs. Who cares?” No, no, you’re gonna cut some logs, and then you’re going to actually project that out and say “How much did that save us over the year?” And then you’re gonna say “Oh, well, actually, that’s a big deal. That just saved all my pay, or something, for the year.” Whatever it is. There’s something in there that even at a semi-smaller scale, where you can say “Hey, this is great.”

And on those ops calls, distinguished engineers are on the call. They’re running the call. They’ve been at Amazon for 20 years, and they’re talking to these people, saying “Hey, we had an outage with –” They’ll go through wins, and they’ll go through COEs, or things that could be improved, and then they go update some service teams. And that cycle has been really great, just to see how ops can be done really well, and everyone can be on board for it. Because as they talk about the wins - they always do that first - they talk about where it can be improved, and you’ll get those distinguished engineers that will really talk about “Hey, you coupled identity to this load balancing. In some way, that may cause problems in the future.” And they can kind of predict and see those things, that I never thought people would be able to see. And it’s just like, “You’ve experienced this before. This has bit you in the past, and now you understand when things should be coupled, and when they shouldn’t be.” And that operational – that knowledge, of just experience of running things at certain scales is so valuable. And they try to spread that out through all of the service teams; everyone’s welcome to come to the ops call. It’s an internal stream, and you can just see how they’re helping everyone improve and predict what’s going to happen in the future.

Those sorts of things have been great to learn from. Actually, I wish more companies elevated their operations, elevated their “Hey, we are running this software. Development, and writing the code, and all this stuff we give for people to write code and solve the problems is fantastic, but we need it on the other side, too. We need it for running the software.”

Break: [00:50:14.17]

Brings me back to my offensive linemen metaphor for ops teams. I don’t know if you’re an American football guy, Justin, but you know, the O line; they just don’t get any respect outside of the team, because they’re supposed to do their job. When they do their job, you don’t notice them. You only notice them when they fail. And so when there’s an outage, or a sack, a not so silent sacking, you have their big face on the television, like “This guy missed his block, and therefore the quarterback got sacked.” But the 9 times out of 10 that he made his block, we’re not talking about him. And that can be very difficult on an offensive lineman, unless that lineman has the respect and praise of his quarterback and his peers on his team, and his wins are celebrated by them. And that’s how you get real camaraderie, and you have people who are willing to do the quiet things, to do the things that no one notices when it goes well. Otherwise, there’s no glory there. There’s no praise. And so I think that that is really cool… Because outside of AWS - I mean, we all assume everything’s just hunky-dory, because that’s the way things work. As customers, we only are mad when it’s not working, when we have that outage. Otherwise we’re not praising our ops people. So…

It comes down to that trust, right? If you trust your defensive end to always get the blitz… Like, I’m gonna roll to his side every time. It’s not even a question of like “Hey, this person is always going to make that block.” And Amazon is always going to keep that service up. That’s something that you build that trust over time. But if they get injured, or if they have an off week or something that - hey, guess what? My trust just dramatically goes down. This isn’t like a “Hey, I understood you were off for a little while. I don’t know how long until you get back to where you were.” That trust is really, really hard to regain once you lose it. Once you get blindsided from the back, that’s a problem. And those are the things that you really need to be careful of as someone who works in sort of these high-trust environments.

Same thing with security. Security is a super-high trust – if I have security scanning tool, and I get a breach at my company, I’m going to be looking somewhere else. This is just like “Hey, guess what - that trust is gone. I’m not even going to give you a second chance. I can’t have that news.” Those sorts of things.

There’s some things that – when you need uptime, you need uptime. And if you’re relying and you move everything into the cloud, and you say “Actually, I pick US East 1, and it’s not the best. Let me go over to US West 2. Maybe that one’s better. Maybe go somewhere else.” And at some point, you’re just going to keep moving things around to find where you have more trust.

Yeah, well said. That’s why I think some of these activities are so short-sighted. And maybe it’s the structure of having public quarterly financial reports that you must show up and to the right, unless your stock crash, or whatever… But eroding that trust for short-term gains is just like long-term not a smart move, right?

Well, and I don’t even pretend to know all of the long-term investments that a company of Amazon’s size has.

Look at the real estate that they’ve invested in, even just in downtown Seattle. So many humongous buildings and things that they’ve created there, that kind of required people to be around. There’s so many restaurants and other second-order effects of small companies that rely on them being there. And Seattle as a tax revenue. That affects – the traffic in Seattle, you can tell when the Amazon in office days are in Seattle, because the traffic is bad. Because just Amazon’s coming back to work. And the days that they’re off, I’ve had so many friends that work at other companies and they’re like “Oh, no, I don’t go into office when it’s Amazon’s returned to office days, because I’m gonna be stuck in traffic for double the time.”

And those sorts of second order effects - you’re affecting other companies’ ability to do work in an office, just because you are so large, and you have so many of these investments. I can’t project or even predict what all of the long-term incentives are for them to force people back into an office, and to silently get rid of people, and to do mass layoffs. I don’t know. I just know my own experience, and I know what it’s been like for people that I’ve been close with, and worked with, that have had their lives turned upside down and were like “Oh, I don’t have a job this week. I don’t know what to do, because the hiring market’s kind of down, and things are rough out there.”

[00:56:21.29] Yeah, that’s why the - I guess, is it a theory? …with regards to it being about layoffs causing the stock price to drop. It seems like – I don’t know, maybe not 100% sound, because I’ve seen layoffs where the stock price immediately popped, because it’s like “Hey, this company finally got a handle on their operational costs”, or something. And investors liked that; like, “They finally woke up.” I think the latest Spotify one, the stock went up following the layoff. And so that’s not always the case. And also, the market is fickle and short-sighted, and maybe your stock will drop for the next six weeks, and maybe it’s that quarterly financials that’s the problem. But over the next two years, it’s gonna be just fine. It’s gonna be a roller coaster, but as long as you’re actually providing value in the market and not getting bloated operationally, the stock’s gonna rebound. And so it seems like there has to be more to it than just that, but I’m not saying it can’t be that. It just seems like it’s a simplistic explanation for maybe a nuanced and weird phenomenon.

For sure. And I don’t want to assume that the stock price is the only reason for this stuff to happen.

Right, right, right.

[unintelligible 00:57:27.08] Someone had a report on like “If you make a thing and you sell it for $10, and it cost you $7 to make it, if you instead raise the price by $1, or you lower your operational overhead down to $6, you will gain more stock market value by lowering your overhead to $6, than you will by raising your price by $1. Even though it’s a $1 change either way.” But your overhead to create the thing is much more valuable if you can lower your overhead to actually create it. And that is just like a common thing that a lot of companies do for the stock market; they say, “Hey, we’re lowering our operational overhead for this thing.” And the other fascinating thing I read last year was the book Bull***t Jobs. If you read the book, it’s all about how a lot of jobs are meaningless. I don’t know if I’m allowed to say bull***t.

Haha. You said it twice.

Sorry, [unintelligible 00:58:17.01]

You can say it, go ahead. We’ll just bleep it.

But it was a fascinating read about how a lot of these jobs, especially at large companies, are not about actually doing the work, they’re about organizing work, and enabling other people to do the work. And the people that actually do the work are the ones creating the value, and everyone else is organizing, and pointing, and whatever; they’re having all those people that are organizing it in a way that they say “Hey, we can go in this direction now, because I have enough people that are doing that in this direction.” And in a lot of ways I feel like my role in DevRel does fall under that, where I’m not doing the work, I’m enabling someone else, a customer to kind of come through and then use the thing that someone else made. And I had to come to grips with that, where I’m like “Am I actually adding value?”

I need to read this book, because I’m not sure I agree with it. I think organizing folks, and DevRel, and that in particular - there’s a certain amount of mind margin that you have as a value worker, let’s just say, to use the language you’re using… And if I can put people in place to organize those people better, and drain them less, then I get more value from their actual work. I still feel like those are valuable. Those are not BS jobs… Personally.

And ideally, you could have 100 people that all just work in the same direction, and they wouldn’t need coordination. The reason we have managers is that we have to be able to coordinate information, and timing, and all these things. Ideally, in a wonderful world that doesn’t exist, people could just work in the same direction and do the same thing without needing the overhead. And so at some point, we have to add that overhead, to be able to align people in ways that make sense for a larger group to do more work. Because me doing work on my own isn’t as good as five people maybe doing work in the same general direction. It’s not five times better, but it’s more than two times better. And so at some point I can add a manager to help us guide that way.

Right.

[01:00:12.20] One of my favorite quotes from the book is – like, the one thing I took away from it was at some point we got so good at manufacturing things, that there’s not really a shortage of being able to create things, especially in technology. These bits are free, essentially. At some point, we have enough bandwidth, so we’re not manufacturing – it’s not a manufacturing limitation. What we have to manufacture is desire. People need to buy things, and we need to manufacture a way for them to actually buy stuff. And that’s marketing, in a sense. Marketing is just that: we’re gonna tell you you need something, and then maybe 1% of those people actually come back and buy it. And you’re manufacturing a need, rather than manufacturing a product. And that’s where the marketing space exists, and that’s where a lot of people that say “Hey, I need you to go do this” is “We need you to go do something else, even though we have the capabilities just to do it on our own. But I can’t have enough people to do it together to make a bigger impact.”

It’s difficult to find the sweet spot when it comes to support roles, when it comes to management and leadership roles… Because some of that isn’t necessary, especially as an org grows, and useful, and valuable, and worth more than you’re paying them, of course. But there’s also an opportunity for you to have too much of that, and at that point you do have people whose roles aren’t bringing as much value as that person could bring in a different circumstance. I’m not saying it’s necessarily the person who’s not bringing value, but just - we have too many managers, for instance; we have too many DevRel. Right? And it makes sense, that’s a harder thing to measure. What’s the right number of DevRel for AWS, Justin? I mean, who knows what the answer to that is, right? We could go to get PhD’s on coming up with an equation for –

And it always changes. Whatever you decide on isn’t gonna be the same next year, because the market changes, and customers change, and needs change, and products change… And so as the world shifts, I don’t know how many managers we need. I don’t know how many people in DevRel, I don’t know how many engineers we need. We just know at some point we don’t have enough of something, because that area of the product or the service is suffering. So how do we get more of it? Well, we add more people, because we assume that Mythical Man Month is going to take some effect of adding more people to it. But if I had enough people, I am gonna get more out of it. And so what is that balance? You just keep having to adjust it over time.

Let me slightly change the subject, but keep it on point. I think this will be interesting, because I would love to have your take on this. Your prediction of a major AWS outage in 2024 reminds me a lot of the predictions that were made late 2023, when Elon Musk bought Twitter, and laid off something like 70% of the company. And the predictions, which I was pretty much like “Makes sense to me”, was like “There will be a major Twitter failure, operational, technical, operating at scale. Twitter will be down and gone soon. Sometime this year.” And - I mean, this hasn’t been hiccup-free, by any means. There have been outages, and stuff. But on the whole, it seems it’s going okay over there in terms of twitter.com and the APIs. Or x.com now. I don’t know. Mine’s still twitter.com, but it says X…

It redirects. Yeah.

They haven’t fully done all the redirects, but yes, the platform is called x. Curious… And then a lot of the people that made the predictions were like “Well, these things take time, and so eventually it will fail.” I don’t know, I have no idea. I have no insider knowledge. I’m curious, just from a guy who understands systems at scale better than I do, what’s your take on that? Were they just way over-bloated in terms of headcount/people? They didn’t need all those people, you could run it on a skeleton crew, and that’s what they’re doing. Have there been outages that I don’t know about, like major ones that would make these predictions true? Is it just it’s a matter of time? I don’t know, what are your thoughts?

[01:04:05.06] I have no insider knowledge, but I had friends that joined teams at Twitter, and were eventually let go. And I do know that they got rid of a lot of features and products as part of this. I ran a community on Twitter; when it was twitter.com they had communities, and I had one for the tech jobs. I was like “Hey, let’s just start a community on Twitter for people to find tech jobs.” I don’t run it anymore. It doesn’t really – I think it still might exist. I don’t know, they’re not in the menu anymore. But they got rid of that. Spaces is still around, but pretty much everything else that I know they were working on just went away, including all of their safety, and moderation teams.

Right.

And so by removing some of those things - that’s a lot of people to do new products, to build new things. I also know the operations side of things did take a hit. A lot of times things do run for a long, long time. I have servers that I used to maintain that were on for years. We never had to touch them. The website worked. Once you get it working, the port’s open, I’m okay. And I can’t change it. I can’t scale up, I can’t do new things, but it works, and it’s there. And I know that through a lot of the API changes of like you have to pay for access, a lot of the – public visibility for tweets were turned off for a while… There’s a lot of things that changed that reduced the amount of just API calls and tools that were using it and reliant on it.

Certainly public API would be a huge reduction in overhead.

Right. Just on New Year’s Eve, the emergency response Twitter account lost its API access. It was like “Oh, we’ve used all of our credits. We’re done.” You can reduce your load by a lot, and just like “Hey, we’re just gonna cut back on a lot of stuff.” And maybe that was just something that you could turn off some servers, maybe you could scale down, because if your volume of calls goes from 100 million to 100,000 - guess what, you can turn off a good portion of your servers, or just not really worry about it anymore. It’s just like “We’ll just scale it, and then we’re fine. We don’t need to make changes to scale things, and we’re not going to hit new scaling limits”, because Twitter already was hitting those limits over and over again. Because there are these stages as you run software, where it’s like “Hey, our Postgres database was fine with just a single replica. And now at some point we need three, because we hit the scaling limits.” And if you’ve made it to that scaling limit, like “Okay, we already have the three replicas. This is gonna last for a long time. We don’t need to worry about storage, because our volume isn’t increasing as much…” You just keep hitting those stages over and over again, the up side of things. When you’re running the services, you can project out maybe a year. Like “If the growth is stable here, we can go until here, and then at some point we need to switch to something else.” And you’re constantly in that replatforming, or just redoing your infrastructure to make sure you can hit the next level of scale, because you can’t predict what areas are going to necessarily be the next bottleneck. But if you remove a lot of those scaling requirements, things just run for a long time. NGINX is really powerful. Disney Plus, the frontend of Disney Plus, when I was there, was just like a few NGINX boxes. I was amazed at how much you could do with a really small amount of compute. It was like “Oh, actually, no, that just works. And that just scales.” And that’s kind of amazing, if you just set up with some of those small things as best practice. You don’t need everything for everyone. It’s just like, figure out what scale you’re at, and just run it.

So I think that Twitter will last for a very long time based on the scales they were already at and the reduction that they’ve had since then. And I don’t think that that’s going to necessarily change. I do think that again, there’s a lot of cracks that have shown up over the time, of people not knowing who was responsible for something. I uninstalled the app, but the Twitter mobile site wouldn’t load on my phone for months. I just couldn’t go to Twitter. It was like “Actually, no, this is kind of nice. I’m okay if I only do it from my computer.” But it wouldn’t load on my phone anymore, so I’m like “You know what? I’m okay. I won’t go.” And that’s okay.

[01:07:59.05] So there are those weird edge cases that just no one probably caught, because there was a gap there. But for sure, they were at such a large scale that notching down on the requirements allowed them to kind of figure out where they needed to fill up, or region back to where they were.

I think that’s a fair point. I think that’s on point. The only thing that I’ve noticed recurring, where I’m like “This is just a failure in having enough people to actually fix this”, is the redirect service; you know, t.co. All the links get redirected through t.co. Specifically inside of the phone app, it just doesn’t always work. And a lot of times it just fails to load page, and you click back, and you click it again, and it works the second time.

Every single time.

Yeah. It’s not every single time for me, it’s probably like 80%.

But yeah, so there’s a situation where it’s like “This clearly could just get fixed by somebody”, because it worked previously. But they probably haven’t noticed, because they’re just on a skeleton crew.

And I do think that that’s one thing that as those small issues show up, they’re going to take longer to fix. When that was a problem before, you’re like “Oh, you know what? This will probably be in the next app release.” “Get the new app release.” Something was gonna change, someone was gonna scale something up. But with fewer people, you just – you can’t pay attention to everything at the same time. And so you really have to focus a lot more, which I think is exactly what Elon has been trying to do with X, is focus in almost a different direction. But you can’t focus on all of those little things that were just like “Oh, this is a bug that’s been bothering someone for so long.”

When I started Amazon, one of my main goals was to – inside your AWS account, if you click down on your account, there’s account number there. But you could never copy that easily. There was no copy button next to your account number, and I wanted to do that all the time. I managed dozens of accounts, and I always wanted to click that button to copy it. But if you copied it, you’d get dashes, or get the next line… One of the first things I did was I opened a ticket internally. I said “I need this to be a copy button, please. Just add copy buttons to my account number, my region, whatever it is.” And sure enough, that ticket got solved. I wanted that to exist as a customer. I am happy now, because if I’m using this again, I can drop that down, and I can click the Copy button on my account number. And it copies it without dashes, which is exactly what I wanted.

And in some of those little things, that fix happened pretty quick, because as a customer I asked my [unintelligible 01:10:08.27] for it, who asked someone else, but at some point they got lost in the “Where does he ask?” He didn’t know where to go. As an internal employee, I just spent an hour to find who’s responsible for this. I need this widget to have this thing, and find their queue of tickets. And I don’t care if it’s done now, but if it gets done, that would be awesome. And sure enough, I was able to find that ticket queue, put in that ticket… And it worked. And I was like, that’s amazing, to be able to be an internal employee and fix a bug that was affecting me as a customer, and then see that just roll out. And I’m like “Cool. We’re good.”

That’s awesome. Let’s talk Ship It. Let’s close it off with the quick Ship It conversation. Obviously, we’re back. “We are so back”, as the kids say… But we are going to bring Ship It back. We have some stuff going on, recordings happening… Maybe, Justin, your perspective on Ship It. Obviously, it’s going to be the old show, but it’s going to be a new show. It’s going to be your spin on what it is; of course, Adam and I still highly involved, and excited about the reboot, so to speak… But what can folks expect from Ship It, from you, from the people involved, and what’s going to happen over the next few months of this old new show?

Yeah, I mean, like I said, I loved what Gerhard was doing with the show. I loved the topics that he was already covering, and some of the guests he had on. And I want to continue that as well. I want to focus on that topic space. Everything after git push. What do we do? CI/CD pipeline, security scanning, system scaling, whatever it is, all the way from observability, SRE - all of that stuff’s involved in the not-writing-code side of things. Like, how do I debug a Linux server? Those are things that not a lot of places focus on, and I want to keep that focus of the topic of just shipping the code.

[01:11:55.20] Getting some great guests on, focus on areas that are running code; if you run production code in any sort of environment, I want to hear from people, because it’s not just a web service. And in my time at Disney and Disney Animation, we had almost no web services. Even the Disney Animation website wasn’t run by us, it was run by another – we just did rendering. We would render stuff, and we had some internal services. How does that look? Why is that different? People still wrote code, and we still did stuff. And I want to know what those different environments look like for people, because some running software is not the same thing. It’s not always just an NGINX with a backend app. A lot of places look really different, and there’s so many variables in that, that I want to talk to a lot more people, and give people more exposure to what it actually looks like. If you’re in a hospital, how is that different than a streaming service? Those things are very different environments, and have different concerns, and different needs for what they’re doing with their software and infrastructure.

So that’s the first thing - I want to keep some of those people coming in. I also wanted to have some things that – I love listening to podcasts in general, and the things that I want to hear… I don’t want just a news show, but I want a couple news topics. I want to know some things that are relative, or something that the hosts, whenever I’m listening to the show, I want to learn what they learned this week. Some of my favorite shows that I’ve listened to in the past always have something that is personal, that’s like “Hey, I did this thing, I solved this problem, and now I –”, whatever. It’s like a small thing. It’s not like “Oh, everything’s groundbreaking every week.” No. I learned how to make a dashboard on my Raspberry Pi. Here’s the thing I used, here’s an open source tool, whatever it is. And so I have some recurring segments that I have ideas for to make that fun.

I’m bringing on an awesome host [unintelligible 01:13:33.12] with me, because she has such a great, different perspective, and different experience than what I have, from running services in a different sort of environment, and with different constraints. So I’m really excited about that. And then just having those guests come on and learn from them about what products exist, whether they’re open source, or SaaS products, or just different ways of thinking about scaling things.

I used to also run a Twitter Space for reading white papers on infrastructure. I called it Paper Club, and it was a monthly “Let’s read a white paper, and then just talk about it.” It was like a book club for technical white papers. And that sort of deep dive into technology, and where technology comes from, has always been fascinating to me, because I can learn a lot about how or when I should use something based on what problem it solves when someone created it. Do you want to use Raft Consensus? Maybe, maybe not. What problem did they solve when they created Raft, that something else didn’t solve? And then you can maybe make a better decision about which tool is the right one for you.

So those sorts of deep technical topics are something I also would love to bring to the show, and have people come on and talk about them. I had on one of my Spaces Eric Brewer, writer of the CAP theorem, and we were literally reviewing one of his papers. Not about CAP theorem, but about scaling like AOL services. And he joined the Space, and I was blown away that I can just have access to someone like Eric Brewer on a Twitter Space. Like, are you kidding me that? That amount of shrinking of what the internet is is fascinating to me, where it’s just like “Oh, that was what was great about Twitter in the heyday of everyone was just there a lot of times.” And he showed up, and people were discussing it. And I learned a lot. I read the paper, we were talking about it… I said “Hey, why did you do it this way?” He’s like “Oh, because of this other constraint.” We didn’t even talk about the paper. “Here, let me tell you.” “Oh, that’s great to know.” I love those conversations.

I’m looking forward to having more conversation on Ship It around those things, about “Hey, this is what we said in the blog post about the outage… But here’s the thing that we didn’t say, or the constraint that we didn’t know about at the time”, whatever it might be. Those are all areas that I would love to talk about.

Well, we’ve got a link to a LinkedIn post that I think is the only place you mentioned it thus far on the internet. I looked on Twitter and I didn’t see it there, in case I missed it…

I posted on BlueSky. I do a lot more on Blue Sky now than –

Oh, you’re BlueSky guy.

I’d love to hook you up with our Mastodon account for Ship It. Anyways, go ahead, Adam.

[01:15:52.06] I was gonna say, we’ll link it up in the show notes, just because there’s an invitation there. There’s an email address there, all that good stuff, you can pile on the comments… Just come there and celebrate bringing back this podcast, and encouraging Justin and co-host to do great jobs with this podcast… And obviously, Jerod and I will be here along the way as well, but… We want to hear from the community. What can we cover on this podcast that is interesting? What should we really talk about around git push, and applications being in production, and keeping them up, and what we’re learning, that kind of thing? So pike on that post, share your comments, share your thoughts, email us if you have topics… I’ve already seen a couple emails come in. Jerod, I know you got a DM or two… So definitely action happening already in terms of what we can talk about with Ship It. So that’s awesome.

Yeah, we’re playing on starting recording as soon as possible, and I would love to hear more ideas for topics around – if you’re running software, I want to hear about it. Because it’s fascinating how similar and different a lot of these environments are.

Yeah. I guess one more plug, because I just have to - it’s what I do - is we have a couple of sponsors already for this podcast. Sentry is thinking about bringing it onto their 2024 plan. And I know [unintelligible 01:17:01.00] already has committed to some episodes of Ship It. And so if you are at a company that can benefit from reaching more developers that Ship It reaches, we want to sponsor this podcast, so reach out and say hello. Too easy.

There you go. If you are a Changelog++ listener, don’t worry about it, you’re gonna just start getting fresh, good Ship It episodes right into your feed. If you are a Master feed subscriber, don’t worry about it; you’re also going to just get Ship It episodes. If you aren’t either of those, there’s no better time to subscribe to Ship It. We’re at ShipIt.show. There you’ll find an email subscribe, and of course, links to all of the popular platforms, as well as a direct RSS feed link for you to pop into your favorite podcast app. So do that… If you loved the old show, definitely give this a listen. Hopefully you’ll love it as well. If you didn’t like Gerhard’s British accent - well, we have a non-Brit here… So maybe it’s a good time to give it another go. Of course, Gerhard will be coming to a Kaizen near you very soon…

Very soon, yeah.

So he is definitely still very much involved. He just does not have the bandwidth for Ship It right now. Thankfully, Justin has the bandwidth. He also has the expertise and the desire to bring this awesome show back with us. So we couldn’t be more excited, Justin, and looking forward to what you come up with.

Yeah, and thanks to you for the opportunity. When I reached out, it was just – I was catching up on episodes, and I was like “This needs to exist. There should be a show about this.” I really appreciate both of you replying to the email and letting me know “Hey, this is where we’re at, and this is where we want to go with it. What are your ideas?” Because having that flexibility and being able to rely on the audience and the network you’ve already built for it… I already know people that are out there that want this content. And being able to continue on the great work that Gerhard has been doing is just – I’m blown away by being able to do that and not starting from scratch… Because that is so hard, to just do that for so long, starting from scratch. And you all have built such a great network here, and I wanted to be able to lean on all the community and people that are involved already with Changelog.

Yeah. That’s awesome. Let’s go from one to two.

Alright. Thanks, Justin. We really appreciate you telling your story with us, and like we said, looking forward to what you come up with.

Yeah, thank you.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00