Ship It! – Episode #114

Deploying on a Friday

with Michael Gat

All Episodes

Michael Gat joins us for a look back on mainframes & why sometimes deploying on a Friday IS the right thing to do.

Featuring

Sponsors

Coder.com – Instantly launch fully configured cloud development environments (CDE) and make your first commit in minutes. No need to traverse README files or await onboarding queues. Learn more at Coder.com

NeonFleets of Postgres! Enterprises use Neon to operate hundreds of thousands of Postgres databases: Automated, instant provisioning of the world’s most popular database.

RetoolThe low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog

Notes & Links

📝 Edit Notes

Chapters

1 00:00 This is Ship It! 00:52
2 00:52 Sponsor: Coder.com 03:38
3 04:34 The opener 08:03
4 12:52 Michael ran everything 01:34
5 14:26 Defining a mainframe 01:13
6 15:39 Modern data centers vs mainframes 01:45
7 17:24 Old is new again 02:46
8 20:10 Offsite servers 02:42
9 22:53 Most companies suck... at infra 02:13
10 25:18 Sponsor: Neon 03:33
11 28:58 Why push on Fridays? 02:58
12 31:56 Fortnite update hype 03:20
13 35:16 Saturdays after deploying 00:48
14 36:04 User spike 02:54
15 38:58 Stock exchange as paper 02:27
16 41:24 Michael's training experience 06:02
17 47:27 Leaky extractions 03:14
18 50:41 Where to reach out 00:18
19 51:10 Sponsor: Retool 02:53
20 54:12 The closer 01:30
21 55:42 Where to find white papers 08:34
22 1:04:16 Outro 00:52

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Hello and welcome to Ship It, the podcast all about what happens after you git push. I’m your host, Justin Garrison, and with me as always is Autumn Nash. How’s it going, Autumn?

I’m just…

Hey, that was amazing.

I love your [unintelligible 00:04:48.27] and now I don’t remember what I was going to say.

It’s just big.

It is ginormous.

I have some bad news, Autumn.

Oh, no… What now…?

I have some bad news. I listened to a part of one of our podcasts and…

Oh, no…

But all my podcasts play at 1.2 speed. It doesn’t sound like us. [laughs]

What do you mean? What does it sound like?

It’s just faster… I’m remembering for myself to speak a little slower.

I don’t think my voice should be used at 1.5 speed, because I probably sound like a chipmunk, and now I’m just… I’m very afraid. I’m very afraid.

Most of the podcast apps that I use don’t change pitch. It’s just the speed side of it. Like, you’re not chipmunks. I don’t do 2X. If someone out there is listening to this at 2X, I am sorry. And how…?

Your face looked so apologetic.

I could just imagine myself… I get excited, and I’ll talk a little fast, and then I don’t know how someone would make any sense of it. It doesn’t make sense at regular speed…

I live excited and talking too fast, so I just… The poor podcast people.

I’m wondering if this is one of the shows that people listen to at a 1.1 or 1.2 speed.

Yeah. They’re like, “We’ve got to do regular speed for these people.”

“This is too much.”

How do you listen to your own voice? Because I’ve only listened to like –

I don’t, usually. I try not to.

It hurts my soul. Sometimes I just appreciate you guys for listening to my voice and putting up with me so much…

One thing that’s worse than listening to your own voice is editing your own video. Because then you’re listening to yourself and you’re watching yourself, and you’re like “Oh, man…”

That’s why I can’t do TikToks as much as you do… Because you’re always like – and I made like five TikTok videos and I’m like, I can only look at myself and listen to my voice and then I’m just like “Why are you so awkward? Why are you like that?”

Yeah. For me it’s like “Why’d you touch your face again? You didn’t have to touch your face.”

I love that that’s the only problem you had. I was like, there’s five different things that hurt my soul about watching my own TikTok videos. I’m just like “Ohh…” And then each one of them has a different problem, and I do another take, and I think they get worse as I go… So they kind of get better, and then they start getting worse, and then I’m just like…

Yeah. It’s difficult. Editing is very hard when you’re doing it to yourself. And that is audio and video both have problems.

What’s worse? Do you think it’s video or audio?

It depends on what I’m doing with it. If this is a monologue audio, that’s going to be difficult to listen to. If it’s like short clips of video, I have fun with it because I just pick out the most random little one or two seconds.

Yes. I love random ones, especially when we did the Scale thing and we were just like – because it was not of me most of the time. And it was just like, “Look how fun this is.” That’s true, though.

Yeah. But I’ve finally just finished… I literally edited a video and posted it on my YouTube last week for a video I recorded over two years ago.

[07:56] Dang, Justin. I thought I was behind.

No, it actually aligns with this podcast as well. I have a hard drive video on my YouTube channel that I recorded from 2011, 2012, that I posted last year. I did it as like – I wanted to do this when I worked for a website called HowToGeek.com. I was one of their editors and we wanted to do video, and I’m like “Hey, I have these ideas –”

How do you get these jobs where you just get to do nerdy projects, and have fun and record them?

I read HowToGeek.com, and they had an editor that left, and they just posted that like “Hey, we’re looking for a writer. We want them to be Linux-focused.” And I’m like, “This sounds awesome.” And I was already writing for my personal blog. I just wrote stuff. I always encourage people, go write on your personal blog. Go have a personal blog, Run a website. You will learn a lot of things.

How do you have time for all this? I’m so tired.

I don’t have three kids.

Good point.

But yeah, and over time, you just spread out a little bit. If I look back at the almost 20 years of writing at my blog, I averaged one post a month, which is incredible to me when I look back on it, but also I think like “I didn’t spend that much time doing it.” And like, no, no, no, most of them, a lot of the early ones were like one or two paragraphs. It doesn’t matter. It was before Twitter. This was like, I needed a place to put something, and it was short form… I just put it out there. And then I ran a newsletter for two years, which was weekly… And so that caught me up, because I posted all those to my blog, too.

I just got tired listening to you talk about this. I feel like I do too many things as is, and I don’t know about this… We’re gonna have to just summarize my Twitter, or something. I don’t know.

Well, and yeah, my wife definitely tells me I have no chill. I cannot relax. I am always doing something. Even when I wrote my book…

I don’t either, but I can’t tell whose no chill is worse. Your no chill isn’t – I don’t know how to sit down, and I always have a new hobby that turns into another job, or interest, or volunteer role, or… Then the whole board chair, of like – no, it’s just too much.

There’s a difference between ADHD no chill, and focused no chill. I have focused no chill. I can focus on something for long periods of time, like writing a book, or building a course, or doing that stuff.

Oh, I just figured out the difference between us. Mine’s is ADHD no chill, and yours is focused no chill.

Yep. I think that is a big win for me, is being able to accomplish larger things [unintelligible 00:10:24.20]

And you take naps, and I just drink obscene amounts of coffee…

Okay, I’ve been taking less naps, since I’ve been drinking less Dr. Pepper. Just saying. It was probably some correlation here. But anyway, that’s not what we’re talking about. Everyone’s here to listen about software.

Are you drinking less Dr. Pepper and eating more candy, Justin?

No. I’m just trying to lose some weight. Just less sugar all around. That’s the goal.

I’m both so proud of you, and also like, what are you going to lose? Your beard?

No, I already did that and you didn’t like it.

It was horrible. Don’t ever do that to me again.

I tried to lose weight by shaving my head, and it wasn’t as much weight as I expected.

I was traumatized, okay? Please don’t ever do that again. Like, you need a hat, and like a beard, and I just… I need warning. I might need therapy after that. It was traumatic.

So on today’s show, we’re doing another retro episode with Michael Gat. And Michael has done a lot of things in his career, and I had an amazing time talking to him at Scale. He had a really good talk about cost management, which will be in the show notes… And when I was talking to him, it was just more and more things that I was like “Wow, I want to learn more from you about how things used to work”, because all these patterns just come up again. And I love it. So I invited him on the show, and this episode is loosely about mainframes, but also just about what it was like running software in a financial system from 30 plus years ago… And I love it.

I just love learning about how things were done, because it always kind of cycles through.

Yeah. It looks very familiar again. And there was a batch processing at the end of the trading window for the stock exchange… It’s still just like, we’re just doing batch jobs on a bunch of computers now. And it’s just like, it used to be people; it used to be the night shift would come in and like fill out the forms, and now it’s computers.

And even when you get a more efficient, new process, it was influenced by a part of the old process that sucked, that we needed to automate, or do – you know what I mean? So they’re all very relatable.

Yeah. So let’s go ahead and jump over to that interview with Michael, and then we will talk to you all after.

Break: [12:39]

Alright, so thank you so much, Michael, for coming on the show today and talking to us all about – this is another retro episode, and I’m excited for it, because you were running software and infrastructure a while ago, right? We were talking – we actually met for the first time at Scale, right?

Yes, we did. All three of us were at AWS at the same time, but you know how big AWS is…

Yes. We never ran into each other at all. And then we just started talking, and I learned so much from your talk at Scale, which people should go check out. I’ll put that one in the show notes. But I did want to talk about what you used to do. So describe to me what infrastructure or software you used to run.

Well, basically everything that has existed, from roughly the late ’80 to present, or early ’90 to present, is something that I have probably touched, at least peripherally.

I love it. Michael’s like, “I helped build the universe.”

Yes. So the universe. If you were at Scale, you heard the concluding keynote - and I’m spacing on his name, but certainly a gentleman who’s older than me, who just went through his history, going back I think to the ‘60. Well, I can’t go that far back… But what I can say is when I started, there were still commonly mainframes in the mix.

Now, the reality today is if you went back to the same types of institutions - I worked for them, mostly large banks and brokerages - there are still lots of mainframes in that mix. There have to be.

Can you define a mainframe for people that don’t maybe know what a mainframe is?

That didn’t work in technology in the ’80.

Okay, a mainframe is… And there was actually a really good interview - not on the Shi pIt channel, on one of the other channels - with a mainframe guy about a year ago. I can get you the details on that and you can add it to the notes. He really explains what it is… But it’s where computing started. It’s these - well, back in the day - huge boxes.

You could literally open a door and walk into one of these computers, that ran in massive data centers, with surrounding cabinets full of disk drives and other accessories. Today they run on 19-inch racks, just like everything else on the planet. But the idea behind it was you had a large central processing unit that was designed to do one thing and one thing only really well, and that was crunch numbers. Everything else is kind of outsourced to peripherals, subcontrollers, all sorts of subsystems, that do everything else so that that mainframe can focus on just crunching numbers.

What is the contrast if we look at a modern data center, versus a mainframe?

Well, the contrast I would come up with is the mainframe was designed from the start to be high availability, multi-tenant, designed to do basic business functions - business or scientific functions - which as I said, is crunching numbers. Everything else kind of fell by the wayside. When these things started, you were feeding punch cards into the machines as your storage, literally; storage on cardboard for computers. I don’t think we’re going to go back to that. Although tape is making a huge comeback…

[16:21] Tape is dense, right? Tape can store –

Well, the thing is tape is currently - because we haven’t done anything with it for so long, it’s not dense. So we can increase the density. What tape is, is it’s huge. You know, a disk drive… I had one over here a few minutes ago.

Oh. A disk drive is – you know, that’s the surface area. I’m holding a two and a half inch, but… A tape is about three and a quarter inches wide, and I think a thousand meters long, or 1,200 meters long, something like that. So your surface area is huge. Horrible for random access.

Yeah. [laughs] Linear access only, please.

But if you’re doing sequential access, reading a big data set, it rocks. If you’re doing backups - amazing. One of the things I was thinking about in this interview was a lot of what’s old really doesn’t get old. It just sort of changes.

Well, and it’s interesting… People will be like, “This is a new thing, and we have to do it with everything.” And then all of a sudden we’re like “No, we’re going back to the old thing. We’re going to do the old thing now.”

Well, we’re always doing the old thing. And I’ve got another example of this that I’ll bring up later. But to go back to - a mainframe is a big computer designed to centrally crunch numbers, sequentially. What happened, I guess, a little bit before my time, when the IBM PC came out, the first business personal computer, was the idea was rather than centralizing everything, we’ll distribute it. We’ll give everybody their own little personal workspace to do everything. And yeah, they may have to share it at some point. It may have to come together in a mainframe that combines everybody’s work after the fact… But everybody gets their own space. And then networks came along, and we realized that we could do this whole parallel thing.

And today we’ve gotten to the point where we now have clusters, and tools like Kubernetes, and all sorts of middleware, that have finally allowed us to build a high-availability, multi-tenant, resilient infrastructure that kind of sort of does the same thing that mainframes did in the 1960s. So what’s old is new again.

Yeah. If you squint at it, it all kind of has the same patterns, but with different…

Yeah, it’s always the same thing.

Well, actually, when you really squint at it, the thing that has the same patterns is stuff like – you’re beginning to see things like Oxide, which… They haven’t quite gone after the mainframe, but they’re going after what IBM used to call the mid-range computer, which was sort of a smaller version of that, that a normal company could afford.

Right. Sort of a rack size computer. They ship it to you, it’s an appliance, essentially, and you carve it up and use it as you want.

Yeah. And one of the key things about it, they own everything that’s in it.

You don’t have to run all of the various pieces that you would need to run. And I’m not going to list them here, but you know very well, on top – if you want to build a Kubernetes cluster and have it run well, there’s a ton of stuff you have to install.

You outsource the decision-making, right? You’re just like “Actually, I’m going to go buy their stack, because they have opinions, and their opinions sound good to me. I can just make fewer decisions.”

And that is exactly what you did if you bought an IBM mainframe back in the day. IBM owned everything, right down to the if there is a failure on a piece of the mainframe - and you could have localized failures. You can hotswap CPUs on a mainframe. [laughs]

[20:10] I think that’s kind of where it’s going with cloud too, because people were like “Oh, we’re going to run our own data centers.” And they’re like “No, the cloud’s better.” Now people are owning infrastructure, but it’s being managed in someone else’s data – not infrastructure, but they’re running their own servers and racks, but it’s managed in someone else’s data center… Which is weird to me, because then it’s like, I feel like all the good things about the cloud is that you don’t have to wait to like get all the equipment, and then set it up, and then it goes and gets old… And I’m like, I don’t understand how that option has gotten so popular, you know? It’s all these different degrees of ownership, and… I don’t know, that’s weird.

Well, what’s interesting about the way it used to work back then – well, still does, for those who are running these beasts… It’s if you have an outage, or if you have a problem, your CPU or a disk or something needs to be hotswapped, we ran our own data center, but our IBM techs had access. They had a key card that got them in. And sometimes they would just show up and tell us “Oh, we need to fix something.” “You do?” “Yeah. The computer told us.” So they would know before we did, because they truly did own the stack.

That’s interesting.

I remember one of my first times as a sysadmin that I was on call, and I got paged because a hard drive failed. And I was like “Oh, cool. It’s a hard drive failure. That’s fine. It’s redundant. No big deal. I’m just going to show up and replace it on Monday.” It was a Saturday, or something like that. And I got a call 30 minutes later, and someone was like “Hey, I’m at your door.” And I’m like “What are you talking about?” And they’re like, “I have your hard drive.” I’m like, “I don’t want that hard drive right now. It is Saturday. I am out, doing something.” They’re like, “No, our SLA says you get a replacement within 45 minutes.” I’m like, “Physical replacement?” They’re like, “Yes. I have your hard drive. I need to plug it in.” I’m like, “Ohh…” So I drove in, I’m like, “Okay, poor you for driving [unintelligible 00:22:08.11] I didn’t know what that meant.

That is wild, that somebody drove you a hard drive…

They have local stores, and they’re like “I’m here to replace this hard drive, because our alert said it failed.” I’m like, “Oh yeah, no, this is how this works when you’re responsible for the hardware, and you have these SLAs and agreements, and like, 45 minutes is 45 minutes, it doesn’t matter the day or time.”

That is true to today. At AWS I dealt for a while with some networking hardware, and I can tell you that the companies that provided that networking hardware had similar SLAs, where whatever country we were in, they would show up with a spare, within a certain amount of time.

I feel like we’re in the era of like “Mess around and find out”, you know, and people are so into saving money, and I’m like, “Do you know how much work these big companies put into like those SLAs, and meeting and maintaining those things?” And everyone is just like “We’re going to do it ourself because we want to save two bucks.” And I’m like, “But is that two bucks worth it for the amount of people that you need–” You know what I mean? Like, I don’t think it’s so much that everyone needs to have a cloud, but I think they should – when you’re taking costs, don’t forget the human cost, and the time… You know? There’s so much more to it. And these big companies - they’re so big that they can do that. They can have the people to do that. They can make sure all of those racks are working. So I’m just like, I hope we’re taking the context of the DBAs, and like the people that are maintaining all that stuff when they’re doing these cost sheets.

Most companies suck at running infrastructure. [laughter]

Michael’s out here keeping it real.

Well, no, it’s true. Running a data center is really, really challenging.

It’s so much more work than I think people really, you know…

And it’s not like it’s a great career path, being a tech, running around hot-swapping things in a data center.

[24:09] Unless you’re gonna to sell that skill set, right? And that is the case – people forget that like AWS, and Google, and Microsoft, they are the biggest on-prem users in the world, right? And they just happen to sell that on-prem to you, and they are good at running on-prem.

But that’s what I’m saying. I feel like people – and even just a DBA for a database, right? People are like “We’ll manage databases and we’ll do our own.” And then they’ll pick like a very complicated database, and they don’t remember… For instance, for Cassandra - like, that is a skill. There’s a lot of big companies that were using Cassandra back in the day, right? And now people are either migrating, a few still use it, but that is a skill that is very hard to find. You know what I mean?

Specific. It’s very niche.

Yes, it’s a very specific skill. There’s very specific things you need to know about scaling it, and like not having a node die… So when people are just like “I’m going to save some money, and we’re going to go back to this”, I’m like “You have to find the people that know how to do that, you know? And do it well.”

Break: [25:13]

Michael, going back to mainframes real quick… One of the things when we were talking about this episode was something you said that was surprising to me, and I wanted to learn more about. You said that with the mainframe, you always deployed on a Friday, right? Like, Friday was deploy day.

I’m not sure if that was so much a mainframe thing, or the fact that mainframes tend to be used in big financial institutions, where there are huge downstream effects to anything you change. You report your trades wrong to the stock exchange. Well, there could be multi-million or billion-dollar impacts to that. So you want to get it right. And one of the key things, at least back in the day - I don’t know how they do it today - was “We want to be able to recover.” Well, the nice thing about weekends in that financial world is everything’s closed on the weekends. It is still today.

Except for the ops people.

Exactly. If you screw up your processing on a Friday night, you have two days to recover. Nobody wants to work weekends. But the choice of “We’re going to be shut down for a day, because we screwed something up and we’re back recovering it”, or every very rarely somebody has to work a weekend - well, when you look at the monetary impact, yeah, some changes you do ahead of the weekend to give you that chance to see it work in prod, make sure it’s working right in prod, and recover from those downstream impacts if you need to… Because again, we’re not talking about a social media site of “Okay, just roll it back”, and for 15 minutes some people got a bad version of their timeline. It’s for 15 minutes trades were being reported wrong. That’s a very different world. I suspect that if I went back to that world today, it would not be all deploy on a Friday, because 90% of what they’re doing is not that core processing. 90% or more of what they’re doing is the various consumer-facing apps and other things that you would be familiar with from a Fidelity, or Schwab, or whoever it is you use… And that is stuff that you don’t need to deploy on Fridays.

In general, more software could benefit from office hours. Or like business hours, actually shutting down and saying “Hey, we have a planned time that the business is closed.” Do you remember when websites used to close?

Websites used to be offline at nights, and they’re like “Oh yeah, that person has turned off their computer, and it’s just not around right now, so the website is down”, and that was fine.

[31:56] I think we’ve gotten so used to 24 hour access to things, though… There’s that study that if something takes two seconds longer to load, they’ll go somewhere else and buy it. But it’s also wild to me that, for instance Fortnite - my kids play Fortnite - and they’ll shut them down for 10 hours for an update, and they’re so excited about the update. And the game is so big, they can do that, and it’s fine.

Well, it’s interesting that in some cases like that you almost have to wonder, do they really need to shut it down? Or are they almost trying to build the anticipation for that? You almost are kind of building in excitement, and making it into an event where otherwise it would just be “Okay, the new one’s there.”

I do wonder that, because some kids will wait up till like 3 AM and play, but I can’t – I want to wonder if it’s like because it’s such a huge world change to a game, and then they’re having so much online play, and they’re onboarding all these new people to these changes to see if it’ll keep up, and if it’s easier to launch it at 3 AM unless people are playing… But also, they do get very excited. There’s new skins, new things to buy, and like a completely different world change. So it is interesting, is it because they need the downtime, or is it because it’s just going to make them more money?

Well, first of all, if anyone that runs the Fortnite infrastructure wants to come on the show and talk to us about those updates, email us, shipit [at] changelog.com, because that would be fun.

Yes. No, honestly, I find it so interesting, because so much of the world and SLAs are built on being highly available, because we’re gonna miss money. But honestly, I wonder, does it make them make more money? I just find it interesting that they can turn this whole thing down.

Well, and when you’re doing live updates – because a game server’s like you have to wait for someone to quit, so you can turn off the games, or all this stuff… Like, it would probably take them a month to roll out updates without just saying “Everyone disconnect during these hours”, and then they can go through and replace everything.

Well, that’s what I’m saying… But think about just the whole business model. For one, it’s a free game. It’s genius. But the skins are so expensive. Fortnite and the way that Epic has done it, they’ve almost revolutionized how they do a lot of the gameplay. And just watching the way that they’ve broken so many rules, and how they roll out things… Think about it, they’re not changing and updating a little bit, like Minecraft. It is a completely different world, with different things, and different – you know what I mean? So it’s just a very interesting concept to see, like, they’ve broken so many rules on like how we do gaming in general, and they’ve been so wildly successful. I would love to know what actually is underneath all of that. Also, I think Fortnite’s got components built in Java too, which makes it more interesting.

[laughs] I don’t know, I can say there have been times where I’ve been sitting there, waiting for sundown on Saturday in New York, so I can place my order with B&H Photo… Because they do shut down the ordering for the Jewish Sabbath.

That’s kind of interesting. Okay, so also for –

Yeah, for religious reasons they won’t take money sundown Friday to sundown Saturday.

I love that though, that they’re still successful, and they can take time off… Because that is something that we don’t see in corporate America often.

So when you would work Saturdays, or you’d deploy on Fridays with the mainframe, what was like your Saturday like? Were you working the whole eight hours, or like the whole weekend? Or was it just like checking in and making sure it’s not on fire?

Typically when we would deploy for Fridays, the stuff that was used, the online stuff, the stuff facing a user would be used on Fridays. So we’d deploy early Friday AM, before everybody came in, so we could monitor during the day. So it might mean an early start. But you would usually know very quickly, “Does this work? Are there issues?”

[35:55] Well, that’s also interesting, because you’re getting people to use it, but you’re not completely endangering your whole week. So that’s –

Right.

Although Fridays are usually heavy bank days, right? Like, payday’s there, and it feels like you’re gonna get a lot of – like a spike in users. Like “I need to go cash my check.”

How do you deal with that?

Well, remember, I was dealing mostly with stock market, and it was mostly institutional. And the big institutional traders - I don’t even know that they look at calendars other than earnings days, and the days they get their bonuses. So –

I also wonder if that changes now too, because the way – okay, so we have Robinhood and all these very different ways to trade now… You’re not calling in to your broker, like it was before… So I also wonder if that would change –

A lot has certainly changed.

And I think buying and selling the news is so much faster now… I wonder if the patterns of how people invest and do their trades would change the way that that would work.

Well, what I would remind you is then as now most of the volume that gets traded is by big institutional traders, not by you and me.

That is true.

And the stuff that you and me, the three of us, anybody were likely to know, unless in your time at Amazon you maybe knew Jeff, or maybe Andy - the type of trading we’re likely to do is not the way institutions trade.

That makes sense.

So very different world. But it certainly has changed. And right down to the point that a lot of the systems we were building, which were systems to do things like take the information from a trader who has just completed a trade by talking to somebody at another brokerage, on a phone, and then enter it into the system - that doesn’t happen anymore. So a lot of that has kind of fallen by the wayside, which is why I think they’re probably not deploying everything on Fridays anymore. I’m guessing there is certain core stuff, particularly the backend processing - the stuff that happens after a trade - that they might be favoring Fridays… But the timelines have shrunk. We just went to what’s known as T+1 settlement for securities in the United States, which is to say that everything completes related to the trade within one day. Within one day the securities move, the money moves, everything is done.

When I entered the business, I think it was still T+5. So you had five business days for the various brokers to confirm with each other that “Yes, this is the trade that we did with each other. And these are the instructions we’re sending to our related banks and clearing houses and other institutions to make sure that on the T+5 day everything moves correctly.”

What did that look like? Because I’m assuming in the ’80s, maybe early ’90s the stock exchange went through a lot of changes, just with technology.

Yes. We went through decimalization… Back in the day they traded stocks on eighths.

Oh, wow, that was the smallest you can get, was eighth.

Yeah, that was the smallest you can get. You’d buy a stock for – you know, there was no [unintelligible 00:39:19.18] A stock would trade at 12 and an eight, and 12 and a quarter, 12 and three eighths. [laughs]

When we talk about a stock exchange today, it’s like this weird system of moving components. But back then, was there a moment in time when the stock exchange was paper, and there was like “This is the stack of paper that says that today’s stock exchange is this stack of paper, and we’re going to go through and we’re going to spend the next five days…”

There was. That’s way before my time.

Right. But I feel like ’70s-‘80s timeframe would have been the digitization of that. At some point they’re like “We need to store this in some sort of database.”

[40:01] Stop calling Michael old, Justin.

No, I’m fascinated by –

[laughs] No, I am. I am, first of all. But I’m sure there was. When I talked to some of the old time people who were like “Yeah, at the end of the day we’d just take this stack of”, what they called trade slips, where the traders would write what deals they did, and with who, and some clerk would sit there and interpret this, and enter it into a computer… And then you’d go through this five-day back and forth of everybody confirming that what my trader wrote down on the piece of paper to hand to a clerk, who may have typed it in wrong, confirms/aligns with what yours does… And by the way, what I have on file for this customer’s banking information aligns with what you do. That’s why it took five days, especially since a lot of this information, again, before my time, but I heard the stories, was really “Okay, at some point on the nightly processing cycle somebody would produce a tape”, and somebody would physically take that tape and run it down to some other institution on Wall Street, and they’d read it in, and that was what you did as business for the day.

It’s kind of cool that it shows how technology and making it more efficient kind of influenced the regulations a little bit more…

Oh, yeah.

But also, this just proves that ageism is total BS, because people always are like “Oh, well, things are changing, and things are new”, but because so much of old is really the most stable and best way to do things, even when you advance, it’s so many of the other principles and having that knowledge makes you better at it. It makes me so mad that so many people with like what’s going on right now are having a hard time getting jobs, and people are being judged by their age, and I’m like “Those are the people that you should want.” They’ve seen it get real, they’ve seen it change, they’ve seen so much… And it just makes me mad that it’s not valued.

I also think that we do everybody a disservice by assuming that everybody’s gonna come in with the knowledge they need, and we’re not in need to train them… Which was “Well, let me –” I was gonna say, the reason I know so much about data centers - which I do - is because back in the day I worked on mainframes, but they weren’t teaching mainframes in school. Even when I went to school, they were not teaching mainframes in school. I was working on PCs, I was working on Linux boxes… I had never seen a mainframe. They had to train you. So I actually joined something called a training program. They actually had these things. What did you do in this particular training program? …and I can’t speak for any other… Well, for six months you spent four hours a day in a classroom, or doing exercises in a classroom or in front of a computer. And for the other eight hours of the day you’d pay your dues. How did you pay your dues? Well, they’d put through you something you could do, which was doing stuff in the data center.

That’s awesome.

So I spent – in my case they shortened the training program by two months, because I actually came in with a CS degree, a lot of people didn’t, so they accelerated me a bit… But I spent four months working as an operator in a data center. I know how to mount a tape to a datacenter. I know how to mount a disk drive.

But that’s so underrated though. Now that now that we’re trying to automate everything for efficiency and for money… But if you don’t build it, how do you fix it? If you don’t go through all the troubles of trying to figure out, and the failing and experimenting, you will not be as effective. So the next generation of developers - are we going to break them by overly automating and having things do everything, you know?

[44:08] And building with oversight, right? Because that whole training - it’s an apprenticeship. When I look at – my neighbor’s an electrician, and he’s like “Yeah, I have this year-long apprenticeship.” And I wish there was a way – because a lot of people are just like “This is guarded secrets now.” And it’s “Well, it’s not, really.” Don’t give them the all the customer data, but like, there’s plenty of stuff that someone can do.

But it’s also wild though, because they just expect you to come out of college with all this knowledge, and college does not teach you Linux, or Kubernetes, or infrastructure.

Well, and knowledge is not experience.

Exactly.

I can have the numbers in the books, and they’re like “Oh–”

And theoretical and actually doing computer science are two wildly different things.

Yeah. And to this day I think that four months that I spent in the data center is some of the most valuable time I spent, because that was the time where – you talked about getting paged at two in the morning… I was the one who was sending the pages. I was the one who was getting the calls from the barely awake developer who had to fix something. And I was the one dealing with their frustrations with whatever operational issues they were dealing with, that I had to help them fix.

But that big picture of the empathy, and knowing how it works, and knowing how that system relates makes you so much better at what you do. The people that come – and they’re customer service engineers, and they work in like the helpdesk, and then they become engineers… You have such a leg up, because you’ve seen it break.

Yeah. And you see how it operates, and you understand the limitations on the operations.

Yes, that is so underrated.

And when it’s all kind of handed to us as “Oh yeah, this is abstracted away in the cloud”, we really miss some of that nuts and bolts of “Well–” There can be physical limitations to things. I remember when I was having a discussion in one of my roles at AWS, where I was dealing with a lot of networking issues. I’m on the load balancing team. And I had to finally tell this person that “No, we have a limitation called the speed of light”, because US East 1 is from one data center to another could be 10 miles apart, connected with fiber. Well, there’s a minimum latency that you will have across those two data centers, that is defined by the speed of light through fiber, over whatever that 10-mile distance is… And no, there’s nothing I can do about it.

Well, you just need to storyboard out quantum entanglements.

[laughs] Exactly.

Oh, my goodness…

And I am sure there are people at AWS working on that, but… [laughs]

Even just not even that, but like – I think platform teams are wonderful, but just the fact that people don’t set up their environments, you don’t understand your whole bigger picture of your environment, because you didn’t have to do anything with it, you know? So the more we abstract infrastructure and all these things, which are great, because you’re more effective/faster, but you also are less effective of fixing it and knowing what happened.

Thank you so much, Michael, for coming on the show and explaining all about the old things that are kind of the same, and they’re new again. And I’m still getting over hot-swapping CPUs. You just dropped that casually…

I’ll send you a link. But I do want to conclude on one story, or on one little tidbit that I was just thinking about yesterday as I was looking at some of the latest stuff going on in S3. S3, of course, being Simple Storage Service. But if you use it, you know it’s not really very simple. In fact, using it correctly is extraordinarily complicated. And it occurred to me that “Well, nothing here has really changed.” Because you can use S3 very simply, but you won’t be using it optimally.

[48:19] Well, back in the day on a mainframe I had to write stuff to disk. And to do that, you had to know the physical details of the disk, and tell it how to allocate space. Do you want to allocate tracks, a single track around a single platter, or a cylinder, which is a stack of tracks in the same – you know, lining up with each other? Did you want to make them contiguous? Did you not? The more efficient you made it, the less likely you were to find space in the system. You had to know all this weird, arcane stuff.

When I tell people this today, they tell me “Well, that sounds really weird that you would need to know all that.” But I was thinking, “Yeah, well, 30 years from now people are gonna think it’s really weird that I had to think about how big my puts and gets were going to be.” And whether I wanted to do parallel puts and gets, or any other things to make it more efficient… But there’ll be some new level of bizarre, arcane stuff that people have to do to manage storage. So yeah, things changed, but nothing changes.

Yeah. Even like five or eight years ago, I was doing a bunch of stuff in S3 and I didn’t know they have unique primary keys, because it’s not a [unintelligible 00:49:43.28]

Oh, absolutely.

And I wasn’t doing that. And I didn’t know how to use this. I was using it “simply”, and then I was hitting bottlenecks.

You need to distribute the keys to keep them from –

Exactly. I had a terabyte data up there, and I’m like “Why can’t I fetch it? Why is this slow? It’s S3, it should be the cloud.” Like no, you need to distribute this over keys. I’m like “Oh… Well, that’s not simple.” That’s putting the architecture of the system on the developer, in exactly the same way the mainframe was putting the architecture of the hard drive on you as a developer. We just push that off, and the architecture underneath the thing is going to influence how you can use it.

Yeah. And that has not changed, and I don’t think that’s ever going to change.

Nope. Leaky abstractions are always leaky, so it’s always gonna happen. It’s just a different level of how low – like, I don’t care about the hard drive allocation in S3, but I do care about the architecture of the system and putting it together. So yeah…

Well, thank you so much. This was a lot of fun. And if people want to reach out to you, I know you have MichaelGat.com.

We’ll put that in the show notes. We’ll put a link to your talk and a couple other areas, and…

Okay. And I’ll send you a couple links related to some of the things we’ve talked about, and you can put that out there as well.

Awesome. Thank you so much.

Okay, thank you.

Break: [51:02]

Thank you so much, Michael, for coming on the show and talking to us all about some of the history on infrastructure, and in hardware and data centers. I love this stuff. I read a lot of history, like historical technology… I think partially because I feel like I missed out. There’s this like this historical FOMO for me.

If you guys could see the cutest excited little boy face that Justin is making right now when he talks about how he missed out on FOMO… It’s great. It is pure, like, just happiness. It’s great.

I love hearing the history of how these technologies stuff… What was the – Broadband. Two words. Broad, as in like a group of women, band, like a group that they did… Broadband was a great book about how women shaped a lot of technology internet. And they actually go into hey, some of the first mainframes – and the term computer was a woman’s job. They were computers, and then…

Will you remind people of that, please?

…which we will say again and again. Women are computers, alright? They always have been, and they are amazing at it. And even a lot of the mainframes, the early calculating computers, women were the people that worked on the systems, partially because they were small enough to get into the gear systems and replace things inside of the system, but also because they were wicked smart, and could debug these systems like no one else. It’s just like “Oh, this is how this thing works together. Here you go, we replaced that gear with a smaller one or whatever, and now your calculations work.” Fascinating. Anyway.

Today’s outro - I didn’t come up with a clever name, I’m sorry, because this is just a response to listeners’ questions. I was asked multiple times in the last couple of weeks about where we find white papers, and what are some of our favorite white papers. And so in today’s episode, I just want to talk about some of the sources where I have historically found white papers. And I love finding these white papers… Because to me, they talk about why the thing was created, and then we see later how it’s used. And those are not always the same thing. And seeing what problem a system was solving when it was created is a very different thing than actually seeing the application of it… And so I like going back to the source of things, when someone’s like “Oh, you need to use Cassandra.” I’m like “I’m gonna go find the Cassandra white paper”, and see what problem they were solving, and then go read why they solved this problem. And then I can tell you, “Maybe we do or do not have the same problem. And we’ll figure that out if we want this.”

That was my favorite part about being a solutions architect, was you need to figure out the use case for things, and when things need to be used… So when you read other white papers, and you kind of reading why they made the decisions they made when incorporating or building something, it helps you to make better decisions on what to use, or maybe when you want to build something, or when you’re advising people to build things, or when you’re designing a system as an engineer; when you read enough about the other decisions other people made, it’s almost like when you’re learning from the people that came before you, so you make less mistakes… Because things always sound great in theory, until you try to build something and you’re like “Ugh!” That’s where they like ran into that problem. I will avoid it by not doing a, b and c, you know?

And really, a lot of it comes down to just like not misusing tools, right? Because people will get a tool and like “This tool works for everything. I need everything to be Kubernetes.” I’m like “No, you don’t. That is a different problem set, that you’re not trying to solve right now.”

Can we talk about that? Working in databases, and especially NoSQL databases, people are like “I just want to use Dynamo for everything”, and I’m like “But why?” It’s the hammer…

Because of infinite scale, right?

Yes, it’s the hammer. People want to use the hammer for everything, and sometimes need a screwdriver, and sometimes you need [unintelligible 00:57:46.16] And if you take a hammer – like, a hammer is a great tool, but using it in a situation it’s not qualified for, you are going to break something, or ruin something, or make it very expensive for no reason.

[58:01] So the main place that I used to get a lot of papers from was just the quote-unquote FAANG companies. Facebook, Google, Amazon, Netflix, and Microsoft… Actually, Microsoft is thrown in there. Apple is the other one, is the other A.. They all have research divisions, and they all publish papers. And so many of them also are running large-scale systems, they have datacenters, or just complex use cases… Some of them have invented technologies that we use today. Go read the S3 paper. It’s a good read, and you can see why was it created. What was object store solving? And these are all things that are available on their research – we’ll have all the links in the show notes for if you want to find one of those research divisions… But usually, what I was doing was I would find one paper that was interesting, and I would find the author for that paper, and I would go search on their system, or filter by that author, and say “What else did they write?” Because they probably worked on adjacent systems. And I’d kind of branch out from there.

So I remember there’s a couple that were like at Google, and one at Amazon that I was like “Oh, I’m just gonna read all their white papers”, because I find they worked on interesting problems. So that’s one way. And all these sites have good filtering. All the FAANG companies have really good filtering for white papers.

I really like MIT, and a lot of the big schools’ research papers… But they tend to be more theoretical, so I prefer FAANG, because FAANG is solving a problem, usually, and they’re solving a problem at scale… So their decision-making in how went wrong and how – it is really precise. Also, Meta is really good. I know people think Meta is evil, but meta has made so much open source stuff, and they are very good at talking about what they did, in public. Cassandra, I think, was born out of some stuff that Meta did. And just like the original Dynamo paper - it’s fire. I learned so much about databases from white papers.

Yeah, the fact that a lot is white papers are coming from a product perspective, right? It’s not an education, it’s not a pure research environment. This is like –

Because sometimes theory - you get so deep in theory that you don’t understand how to use it. Or maybe you understand, but it doesn’t actually work. That’s like when people say best practices, but those people that made the best practices have never used it in real life,a nd you’re like “Bro…”

They’ve never practiced it.

It’s like “Well, your angle should be this.” I’m like “Why?” “Well, because…”

Not just that, but sometimes it’s impossible, and you’re like “That’s…”

Yeah, going for the optimal solution is not always possible.

Exactly. And I really appreciate – Dynamo has come so far from the Dynamo paper, so they’re really not exactly the same… But just seeing the progression is just amazing. Also, I learned how many things were built on Postgres by reading white papers, because there’s so many databases that are these new shiny databases, or very well-used databases, but they’re just layers of abstraction on top of Postgres. So it’s amazing. Postgres runs the world. Don’t sleep on Postgres.

Yeah. The other place I was going to point out that I have found a lot of the research papers that I enjoy was because I worked at Disney animation for so long, we had a research group… And again, it’s one of those kind of like practical papers, where they would publish papers on the research they were doing for every movie. So for like Frozen, there’s a paper about how to render snow and light refraction inside of snow, which isn’t transparent, but it’s not a solid object. And so how does light get refracted inside of snow, was actually like they studied the physical properties, and then “How do we reproduce some of that in a computer?” And then how do we make that research practical, to actually ship a movie in time? That’s like the progression that you have to make.

That is so interesting, because I don’t think I would have ever thought about how to make snow… And it’s weird, because I don’t know where I will use that information, but I really want to read about it, you know…

Yeah. And there’s a lot of – like, gaming companies have it… I read a lot of the Disney and Pixar research papers, because I was fascinated with that, but the gaming industry is also the same way. Many of them are like “How do we reproduce this physical thing in the real world, in a computer?” But not perfectly. Because you have to have all these trade-offs about “Oh, I don’t have enough RAM to–”

But see, that’s what I’m saying. The trade-offs are the most underrated part of it, because that’s where you learn so much.

Yeah, for sure. And the last place I was gonna point out that I get papers from is acm.org. It has a bunch of journals. They run conferences that I enjoy. SIGGRAPH is a gaming animation sort of conference, and so there’s a lot of stuff that comes out of that from talks… But also, I watch a lot of talks at conferences online, on YouTube. If they’re recorded on YouTube and I think it’s interesting, I usually will find one that’s interesting at a conference, I will watch it, and then I will look at that person’s sources. Because a lot of them will be educational, or research-based, and they’ll say “Oh, we’ve built some of our foundational stuff on top of this paper.” So I’ll go find the paper, and then from that.

Same with books - if I’m reading a book, and they have a paper listed of like “Oh, we’ve got this research from here”, if I’m interested enough, and I feel like I have the time, I will go ahead and find that paper. And my flow for like finding papers - I just find the PDF, and I put it in Apple Books, and I read it on my iPad with Apple Pen. And I just add notes to it, and that’s where I typically consume them.

I used to print them out, and at lunch I would go with a sharpie marker, and I would go sit outside, leave my computer or my phone at my desk, and I would just go out there with a white paper… I don’t do it to remember them, I don’t do it to like search through them again, I’m not looking at my notes, but just sometimes I want to like jog my memory, of like “Oh, what did I like out of here?” And I’ll find some highlights, and I’m like “Oh yeah, that was the key point that I thought was interesting.”

Thank you so much for listening to this episode. If you have people you would like to have on the show, or topics you’d like to have us talk about, please email us at shipit@Changelog.com. We do have a bit of a Plus Plus thing. Autumn and I go on a tangent here, talking about not just research papers, but just in general engineering blogs, and kind of how that relates to Dev Rel, and engineering, and some thoughts we have there. So stick around for the Plus Plus content, and we will talk to you all again next week.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00