Changelog & Friends – Episode #12

You call it tech debt I call it malpractice

featuring Kris Brandow from Go Time

All Episodes

Go Time panelist (and semi-professional unpopular opinion maker) Kris Brandow joins us to discuss his deep-dive on the waterfall paper, his dislike of the “tech debt” analogy, why documentation matters so much & how everything is a distributed system.

Featuring

Sponsors

FastlyOur bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.ioThe home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Typesense – Lightning fast, globally distributed Search-as-a-Service that runs in memory. You literally can’t get any faster!

Notes & Links

📝 Edit Notes

Chapters

1 00:00 Let's talk
2 00:38 Moar Go time friends!
3 00:58 Our GopherCon roots
4 02:17 2nd-gen software engineer
5 07:32 You don't learn through DNA
6 10:10 Always willing to read
7 12:53 Misconceptions of waterfalls
8 17:09 No appetite for rigor
9 22:12 Jerod justifies his nothing
10 23:49 Kris resigns from Go Time?!
11 25:52 Success hides problems
12 28:12 Startups are too slow
13 30:02 The tech debt analogy
14 33:42 Hard to quantify
15 40:26 Documentation matters
16 42:24 Tech malpractice
17 51:24 Auto-Librarian
18 54:53 Librarian tenure
19 56:50 Learning the skillset
20 58:58 The 'why' doc
21 1:01:44 The 'why not' doc
22 1:05:05 Too busy documenting
23 1:08:38 The value of worthless words
24 1:11:46 Change in the last 40 years
25 1:16:45 Announcing KrisOS?
26 1:21:22 What are distributed systems
27 1:22:53 How computers work
28 1:26:59 This is Kris's journey
29 1:29:53 A true full-stack engineer
30 1:32:37 The PCI-e rabbit hole
31 1:34:43 Malpractice is fightin' words
32 1:35:25 The spectrum
33 1:35:46 The malpractice umbrella
34 1:37:39 No time to make a shorter one
35 1:38:16 Coming up next

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Well, we are here with our friend, Kris Brandow. Kris, thanks for joining us on Changelog & Friends.

Thanks for having me.

I love this, Jerod… Getting to spread the host love even, you know… We’ve had Mat on, but never Kris…

That’s right. So Kris is a Go Time regular. He’s been on Go Time for a little while. I remember when I first met you, because it was at GopherCon, digital edition. I don’t know for if you were physically there, but I certainly wasn’t… Were you there?

I think yeah. Because that was in - what, the winter of 2020, right?

Something like that.

Yeah. So I think I was there. I think I gave a talk at that GopherCon, so…

Okay. So we did a show. Mat hosted it, I was just the producer, the guy behind the guy… And the show was with you, and Angelica Hill, and Natalie Pistunovich, and Mat hosted, and that was when I first met you. And we needed some more panelists at the time, and so I was kind of there, listening, scouting different people, looking at talks, and just wondering who could be a good Go Time panelist… And literally, I just liked that conversation so much, and the three of you, that afterwards I think I just was like “Would all three of you like to be Go Time panelists?” And they all said yes, and we all lived happily ever after. So that’s how Kris came to be a Go Time panelist, at least from my perspective. Does that sound right? Does my memory serve me?

Yeah. I think I have actually been on an episode as a guest over the summer…

Oh, that’s right.

And then - yeah, you invited me to be on the show for that GopherCon, and then you were like “Yeah, y’all wanna be on?” and we’re all like “Sure. Yeah, that sounds like fun. Let’s do it.”

Yeah. And it has been fun. So one thing I’ve learned about you, Kris, through the last couple of years, is that you really shine in the Unpopular Opinions segment of the show. I might even consider that like your wheelhouse. Sometimes I think Kris is there just because he’s got a couple queued up, and he wants to unload his unpopular opinions… But another thing that’s interesting about you, which I’d love to hear more about, is I think that you’re from software engineering background, right? Like, your parents, or your dad, at least… You have some of that in your background, but you didn’t want to do that, you wanted to be a writer. Is that right?

Yeah, exactly. So both my parents – well, they both studied computer science in college. That’s actually where they met. And my mom was actually the first one to start professionally working within the CS field. She was a programmer for a decade. And then my dad got his master’s, and then he’s been a software engineer ever since. He does a very different type of software engineer from me; he does very low-level stuff, so he’s writing a lot of C, a lot of ADA, that sort of stuff… And my mom – I remember my mom told me she quit programming because she spent some absurd number of hours, I think it was like 16, or 30 hours or something, trying to find a problem with the compilation of a program, and it turned out to be a missing semicolon somewhere, and she was just like “I’m done! Like, absolutely not!!”

The last straw…

Linters… Or something.

Even with that, she wound up going back and doing not CS stuff, but where she worked for quite a while was a psychiatric hospital, and she managed their entire phone system that was running on DOS for a very long time.

So even as she left programming, she was still within the wheelhouse… But as a result of that, and growing up around all of that, I was like “I don’t want to do this as a career…”

“No semicolons in my life, please…”

Every family gathering would be like “Oh, you’re gonna go into computers, like your dad?” and I’m like “No. I’m gonna be an author. That’s what I’m gonna do. Building computers is fun, video games is fun, but I want to be an author. Writing is my passion, writing is what I love.” And my dad gave me this wonderful advice when I was getting ready to go to college. He was like “I’m paying for college, so you go to college, and you choose a major for something that you love, and you will figure it out afterward.” My dad has lots of great advice, but that’s one of the best pieces of advice he ever gave me. So I was like “Okay, I’ll go.”

I went, I got my creative writing degree… After a bit of shuffling, I wound up with a second major of broadcasting and mass communication, and really fell in love with the world of not just writing, but also audio and video. The original plan for me going into college was actually to become an audio engineering major, but that major wasn’t really ready at the time, so I kind of did this shuffle and wound up getting what was effectively an audio minor.

But yeah, so I got my writing degree, and I was kind of building websites and things along the way for a while… And then it really ramped up end of my junior year, going into my senior year, because I actually got elected as the basically IT person for the television station that I worked at… Because we had this student-run television station that was like completely student-run; no adults in the room, really. So it was all of us maintaining this multimillion-dollar enterprise…

It’s kind of fun.

Yeah. I was like “We have this $4 million studio that–” Our broker was an alumnus of my college, so he had donated a bunch of money with NBC to build a studio; we have all this nice, shiny digital equipment, but we still managed all of our field equipment checkout with paper forms, and I was like “This is stupid. I don’t like this.” So I sat down over the summer and I built an equipment reservation system using Drupal; good old software. There was like a module available that basically did exactly what I wanted, so I went in, I configured it, set the whole thing up, and I said “Okay, great. Now we can do equipment reservations online.”

[06:00] And my college caught wind of this, and they’re like “We’ve been looking for a system like this for years, and we cannot find one that’s affordable. And you clearly just built one. Can you build one for us?” And I was like “Yes!” And my advisor was like “For money.” I was like “Yes, for money.” So that was my first paid software engineering gig.

It was building and configuring this system for –

Is it still in use?

No, it is not. They use it for a little bit… There was actually a snafu where I wrote a whole user manual, but they never gave the user manual to the person that was operating it, so he had no idea how the system worked. It was just a very miserable experience. I went back and saw him, and he’s like “This thing is not working”, and I’m like “I’m pretty sure I wrote down how to do this in the manual.” He’s like “Manual? What manual?” I’m like “Oh, God…”

And that’s when you learned the acronym, RTFM, for the first time.

Yeah…

[laughs] “What manual…?” That’s too bad.

I built that, I built a whole bunch of other websites, and then I was like “I kind of like this website thing. I like this Drupal thing. I think my options are be a writer”, which was fun in college, but professionally is a lot more challenging to do… Or go be a broadcast engineer, which would have been a lot of fun. A bunch of my friends were broadcast engineers… But I was like “This website thing… This website thing is fun. So I’m gonna keep doing this.” And then that’s when I also realized the irony of me spending so much time being like “I’m not gonna go into computers like my dad”, and then winding up in computers, like my dad.

That’s hilarious. When did you learn the practice? Did you learn during that Drupal project? Or did you kind of pick up things from your parents over the years? I mean, you don’t get it through DNA. You don’t learn to program through DNA. That’s my point.

You do not. It is definitely through a lot of environmental… So I think I built my first computer with my dad when I was like eight or nine. This was the ‘90s, so it was like – I actually had two computers, when all my friends had like one computer for their whole family. So I was already in this ecosystem of being around computers, and building them, and understanding them… And then I still have the first programming book my dad bought me, but I think I was like in middle school when he bought that for me. So I started programming then. Didn’t particularly understand it, and I took C and C++ courses in high school. I did pretty well on them, but programming didn’t make any sense to me. I was always very confused about “What use is this?” I remember being like “Where’s the GUI for this? There’s no user interface. How does any of this work?” But it was in college when it actually finally clicked, when I was building that equipment reservation system, and some of the other websites I built, that I was like “Oh, this is what you can do with this technology. Now I understand how to use and how to do it.” And then from there, it was just like “I’m just going to dig as deeply into this space as I can.”

And so really, it wasn’t till like after I graduated college that I really started to write software. even in all that time I was building that equipment reservation system, I think I built like six or seven other websites for various entities and people. I was only doing like Drupal themes. I had never written a module; I didn’t write a module till after I graduated, and really learned PHP till like after I had graduated. And it was just kind of this mad dash forward from there… Which also caused me to have this really weird career. I’ve said it to a whole bunch of people, that I’ve never been a junior software engineer, because obviously, when I was building those websites, it was just me; there was no one else around me. No one had any clue what I was doing.

Right.

And then over that course of time, I got so much knowledge in Drupal that my first couple of jobs, I was like the Drupal guy. I was the expert in this technology that we were using, and I got hired specifically because I knew so much about Drupal. So my whole career has been a lot of like architecture and principal level design and implementation stuff, like designing entire systems, fixing entire systems where there is no one to go to, no one to refer to, and I’m having to like mentor people about “This is how you build that Drupal module. This is how you should structure your code, this is how you should do these things.”

[10:08] Yeah. Well, that all sounds like it makes sense to me just knowing you for the brief amount of time that I have, because another thing besides the fact that you thrive on unpopular opinions is that you do seem to like to go very deep on specific subjects, specific technologies, even into specific conversations, like continuing to drill down deeper and deeper, which I appreciate. As a guy who’s – I’m a generalist, I always kind of have been… I have certain areas where I go deeper than others, but I’ve never gone all the way down a rabbit hole on a specific thing, because I’m always trying to get stuff done. I’m kind of pragmatic in that way. But because of that, I don’t do things like read papers. I was kind of razzing you because you have this blog post that came out of our Unpopular Opinions segment about the waterfall paper… And I was like “Wait, people read these papers? I thought you just write papers, and you publish papers, and then we reference papers. But do we read them as well? Because Kris does.” But I’m looking for the cliffnotes always. Have you always been just like willing to put in the time to just sit there and read, long sessions of reading as well? Or how do you go about acquiring the deep dives?

Yeah, I have this deep desire to understand how things work, and my brain will not rest until I have like a satisfactory understanding of whatever it is I need to like learn. And I also have this deep appreciation for nuance, so I can’t take simplistic explanations of things and be like “Oh, okay. Yeah, that’s fine.”

With the waterfall paper, I was specifically – one of my friends actually pushed me to go read the whole paper. Because for years, I’d known waterfall is not how we describe it, it’s something that’s very different. The paper doesn’t really say what I said, but that was actually – when I first read the paper, it was really just to satisfy your curiosity around where did this diagram come from? What is this diagram? And I saw the other one in the paper, I’m like “Okay, that’s enough. This is an iterative development model”, and I like went on, so I was deep in some other rabbit hole, and this happened to be a tangent.

But I was talking to my friend a couple months ago, and I was talking about software development, and software engineering process, and he was like “Oh yeah, this thing is like the program design person in the waterfall paper.” And I was like “Program design person? What are you talking about?” So I went home after that and I sat down and read through the paper, and my mind was just like “Oh no, no, this is one of those things where everybody has gotten it wrong about what this paper says.” All of the summaries, all of the stuff, all of the kind of industry knowledge, I’m like “This paper does not say what people claim it says.” And that’s where it kind of just went from there.

Yeah, I wonder how that happens, besides - you know, I joke that people don’t read papers. Obviously, people do, I’m just not usually one of them. But it does seem like as we institutionalize, so to speak, knowledge to be passed down, which is what books and papers really are for, we’ve failed seemingly over and over as a software engineering industry to receive the passdown, and that’s why we find ourselves reinventing things from the ’70s; good ideas get lost, they get refound again…

One of the reasons why I like podcasts, by the way, is because it’s kind of a more approachable way to cross-pollinate ideas across the industry. Not quite as thorough, of course, but at least to give a seed that somebody can go and put the time in to dig deeper. But Why do you think that we all think of waterfall so differently? I grew up in my career in the Agile era, and I learned about waterfall in college. I remember being so bored, because I didn’t have any context to apply it back then. I’m like “I’ve never written a program, and you’re teaching me about waterfall software methodology.” I was like “Okay, I can see the diagrams, but I have no frame of reference for how this would fit into a real thing.” And so then I quickly forgot it.

[14:05] But in the Agile era, it was just kind of like the sense is “Waterfall bad, Agile good.” That’s right, right? Like, waterfall bad, Agile good. Why? Well, waterfall - you try to design it all upfront, and you pass it to the next silo, and they do their thing, and they pass it to the next silo, and they do their thing, and then at the end you built something that you didn’t want in the first place. And with Agile we’re recognizing that we have to be able to change things as we go. That’s kind of a basic reason why “Waterfall bad, Agile good.” But I didn’t realize what you just told me a couple of weeks ago about the waterfall methodology actually being an iterative process. I thought it was a “1, 2, 3, 4, 5, we’re done.”

Yeah, that’s like the biggest misconception about the paper, and what Royce is advocating for… Yeah, it’s iterative, but it’s not like it’s – you know, when I first was thinking about this, I was like “Oh, perhaps it’s buried somewhere in the paper that it’s iterative, and he kind of talks about it as this linear, one step, then the next step, then the next step thing.” And it’s on the first page that he says it’s iterative. It’s literally – before you even get to the diagram showing that traditional waterfall with no backflow, the page before that says “Oh, this is an iterative process, and here’s the diagram showing the backward flow of steps.”

So it’s not like it’s buried in there, it’s just, I don’t know how it got missed so much. I guess I have an inkling of how it all got messed up… Because there’s a couple of other papers that – there’s probably tons of papers I could have used, but Wikipedia is usually a pretty good source for info. And so there were two other papers that are kind of talked about in the history of waterfall. One of them is where the term waterfall actually got coined from, where the person’s like “Oh, this is Royce’s waterfall method of development.” And the other is just describing a development process, -ish.

The last paper I mentioned, the one describing the development process, the author writes an editor’s note for that paper in the 1980s, and is like “I don’t know where it went wrong, but people got the idea that software is something you manufacture, instead of something you design and build and engineer.” And that seemed to be where part of the issue came from, because he also mentions this divergence of programmers from engineering methodology, which is also what was happening in like the ’70s, and ’80s, and ‘90s, as we kind of get toward more of like the Google culture of engineering. And he attributes that as part of the problem as well, because it’s like “Oh, these people want to be special. They don’t want to be seen as engineers, they want to be seen as something else. And that’s led to all these problems, because now we’re trying to manufacture software, and you can’t really manufacture it… But also the people who are in charge of making sure that we do things well don’t want to follow any of this history, they don’t want to follow the engineering methodology. They feel–” I mean, he doesn’t say this, but it’s kind of implied that they kind of feel like it’s too constraining for them; they want to go do – they want to be more free in this way they build software.

Right. Well, a couple of things… First one is that, as a software developer or engineer, whatever you want to call us today, and let’s just call it in the 2000s era, I do not have an appetite for rigor. I’ll just confess. I really don’t. And even looking at this, I’m like “This looks like not what I thought it was.” And when I say “I’m looking at this”, I’m referring back to Royce’s paper, and specifically the steps in the process. He does say that there’s kind of two different kinds of projects, and at a certain scope size, you don’t need any of this stuff. And so I’m like “Maybe I’d just like to live right there and let you all do the rigorous stuff. There’s plenty of work to do in the world.”

[17:50] But I don’t think I’m all that different than anybody else when I say I just don’t have an appetite for the rigor, and I want to just iterate, and change… And like you said, I want to be more free as an engineer. And I feel like that’s typical. What do you think, Adam? Is that a sentiment that you hear people say? Or am I weird like that?

I don’t believe you, Jerod. [laughter] I think you do have an appetite for rigor. I think you lead yourself to believe wrong. I think that you are – you do make pragmatic choices, I believe…

But I believe that you have done it so long that you forget that you have rigor. It’s just built into the way you operate.

Maybe.

I believe that you operate with rigor, but you’re also very pragmatic when you apply your effort. And so in your effort you’re rigorous, and you probably have an appetite, despite what you say…

Well, I appreciate that.

So that’s what I think. And if you represent a population, then I think that others might have the same.

But you might also think that I’m pedantic, and Kris would say I’m nowhere near pedantic enough, because that’s another one of his unpopular opinions, is we think we’re pedantic, but we’re not as pedantic as we need to be as software engineers. So maybe you and I think I’m rigorous, even though I just said I don’t think I am.

I think I have certain things in which I apply rigor, but I don’t have an appetite for it. It’s not the kind of thing where I was like – like, I believe a lot of what Kris wrote in the piece about the waterfall, and your overarching theme is “We need more documentation”, Kris.

As a writer, I find that hilarious. But as a non-writer, I’m like “Yes, Kris is right. We need more documentation. And somebody else should do that.” [laughs]

Somebody else should do that…”

I mean, there is this element of rigor I think being a little bit different than what people think it is. And I think that actually is what has led to a lot of the problems we have…

Okay, here we go.

…is that people think that rigor is like “We have to write all of these documents, and they have to be in this very, I guess, in a way, academic-like writing, and we have to be very formal in how we’re describing things and how we’re designing our systems.” And there is a place for that level of very precise documentation, which is like what Royce was getting at. Royce was building spaceships, and stuff; rockets that get us to space.

For sure. And those places need that.

Right. You need to document things very, very well, because there’s very low margins for error. And I think a lot of people are not in that space. So the type of documentation you need isn’t as much a “It has to be strictly formal, so we make the fewest mistakes possible.” It needs to be more of “Do you actually understand the problem that you’re trying to solve? And should you be solving the problem you’re trying to solve?” And like answering those base-level questions. And that’s kind of fundamentally what the engineering process is about, is like “Do we have enough information to actually proceed forward?” Which is one of the things that Benington in his paper highlights, is the top-down methodology that they’re all used to as engineers isn’t about “You write this doc, and then you write the next doc, and then you write the next doc, and then you write the next doc, and then you go build the software.” It’s “Okay, we’re trying to write this doc. Do we have enough information to actually write it? Or do we need to go do a prototype and experiment and acquire more information?” That’s the thing. I think that when we say “We need rigor”, that’s what we need. And I think that’s something that you, Jerod, do quite often. I think you do find information that you use the information to decide “Should I do something? Should I not do something?” That’s your whole motto of “Keep the main thing the main thing.”

And I think a lot of software people wind up trying to build stuff without enough knowledge of what it is they’re trying to build. This leads to the estimation problem that we have, where it’s like, people cannot estimate how long something will take. And we say “Well, this is because estimation is so hard.” It’s like, estimation is not that hard if you have the proper information. But if you’re sitting in a room and one person’s like “This is going to take two days” and the other person is like “This is going to take two weeks”, you need some way of knowing which person is right. Or if they’re both right, why they’re both right, or if they’re both wrong, why they’re both wrong. But no matter what, if you have that type of discrepancy, that’s definitely a sign that you need to go do some more exploration, some prototyping, some investigation, something to acquire more information to pull out of that realm of just optimistic engineering judgment. Because they’re probably both wrong, and it’s probably gonna take you two months, so… You probably miss a lot of things in your thinking.

[22:08] Take your number, multiply it by three, and now you’re good to go. Well, I think in a sense - so one thing that I do a lot is I do nothing. So when I do nothing, I pitch that as me being lazy, or a procrastinator, and stuff. But I think as I observe, in my experience what I’m actually doing is information gathering, and delaying decision-making until we absolutely need it… Because I have found that when you make a decision too quickly, like you said, Kris, you’re probably going to be wrong. And the longer you can put that decision off… Now I understand, some people have – here’s where it hits the road, is like you’ve got budgets, and timelines, and pressures, and bosses, and demo days, and all this stuff where it’s like “I can’t delay any further. I have to make the decision today.” But anytime you can afford yourself more time, and just let the current thing be what it is, even though you know it’s not optimal, really feel the pain for as long as you can. I mean, and sometimes I feel for so long… I mean, on our Kaizens we laugh. I’m like “Nope, still not ready. I’m still not done. I wasn’t ready for this.” I’m really just procrastinating, so to speak, but I am information-gathering. I want to make that decision with as much information as I can have… And depending on what we’re doing, what the feature is, what the project is, I don’t have all that much time. I can’t say “Well, I’m gonna spend 40 hours on this.” So I’m just going to like slow-play it and make that decision with as much information as I have.

Now, if I was full-time engineering like you are, Kris, I would obviously be putting the time in to make the decision by the end of the week, or by the end of the quarter, or whatever it has to be. But I think that in that way, it seems like I’m being lazy, but in a sense, I’m actually just being rigorous. So I like the way that that sounds, because now I sound better…

This has definitely been one of my experiences working in software engineering organizations. There’s this sense that you need to move a lot faster than you actually need to move. And it’s actually one of the things I’ve been thinking about a lot lately, because I’m in the process of trying to build my own thing, try a new career adventure…

Oh, really?

Yes, I’ve decided that I actually want to be a writer, and that writing is my true passion, and I think I can help the world a bit by doing that writing more so than I can with software engineering… Especially in distributed systems, which we can talk about in a little bit.

Is this your resignation from Go Time, or what is this? [laughter]

This is not a resignation – in a way. So an interesting little side bit… I’ve recently realized that I don’t see writing as strictly only typographic writing, which is writing on the page, and publishing, and all of that. I see writing in the sense of like the suffix of graph. So you actually have typographic, but you also have photographic, audio graphic, video graphic… So I consider all of those forms of writing that I want to engage in… Which is a long-winded way of saying “I like being on Go Time, because it’s a form of writing for me.”

There you go.

But as far as like actual software engineering is concerned, I’ve found myself continually walking into environments where everybody would be like “No, we need to move fast. We need to move fast. We need to make decisions.” And it’s just like “Y’all, this is a multi-billion-dollar company. We’ll be okay if we take an extra three days to figure this out.” Or “We’ll be okay if we take a month to get this path right, or get these details correct.” But there’s this sense that you always must be moving quickly, and you always must be moving forward… And I think there’s also this sense of loss aversion to an extreme degree.

I’ve been in environments where it’s just like nobody likes this process. Why are we still using it? We all know it needs to change. We all know it’s not giving us what we want.

Right…

People are like “Well, we use this process, and now we’re successful, so we should keep using this process.” And I’m like “That is filled with fallacies, and we need away from that, and away from that thinking.”

[25:52] Success hides a lot of problems. You think this process brought me the success, but oftentimes the success came from somewhere completely different, and the process just happened to be there along the way, whatever it is. It’s correlation and not causation.

I’ve seen that a lot as well. I’ve done it as well. I’m like “Is this actually what’s producing the goodness? Or are we just doing this thinking that is producing the goodness? Because actually, it’s holding us back. What if we could ditch it and still have the goodness, and have a better process at the end of the day?”

Everyone’s just chasing things, right? In the case of a multibillion-dollar company maybe less so chasing, but they got bitten by that “We should be a startup” bug. “We should have many startups inside of our large Exxon-style company”, for example. I know people within Exxon, and they operate that way. Exxon is big, but their division or their small troop is very startup-esque. And so they feel they have to impress, and there’s some sort of urgency that is perceived by their bosses, higher-ups, or whomever, that makes them move at a pace that’s just not necessarily, like you said, Kris.

And then literally, in startup land they are in a rush, because they haven’t built the thing. They’re in a rush to reach product-market fit in most cases, because only then should you apply more financial pressure, i.e. more funding to grow that model or whatever it might be; only then should you apply more heat to the fire, fuel to the fire, or whatever it might be. There’s this rush to build, and I think that’s sad that software engineering has to feel the major brunt of that effort, really, of that concern… Because it’s software that’s really eating the world.

It also becomes a bit ironic in that there’s not really – like, the guidance is kind of all over the place. It’s like “Well, we need to move really fast”, but if you look at all the historical sayings and tropes we have, it’s like “Measure twice, cut once.” If you have X amount of time to do something, spend 90% of it planning, and then the last bit of execution.

Abe Lincoln. He was successful…

Yeah. I mentioned that recently. Yeah.

Yeah, you did.

And he said “If I had–” I think it was six hours, five hours… Some number of hours to cut down a tree, I’ll spend the first 90%, sharpening my axe, essentially. I want to have a sharp axe, so that when I begin to cut, it’s a precise and measured cut.

Yeah. And this is kind of the same sort of thing that people like Rich Hickey put out there as well. It’s just like, if you have a very small amount of time, you need to be very diligent with your planning, because you have very little room to mess up. And I think that’s – I guess this is probably an unpopular opinion, because I always talk about unpopular opinions…

Oh, my gosh. Here we go…

Yeah, here we go.

…I think startups actually move far slower than they should be moving, because they refuse to admit that planning is required, documentation is required if you want to actually be able to sustainably move quickly. Everybody rushes into that first, like “We’re moving so fast for the first six months”, and then six months later everybody’s like “We don’t have any idea how any of this code works, and every time we touch it, we introduce seven bugs. What are we going to do?” And it starts kind of eating away at their company as they try and push forward, and they just start hitting wall after wall… And either they wind up getting acquired, or they go public, or they just disappear, because they just can’t maintain the speed that they had in the beginning. And I think a lot of times people just don’t understand why that is, or are just confused about why is it so hard to actually do something that’s supposed to be simple.

An example of this is at a previous job it took so long to spin up a new microservice that people started repurposing microservices that we were decommissioning. So it was like “Oh, we have this service, we don’t need it anymore” and people would be like “Oh, we’re just gonna rip the guts out and put the new thing in, because it takes so much effort to spin up a new repo and get it all working, and all of that.”

And this was on a greenfield project that was less than two years old. And we could not get a new microservice up and running… Which is kind of ironic, because the point of microservices is that they’re supposed to be small, and you should be able to get them spun up real quick, and all of that… But we just couldn’t do it.

Wow. So what you’re saying, Kris, is that the acquired a whole bunch of technical debt…

[30:07] I would say…

I’m pushing his buttons here. Technical debt is our next topic… This is another one. See, another thing that Kris likes to do - as a writer, as a graph person, he likes to think deeply about words, and their meanings, and analogies. And so him and I have that in common. We like to bikeshed the meaning of words, and “Is this a good analogy? Is this a good name, a bad name?” And you took issue with the tech debt analogy, which is why I was pushing your button there.

I think that the tech debt analogy - we should get rid of it, because I don’t think the thing that we’re talking about when we’re talking about tech debt is debt. I think it’s more akin to malpractice, and people are being irresponsible. Because I think most of the time when tech debt gets brought up, it’s like “Oh, we’re just gonna skip writing the tests, or skip writing documentation so that we can get this thing out the door faster. Or we’re just gonna code this in like a really messy way, so it gets out the door faster.” And I’m like, “That’s not debt. That’s you not doing your job properly. Please just write the comments, and the dots, and the tests.“It’s part of the job. You can’t cut out vital things. Or if you do, then you’re committing malpractice, and we should call it that, and that’s why I think it’s gonna be unpopular. So it’s not tech debt, it’s malpractice.

Which I think is a great analogy. And so one of the things I like to do is disagree with people…

The tech debt analogy is a good analogy, and the non-tech debt, the malpractice analogy is bad?

Well, there’s some nuance here. Kris, explain maybe your side of this stance, because I think it’s shifted slightly, because we’ve had some Slack conversations, but…

Yeah. I agree with you on one aspect, Jerod. I think tech debt is a good analogy. I think where I draw issue is that it is just an analogy. It’s a thing you use to help people understand something when they’re not in your contextual domain. The problem is we’re using that analogy not with people who don’t understand our contextual domain, but with people who do. So it’s like we use it with each other to try and explain our codebases and their state to each other, and that’s where I think it really starts to fall apart… Because if you’re not careful with an analogy - and I think this is something you learn as a writer, is you can only stretch an analogy so far before it just becomes ridiculous and it starts falling apart. And I think that’s basically where this tech debt analogy is, where it’s like people that use it want to use it to explain so much of software, so much of how you should make decisions about what choices you make, and abstractions you use, or how you build things… Like, I was listening back to the tech debt episode that you all did…

Yeah. With John Thornton.

Yeah. And one of the things that stood out to me was this whole “Oh, well, it’s tech debt to build this abstraction and only build the front end of the email service, and then not build the backend for it.” And I’m like “Is that tech debt? Or is that just planning stages?” It’s like “Oh, well, the first thing we want to build is the front end, and we want to get that right. And then the next thing we’re going to build is the thing that actually can deliver all of the emails.” And that just seems like bog standard planning and approach of a problem that gets you the biggest bang for your buck along the way. And it’s kind of weird, at least to me, to phrase that as debt. Because I think that is a misunderstanding of what debt in the financial sense is, and tech debt derives rather directly from the idea of financial debt. But once again, an analogy of financial debt, not like a debt itself.

Yes. And I do think that the best use of that analogy is exactly what you said, which is to describe a situation to somebody who’s not an engineer or on the software team, about the situation. I mean, historically it’s been very difficult to get buy-in from people who aren’t in the trenches of the codebase in order to spend time and money on making the code better. This is a problem for us. It’s like “How do I convince my boss, or my boss’s boss, or my boss’s boss’s boss that it’s worth spending two weeks just fixing things?” Because they don’t understand that, really. They’re like “Well, we need features. So you work on features, you don’t work on fixing things.”

[34:22] Well, tech debt’s a great analogy for them, because they do understand debt, they do understand finances, and they understand that you’ve got to pay that debt down or eventually you go bankrupt. Eventually, you can’t deploy a microservice; you’re taking over other people’s microservices. So that’s a great use of that phrase, or that analogy, is to explain “Okay, I get it. Like, we’ve taken on so much debt because we’ve been moving so fast, and making compromises along the way, and now we need to pay down some of that. And so I can spend some money on this, actual money that is in my budget as a manager or whatever, and say “Yeah, go ahead and take every Friday and have it be refactoring Friday for the next quarter”, or whatever they decide.

But I do think it falls down a little bit, as every analogy does eventually… Like you said, the more you apply it, the more it falls down. Because financial debt is so measurable; it is so quantifiable. I can tell you my numbers, and you can be like “Okay, you’re in this much debt. How much money are you making?” I can tell you those numbers. In fact, we have TV shows, and radio shows where people call in and ask financial advice from people… Right? Dave Ramsey’s like “Get out of debt”, and people call him and they just tell him, and he’s like “How much are you making? How much is your debt? etc. What can we do about it?” You can’t do that with technical debt. I can’t quantify to you - I’m sure there are startups that are trying - how much debt we have as an organization. Can you?

I think the trouble here is that – I guess like two things. I guess first, this might just be that my experience is like a very unique one. Although talking to friends and other people, I don’t think so, but I will say maybe it could just be my experience. But I have had very little trouble getting business buy-in for why we need to fix things, or “pay down technical debt.” The issue I have run into, that I’ve seen happen quite often is engineers go into the business and saying “Well, we need to pay down technical debt”, and they ask them, “Okay, well, what is it you want to do?” and the answer to that is “Pay down technical debt. What do you mean?” It’s like “No, no. Can you give me basic parameters?” Once again, can you quantify this as like how much time do you need? What are you going to do? What it’s going to be like in the future after this. And I think because we don’t understand ourselves what this debt is, we have a lot of trouble trying to articulate it out.

Because I think that’s also where we kind of shoot ourselves in the foot with technical debt, because the way businesses understand debt is as this very complex tool that they can use to get things done. Whether you ever pay down your debt is an open question, and there’s all sorts of – there’s different kinds of debt, there’s different qualities of debt, there’s different timelines of debt… But also, debt is a thing where – it’s maybe a liability for you, but it’s an asset for somebody else. It’s like “Oh, me having this debt is bad or me as a company. But if I’ve given somebody else debt, that’s good for me, because I’m going to get the benefits of it.” And that’s where the analogy starts to really fall apart, because those things aren’t really true of the kind of debt that we’re talking about. We’re kind of just taking bits and pieces of financial information like interest rates and all of this and trying to apply it to our projects… When what the real problem is is that we need to just be better at making sure we’re being responsible in building out our software, which is why I equated it more to malpractice than to technical debt, in that “Why do we have all of this debt? Like, how did we acquire it?” Because if the finance department of a company is just like “Well, we have all this debt”, and the answer is “Well, where did it come from? Who decided it? Why did we take on this debt? What was the analysis that went into choosing this debt?” and they say “Well, we don’t know. We just wanted to move fast”, that’s gonna be a problem. You’re gonna get questioned, and you’re going to start losing trust.

[37:55] That, in my experience, has been the biggest problem, is that the business people, the product managers don’t trust the engineering department to be honest about how long things will take, or knowledgeable about how long things will take… Which is partly our own optimism. We say “Oh, this will take two hours”, and it takes two weeks, and they’re like “I can’t trust anything you’re going to say. So you’re telling me you need to pay down this tech debt, but you can’t quantify it, you can’t explain to me what it is. You can barely explain to me how we got it”, which is usually just “Oh, the people before us” or “It’s just how it all works.”

So it’s sort of like we’re trying to tell people that we need something that’s going to cost a lot of money, but we can’t actually explain to them in understandable terms, in just like timeline and dollar terms, why we have this, how we will pay it down, and how long it will take to do it.

And I think there are people that use “good tech debt”, but I don’t think that’s debt, once again. I think that is very much - you’re just doing planning; you’re doing documentation, you’re figuring things out, you’re making timelines, and you’re making trade-offs and choices.

Well, yeah, you could argue – I mean, any good trade-off in software could be aligned with a good loan that you take out in order to accomplish something. So I can see where you could make that line up. But I agree with you that a lot of our struggle is the ability to communicate… Period. [laughter] I was gonna qualify that, but then I was like “Let’s just stop right there”, because sometimes it’s really hard as a software engineer to be able to represent exactly what is the problem, and exactly what you need, how long it’s going to take. I mean, a lot of times, going back to the beginning of the conversation, that’s how we got here in the first place, is you asked me for a timeline, I had no idea how long it was gonna take; I made my best guess, then I tripled it, and I was still under… And now we just shipped a feature that was half-baked, and now you’re wanting to build another feature on top of that feature, and I’m sitting here thinking, “This is not going to end well.” And that’s just not easy to communicate in a way that doesn’t get you fired, or… You know, it’s very difficult to explain that.

I think it is hard to quantify, because I could have answers like “Well, we’re going to go back and we’re going to retro-fit a test suite around this. And then we’re going to rewrite this sub-section. And then I’m going to maybe swap this dependency for another one.” And then what? Well, are we done? Have we paid our debt down? It’s like, I doubt it… Like, no, because it can always get better. How long is that going to take? I don’t know… Maybe a week. Okay, well, at the end the week I’ve only got step one done, and now it’s like “Hey…” I mean, it’s just, it’s fraught; it’s very, very challenging to succeed in this arena.

It does all go back to kind of the original point that I had, or the main point I have in the blog post, of like documentation. We don’t know how long things are going to take, because how do you actually acquire understanding of how long things take? Do we sit down and keep all of the documentation for the projects that we’ve done, so we know why we made the decisions we made, what the timelines actually look like, recorded things so that we know “Okay, well, in the beginning we said that this was gonna take two months. One month in we figured out it’s actually going to take four extra months. And then we made these decisions to try and get the timeline down. We did all of this…” If you don’t write any of that information down, you have nothing to go back to to refer to, which means that the next time you’re in that position, all you have is your intuition of what happened, which will almost always be incorrect.

This is why we write things down, because it gives you a tangible thing that you can go back and look at and be like “Oh, this is the timeline of events.” Because everybody’s gonna have a different perspective on why something happened. And there’s a whole bunch of people that have more knowledge than other people, and you kind of need to blend that all together and get everybody in a room and be like – like, if the product manager comes to you and says “Well, the last time you told me that this feature that seems like it’s gonna be similarly sized will take two months and it wound up taking six months - I don’t like that.” And if you can sit down and say, “Well, that’s because we started that project in September, we had the lull of December and January, and if we look at our trending data, we get less work done in the winter months, because of vacations and all of that… And then we have like the beginning of the new year, and we have to do all this extra budgeting stuff, and all this planning… And that’s why this project wound up taking six months. But now we’re in April when we’re applying this project, and we’re about to hit the boost of summer, so we know that we’ll be able to do this work in a focused way and deliver it within his timeline.” And if you can say that to people, they’re gonna be like “Oh, okay, no, that makes sense. Okay, let’s schedule it that way.” But if all we’re saying to them is like “Well, things happened”, then that’s not gonna be enough. They’re gonna be like “Okay, it’s gonna take six months then”, because you don’t have that information to be like “This is why it went this way.”

[42:23] You’re saying it’s called malpractice though, right? Are you advocating that it should not be called tech debt, it should be called tech malpractice? Can you label your specific perspective?

Can you document this for us, Kris? [laughter] Put this in writing…

Because I wanted to hear all of what you had to say before I brought my argument in… And it seems like you – I couldn’t quite tell if you had an issue with the analogy of tech debt being akin to financial tech debt, or if it was simply “Tech debt should be renamed to tech malpractice”, because you feel like the majority o what we call tech debt is a result of a version of malpractice, in some shape or form, not just simply pragmatism, or quick analysis, cutting corners… Because we do that in order to get through something, whether it’s dishes or not. Like, did you rinse them before you put them in the dishwasher? Okay, is that malpractice? No, not really; it’s just more like “It didn’t need rinsing, because my dishwasher’s powerful enough.”

I think there was a period of time, probably a couple decades ago, when it was like “Nah, this is just like we haven’t learned.” But by this point, enough of us - I’m specifically talking about people that are running projects - we’re people that have been in the industry for quite a while, and they’re planning projects out. At this point you should know better, right? The number of codebases I’ve walked into that lack not just commenting, but any type of coherent documentation about what this thing does… And then people are like “This thing is – I don’t want to touch it, because I feel like I’m gonna break it”, or whatever. And it’s like, we built this thing 12 months ago. There’s no excuse for why something we’ve just built from scratch should be in this state.

Yes, I agree with that.

We should have hygiene around our codebases. Not washing our hands as a doctor, and if you get someone sick, and if you cause lots of problems, that’s malpractice. If you as a lawyer don’t keep good notes and don’t keep track of things and you lose your case, that could be malpractice. And for us, if we greenfield a project, and in less than a year or less than two years we wind up back where we were, saying “Oh, we need to greenfield it again”, that is wholly on us. That’s wholly our fault, and our problem. And when we talk about it as debt, we kind of get to escape that accountability for ourselves. It’s like “Oh, well, we’ve just acquired this debt and now we have to pay it off”, and we don’t have to internalize as “We’re the ones that acquired this debt irresponsibly, and it is our fault, and we’re not using this tool responsibly.” So it’s just shift the focus from the tool, which is a perfectly good tool… Like, should you write comments in your codebase? I think that there is no concise answer to that. Sometimes you need comments, sometimes you don’t, sometimes comments can be harmful… There’s a whole bunch of nuance that goes into there. But should we wind up with codebases that we don’t understand, because there’s not a single line of commenting in 15,000 lines of code? No, that’s clearly wrong. You’ve done something wrong; you didn’t have a plan of how you were going to document it, of how you were going to make sure that you can maintain it in the long run, continue to add new features to it, onboard new people… And we’re at a point in the industry where this shouldn’t be happening anymore. It shouldn’t be nearly impossible to onboard an engineer onto a project that is less than a year old; or really any project at.

Right.

I think the key is though as a result of. I think it is tech debt, because there is a cost that has to be paid to resolve it, right? A literal cost in salary, a literal cost in time, a literal cost in effort, cognitive thinking, team effort, loss of focus on product etc. So there’s a cost. So debt does make sense. But it is a tech debt as a result of what? A malpractice scenario, or a pragmatic thinking scenario? So I think as a result of is more akin to what your argument should stand upon, not relabeling it.

[46:06] Yeah. And it’s less so that I think we shouldn’t – I mean, there’s a whole other reason why I think we shouldn’t use tech debt as an analogy anymore… But it’s not because of that. Part of malpractice is because of that.

I wanna hear that. I do like your idea around malpractice, and I think you’re spot on.

Just calling it that… Well, you can’t go to your boss and they’ll say “Why do you need this budget?” and you say, “Well, because of malpractice.”

Right. They’re gonna be like “What?!”

And then they’re like “Well, whose malpractice?” and you’re like “Well, it wasn’t me…”

“Who’s head’s rollin’ here? We have a tech debt to pay, and it’s a result of these, these and these.” And internally, you may want to deem it as malpractice, insofar as that you’re trying to institute change; not to damn people in the past for their choices, which you may should do by not hiring them ever again, or never working with them on a team, or calling them out the next time they try to make the decision… But it’s more so to say “Okay, in the future can we just not do things like not document? Can we please document our code? Can we please at least at a low minimum, which is like a very low minimum, comment our code?” This is not a lot to ask of a team to do.

Well, you could institutionalize that thing. You can say “We have to have 95% test coverage.” And you could argue on the margins about the value of that. But in terms of actual concrete steps that we could do… So if the question is “How do we keep ending up here?”, there are a few mechanical changes you can just dictate top-down, such as “Code will have written comments.” And then of course, you have to then bikeshed exactly what that means. And “We will have test suites that are automated”, and then you have to bikeshed out exactly what that means. But those are things that could help you not end up there again. And you’d have to top down do that. Now, it could still happen…

Another factor that comes into here - and we’ll get back to your other reason, Kris - is churn is so high in our industry. You talk about onboarding new people… I mean, so many times I think when we’re coding - and now I’m just speaking personally again - “I understand the system and I will be here next year”, a lot of us can only say one half of that sentence. “I understand the system, but I might not be here next year” is more often the case. I mean, people move around a lot, and this goes back to what Jessica Kerr was talking to us about last year, which is knowledge transfer is one of the biggest challenges we have in our industry, because people are just hopping around so much that you’re constantly onboarding, you’re constantly offboarding…

It should be illegal. That’s my unpopular opinion - it should be illegal to do it.

And you don’t need comments – I mean, a lot of code doesn’t need to have that manual as you’re working on it and it’s living in your head and it makes total sense. And we think “And this is how it’s always going to be.” And it’s not.

One of the big reasons why I want to kind of push this idea of calling it malpractice is because I think it also shifts the accountability of the people in the room. Because I’ve seen it happen where the acquisition of the tech debt was not something most of the people in the room wanted to do. They knew it was bad when they were doing it, but they didn’t have language to push back and say “We should really not be doing this.” And they didn’t have – really, they didn’t want to make the political investment in pushing back on a staff, or a senior, or an SVP, or whatever, or staff or principle of SVP level person of saying “This is really bad. We shouldn’t be doing this.” Or “No, we can’t move this fast, because it’s going to acquire this debt.” If we can just say “We can’t move this fast because it will be malpractice”, that is then something where it gives people more tools for the future.

So I think the way we wind up in this situation so often is not enough people spoke up. And I’m someone that’s pretty outspoken. It tends to get me in a lot of hot water, because people don’t like it when you push back on them. And I’ll talk to people in private and they’ll be like “Yeah, no, that’s exactly it”, and then they’ll be in the room and they’ll be like “Okay”, and then no one wants to speak up, and everybody wants to be quiet. And then we acquire this tech debt, and then months later everybody’s like “Man, I hate that we have to deal with this.” It’s like “You should have spoken up in the room.” But I get that we as an industry don’t have that ability yet. But it also goes into – we all know that we move around a lot.

So if you are thinking that you don’t need to document this because you will be around, that is also a form of like malpractice. It’s like, you know that you will most likely not be around; the odds are not in your favor that you are going to be here five years from now.

So why is it that you think it’s okay for you to say “Well, I don’t need to document this, because I’ll be here in the future.” It won’t be; it’s an illogical thing to want to say and do. And so I think reframing it in this way can help us as an industry be a lot more responsible with this. And I think there’s – one other little offshoot of this, of… We produce an immense amount of information. And documenting it is hard, but there are some places where we’ve documented it very well, but it winds up being a wiki of some sort. And then we wind up in the area of knowledge management, information management, and we as an industry also, I think, kind of refuse to believe that other people are experts in this. We should be hiring librarians and archivists, and we just don’t. I think that’s also in line with this broader problem.

Librarians and archivists. Well, I think that I don’t disagree with that, in principle. I think in practice, no one’s going to do that, unless they have just huge margins, and just are looking for reasons to hire people that they don’t think are very valuable, but – like long-term thinking. But could software help solve some of that?

So if I’m thinking about the difficulties, the rigor of documentation, obviously a test suite not so much, but if we’re just focusing on docs, and manuals and whatnot… We’re getting to a point now where we can record all of our meetings, auto-transcribe all of our meetings, and make that all searchable. And I feel like a lot of value is right there, in terms of like “Why did we make this decision?” I know there’s practices around decision books… What do they call it? Decision –

Oh, like architecture decision documents, or things like that?

Yeah, exactly. So you can go back eight months from now and someone’s like “Why did we pick Tailscale?” And they’re like “Well, because - I don’t know. Bob picked it, and he doesn’t work here anymore.” But now you can go back to this document and say “Well, here’s why we picked Tailscale for X, Y, or Z.” Arbitrary example. Quasi arbitrary.

It’s a good example though.

But even that requires process, it requires the tooling, it requires people to buy in and actually do that thing… And it’s rigor. But maybe we’ll get to a point where we can just record everything, transcribe everything, and search everything. And then we have kind of an archivist, right? We have kind of a software librarian. Kris, I know you don’t like this, but…

Not quite… Because here’s the problem. The problem with wikis is that they become these areas of just like “Here’s some information.” What is the accuracy of this information? It’s like, I come across a page that documents something. Is this true to how it works? Is this correct? We had a meeting, we talked about some things… I can search it and I can bring that up, but is that correct context? Do I actually have a good understanding of how all this information fits together? Because the work that librarians and archivists do is not so much – it’s more about organizing the information so it is findable and useful. As we add more information – I would say, if you’re going to record all your meetings and actually start documenting things, I think that increases the need for you to have at least one of these two people… Because in order to actually utilize that information, you need to have people that can actually present it in a way that you can consume it.

[53:46] In theory, you could go and learn the Linux Kernel by just reading through all of the great comments that are in there… But it’s gonna be really hard to understand everything if you don’t have like a “Here’s where you start, and here’s the basic structure of how things work, and all of our categorization, and all the ways we use taxonomy to find – this is how we use this word in this specific way, and this is how it’s consistent.” There’s a lot of work you have to do around knowledge if you want to actually be able to use it.

And I would actually push back on the idea that hiring a librarian is something you do when you’re big. I think it’s far too late to hire these people way down the road. You need to hire them as soon as you start generating a lot of information, which is almost right away. Because once again, if you have organized information, your software engineers will be able to find what they need to find and make decisions much quicker, which will make them more effective. Librarians are force multipliers. Dev tool teams are force multipliers. But I think that librarians and archivists are force multipliers much earlier in the process than, say, a dev tools team.

Do you think the average tenure of a librarian would be any longer than the average tenure of a software engineer? …meaning, wouldn’t you have the exact same knowledge transfer problem? Like, who’s gonna be the archivist of this startup, and they’ll know everything inside out and point you in the right direction, and organize all the information if they’re gone in six months?

Well, I mean, that’s kind of the nice thing about specifically archivists. Because what an archivist does is they just go around and talk to people, and extract knowledge, and put it someplace, and organize it. And it’s – you know, this is a science that people get PhD’s in. So there’s a known way to actually organize information very well. And it’s not that it’s just all in someone’s head, it’s written down. So even if you do have your archivist leave, you just hire another one and they continue the work. And it’s just continual work of talking to all of the people, and making sure that a) we don’t have knowledge that’s stuck in people’s brains, and that it’s organized in a way that it is searchable, findable, you can reference it easily, all of that. So it’s really like, the thing you get at the end of the day is this comprehensive system that you can use to find information and knowledge about what you have done. We just don’t have that.

So I’m sitting here saying “We need to write all this documentation”, but even if a company started today just documenting everything they have, if you don’t have a good way of organizing it, you’re not going to be able to use it. It’s like having a book without an index. And if you want to find a very specific piece of information without an index, you’ve gotta go read the whole book, you’ve gotta go through the whole thing. And that’s going to be a tremendously difficult thing to do the more books that you have, and the more things you have. So you want to build up that index, you want to build up that way to refer to things, and that’s the work that a librarian or an archivist does. So in addition to the documentation, you need to have a way of going back and finding the documentation when you need it… Which is a very challenging problem, but thankfully, one we’ve been working on for thousands of years, so we have some pretty good ideas of how to do that.

Yeah, alphabetical order. Easy.

[laughs]

And in both cases that makes sense.

Dewey Decimal, if you need it… I mean, come on.

I wonder if this idea that you have – it definitely is good, but I wonder if we need a specific person in the role. If it couldn’t just be a function of the team that gets done, until it’s proven so valuable that we can actually hire a person to dedicate towards it as point one.

I would say as long as you have the knowledge on the team of like the actual study… You’re not just gonna be like “The whole team is gonna share accounting responsibilities.” No, you need to have someone that knows accounting.

Well, there’s usually a PM on the team, there’s usually somebody to lead… There’s always somebody that’s in charge and responsible.

But you’re saying they must be actually trained in this area of discipline.

Yeah. If they’ve gone and read some books on library and Information science, then yeah, sure, the PM can do it.

Well, I think that we as technologists, and those who work around technology tend to gather the skills necessary to do the job. So if we bolt on the need for a librarian, archivist-type skill set to someone in lead, or the team in general, that we care about these functions, then we all will begin to grow that skill set. By no means am I demeaning the PhD that some folks get to achieve that level of expertise, but we do have to start somewhere. And the pragmatic approach would be to, as a team, care, and then as a team, collect knowledge, and as a team institute that knowledge, and do that function till it makes sense to bring on a literal – because that’s how a lot of roles get formed, is over time we care enough, we do it enough, and then we’re like “Well, we really need somebody dedicated to this, because it’s so important.”

[58:23] Yeah. I think testing is a similar thing. In the early days it was just like – or ops. It’s not like you’re necessarily gonna hire DevOps people from the beginning. So it’s one of the things where as long as you – the actual getting the work done is what matters. Who does it is - as long as they’re competent in doing it, it does not matter nearly as much.

Right. Well, I agree. That’s why we podcast; we talk about these big points and we figure them out, and people listen, and they’re like “You know what? We’re gonna do that in our organization.” They come back three months later in our Slack and they’re like “We did this idea you guys talked about in happenstance on this one podcast way back when, and they wrote a paper, and then you’ve read it, and you’ve highlighted it, and everything.”

Your mention though, Jerod, of Tailwind and the why question, of why we make that choice, is I wonder if we couldn’t have the why document. If there’s a major decision, the why document essentially says why we made this choice, the pros and cons be considered… And the problem I think with documentation we may have is it does institute knowledge. And Kris, you mentioned the point of accuracy… We don’t have this lens in the future, looking at the past, which is documentation, saying “Who agreed with this document? Is this document accurate?” It may have been for a time. Because I’ll often go back to my documentation and be like “Well, I’m glad I wrote this, and it documented where I’m at then, but it’s not accurate now, because it needs to change.” I’m wondering if choices like Tailscale, or choices like using a certain database, or an architecture choice or whatever could be founded in a why document, and that why document essentially has a synopsis of why, the pros and cons considered, and who agreed with the ending results. Like, it’s signed off, for lack of better terms, by X, Y and X. So in the future you know “Okay, this document is at least accurate then, and these four or five people who were in charge agreed with the choices made here… So then it made sense, but now it needs to change because different circumstances are now in play.”

I absolutely agree with you, I think we do need have this document; I just think that it needs to be a lower-level thing. Because there’s always this problem of when does a decision reach the point where you should have this why document?

At deployment. If you put it in production, it should have a why document, right?

Yeah, but how many decisions get deployed? I mean, thousands.

That’s true. I think you can begin with major ones, like a CSS framework, or a paradigm like Tailwind, maybe…

“Why are we using this database?” Right.

I think that there’s – because yes, there’s the decisions you make where you actually do the thing, but there’s also the decisions you make where you don’t do the thing… Which I think actually wastes more time, and it’s harder to document than the things you do.

Oh, it’s true.

Oh, yeah…

So many places I’ve been like “Why is this thing this way?”

Couldn’t that be done in the why document of the what you did, though?

Why not? You need to document, so why not?

If there is a thing you did instead, yes. But sometimes you’re just like “We don’t want to do this”, or “We’re not going to do this now. Maybe in the future we can revisit this.” So there’s not always a thing you did instead that you can refer to. So there are these – and I think those documents are actually more valuable, because every new person that comes in will probably have a question of “Why don’t you do it this way? Or why don’t you do this thing?” There’s always some new technology. “Why aren’t we using Kubernetes?” “Oh, here’s the report that we wrote on the evaluation we did, and why we did it.” And that way too, if the circumstances have changed, and that new person comes with a new perspective, they can go consume that history and be like “Ah, this is how things have changed. I’m going to make this argument of why we do it.” You don’t have to rehash all of history all over again, which is typically what happens now.

I had a phone call, or a Zoom call with a listener of Practical AI a couple of weeks back; it was an enjoyable call, and he had a lot of good ideas just about the show, about us as a Changelog thing… And he kept saying “Have you thought of this? Have you thought of that? Why don’t you do this? Why don’t you do that?” And it was like that, where I was like – and no offense, because these are good ideas, and he didn’t have the domain context that I have. I’m like “I’ve literally thought deeply about each and every one of these, and decided not to.” Not all of them, but most.

[01:02:14.14] And had I been on vacation – put me in a large organization, that guy’s on vacation, somebody else is asking somebody else, “I don’t know, why didn’t we try DuckDB, that brand new database everyone’s talking about?” “You know what, we should try that out.” And then the guy who already spent six weeks deciding we shouldn’t use it - on vacation that day. But decision documents - that’s very much a thing in larger orgs. I’ve always been in tiny orgs, but… Kris, in the places you’ve worked have there been decision documents like this?

There have been, but there’s a threshold. And what people will tend to do is get around that threshold by getting just up to it, and then breaking projects up to smaller pieces so that it doesn’t reach that threshold, which is the dance that constantly happens. I think there was some rule about American banking, where it was like you can only deposit $10,000 in a day before you have to explain it. So people do like “I’ll just do five deposits of $9,999, and that’ll be good”, right? That’s the type of thing that people start doing if you have that kind of place where it’s like “Oh, now you have to –” Because people see it as extra work, right? It’s like “I have to do this additional thing of documenting why we made this decision, and I don’t want to be doing this. I want to do the other things.”

So I think an important part of the usefulness of these types of documents is when they’re just part of the normal flow with everything. And that’s the thing that so many organizations I’ve been a part of just cannot seem to figure out how to do. I say it to people, and they’ll be like “Oh, no, that makes sense… But that seems like a lot of extra work.” And it’s just like, you could just have it as a field in your JIRA ticket, or anywhere… Lower the bar, lower the barrier to documenting why you made a choice, or made a decision; I think it’s so much easier. After every meeting, write up an email summary and send it out to everybody. That’s some documentation. Or post those notes in a wiki somewhere.

When you do it in the small, you don’t have to necessarily do it in the big, because you can pull all of the small pieces together and you’ll have that granularity you want… Because there’s things you could miss in the big architecture decision document, or the big design document, especially historical things that can be in the small. And I think historical also is very important. Because Google Docs is terrible at this, and this why I hate having design docs in Google Docs. Because I’m like “How did you get to this decision? How did you get to this paragraph being written in this way?” You go open the history and you’re like “I have no idea how to parse any of this.” So you’ve gotta go talk to people, and they’re like “Oh, we were thinking of this, and then we decided to do it this other way”, and blah, blah, blah… And you go look, and that doesn’t match with history, and they’re like “Oh, that’s something else I was thinking about actually– So it’s a lot easier if you just like in the moment document things, so people can go back and go read an email thread, and go read the wiki, and look at the history of the wiki and be like “Okay, this is why this decision was

made.”

So going back to estimation…

Uh-oh…

…what happens when the software doesn’t ship, because “Well, I was too busy documenting it”?

You don’t say that. You never say that. [laughter]

Well, in the world that you’re calling for that might become slightly more true, right? Like, if you add one more critical task - let’s call it a critical task, to document - and it’s not being done, and you’re calling for it to be done… I’m just wondering, in that world maybe people will have one more excuse why it doesn’t take four months, it takes six months; or why it took six months.

Yeah, and I think this is also about like kind of the making sure that we have an acceptable bar of what is the minimum that we will do as software people… Because yeah, if the floor is that it takes six months to build something, then the floor is it’ll take six months to build something, and we as an industry need to accept that. Because there’s a lot of – I think we as an industry are still in this kind of… We want to believe that the software we build doesn’t have severe impacts on people’s lives.

[01:06:03.11] I think a lot of people want to be like “Well, I’m not building software for an aeroplane, or for any of these things… So my software isn’t life or death.” I worked at a financial institution that was like a literal bank, and gave people loans, and credit cards, and all of this. And I remember one of my co-workers said to me “Well, what we’re doing doesn’t make that much of a difference.” And I’m like “We’re literally deciding whether someone gets a loan to buy something. That is a pretty major thing. And if our software messes up, they might not be able to buy something that they could have otherwise bought, or they might otherwise need.”

And so I think like a lot of us are like “Oh, no, what we build isn’t that important. No, it is. It has very profound implications on society, and we should be very diligent in making sure that we’re taking the time necessary to make sure that we’re building things correctly, and that we’re not doing that malpractice… Which is a very harsh word, but I think it’s one that we need to start doing… Because I think, especially if you look at the social media companies and some of the kind of egregious stuff that they’ve done in the past, it’s like, would we maybe have prevented that if we had better documentation, if we had better ways of raising our voices and things like that? I don’t know if the answer to that is yes, but it’s certain that the way we’re doing things now is not working, and we need to change something. And the thing that’s been constant since the ‘50s, that has been said, is “We need to document things more.” So I think that’s a good place to start, even if it does cost us a little bit more time. But I think also, it’s a planning thing. When you have more information - sure, this one project, maybe it takes an extra two months. But you might save six months on the next project because you have all this information to look back on. So it’s a thing that also builds over time.

Compound interest. It’s financial tooling.

Yeah. It’s a compounding knowledge. Yeah.

I love that. Compund interest.

Actually, there’s this interesting thing… I was thinking about what debt is in the financial sense, and debt is kind of this weird thing that it’s like at the core of how money actually works. How do you create money in an economic system? It’s with debt. You give your bank some of your money, they give it out to someone else, and they get more of it back, and that’s what bootstraps the entire system and keeps the whole thing running. So it’s like, debt is foundational to how money works. And I think in that same sense, knowledge is foundational to how software works, and we should build things around that. And I think there probably is some sort of equivalent to debt for software. I don’t know if we should call it debt directly, because I feel like it probably operates in a slightly different way… But I think there is something there that we’ve got to get ourselves toward, and I think that’s what I’m trying to do here, is like “What is the thing? What are the tools that we have to work with this knowledge?” Because I also have another hot take, unpopular opinion…

We’ve got time… Let’s hear it.

So I guess I’ll preface this with - in writing, one of the most annoying things to deal with is like prose. So if you’re writing a novel, the words in the novel are very, very expensive, because it’s very difficult to keep a story coherent, and all of that, when you’ve already written a bunch of words on the page. And the thing that I’ve realized, pondering this for a long time now, is that the most valuable words, the most valuable prose is prose that you consider to be worthless. Because if you consider it to be worthless, you have little problem with throwing it away. But as soon as you put value in those words, it becomes quite expensive. Because now you don’t want to get rid of them; now you’re falling in the sunken cost fallacy. And there’s a whole bunch of sayings around this, but I think the most popular one is “Kill Your darlings”, which is basically saying “I know you love that sentence you wrote, but it don’t fit anymore. Just get rid of it.” “I know you like this character. It doesn’t fit in this story. Get rid of it.”

Right.

And I think the same applies to software, and specifically to code. I think code is valuable when we consider it worthless. Because when we consider it worthless, we are okay with throwing it away. But as soon as we assess it with value, now we want to try our best to not get rid of it. And where we are now, code is the thought stuff. It’s the way we prototype with code, we do designs with code, we do everything with code. So if every line of code we write, we see as having value, we will try and hang on to every line of code that we write. And it will be painful and difficult for us to delete that code. And I think this causes a whole host of problems. This is why our operating systems still have code in them from the ‘70s, for some reason, and it’s why we have all these multimillion-line codebases that we’re like “Well, we can’t really get rid of this thing, because we have all of this code invested.”

[01:10:29.09] And so my unpopular opinion there is we should consider code to be worthless, and that will derive value from it in the long run… Which is really weird. I don’t think that’s the best phrasing of it. I think I need a better phrasing, but yeah, I think we should believe that code is worthless, and we should not be so tightly integrated with it. We need to let it go much more often than we do.

Right. I agree with that. I think I’ve heard it said “Features are assets, and code is liability.” And worthless meaning “able to be removed.” So like the best code is no code, and the best feeling as a software engineer is deleting your code. So I’m with you on that. It is a weird way of saying, it’s kind of an interesting way of saying it, because if it’s worthless, how can it be valuable? It’s kind of a contradiction in terms, which maybe makes it sticky… I think we should write our code to be thrown away, for sure.

And there’s a lot of reasons why we don’t; we can get into that, psychoanalyze ourselves. But yeah, I’m with you. I think that the individual lines or characters or words need to be disposable, and when we hold on to them too tightly, we do ourselves a disservice.

Yeah. I’ve been thinking about this a lot. I have this analogy I’ve been working on, because I think – and this kind of goes into why…

Malpractice?

…I’m starting my own company, and all of this… Malpractice is an analogy I do like, yes, but not in this case… Because I think that we don’t understand the scale of how much things have changed in the last 40 or so years. So I’ve been working on this, and I started with me trying to do it based on square footage, but I figured out that doing it with money probably resonates with people a lot better. So if you take the Apple 2, that was released in 1977, and it cost about $1,295, which is about $6,500 in today’s money, and you kind of do a nice little conversion with transistors and clock cycles, so you’re like “Okay, one transistor running at 50 kilohertz equals $1 per year”, let’s say. The equivalent resources of an Apple 2, or $60,000 in this conversion of transistors/clock speed. So you can be like “Yeah, $60,000 a year. That’s your budget for what you want to get done, what you can get done in a single clock cycle with the Apple 2.” If you take a Mac Studio, that was released this year, and it costs roughly the same… If you spec it out with the max processor, max RAM, you wind up with a roughly $6,500 machine. The equivalent amount of money in that same transistor/clock speed is nine quadrillion dollars, which is 100 times the amount of money that exists in the world. So we went from having the resources of decent middle class income, to all of the money in the world 100 times over. And I don’t think we as an industry understand that that’s how far hardware has moved in the last 46 years.

And I think the problem is that we basically just have unlimited money now, and you can’t use the same things that you used when you were budget constrained, when you have unlimited money. You can use an Excel spreadsheet to manage $60,000 a year; it might be a little tough, but you can do it. You cannot use an Excel spreadsheet to manage nine quadrillion dollars. You’d have to actually build something very, very customized and very advanced. I mean, you’re talking about managing world-level economies at that point, or even large government at that point.

[01:14:05.17] So the fact that so much of what we have today is rooted in the time when the resources were that constrained I think shows that we need to really start giving up things that we’ve had in the past, and really start rethinking the very basics of how we do things. I very much deeply appreciate Unix, but I think it’s a model that is kind of too stuck in the past, because we’re doing – a lot of the hardware that we are using is spent trying to emulate that old world, trying to pretend as if we’re still a single processor, running at maybe a megahertz or two megahertz, so that going and fetching RAM isn’t that problematic… But we truly live in a ridiculously different world than even 20 years ago.

There’s another analogy, or I guess comparison I’ve been working on, of roughly 20 years ago we got our first commercial multi-core processor. It was an Intel Xeon. Well, I think there was a consumer version, but the [unintelligible 01:15:02.04] Intel Xeon. It had two cores, four threads; two of them in the machine. you get a total of - what, like 16 threads? No, the total is eight threads, because it’s two cores, four threads each. So you have eight threads. And about a megabyte of cache, or two megabytes of cache.

AMD released a processor this year that has 96 cores, 192 threads per socket… So if you put two of them, you wind up with 384 cores, and each of those processors has a total of a gigabyte of cache. So you have two gigs of cache. That is an extremely different computing device than the one we had back in 2005.

And while we could stretch a lot of the parallelism… You know, we could just utilize four cores, or eight cores, or whatever, you can’t use the same thing –

Goroutines, baby.

Yeah, you can’t utilize the same thing to go – you can’t use those same resources to go use 384 cores and multiple gigs of cache. Because it’s also cache, too. Before, you had a little tiny bit of cache, so you couldn’t really keep a lot in there. Now you have a lot of cache. And if you write your software wrong, you’re going to be doing lots of cache expiration, and your program’s going to crawl to a halt. But you want to be able to use all those extra cores, because over that 20-year span, clock speeds have not increased at all. In fact, that AMD processor we were talking about has a slower clock speed than that original Intel Xeon. So we’re not getting faster, we’re just getting more stuff.

Right. Scaling horizontally.

Yeah. Which means we’re running into physics as well, because it takes time to move across the chip, so you have to deal with those latencies, and all of that.

That’s a distributed system, almost.

Not almost, Jerod. Not almost. It IS a distributed system.

[laughs] So are you here to announce your brand new operating system? I don’t understand… What are you going to work on in this space, Kris? A new OS? Throw Unix out and just start from greenfield?

Obviously, deciding we should throw Unix and all the Unix operating systems out is extremely impractical.

But the main thing I’ve been frustrated with is that I’ve been a distributed systems engineer for most of my career. I love distributed systems problems… But even I was puzzled when I first started looking into the literature around distributed systems and really reading it. And there were some things that I’ve found very confusing. I have this book that I bought called “Fault-tolerant message passing distributed systems.” And I’m like “Okay, this is an interesting title.” Why is message passing in there? Because aren’t all distributed systems message-passing ones? And I learned no, there’s shared memory distributed systems. I’m like “Okay, well, what’s fault-tolerant in there? Don’t all distributed systems need to be fault-tolerant?” And it’s like “No, there’s distributed systems that aren’t fault-tolerant.”

So I went on this journey over the past couple of years of learning a lot about distributed systems. And what I’ve found is that all of our computing devices are distributed systems. And not just in the sense of you have multiple cores, but you have PCIe. PCIe is a packet-switched network system. And that’s at the core of literally all of our computers.

[01:18:08.11] So yeah, you’ve got a little distributed system that’s running your computer right now, running the computer we’re all recording this episode on, which has some profound implications. But even your processor, once again, is its own distributed system, because you have all this cache coherency that you need to deal with, right? In C if I write 5 to the variable x, and then I read it the next clock cycle, I expect that to just be 5, because I just wrote 5. But if I got scheduled to a different core, that has a different value in this cache, then I might not get 5, and that’s gonna break my code.

So there’s a lot of hardware logic that’s done to actually synchronize all of that cache coherency that we have, and we call it atomicity when we’re talking about the processor, we’re talking about atomic memory, and all of that, and transactional memory, perhaps… But in distributed systems we call it linearizability, which is also kind of the gold standard for a lot of databases that they want to reach. And that’s at the core of every processor that we make that has multiple cores. So it’s not just that these are technically distributed systems when you look at them, it’s like, it’s using the same basis of research that we use to go build Kubernetes to go make your processor work… Which means that there isn’t a difference between when you’re programming locally and when you’re programming this kind of Kubernetes, or what have you. And the reason I have a frustration about this is because I’ve been to too many places where I say “Okay, well, we’ve gotta go solve this problem”, and I hear “Well, we want to avoid having distributed systems problems”, or “We don’t want to take on distributed systems problems right now.” And that annoys me, because I’m just like “You can’t not.”

It’s not just like the basic types of distributed systems either. One of the things I’ve learned about is Byzantine fault-tolerant systems, which Leslie Lamport, who I guess you had previously, he coined that term of like the Byzantine generals problem, and Byzantine consensus… I don’t know if he coined Byzantine consensus, but the whole Byzantine thing is his creation; his and some other people. And I was like “When am I ever going to need this?” And then like for one thing, blockchain is actually a Byzantine fault-tolerant distributed system, which is actually pretty cool…

That’s true.

But also, I realized through thinking about this that writing to disk is a Byzantine faulty distributed system, because the disk will lie to you about what it’s done. You might say, “Please write this to disk”, and then do an fsync and be like “No, no, no. Really write this to disk.” And it’ll be like “Yup, totally. I wrote this to disk.” And then you read it, and it’s like “I don’t know what you’re talking about. You didn’t ask me to write that to disk.” It lies to you. So even something as simple as writing to disk is a distributed systems problem.

You need to fire your hard drives, man…

Yeah, you’ve gotta be like “These lying hard drives… We need to get better ones.”

Right? Ask him better. Ask him again. So everything’s a distributed system, is what you’re saying.

And the thing here is I think a lot of it is not difficult to understand if we just sat down and taught people… But we’re still very focused on maintaining the illusion of the old world that we refuse to move into this new world. And so that’s what I’m trying to do, is be like how can I teach people about distributed systems so that it’s actually approachable, and you’re not going and reading academic textbooks and academic papers, trying to understand what even for someone like me that’s been doing distributed systems for ten years was just like “This is difficult to parse.”

Gotcha. Okay, so it’s not a new OS. Distributed systems - is it like a course, or like a book, or what are you actually going to manifest here?

As writing - I mean, right now, I want to start with kind of being on the internet, being a person that writes on the internet, and doing digital publishing, and see where that gets me, and kind of go on this journey. Obviously, that includes other things, like probably video, podcast, other stuff like that, because I have skills in that area. And I think my main goal is just to make it an approachable thing. Because I think right now it’s kind of the land of experts, and there’s a lot of experts that can only really speak in jargon about distributed systems.

I think you kind of ran into this when you were talking to Leslie Lamport, when he was trying to explain some things and I was like “Okay, there’s a leveling problem.” Leslie Lamport is obviously very well-versed in distributed systems, and Jerod, you’re knowledgeable, but not in that realm. And so I think there needs to be a way…

[01:22:20.20] I was waiting in the same pool for like the first 45 minutes. And then once we hit minute 46, I was just hanging on for dear life.

Yeah. I remember you brought up Paxos… Was it Paxos and Bakery? And you were like “Are these related?” and I was just like “Oh…”

No, I asked if the Bakery algorithm had stuttering insensitivity…

Oh, right, right.

…which apparently was a very not smart question, but I didn’t realize it until he answered that, and I was like “Dang it.” So…

And I think it’s so challenging to understand –

I’m a generalist, Kris. I’m a generalist, okay?

But you know, this stuff is all around you. Would it be nice to have a better understanding of how it works? Or just to get rid of some misconceptions. One of the things – I’m working on a series right now about this, about basically helping people understand the misconceptions they have about how computers work… And one of them is about timing models. And the distributed systems that we have now – or really, the processes we have now, right? You have no guarantee of when your next clock cycle will be. If you’re writing a language that runs on a virtual machine or a runtime, you have that scheduler; you have the OS’s scheduler. If you’re running on any type of virtualized environment, you have the hypervisor that’s doing scheduling. And none of these things have to give you clock cycles. They’ll be like “I’ll give you a clock cycles when I want to give you a clock cycles.” And it has really weird implications about the type of code that you can write. For instance, you can’t depend on the passage of wall clock time in the code that you write. Because even if you do check and say “Oh, did this timer timeout?” Or – yeah, if you check if a timer timed out, then you might assume that the next clock cycle, you’re like “Oh, I’m operating within some small bounds of when that check was”, and it’s like, yeah, it could be running an hour later. And I had a friend who was working on this type of stuff with sleeps, and they would ask “Oh, Linux, sleep this process for 10 minutes”, and Linux would get around to waking it up 10 hours later. And it’s like, if your code assumed that it was actually going to wake up 10 minutes from now, and it woke up 10 hours from now, is your code actually prepared for that? Is it ready to handle that type of reality? And the bigger question is, does it need to handle that reality? And my conjecture is that no, it does not, and that we should move away from thinking about things, and timeouts, and sleeps, and wallclock times at all, and move to embracing what is asynchrony, which is not having time bounds on when you will execute, and things like that.

It’s tough to explain this to people, and difficult for people to wrap their brains around, but it could have some very big implications for the ability of us to build interesting systems. For example, one of the things I said for years, which I now feel kind of bad about, because I was just wrong, was that you can’t step-debug a distributed system, right? Because that’s one of the things I’d say, is I don’t use to step debuggers because I’m a distributed system [unintelligible 01:24:58.16] You can’t step-debug a distributed system. But you very much could step-debug a distributed system if you don’t have timing assumptions, right? You can just pause your process, and if nothing else assumes that your process is going to get back to it in any time bounds, then it doesn’t matter how slow it’s executing, and you can step debug it all you want, and your whole system will work, and it will work correctly. But if you have a whole bunch of timeouts all over the place, and those things expire, now you’re in a different path than you would have been otherwise if your system was running at full speed. So once you start programming with asynchrony, now there’s problems you just don’t have to worry about.

I think it’s similar to the local-first movement, and what Martin Kleppmann has been working on, where it’s just like - you know, we right now think of online and offline. And online is some low set of latencies, and offline is - you can see it as a high set of latencies. What if we just didn’t have that distinction, and we’re just like “No, there’s just latency.” And we can handle any amount of latency. We can handle 10 milliseconds of latency, we can handle 10 days of latency. And the whole system just keeps working. That eliminates so many bugs that you might have otherwise. But it’s hard to make that shift if there’s no buy-in for people, which is also the thing I’m trying to get at, is –

[01:26:12.08] Give us a good reason to, right?

Yeah, this is valuable for you as well. It makes your life better if you think in this model. It might be tough at first, but your life will be better if you think in this model.

Alright, it sounds cool. It sounds like a challenge. I think we definitely need more documentation in the world of distributed systems, so that more of us can understand… And like you said, it’s tough sledding, to go down those papers and to understand, and not to get lost… And even Leslie Lamport himself said on the show he’s not a very good teacher. He’s struggled to actually express his ideas, because they’re difficult to express, and he’s operating at a level that sometimes it’s hard to come down and speak to somebody who’s new to an idea, to a concept when you’re so ingrained in it, or when you invented it 40 years ago, you know?

So a valiant endeavor, Kris. I think this is cool. So you’re giving yourself full time to this. This is what you’re doing.

This is my journey now, of really just trying to be a writer. And it was me – It was a lot of me sitting down and thinking about who I am… Because I think part of the issue I’ve had in my career is that I am extremely ambitious in what I want to accomplish, and I would often find myself around people who would say they were ambitious, but were ambitious to a much lesser degree than I was. And I kept butting up against that. And I think I’ve kind of figured out that I just need to kind of operate my own thing, where I can be as ambitious as I would like to be, and actually put in the effort to do that… Because I found that I was holding myself back quite a bit, to kind of be in the environment with everybody else. Because there’s politics and all that you’ve gotta play, and it’s like “Oh, we should go do this big, ambitious thing”, and people would be like “Oh, no, that sounds like a lot of work, and maybe we shouldn’t do it.” And I’m like “No, no, but we can do it, and it’s gonna be –” And you don’t want to annoy your co-workers, and all of that, so you kind of give up on it… And I’ve given up on too many ideas, and I realized “Maybe if I just do this on my own, I can make it happen”, because I also write an absurd amount. I think on one of the Go Time episodes I mentioned how I in the past two years, with journaling alone - I’m a big thinking on the page type of person - at that point in time I had written 3.8 million words over the course of two years. So a little under 2 million words a year. I have actually increased it since then; this was my first month, and there’s three days left, so I think I will meet it… But that old system was me journaling 5,000 words a day, and now I’m up to 10,000 words a day.

Dang…

And I want to do more, because I truly do just think on the page. It’s like, some people think in their heads, some people think with drawings, and all of that… I think with words on the page, which is extremely valuable in this type of space, because if I think with words on the page and I go out there and I publish, I clean up those words some and I publish them out, people can follow that journey. Because I think part of the problem with distributed systems, with a lot of teaching, is that by the time you become a master, you forget the path you took to get there. So you have a lot of people that are like “But how do I get there?” It’s like “Ah, you just learn, just read this book.” But if you can see the trail of how someone got there, and the things they struggled with, and the things they thought about as they went along, then it becomes a lot easier for you to be like “Oh, okay”, and it becomes easier for you to also not feel like you’re dumb, or something, of like “Oh, this person that I respect a lot, they struggled with this when they were learning… So I can keep going, and it’s not like I’m some fool and they’re just a brilliant person.” So I think that’s kind of the contribution that I should make to the world, is - I can write at a volume that is truly absurd. [laughter] And those people cannot do that. So I should go forth and do that, and do what I said to myself when I was younger, and be a writer.

Go forth and write at extreme. You might need a personal librarian or archivist, just to point out the best of Kris’ writing, amongst the many, many, many words that he’s – the valueless, or what did you call it?

The worthless words…

The worthless words that he’s written. [laughs]

[01:30:08.01] Yes… It’s interesting, because I have picked up quite a few books on librarian information science, and I’m getting myself there… Because I also have an absurd goal that I do not know if I will meet, but I’m gonna certainly try, which is I want to be a truly full-stack software engineer. And not the stack as in backend and frontend, but stack as in transistors and up. Like, I want to know how to design a processor, design a bootloader, a kernel, drivers, graphics engines, all of that. I want to be someone that knows that entire stack, because I want to know a) is it possible for one person to know the whole stack? And I don’t think I’ll be able to write a production operating system. Certainly not on my own. But can I write a functional operating system that works for me? Can I implement Ethernet that works for me? Can I implement a network stack? Can I design a chip? Can I implement, say, a RISC V processor on an FPGA? These are things that I want to know and understand, because if I am successful in my endeavors to do writing, and be a person that publishes on the internet, I can help be the force that pushes us toward advancing our technology some more.

So I think part of the problem that we have is that not enough people can see the whole stack. And I think right now we’re in this position where we all think that it’s impossible for any one person to see the whole stack, and I want that not to be true, because I think there’s lots of changes we need to make, and that it does require someone that can see, or a group of people that can see the whole thing.

Well, I would push you towards Andreas Kling and the SerenityOS –

I was gonna say that, Jerod, yeah.

You were saying the same thing, Adam?

Yeah, I was gonna suggest that, too.

Yeah. It’s funny, because I already listened to that episode, and I was already a fan of his, because I think I saw his talk at the Web Browser Hack Conference. It just kind of appeared in my YouTube feed a couple of months ago… And actually, he kind of inspired me a bit, because he – I talked to my friends before, and they’re like “Oh, you can’t write an operating system. Don’t implement libc, that’s blah-blah-blah.” And seeing this one guy that was just like “Nah, I’m gonna build my own operating system, and my own web browser, and I’m just gonna do it”, that was super-inspiring to me. I was like “Yo someone’s actually doing it.” He’s kind of like the first person that does it. Because when you’re kind of alone in the world, you’re like “I’m gonna do this thing”, and everyone’s like “That’s insane. Why are you going to try and do that?” It takes a lot to push back. But when there’s someone out there that’s done it, it’s like “Oh, okay, well, I can follow in their footsteps a bit, or go on that journey along with them.”

Well, that is cool. Kris, thanks so much for hanging out with us. Adam, any other thoughts, questions, unpopular opinions?

I’ve enjoyed the ride, the ride of Kris. It’s been fun.

This has been a ride, yes.

I was super-curious about the PCIe distributedness of my own PC sitting here right now and how I can explore that, but… We can save that for maybe a blog post, or a future Go Time rant or something like that, because I’m just curious about this distributed system we all have in our hands, right here, right now.

Yeah. When I fell down the PCIe rabbit hole, I was like “This is amazing” and “Oh, this is awful.” Like, a little tidbit of it. Because of the way PCIe works, there’s a brief moment when any device that uses Thunderbolt - so your monitor, or theoretically your charger, or really anything, has direct access to all of the physical memory on your machine. And there’s things that happen so that doesn’t remain true all the time, but because of the way PCIe works, and the way it was designed as a direct memory access system, it can just access raw memory, which is terrifying when you kind of think about it… It’s not something people need to actively worry about, but an interesting quirk of history… That yes, I think would make a good blog post.

That’s why I say don’t plug your phone into some weird outlet. I’m sure that’s probably the case, where like you’re in the airport, and you’re like “Oh, that’s a USB charger.” It might be similar to that, where on the other end of that thing might be a PCIe slot that you’re getting funked from.

[01:33:59.09] Yeah. USB is a little bit harder to, because it’s not directly PCIe; so a USB isn’t direct memory. But PCI – so Thunderbolt certainly is, so if you plug your computer in… But yeah, this is something else I like want to talk about.

So USBC 3.0+ whatever, that’s not Thunderbolt-ready, or…

USB 4.0 is basically Thunderbolt 3.0, so it can put PCIe over it. So yeah, USB 4.0 and all of that… This section of things is also like a deep rabbit hole that I also want to talk about more, because I think it’s fascinating how much we cling to abstractions that don’t make sense anymore.

Well said.

Except for tech debt. That one is just perfect, so we’ll just leave it there. [laughter]

I do think you do have your work cut out for you renaming tech debt to something else.

Well, I assume a subsection was malpractice. Now, he admitted there’s certain proper use of the analogy, so we’ve found a middle ground.

That’s why I think you should cut tail to as a result of my idea. I think you should use my idea [unintelligible 01:34:59.14] As a result of malpractice, not –

It doesn’t have quite the same ring to it as malpractice, but… Malpractice is fighting words. You say that in a meeting.

I mean, that’s kind of the point of it though, right, Jerod? [laughter]

It is, it is.

It’s like, if you’re sitting there and it’s like “Oh, are we about to acquire some tech debt?” It’s one thing. “Are we about to perform malpractice?” “Oh, this is a different conversation.”

Right. There’s no positive spin.

I was gonna bring this up during the show, but we kind of like went further elsewhere… And I was thinking maybe just a spectrum, which is like “Okay, is this decision leaning more towards pragmatism, or is this decision leaning more towards malpractice?” Like, you need that spectrum, essentially; where on this spectrum does this decision fall, or this choice fall? And then in that case, you can point fingers.

Isn’t some malpractice – like, there’s no decision-making. Malpractice - I see it as two things. And maybe, maybe I see it differently than you guys. It could be negligence, or incompetence. That’s kind of an umbrella term for both of those, in my mind at least. So you’re not having a discussion about the decision being made if you’re in malpractice mode. You’re just like scooting on, aren’t you? You’re doing your thing. I don’t know.

It depends, because there’s individual malpractice, but there’s also like the whole of the – sometimes the whole of the industry malpractice. Like the whole thing, where doctors in the - what, 1500-1600s didn’t wash their hands, and they’d like do autopsies, and then go deliver babies, and they’re like “Why is our maternal mortality rate so high?” People are like “Because you aren’t washing your hands.” And they were like “No, no, our hands–”

It’s like, you’ve got formaldehyde on your hands, or something.

Yeah. And they were like “Our hands are golden. Our hands are from God. What are you talking about? We deliver babies, we’re amazing.” And everybody’s like “No, wash your hands.” So it’s like a collective thing, where it’s like individually you necessarily blame anybody for pushing back, because they could have ended your career… But as an industry; so that’s the type of malpractice I’m talking about in this sense.

Okay, so that could be under the category of ignorance, right? Like, they were ignorant of what they were doing. Collective ignorance.

Collective ignorance, and probably a bit of hubris in how you feel about your own skills and what you can do…

It turns out malpractice is a wide umbrella, and we can all fit in there if we want to.

The thing I like about malpractice too as an analogy is that it’s not like –

[laughs] “The thing I like about malpractice…” That’s gonna be a title, “The thing I like about malpractice.”

…as an analogy, is that it’s not a singular decision. You have to actually evaluate the circumstances and what happened to decide “Was this just a bad decision? Was this malpractice? Is this not malpractice at all, and it just had a bad outcome?” You have to actually dig into it a little bit more, which is another feature that I like about it. But it also is incendiary, and you know me, I love hot takes and unpopular opinions, so…

That’s true. Alright, to our listeners who made it this far, let us know what you think about malpractice, both as a practice and as an analogy.

And do you like it?

And do you like it or do you not like it? Alright, we should end this before we go off on another tangent…

You know me… We did have that title going, Jerod, of –

Oh, yeah. “Sorry this podcast wasn’t any shorter”, or how was it gonna go?

Yeah, we didn’t have time to make a shorter one…

Yeah, there you go.

I love it.

Alright, bye y’all.

Bye friends.

See you on the next one.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00