In this insight-filled episode, Bill Kennedy joins Johnny and Kris to discuss best practices around the design of software in Go. Bill talks through scenarios, lessons learned, and pitfalls to avoid in both architecture and coding of Go projects.
Bill Kennedy: Now, when we talk about Go - one of Go’s language design philosophies was being able to do more with less code. It’s coming from these – I believe. I can’t speak for Robert, and all… But I believe it’s coming from these ideas that if there’s less code you need to write, there’s going to be less bugs. Now, why did they say this? Because Stroustrup says “If you’re writing more code than you need, it results in ugly, large, and slow code”, where ugly means you’re leaving places for bugs to hide, large means you’re ensuring incomplete test coverage, and slow means you now start to make shortcuts and dirty tricks away from your frameworks and your patterns, because you’re moving fast and the code gets out of control… And these things absolutely happen.
So I think we’re talking about all of this… It all works together, and I think Go is tied into that. And we complain about Go’s error handling. We love to complain about it. But do you know that there was a study done where they looked at 48 critical failures that brought down systems. Hundreds of bugs in Cassandra, HBase, MapReduce, Redis… How many systems run on Redis today? And they’ve found in this study that out of those 48 critical failures, 92% of them could have been avoided if error handling was done better. Failures from bad error handling.
So again, I think that Go designers knew this. They knew this, because they were developers themselves. They were not necessarily academics. They had to build software. They knew what the average developer needed, they knew where they were falling down. And I think Go comes in and solves these things.
[36:15] Personally, I think when somebody complains about error handling in Go, they’re complaining about – they want it easier to do, not easier to understand. [laughter] We come back again, right? So sometimes when you make things easier to understand, things have to be a little more tedious.
But here’s another design philosophy, Johnny? Two of them. One, you shouldn’t be writing code for yourself. You should be writing code for the next person that has to come along. Because if you don’t, if you’re not thinking about the next person and/or the average developer on your team, when you leave, that codebase leaves with you. It gets replaced. And the 3, 4, 5 years you spent on that ends up resulting in meaning nothing. I’ve got code that’s 20-something years old, 10-something years old in production right now. The 20-year-old code should go, because that’s way too long… But I think it’s there because I always wrote code with the understanding that somebody else has to be able to maintain this. It wasn’t about me, it was about the next person, and that allows that code to now not just have to be replaced, right? You need to have that design philosophy in your head; you need to be thinking about that, “Who’s the next person that’s gonna come along here?” And then you’re always writing code for the average developer on your team.
If you’re the average developer on your team, that means I can wake you up at 3 in the morning (God forbid) if I have to, and you can handle the bug. That’s the average developer. If I can’t wake you up at three in the morning, then you’re below average. So another question is “Why are you below average? Is it because I’m failing you, or are you just not coming up to speed?” And then for me, the next thing is the above-average developer. That’s scary, because those are the developers that tend to get bored, and instead of being able to write for the average developer, or bring the team up - that’s where the clever code comes in. That’s where we trip up.
And I tell people all the time, “When you’re hiring, evaluate who this person is for your team. Are they below, are they average, or are they above?” And consciously understand what you’re gonna need to do as an individual and a team to get this person in the right place. If they’re below average – which is great; let’s hire developers who are below average for our teams, so we can bring them up and we can create a stronger team. Those are the best developers in the world, because you can really teach and train them. And now you’ve got somebody who will stay a long time and really work hard and thank you for the opportunity.
But if you put me on a team that’s doing business APIs, I’m above average. If you put me on a team doing crypto, I’m below average. And if I wanted to learn crypto and you gave me that chance, I would be ecstatic, and I’d work hard, and we’d get there. But if you’re hiring somebody who’s above - and I’ve done it before - they can either be amazing mentors and coaches, which is why you’re hiring them, I hope, or they can create utter chaos and destruction, because everything they’re doing is not comprehensible to anybody else on the team, and you’ve gotta maintain it.
[39:40] So those are design philosophies around building teams, around the ideas of all of this stuff. And you wanna apply it back to micro-level decisions, like constructions, functions versus methods, to macro-level decisions around app layer, business layer, foundation layers of code. Policies for these. Import policies. Error handling policies. Who can shut down an application? Who can’t? Who can log? Who can’t? Who can wrap errors? Who can’t? Who can set certain import dependencies? Who can’t?
And you don’t have to have all of it day one. You have to develop it as “Suddenly, there’s a hole in the engineering decisions. Hm. We don’t’ know what to do here. Okay, that means we may not have a design philosophy here.” I get excited when that happens. I’m like, “Oh my God, we’re gonna have a design philosophy for this. Oh my God, we get to do something new! WOOOH!”
Now, you’ve always got some of your base, foundational, right? But those are exciting days. And it’s also exciting sometimes when somebody finds a hole in a design philosophy or policy, where we thought this was the right thing to do, and suddenly we’ve found an exception. And there’s exceptions to everything. There are some exceptions you just can’t take. I don’t really take exceptions between project layers. I’ll never let the foundational layer log. There’s no exception to that. If you have to log, you’re in the business. That’s it.
But then there are other exceptions… Here’s a good one, Johnny. Here’s one where you might take an exception. So baseline design philosophy - a type system is not to be shared. A type system exists to allow package, which is a unit of code in Go, a clearly compile-time unit of code. A type system is design to allow data to flow in and out of the package API, where a package has a purpose. So if the type system’s job is to allow data - if. There’s my philosophy. If you agree with this. You don’t have to agree with anything I’m saying today, by the way… It is totally fair. But if you believe that a type system’s job is to allow data to flow in and out of a package, then that type system is highly localized to that package and that package only. So now you have to make a decision about every API. When it comes to data flowing into an API, you have two choices. You could say “I want the API to accept data based on what it is.” This is what I would call concrete functions, accepting a concrete type. It can accept a user, and only a user. That’s what it is. But thanks to interfaces, we can write polymorphic functions let’s say “No, this API will accept concrete data based on what it can do.” And that’s a next level of refactoring, hopefully; I don’t wanna start there, but suddenly you realize “Not only can I work with a user, I can also work with a customer. Based on this common behavior, we make it polymorphic.” Okay. We all agree with that. You have both choices, and those are the only two choices you have. And those types should exist as types within the scope of that package.
Now, here’s where the fun begins… I have a strong rule that functions should only return concrete values. The function’s job is not to pre-decouple or wrap concrete data already in an interface; that is not the API’s responsibility. It is the caller’s responsibility to decide whether or not they need the decoupling or not. Not mine.
So minus the error interface, which is a whole another set of interesting design philosophies and things I have, I don’t wanna see a function that uses http.handler as the return type. I don’t care if you know or think they’re gonna put it into a handler already. I don’t care, it’s not my job. My job is to give them the concrete value that they can then do with what they want.
“Well, Bill, then we’re leaking a –” No, you’re not leaking a type. They already imported your package. There’s no leaking there, what are you talking about? Stop trying to abstract for the caller. Let them do it. Now, there are two exceptions to this. One is the error interface; we’re handling errors in a decoupled state. There’s lots of reasons why we wanna do that.
[44:04] And until 1.18 comes out, there are times where you might need the empty interface. It should be a little bit of a smell, but let’s be real, I’ve had to write a function or two over the last six years where I was trying to be, for whatever good reason, generic. Maybe I was just doing some data flow… And we were using the empty interface, which - now in 1.18 we’ll be able to replace with a concrete type. [laughter] I mean, what is generics at the end of the day anyway? Generics is concrete, polymorphic functions, where the polymorphism isn’t happening at runtime, the polymorphism is happening at compile time. We’re choosing the concrete type, the data – because the only data that flows is concrete data anyway. We’re just choosing that at compile time. For me, it’s concrete polymorphism, as opposed to runtime polymorphism.
But there’s a philosophy - we shouldn’t be using the interface as a return type, minus those two exceptions, when they happen. And people disagree with me there, but… There it is. So if I see a function that’s returning an interface, it’s immediately code review style, so “What are we doing? Why are we doing this? Prove to me that we need to take an exception”, but it’s gonna be hard, because if I return the concrete type, that doesn’t prevent the caller from doing whatever it is they’re doing.