Ship It! – Episode #16

Optimize for smoothness not speed

with Justin Searls from Test Double

All Episodes

This week Gerhard is joined by Justin Searls, Test Double co-founder and CTO. Also a 🐞 magnet. They talk about how to deal with the pressure of shipping faster, why you should optimize for smoothness not speed, and why focusing on consistency is key. Understanding the real why behind what you do is also important. There’s a lot more to it, as its a nuanced and complex discussion, and well worth your time.

Expect a decade of learnings compressed into one hour, as well as disagreements on some ops and infrastructure topics — all good fun. In the show notes, you will find Gerhard’s favorite conference talks Justin gave a few years back.



Fly – Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at and check out the speedrun in their docs.

LaunchDarklyShip fast. Rest easy. Deploy code at any time, even if a feature isn’t ready to be released to your users. Wrap code in feature flags to get the safety to test new features and infrastructure in prod without impacting the wrong end users.

SignalWire – Build what’s next in communications with video, voice, and messaging APIs powered by elastic cloud infrastructure. Try it today at and use code SHIPIT for $25 in developer credit.

Grafana CloudOur dashboard of choice Grafana is the open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

Notes & Links

📝 Edit Notes

⭐️ Justin is giving the keynote at this year’s Reliable Web Summit. The topic is why distrust is at the heart of a lot of the issues we discussed and how to build trust as an individual, small team, and organization. ⭐️

Reliable Web Summit 2021, Justin Searls keynote prep

Justin Searls: If you’re a programmer and you’re happy with the work you’re doing, you’re growing in the ways you want, and you feel pushed to do your best work as opposed to work the most hours, congratulations! That’s all too rare.

Gerhard & Justin


📝 Edit Transcript


Play the audio to listen along while you enjoy the transcript. 🎧

So in my career, I have been part of many teams that just sling code, or features, our business value, depending on who you talk to. But sometimes that did not feel right, just slinging code, slinging stuff. Yes, you should ship and learn quickly, very important… Constantly challenge your assumptions, very important. But there is such a thing as doing it right and fast, and doing it bad and fast. So what is that difference, Justin? What do you think?

Yeah and that’s the sort of – I’ve been on both sides of this conversation; as an entry-level developer, feeling like I had just an infinite amount of pressure, both from on high, wanting more things shipped faster than was physically possible, pushing constantly to just get features out the door, or to get this thing delivered, where there was a failure to communicate between me and the people managing me… Especially early on, I didn’t know how to discuss things like software complexity or where my time was going. And to feel that pressure coming from above, feeling it kind of like sympathetically through my peers, who were feeling the same pressure and kind of pushing on one another to try to make that pain go away… And then the personal pressure on myself, where I was literally starting from a place of incompetence. And by incompetence, I mean could not independently build the thing I was being asked to build without significant help, significant research, significant learning.

[04:06] And I’m at a point now of relative competence, but it’s taken me 20 years to realize the software that I want to build, as I build. But until I got to that point, I needed the safety of being able – psychological safety, as well as the vulnerability in like a social term to be able to communicate with people around me about like “Hey, I need time to figure this out.” Or it needs to be okay for me to ask a question about how this works.

And so, in the beginning of my career, I viewed your question of just slinging code versus getting stuff right, almost entirely through the lens of these social pressures that others placed on me, that I imagined others placing on me, and that I placed on myself, and it was very difficult for me to escape that.

Later in my career, as I started to move into either non-technical roles, or helping teams in a way that was purely advisory, you’d see teams that even in the absence of pressure, they would still really struggle to get any kind of traction towards delivering anything.

And I would talk to very well-intended VPs of engineering or CTOs about “How do I, without downstream pressuring people, and giving them deadlines and cracking the whip, so to speak, get the outcomes that I want?” And the answer, then and now, seems to be that the autonomy needs to be met with some sort of healthy alignment, drive, engagement, excitement, positive energy around like just wanting to accomplish the thing together as a combined group. And unless those motivations are both present and healthy and rewarded and aligned, you can really struggle, I think, as a team, to find a good cadence.

I think there’s a reason why we keep talking about words like velocity, speed, “How fast can we go?” And I think to somebody who’s new, they might think that that’s all about how fast you can type, right? Or how fast you get features out the door. But really, I started to think about it in terms of not speed per se, but fluidity; how much friction is there day to day in people’s lives, and how organically are they able to take an idea, communicate it into a product feature, or aspect, or stakeholder concern, and then prioritize that and get it scheduled and worked on and delivered and shipped into production, and validated, and so on and so forth? How smooth of a process is that, versus how fractious?

And if we’re going to optimize for one thing, it’s probably smoothness over speed, per se. And it’s difficult, because it sounds like a little bit like woo, I think, to both developers who just want to focus on the technology, and to managers who just want their project done yesterday.

So I don’t know… Long-winded way to maybe not answer your question.

No, I think that was a very good one, because it just showed how much complexity there is, in that answer. And this is complexity that comes from experience, that comes from the real world, all the situations that you have been in personally, and I know that many can relate to you.

What I can relate to the most is that velocity. It really doesn’t matter how many points you deliver in a sprint; it’s not about that, it’s about how can you keep that consistent, not over a few weeks or a few months, but about across years. In a couple of years, how can you consistently maintain a speed that’s healthy, that you can build on top of? That complexity, when it comes - because it will always come - it doesn’t affect that consistency. That is what a healthy delivery mechanism or delivery team looks like to me. It’s never about how many points, it’s about month-on-month, year-on-year, can you keep that up? And if you can do that, well, the sky’s the limit.

[08:16] I think to this, there’s another thing which keeps coming very often - going in the wrong direction, regardless of the speed, will always be wrong. So what would you say about that, about knowing where to point teams, especially the ones that have to collaborate?

Yes, that’s a great question. And I think that a guiding light for me on the most successful teams that I have either been a part of or that I have witnessed, has always been a shared and common just understanding of what their purpose was. So I was part of an organization, a consulting company, just prior to founding Test Double. So we founded Test Double in 2011, so it was like 2009-2010. And they were in that era where it was known as like an agile software consultancy. And so they were peddling, pushing their own kind of blend of agile engineering practices like Scrum and extreme programming. But they did an interesting thing in their sales process of really pushing business value.

And so if user stories rolled up into like Epics – Epics, in their sales parlance and also in how they practiced and delivered, would roll up into business value stories, or value stories. And we would start each engagement by actually getting the whole team in a room - developers, QA, product owners, business stakeholders alike… So it wasn’t behind some secret veil of like a product organization. I didn’t even know that might be considered desirable in certain organizations; I was sufficiently naive to this experience. And what was great about it was we would just have an open and honest put up on the board, like “Hey, executive or stakeholder or person who brought us in here to like build this thing, how is X, if delivered as conceived, going to make or save your company money?” And just boil it down.

And first of all, a lot of executives, it turns out, are uncomfortable with being put on the spot to answer what shouldn’t be a simple question such as that… But when you really sat with it, and as a team forced the conversation out, and then you followed through, not just on – I don’t know… Here’s a project example that we did - currently, our system is so slow that sales reps who go to restaurants to sell food supplies, end up just spending multiple minutes just waiting for pages to load, and they could hit three or four more restaurants a day if it was fast. And that would result in like X dollars. And we’d follow through and be “Okay, so what is X? How would you measure X? How will we assess that X has been attained after we’ve delivered it?” And not only in the kind of initiation phases and discovery of the project, but how will we, on an ongoing basis, track that as the primary metric for success for this project, as opposed to arbitrary story points, right? Because there’s no way to know whether you’re going in the right direction or the wrong direction if you don’t have a shared understanding of what the point of the thing that you’re building is. And most software teams, they don’t know what the point of the thing that they’re building is. Or in this day and age, to know it would be to not want to work on it anymore… You know, whether for ethical reasons, or just because it’s a lot of the stuff that gets built these days is kind of slimy.

And even though, in my practice at Test Double, our clients - they work on fantastic and wonderful products - I think that we have sort of been encouched into this default relationship where product throws “Here’s the features that we want” and “Here’s the things that we need.” There’s a disconnect at the developer level, at the team and engineering level, where we lose sight of or aren’t really bought into or aren’t really included in the discussion of “But why?”

[12:11] And I have seen teams where developers know the answer to “Why”, and when a product owner says, “Hey, here’s how these comps should go… And you click this and then you click this, and then you click this”, a developer who knows what the ultimate goal is, in terms of like business value or whatever the overall organizational objective trying to be met is, can successfully have a real two-way discussion with that product person and push back, or offer alternative ideas, or even find shortcuts that would make things faster. And in the absence of that, everyone just becomes an order taker… You know, I receive these marching orders and then I go and build the thing. And I think that sleight of hand is what actually facilitates and enables a lot of the negative externalities that we see in our industry.

This resonates with me at many different levels. I’ve seen a lot of what you’ve just said in Pivotal Labs. I’ve seen this in the IBM Bluemix Garage. These are the things that, you’re right, were the most important ones from the beginning. I like the engagements, that engagement mentality, I like the focus on business value and customer outcomes… So all that makes perfect sense, and that is a very powerful reason to do things and to ship software. And you can correlate those points to business value. That’s amazing.

However, I’ve also seen a different side of the coin, where you’re working on a software that gets shipped and others get to use, to implement their own things, like for example a database or a proxy, or whatever else, but it’s like more technology-oriented. What do you think the equivalent why and the equivalent business value is in that case?

If you’re building a developer-focused tool - and this could be a paid database, like a Snowflake, or something like that, or an API… Or it could be open source and it could be completely free - I think it’s still important to understand that when developers are your customer, they are still human, and should probably be treated much the same way as a naive non-technical user of a software system that serves naive non-technical users all the time.

In general, I suppose – to clarify your question, are you asking specifically about how this applies when the overall objective is less about making money and more about meeting somebody’s unmet need with technology?

Well, I think with the software that we write, everybody’s trying to make money. But I think sometimes the relationship between making money and writing the software is clearer… Such as, for example, when you write like for business-facing, customer-facing products. But if you have a software that you build that then gets used, and then you have, if you imagine, services attached to that.

So let’s take, for example, MySQL. Let’s say that you’re selling MySQL and you’re building MySQL. I mean, sure you have the licenses that MySQL has, or maybe you have a service that you offer, which MySQL is part of, but then the value is less clear, because you’re not building the software to, as I said, sell licenses. Someone is using it, a part of the service, to deliver value to other users. And in that case, I think the value is less clear. So do you see it differently?

I think that what you’re describing could be phrased as like a different vector, where some products are just obvious. Like, if I was hired as the Chief Product Officer of a company that made branded sweatpants, and you could put like any college name on those sweatpants that you wanted, my job as a product officer would be pretty straightforward, right? Like you can already imagine what that app would look like.

And so I wouldn’t have to really provide a tremendous amount of detail and subject matter expertise. If my business were to be all about and be focused on, and I’m hired as chief product officer to facilitate the FDA approval of highly regulated pharmaceuticals - that job sounds a lot harder, and I hope it pays a lot more, right?

So I think the same holds for kind of what you’re saying in terms of if I’m building a database engine. That is a very, very challenging product category, because it requires – and when you think about it, what are the things that are in common between that and the pharmaceutical case? It’s like, tremendously deep subject matter expertise, and probably a lot of vision, some big dream that a product person can articulate and get other people on board with and break down into smaller reducible problems… And sometimes our wires get crossed, because I think developers and software people, because we are users of a lot of the stuff, we’re able to dogfood, like use the tool as we’re building it, you know, sometimes in a bootstrap way, to build the tool itself… We can sometimes underrate the value of smart, thoughtful product as it pertains to technical solutions that we ourselves could very obviously see ourselves as consumers of.

I think that makes a lot of sense. And it just goes to show that sometimes the complexity in the code that you build, and everything around it can make it difficult to answer that “why”. I mean, you should still do it, it’s still very important, because if developers, software engineers, however you want to call them, are detached from the “why,” why they do what they do, then how can they find all the good things that make what they build good? And how can they get excited about it? How can they be creative and innovative about their work? So I think they go hand in hand and they’re very, very important.


Okay. If you were to describe a development pace that feels sustainable and healthy to you, what would that look like?

You know, that’s a really interesting question, because for me - and it might just be a function of getting older, of being around the bend a certain number of times, on the cadence of different projects of… I used to, especially earlier in my career, I’d feel the ups and downs that came with software development a lot more intensely, early on in my career, and I got married at the beginning of my career…

On Monday, say, I would grab a new feature, and I would immediately feel overwhelmed, and I’d feel like I was drowning in complexity around all this stuff that I didn’t know, and I would just panic. And on Monday night, I’d come home and my wife would see me in this state and she’d try to console me, right? And then on Tuesday, my asynchronous brain would have a chance to think about the problem, chew on it, and I’d make some kind of forward progress, somehow… And I’d feel the wind at my back, and I would feel hope and inspiration. I’d come home that evening and my wife would see me in a better mood, and you know, she’d be like,” Oh, great. He’s better. He’s over this hump.” And then by Wednesday I’d run into another blocker and I’d be in the same pit of despair again.

So what I noticed early on was that I’d have these really high highs and really low lows, and enough so that other loved ones in my life were able to kind of predict my mood based on what they’d seen from me the previous two or three days.

[19:58] And I say that because, to answer your question, I think that it’s a very – I want to like acknowledge and recognize that there are aspects to this work that are deep and creative, and require a lot of asynchronous chewing to successfully build and see the right solution. So even if you could just like, to your point, sling stuff really, really fast, sometimes features are a little bit better if you just take a more deliberate pace and allow yourself an overnight. Right now I’m in a role where I’m kind of split between duties, and I’ve found that it’s actually been really nice that I have a few focused hours to work on software in the morning, and then I get racked with a whole bunch of meetings, but in the asynchronous time where I’m not explicitly thinking about it, I can come at it at the next day and have like a gust of inspiration. And if you think of the stuff that you write as being not just an inevitability of like percent to complete, but that the outcome actually changes based on a whole bunch of stuff that goes on in our brains that we don’t really understand, I’m almost trying to – and I feel like I’m almost you know, describing, some sort of acoustic singer, songwriter… You know, get a particular vibe going, you know… But I feel like what I would want to capture on a multi-person team level is a sense of that same sort of productivity, right? Like, you should feel challenged, you should end some days feeling like you’re up against the wall, and you should have enough time to give things a little bit of space to come at them from different angles the next day. But if like a feature is taking, I don’t know, a week or two weeks, other human factors sink in. You might just feel disillusioned, or disengaged, or dispirited, and other “dis” words.

And so I think that there is a boundary almost on like us as biological organisms. There’s probably an answer there, of different spectrums for different people, for sure, but there’s probably something about the cadence of just the way that our brains work, how we exist as social creatures… And that’s probably where I’d start digging to give a good answer to that question, which is probably very unsatisfying for a lot of people.

Break: [22:21]

So I’ve noticed, Justin, that you had just started a Twitter poll recently. And the Twitter poll is – this the question: “Has the Emergence of DevOps Sped Up or Slowed Down Teams’ ability to Deliver Software Overall?” That was an interesting question. I’m wondering what the responses have been so far. I know we’ll still go for another hour, but first of all, what made you ask that question, and what are people replying?

Yeah, because I have not the most healthy relationship with distractions throughout the day, I have to admit, I’ve only kind of glanced at a few of the replies… But the reason that I asked the question is because I think a lot about how the advent of sort of mainstream open source software - and that began, I think, in the mid-aughts, when… You know, I experienced it in the Java community, because of what the Java vendors were selling; enterprise Java systems and stuff were not particularly well designed or usable, and it created an opening for a lot of open source Java tools, chief among them probably Spring, and the Spring family of brands - to be the first thing that a lot of people in large organizations used… And that was open source.

[24:35] So I got down that rabbit hole – of course, it was still incredibly hostile to actually try to contribute to these things, and if you weren’t a Unix hacker who was super-comfortable in mailing lists as a modality for how to communicate with humans, it was not at all welcoming. But the advent of GitHub, of course, changed all that. You know, once you got over the hump of learning Git.

So you had 227 votes so far…

I was one of them. And the majority, 44.5% are saying “Sped up.” That’s what the majority thinks, and that’s what I voted for as well. We may publish at the end of the poll the results in the show notes, so check them out when the episode comes out.

Okay. So do you think that the DevOps, but more importantly, the automation that seems to be abundant these days - do you think the automation made things better, or do you think it made things worse for shipping software?

So DevOps, just like so many things in open source, became a hot and trendy buzzword that was heavily marketed and associated with either products or sort of halo projects when it comes to recruiting in like big tech companies. And the original idea that DevOps would be like test-driven development, and if you just gave developers testing, they would incorporate it into their team room, they would automate away a lot of the pain around testing and quality assurance, and then the intrinsic quality would increase at a marginal decrease in that team’s ability to deliver things quickly… And in part, accelerated by the fact that they no longer had other people to have to communicate requirements to, so that things can be tested. So like the theory went, if we just did that with operations, we would get the same lift.

And to me - I had that experience, and it was called Heroku. You know, it was the most DevOps thing that I had ever used in my entire life, was being able to say “git push heroku”, not have to think about my operations at all, but know that it was like taken care of, that I had answers to every question about scale, and about adding on additional components, without necessarily having to turn it into my side hustle or my day job or my identity.

But DevOps as a term has changed, as I think the Agile era of the aughts sort of undervalued and played down the importance of operations as a practice. I think a lot of the people who are the Linux sysadmin archetype of the late ‘90s might be seen as sort of getting their comeuppance now or their day in the sun of lots and lots of new innovations and technology that are focused on meeting the same kind of just core desires… You know, some of it’s like “Hey, how big can we make things? How fast can we make these? How can we automate all of these very fancy and cool, but maybe a little arcane and unnecessary at small volumes and scales, like orchestration of like lots and lots of real and virtual systems up in the cloud?” So DevOps and automation tools have enabled and empowered lots and lots of really cool stuff.

And my experience, of “I just want to be able to “git push heroku” and have my app work in the cloud and not have to worry about it ever again” is, I think, still the pinnacle of what I would want as a developer. And of developers that I’ve talked to that have had that experience in real life, they all wish that we could still have that.

[28:10] And Heroku still exists and it’s still a thing, and I love the people there and I love the product, but clearly, it’s not a flavor of answer that the market is searching for, because everyone thinks that they’re going to need Google scale and Facebook scale kind of tools for the job that’s in front of their very straightforward CRUD app, with very few users. And this is all of a piece with sort of startup culture that everything needs to be a billion dollar unicorn to be valuable, and so you have to presume the conclusion that of course they’re going to reach that scale, so then you may as well just on day one reinvent the universe in AWS through all this automation.

So DevOps as an overall meme in the industry I think has been net negative, and slowed down a lot of teams by way of distracting them, where the fact that teams now have to hire a certain number of DevOps people, quote unquote, “to full time just keep the hamster wheel spinning of their cloud-based computing”, whereas before you might even have had an on-premises server that was just sort of sitting there and was just on and worked…

That’s what I, in spite of the poll results - I think like 44% of the people saying sped up… I think some percentage of those people are just people who like really geek out about DevOps technologies and kind of don’t care and are just team pro-DevOps… And some percentage are just people who like living in the ideology that we live in and probably just never had the experience of what if you could just set it and forget it and not have to worry about it again? Because if it’s a means to an end, why would you want the thing that required a ton of effort and thought and complexity and specialized skills and so forth, and constantly having to read up?

So I’m coming across as pretty anti-DevOps here, but I think that when you look at the replies, the number one point of contention is that no one has a shared understanding of like what we mean by the word “DevOps”. And so just to focus on automation here, it’s - yes, I love real automation, but I don’t think that what we’re typically describing around DevOps related activities is like actually automating anything, in terms of actually automating away a problem.

This specific question is something that I’m really passionate about, because I am in the DevOps camp, but for other reasons. So it’s not about the technology. I mean, there are some aspects of that, just to see how things are changing and how they’re improving… But I understand it at a very fundamental level, since I have been involved with it for, as you mentioned, in your case, 20 years, but my focus has been infrastructure. And I live and breathe it on every single team; I went into the Puppets, into the CF engines, into the Chefs, into all those like infrastructure’s code and configuration management, and so on and so forth.

One thing, which I would like to say, the first thing, is that git push is the pinnacle, you’re right. And that should not change., the setup itself has always been git push. We use Ansible, we use Docker, we’re on Kubernetes now… We’ll be using something else not before long, I’m sure of it. It has always been git push, because that is the golden developer experience - push it and forget about it. It’s all the stuff that happens afterwards that makes a resilient system in production, and I think that’s where a lot of the DevOps folks or many DevOps folks forget about, because they get distracted by new and shiny, or “Let’s just keep changing things.”

I see a lot of parallels between test-driven development and testing, and DevOps and infrastructure, where you can see things right or wrong, and the outcome will be a result of your perception, of your principles, and eventually, your skill set as well.

So what I can say is that if your users are happy, latency is low, all the requests are going through, nothing is lost, data isn’t lost, you’re doing something right. And as long as developers, which by the way, are also users - as long as you can just git push and show them what is happening at all levels, whether it’s testing, whether it’s performance, whether it’s regressions, whatever it may be, and eventually running in production… As long as they have a good understanding of how the system works as a whole - well, you’ve achieved your task.

[32:12] So you’re right, Heroku had something for it, and there’s many things that have happened afterwards. But to be honest, not everybody cares about these things, or should they. They shouldn’t really care; they should just git push or just use the service and be happy. That’s the end goal, to simplify it.

So let’s switch focus to something that I know you have a lot of experience in, which is testing. Just as a lot of advice out there about DevOps is bad, I know that a lot of the advice that’s out there about testing is bad. Why is that?

Yes. So in trying to connect the two themes, what you just shared about DevOps is 100% true and matches my experience as well. And where the analogy between the two struggles a little bit, is that if I want to have that git push Heroku experience and it cost me $30 a month, it is very difficult, I think, to be like a human who works on infrastructure and do literally any amount of customization or custom stuff, and compete with that on price. But because of the way that we consider the cost of software development - like, a lot of companies out there, as soon as you’re a full-time employee, your marginal cost on an hourly basis is $0. It’s like they become blind to just the actual expense of people’s time.

So I think of the failures of DevOps as being a failure to recognize the time sink that a lot of teams find themselves dumping lots and lots and lots of hours into when there’s commodity services that if you would only adhere to a set of conventions, would get the job done close to, or as well.

Testing is kind of like same core fallacy, is that we talk a lot about the activity and the importance of it in a sort of boolean state, like “Are you DevOps? Are you not DevOps? Are you in the cloud? Are you not in the cloud? Are you tested? Are you not tested?” It’s sort of like the degree of sophistication, because these are secondary concerns to building a product that does the thing that it’s supposed to do. No one really has the mental and emotional bandwidth to consider, “Am I DevOps-ing good? Am I testing good?”

So simply, there’s usually some, if not a person, like a mood in the team that’s like “In order for us to be a moral and ethical and upstanding team, we should be able to check this box or that box.” And so I want to check the box that I’m doing DevOps on and check the box that I’m testing. And when we consider the bad advice about either, it’s often coming from people who either are operating under that sort of simplistic notion, or for some reason have an incentive to enable and perpetuate it.

And so what I think about the failings of either are when the team lacks an appreciation for the overall total cost of ownership, the overall return on investment of where their time is going, and what are they getting for that time, or you know, in terms of AWS, or if you’re running a bunch of server somewhere to automate your CI build, and money. And if you appreciate that, then you can have a lot of really fun and interesting conversations about testing. How often are you seeing failures? When you see a failure, does it indicate an actual bug? Does it indicate somebody forgot to update a test? Is it brittle and flaky? Like, how long does it take to fix them? How many places do you have to fix the code or the tests in order to get back to a passing build? Like, how much time is lost in terms of the waiting to run the tests locally? Do you run the test locally? Do you run them in the cloud? And if you run them in the cloud, how long does it take until you get notified? And how many people get notified? Does the whole team get notified or just one person?

[36:00] And unless you know, in a quite data-driven way, the answers to a lot of these questions, general context-free advice that you see about the right way to run a test or the right tool to use is not necessarily going to help put you closer to the end goal, which is like the tests serve the team to accomplish what they were trying to do either better or faster.

That’s a great one. That’s a great one. I will have to do something – go back on the DevOps slot; I just can’t leave it. Let’s put it that way, I just can’t leave it.

So DevOps and automation - let’s just talk automation - is something that once you get to a certain… I wouldn’t even say like team size or certain complexity, a certain maturity - you have to do. And yes, you can delegate all of that to some service provider. But knowing how the service provider works and knowing how that service provider integrates with other service providers, whether it’s DNS, whether it’s certificates, whether it’s backups, whether it’s migrations for example, whether it’s a distributed database, like… Because they do fail; all these systems fail in weird and wonderful ways. What about your CI system? Even if you use a managed service, every single one of them, in my experience, have small quirks.

So having that operational knowledge of how these things work, and how they integrate, and what happens between your git push, and the code actually ending in production. And what happens between patching all the stuff that needs to be out there. And maybe - you know what, maybe it’s just like your code dependencies. But what does all that automation around the code look like to actually get the value continuously out there? And when something is wrong, detect it, notify it. And this is not just test, it’s everything else.

So this is like operating your software; there’s a lot of knowledge, even if you’re using every single provider under the sun, and you delegate, you offload all those tasks, they still combine at some point. And whether you know it or not – I don’t know of a platform that does it all, because they can’t; they’re just too big. “I’ve solved the operation of software”, you can’t say that. Just as you can’t say, “I’ve solved testing. Every single type of testing for every single platform.”

So there’s a lot of detail in how stuff runs and how stuff gets out there. And what happens when things fail? Because they do fail. How do systems degrade? So it’s more of that operational knowledge that I think you have to have, that you need to automate around, so that things are easy, so that things are resilient, they fail predictably. And that is the DevOps and the automation. There’s a bit of SRE there that I think about, which is a lot more complex than git push and to run it somewhere, “I don’t want to care about it”; the database, or the load balancer. “I don’t even know that I have a load balancer. “Just take care of it, Heroku.” It’s a bit more complicated than that, because there’s all those other elements that make the big picture.

There’s a burden of knowledge and experience that I bring about testing, that you bring about infrastructure to each new thing that you do or team that you join. And one of the things that I think we, as an industry, especially as we have created more sophisticated tools on every front, whether those are frameworks or language ecosystems or dependencies, or memes like DevOps or TDD, is that we haven’t done a good job, in spite of the fact that there are indeed very good tools for getting started with a brand new thing and slinging some code and proving out a concept - very fast. And there’s a lot of tools for how to with enough time and person power, rebuild Google’s infrastructure at scale.

What we fail to appreciate, I think, are the inflection points or really the step function, or what, in my brain, I envision as a literal cliff of what do you do when you’re transitioning from small enough to be able to use a commodity service and not really care about, so that you can focus on the thing that you’re building, versus the stage two or stage three of the rocket, where after some gigantic chasm, “Oh yes, we – “ In your example, “We can’t use Heroku anymore, so we have to throw that entirely out and now we have to reinvent the universe while we’re continuing to operate suddenly own all of these things.”

[40:20] And so I think about that in terms of slinging code and testing too, right? Like, if you’re able to build a proof of concept, get something out the door, there’s no tests at all… The same would go for applying rigorous architecture and design principles to the software. And then same would go for let’s say like to go faster, we build a server-side rendered traditional HTML templates with variables and stuff stored in a session or something, as opposed to a single page app that’s built in JavaScript that might have a snazzier user experience.

We might have done all of those kinds of things early, shed that complexity, to get out the door as fast as possible. But in each of those cases, once we reach that breaking point – like if I’ve got a server-side render application, I can’t just like flick a switch and then remake it as a single page application, just like I can’t snap my fingers and have sophisticated DevOps. Or if I have like a big mess of spaghetti code all over the place, I can’t just like overnight refactor that into a well-formed, well-considered units of code, or write a test suite that is going to be well paired at appropriate levels of abstraction up and down the stack in terms of everything.

And appreciating that when we talk about scale, we are not talking about twisting a knob up or just getting more revenue… Like, we’re talking about very specific inflection points where you have to start caring about those deeper levels of knowledge that you’re speaking about. And that’s where I think there’s a lot of a failure in our community and our industry, to put a name to those things, and to actually have patterns that are successful for helping teams navigate those transitions.

You asked me before we started recording what I hope to achieve with this podcast. That’s one of the things. How do we share more of that? How do we bring those nuances out? How do we have those discussions and figure out how stuff is changing and how do we need to adapt to those changes? What makes sense for our specific setup? Heroku may make perfect sense; Kubernetes may make perfect sense. We just don’t know, because it’s all specific, and guess what - you have to figure it out for yourself. We can help along the way, maybe simplify some of the choices or make them clear, but at the end of the day, you have to choose and you have to combine those choices long-term. And it’s not a one-off. Continuously. And that’s what makes this really challenging.

Let’s seek a very specific example about what I think is a test suite gone wrong. Imagine, Justin, that you have just joined a team of nine developers, so you’re developer number ten, and they’re all working on the same monolithic codebase. This team has constant test flakes, which means that the testing part of the pipeline that gets code, slinging code into production, it keeps failing for random reasons, multiple times per day. And they keep hitting the Rerun All Jobs button, adjusting timeouts, adding more retries to their tests, that type of thing. First of all, what are your thoughts about this specific situation?

Yes, well - I mean, unfortunately, it is all too common of a situation, and I think that it is challenging to write tests that are not susceptible to several specific things that contribute to what is commonly called brittleness or flakiness, right? The most important thing to understand as we’re approaching this question from the outside in is what is our goal to have this built in the first place? And the goal is probably to have some sort of confidence that things are working. And if we get one green build out of five and no source code changes in the process – like, my confidence is not high, right? And we know this based on kind of intuitive experience that if it passes one time out of five, it means that you have proven the system can work. And you’ve probably also proven not that the system has some sort of fundamental flaw and will break in production, so much as you figured out that there are environmental timing or ordering implications that can cause one or more of your tests to not work.

And I think the first thing that a team should consider when they are running into this problem is to get back to consistent green builds as fast as possible. Because again, if you’re thinking about testing as ROI, if all nine of those people are getting an email every single time that the build breaks, and then say three of those people just independently start racing to go and screw around with timeouts and stuff - that’s a lot of dollars flying out of a window, because of one little tiny thing that might not be well understood. And then no one was no one woke up that day and was like “I’m going to really wrangle all of the flaky stuff in our test suite”, right? They’re all trying to do something else, and this is the thing blocking them from that thing. Not to say they’re half-hearted in their attempt to fix it, but what they’re really needing is a salve in that moment, as opposed to a solution.

So the first thing that I would do is I would lockdown everything. Normally, I don’t like freezing time in the system, right? But I’d probably start with the common quick fixes that I can apply, like “Hey everyone, it is now 2019, August 3rd, it’s a Tuesday, and it’s 11:33 PM, and we’re just going to lock that whole server down that way”, whether we’re using a test tool to do that or Unix. “Hey, everyone, we’re also going to no longer randomize the test order, we’re going to use this particular seed that is known and good, that we’ve seen work.” “Hey, everyone, we’re going to change all of the directory globs that are currently in an unspecified order in Linux, and that’s why Linux builds are failing, but on our Mac where it’s the alpha order, those are all passing.” Like, “Hey, we’re just going to do a sort on all of those glob requests everywhere, so that we’re just loading alphabetically, because it doesn’t really matter…” And we’re going to like go through the half dozen or so quick hit things to just try to get to consistent builds as soon as possible. And hopefully, that’s one person one day, if ever.

Yeah. What does the consistent build mean? So we said one in five passing; very bad, very inconsistent. What does a consistent build mean to you?

[47:53] For me, the ideal and the asymptotic goal that I would have is anytime that I saw a build fail, it means that something is broken in the application. And by the way, I think this is actually the popular notion that managers who are told about testing and see a build on the wall - like, their intuitive notion is that if light goes red, it means that the system that they are extensively paying to have built is currently in a state where it does not work, right? That’s intuitive. But if you’re like this team that has become habituated to this environment, where things just break randomly, those developers will have lost confidence that red means that anything is broken at all.

But the business person is still thinking, “Wow, there’s like a lot of failures, and so it’s time well spent to go and fix those brokenness”, because like in their mind, in the business person’s mind, like anytime spent fixing the build is time spent making my system work when it didn’t work. And so that seems valuable. But if you were to tell them that 95% of the reds in the build that were distracting the whole team were just bullshit implementation problems in the way that we wrote tests, because data gets polluted from one test to the other tests, depending on all sorts of different things,- that business owner will probably be rather upset, right?

And we shouldn’t be the same kind of upset, right? So the flakiness is one thing. I would only want my test to fail if something was actually broken. And I would go a step further and say, “I only want my build to fail if the production code doesn’t work.” So if a test was just somebody forgot to update the test, I don’t want to see that in the build. I want people to run tests locally. And if they’re not running tests locally, I’d want to figure out why. And then I want to make it fast enough so that people do that and they have the tools to want to do that. And so that’s the best answer I have to your question.

Yeah. What about – so there’s follow up which I think it complicates things and it makes them more real as well… What about a test suite that has to rely on integration tests, because the software that it tests is really complex, and you have to do black box testing… Because a lot of the stuff - like, you’re testing the correctness of the system at scale. So how does the system break, for example, when we expect it to break? So that’s one aspect.

The other aspect is not everyone can run it locally, because the stuff that the system has to provision, the setup is too big; it won’t fit on a development machine. It needs multiple machines just to basically orchestrate the system as a whole. That may be an over reliance on integration tests, but this type of knowledge to go to something like TLA+, spec-based testing - and there is another one which I’m blanking out on; not feature-based testing… Property-based testing. To do that type of testing, it requires a special type of knowledge and a special type of approach, especially when it comes to like a heavy data system.

So there’s that aspect, but the other one is around different CI systems flaking in different ways. So the same test runs in two separate CI systems, and not the same tests fail the same way in the two different CI systems. There’s nothing wrong with the tests; there may be a timing issue, but more importantly, it’s a resource contention issue. So what would you do in that case? Because it’s not the order of anything, it’s not like the globbing staff, it’s not – nothing that you can do, like the simple fixes; it’s just like the sheer scale of the thing. And maybe a lot of approaches over the years, which maybe weren’t as good… So you’d think this is like a mature codebase, like a decade plus, which tends to happen, which happens to be a lot of the software, which gets more complex, more brittle, more – you know, you just have to tend to it.

There’s another aspect to what we were discussing earlier, about this sort of boolean mindset that the people have - is it tested, is it not? And one of the things that ideology has led to most teams, at least the majority of that I run into, to conclude is that there is a single bucket for every app called test, and you just put tests in it, and you’re lucky if it hasn’t directories that are nested underneath there, in terms of organizing the tests.

[52:11] And there is a default sort of assumption – even on, I would say, highly competent teams, you might be able to expect that there are unit tests that will indeed run locally, of most things that are added, and there might be like one integration test that may be will run the whole application and just prove that a feature works. You know, if you added widgets to an existing application, maybe you would expect that integration test to create a widget, read some widgets, update a widget and delete a widget, right? And just, again, do that CRUD flow for N times, for N features over the life of the system.

There’s two things that make tests very, very expensive to run, that you hit on. One, on the first bit is this logical organization failure on our parts. So what I would say is - okay, let’s say that you’re building a system that interacts with an early client of ours, interacts with the paging network on electrical grids to communicate to thermostats, to make them go up and down when it’s really hot outside in Texas. Like, we could, if we wanted to, write every single test, with the assumption that a thermostat on a breadboard with a serial number is plugged in and powered on and on this network, so that we can like actually interface with it through every test, even if the tests that have nothing to do with that particular integration, even if our architecture could be built in such a way that we could just sort of like have a driver to that thing that we could easily mock out and like get away from that… Bbut if we walk into the assumption that we need a maximally real test, every single thing this system does, we need to be able to prove scientifically that like we’re going to be able to go completely end to end - like, if that’s your orientation, whether explicitly, like you’re buying into it, or implicitly, like I don’t know, we’ve got a test directory, and every test just needs to – at least one test needs to talk to this thermostat. So we just have to assume that they all do you… Like, that is a failure to organize based on the constraints that you’re under.

And so what I would encourage people to do is have multiple test suites and work backwards. So like what’s the most resource contented environment that you might have? Maybe it’s spinning up 100 different servers and so forth, they are operating under a particular scale… Like, great, we’re going to do the bare minimum number of testing to achieve confidence there, and then we’re going to break out a new bucket where you don’t have access to that, and everything else will default to that until we can find another really expensive resource contendee thing to do, and then we’re going to try to increasingly make the default place where people put tests into the least integrated, least complicated, necessary infrastructure for them to work.

So that’s, I think, at the end of the day, one of the answers to, I think, all these questions - you end up with trying to maximize the number of isolated units that can be tested in isolation, where you get really straightforward, not only fast feedback, but the feedback tells you exactly what line the failure was on kind of tests, and some number of locally integrated, everything’s talking to everything else, but all inside of that monorepo… Some number of contract tests that will actually just go and like validate assertions against a running instance of some server that you integrate with, some number of like driver style tests to that kind of thermostat… And then, you know, a hint, maybe just one, just a golden path of like “Hey, when we turn this all on in the real infrastructure that we really have, can I make a user and log in?” And maybe that’s the only thing that you actually needed to prove in that fully plugged together state. But then you’ll sidle up to teams where they have 1050 unit tests, and you add one marginal unit test just to make sure that emails are formatted correctly… And that’s running up and down the stack, and now the base time that is each individual test, like if we run like logically and serially, is like four minutes long, and every single time you add anything, it’s like this really outsized cost.

We have time for one last question, and it’s going to be a quick one…

…and I’m hoping, more importantly, a fun one. I’m curious how you describe, according to you, which is the most impressive Olympic event you’ve seen come out of Tokyo so far?

Alright, I study Japanese language, and so the only reason I have an answer to this is because I watch the Japanese news every time I’m on my bike. And so the most impressive thing that I saw was a 13-year-old young woman from Osaka winning the gold medal in skateboarding… And to see the level of excitement that had generated, because I believe she’s now the youngest gold medal winner in Japanese history, especially in a new event. So I thought that was pretty darn neat.

That’s great. I was thinking something else, but maybe we drop that in the show notes, because it’s too funny… I know that you could not have described that, but that’s what I was hoping would happen. That’s okay, it’ll be in the show notes, you can check it out. This has been a pleasure, Justin. I think we need to do another one. I mean, this just got me started. There’s so many more questions that I have for you. I’m looking forward to it. Thank you very much.

Absolutely. Take care. Thank you.


Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00