This week Gerhard is joined by Justin Searls, Test Double co-founder and CTO. Also a š magnet. They talk about how to deal with the pressure of shipping faster, why you should optimize for smoothness not speed, and why focusing on consistency is key. Understanding the real why behind what you do is also important. Thereās a lot more to it, as its a nuanced and complex discussion, and well worth your time.
Expect a decade of learnings compressed into one hour, as well as disagreements on some ops and infrastructure topics ā all good fun. In the show notes, you will find Gerhardās favorite conference talks Justin gave a few years back.
Featuring
Sponsors
Fly ā Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
LaunchDarkly ā Ship fast. Rest easy. Deploy code at any time, even if a feature isnāt ready to be released to your users. Wrap code in feature flags to get the safety to test new features and infrastructure in prod without impacting the wrong end users.
SignalWire ā Build whatās next in communications with video, voice, and messaging APIs powered by elastic cloud infrastructure. Try it today at signalwire.com and use code SHIPIT
for $25 in developer credit.
Grafana Cloud ā Our dashboard of choice Grafana is the open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Notes & Links
āļø Justin is giving the keynote at this yearās Reliable Web Summit. The topic is why distrust is at the heart of a lot of the issues we discussed and how to build trust as an individual, small team, and organization. āļø
- It all started with this Tweet
- 5 for 5000: Find your Leading Indicators
- š¬ The Selfish Programmer
- š¬ Running a business, demystified
- š¬ How to program
- The most impressive Olympic event Iāve seen out of Tokyo so far
Justin Searls: If youāre a programmer and youāre happy with the work youāre doing, youāre growing in the ways you want, and you feel pushed to do your best work as opposed to work the most hours, congratulations! Thatās all too rare.
Transcript
Play the audio to listen along while you enjoy the transcript. š§
So in my career, I have been part of many teams that just sling code, or features, our business value, depending on who you talk to. But sometimes that did not feel right, just slinging code, slinging stuff. Yes, you should ship and learn quickly, very importantā¦ Constantly challenge your assumptions, very important. But there is such a thing as doing it right and fast, and doing it bad and fast. So what is that difference, Justin? What do you think?
Yeah and thatās the sort of ā Iāve been on both sides of this conversation; as an entry-level developer, feeling like I had just an infinite amount of pressure, both from on high, wanting more things shipped faster than was physically possible, pushing constantly to just get features out the door, or to get this thing delivered, where there was a failure to communicate between me and the people managing meā¦ Especially early on, I didnāt know how to discuss things like software complexity or where my time was going. And to feel that pressure coming from above, feeling it kind of like sympathetically through my peers, who were feeling the same pressure and kind of pushing on one another to try to make that pain go awayā¦ And then the personal pressure on myself, where I was literally starting from a place of incompetence. And by incompetence, I mean could not independently build the thing I was being asked to build without significant help, significant research, significant learning.
[04:06] And Iām at a point now of relative competence, but itās taken me 20 years to realize the software that I want to build, as I build. But until I got to that point, I needed the safety of being able ā psychological safety, as well as the vulnerability in like a social term to be able to communicate with people around me about like āHey, I need time to figure this out.ā Or it needs to be okay for me to ask a question about how this works.
And so, in the beginning of my career, I viewed your question of just slinging code versus getting stuff right, almost entirely through the lens of these social pressures that others placed on me, that I imagined others placing on me, and that I placed on myself, and it was very difficult for me to escape that.
Later in my career, as I started to move into either non-technical roles, or helping teams in a way that was purely advisory, youād see teams that even in the absence of pressure, they would still really struggle to get any kind of traction towards delivering anything.
And I would talk to very well-intended VPs of engineering or CTOs about āHow do I, without downstream pressuring people, and giving them deadlines and cracking the whip, so to speak, get the outcomes that I want?ā And the answer, then and now, seems to be that the autonomy needs to be met with some sort of healthy alignment, drive, engagement, excitement, positive energy around like just wanting to accomplish the thing together as a combined group. And unless those motivations are both present and healthy and rewarded and aligned, you can really struggle, I think, as a team, to find a good cadence.
I think thereās a reason why we keep talking about words like velocity, speed, āHow fast can we go?ā And I think to somebody whoās new, they might think that thatās all about how fast you can type, right? Or how fast you get features out the door. But really, I started to think about it in terms of not speed per se, but fluidity; how much friction is there day to day in peopleās lives, and how organically are they able to take an idea, communicate it into a product feature, or aspect, or stakeholder concern, and then prioritize that and get it scheduled and worked on and delivered and shipped into production, and validated, and so on and so forth? How smooth of a process is that, versus how fractious?
And if weāre going to optimize for one thing, itās probably smoothness over speed, per se. And itās difficult, because it sounds like a little bit like woo, I think, to both developers who just want to focus on the technology, and to managers who just want their project done yesterday.
Yeah.
So I donāt knowā¦ Long-winded way to maybe not answer your question.
No, I think that was a very good one, because it just showed how much complexity there is, in that answer. And this is complexity that comes from experience, that comes from the real world, all the situations that you have been in personally, and I know that many can relate to you.
What I can relate to the most is that velocity. It really doesnāt matter how many points you deliver in a sprint; itās not about that, itās about how can you keep that consistent, not over a few weeks or a few months, but about across years. In a couple of years, how can you consistently maintain a speed thatās healthy, that you can build on top of? That complexity, when it comes - because it will always come - it doesnāt affect that consistency. That is what a healthy delivery mechanism or delivery team looks like to me. Itās never about how many points, itās about month-on-month, year-on-year, can you keep that up? And if you can do that, well, the skyās the limit.
[08:16] I think to this, thereās another thing which keeps coming very often - going in the wrong direction, regardless of the speed, will always be wrong. So what would you say about that, about knowing where to point teams, especially the ones that have to collaborate?
Yes, thatās a great question. And I think that a guiding light for me on the most successful teams that I have either been a part of or that I have witnessed, has always been a shared and common just understanding of what their purpose was. So I was part of an organization, a consulting company, just prior to founding Test Double. So we founded Test Double in 2011, so it was like 2009-2010. And they were in that era where it was known as like an agile software consultancy. And so they were peddling, pushing their own kind of blend of agile engineering practices like Scrum and extreme programming. But they did an interesting thing in their sales process of really pushing business value.
And so if user stories rolled up into like Epics ā Epics, in their sales parlance and also in how they practiced and delivered, would roll up into business value stories, or value stories. And we would start each engagement by actually getting the whole team in a room - developers, QA, product owners, business stakeholders alikeā¦ So it wasnāt behind some secret veil of like a product organization. I didnāt even know that might be considered desirable in certain organizations; I was sufficiently naive to this experience. And what was great about it was we would just have an open and honest put up on the board, like āHey, executive or stakeholder or person who brought us in here to like build this thing, how is X, if delivered as conceived, going to make or save your company money?ā And just boil it down.
And first of all, a lot of executives, it turns out, are uncomfortable with being put on the spot to answer what shouldnāt be a simple question such as thatā¦ But when you really sat with it, and as a team forced the conversation out, and then you followed through, not just on ā I donāt knowā¦ Hereās a project example that we did - currently, our system is so slow that sales reps who go to restaurants to sell food supplies, end up just spending multiple minutes just waiting for pages to load, and they could hit three or four more restaurants a day if it was fast. And that would result in like X dollars. And weād follow through and be āOkay, so what is X? How would you measure X? How will we assess that X has been attained after weāve delivered it?ā And not only in the kind of initiation phases and discovery of the project, but how will we, on an ongoing basis, track that as the primary metric for success for this project, as opposed to arbitrary story points, right? Because thereās no way to know whether youāre going in the right direction or the wrong direction if you donāt have a shared understanding of what the point of the thing that youāre building is. And most software teams, they donāt know what the point of the thing that theyāre building is. Or in this day and age, to know it would be to not want to work on it anymoreā¦ You know, whether for ethical reasons, or just because itās a lot of the stuff that gets built these days is kind of slimy.
And even though, in my practice at Test Double, our clients - they work on fantastic and wonderful products - I think that we have sort of been encouched into this default relationship where product throws āHereās the features that we wantā and āHereās the things that we need.ā Thereās a disconnect at the developer level, at the team and engineering level, where we lose sight of or arenāt really bought into or arenāt really included in the discussion of āBut why?ā
[12:11] And I have seen teams where developers know the answer to āWhyā, and when a product owner says, āHey, hereās how these comps should goā¦ And you click this and then you click this, and then you click thisā, a developer who knows what the ultimate goal is, in terms of like business value or whatever the overall organizational objective trying to be met is, can successfully have a real two-way discussion with that product person and push back, or offer alternative ideas, or even find shortcuts that would make things faster. And in the absence of that, everyone just becomes an order takerā¦ You know, I receive these marching orders and then I go and build the thing. And I think that sleight of hand is what actually facilitates and enables a lot of the negative externalities that we see in our industry.
This resonates with me at many different levels. Iāve seen a lot of what youāve just said in Pivotal Labs. Iāve seen this in the IBM Bluemix Garage. These are the things that, youāre right, were the most important ones from the beginning. I like the engagements, that engagement mentality, I like the focus on business value and customer outcomesā¦ So all that makes perfect sense, and that is a very powerful reason to do things and to ship software. And you can correlate those points to business value. Thatās amazing.
However, Iāve also seen a different side of the coin, where youāre working on a software that gets shipped and others get to use, to implement their own things, like for example a database or a proxy, or whatever else, but itās like more technology-oriented. What do you think the equivalent why and the equivalent business value is in that case?
If youāre building a developer-focused tool - and this could be a paid database, like a Snowflake, or something like that, or an APIā¦ Or it could be open source and it could be completely free - I think itās still important to understand that when developers are your customer, they are still human, and should probably be treated much the same way as a naive non-technical user of a software system that serves naive non-technical users all the time.
In general, I suppose ā to clarify your question, are you asking specifically about how this applies when the overall objective is less about making money and more about meeting somebodyās unmet need with technology?
Well, I think with the software that we write, everybodyās trying to make money. But I think sometimes the relationship between making money and writing the software is clearerā¦ Such as, for example, when you write like for business-facing, customer-facing products. But if you have a software that you build that then gets used, and then you have, if you imagine, services attached to that.
So letās take, for example, MySQL. Letās say that youāre selling MySQL and youāre building MySQL. I mean, sure you have the licenses that MySQL has, or maybe you have a service that you offer, which MySQL is part of, but then the value is less clear, because youāre not building the software to, as I said, sell licenses. Someone is using it, a part of the service, to deliver value to other users. And in that case, I think the value is less clear. So do you see it differently?
I think that what youāre describing could be phrased as like a different vector, where some products are just obvious. Like, if I was hired as the Chief Product Officer of a company that made branded sweatpants, and you could put like any college name on those sweatpants that you wanted, my job as a product officer would be pretty straightforward, right? Like you can already imagine what that app would look like.
[16:16] Yup.
And so I wouldnāt have to really provide a tremendous amount of detail and subject matter expertise. If my business were to be all about and be focused on, and Iām hired as chief product officer to facilitate the FDA approval of highly regulated pharmaceuticals - that job sounds a lot harder, and I hope it pays a lot more, right?
Yes.
So I think the same holds for kind of what youāre saying in terms of if Iām building a database engine. That is a very, very challenging product category, because it requires ā and when you think about it, what are the things that are in common between that and the pharmaceutical case? Itās like, tremendously deep subject matter expertise, and probably a lot of vision, some big dream that a product person can articulate and get other people on board with and break down into smaller reducible problemsā¦ And sometimes our wires get crossed, because I think developers and software people, because we are users of a lot of the stuff, weāre able to dogfood, like use the tool as weāre building it, you know, sometimes in a bootstrap way, to build the tool itselfā¦ We can sometimes underrate the value of smart, thoughtful product as it pertains to technical solutions that we ourselves could very obviously see ourselves as consumers of.
I think that makes a lot of sense. And it just goes to show that sometimes the complexity in the code that you build, and everything around it can make it difficult to answer that āwhyā. I mean, you should still do it, itās still very important, because if developers, software engineers, however you want to call them, are detached from the āwhy,ā why they do what they do, then how can they find all the good things that make what they build good? And how can they get excited about it? How can they be creative and innovative about their work? So I think they go hand in hand and theyāre very, very important.
Totally.
Okay. If you were to describe a development pace that feels sustainable and healthy to you, what would that look like?
You know, thatās a really interesting question, because for me - and it might just be a function of getting older, of being around the bend a certain number of times, on the cadence of different projects ofā¦ I used to, especially earlier in my career, Iād feel the ups and downs that came with software development a lot more intensely, early on in my career, and I got married at the beginning of my careerā¦
On Monday, say, I would grab a new feature, and I would immediately feel overwhelmed, and Iād feel like I was drowning in complexity around all this stuff that I didnāt know, and I would just panic. And on Monday night, Iād come home and my wife would see me in this state and sheād try to console me, right? And then on Tuesday, my asynchronous brain would have a chance to think about the problem, chew on it, and Iād make some kind of forward progress, somehowā¦ And Iād feel the wind at my back, and I would feel hope and inspiration. Iād come home that evening and my wife would see me in a better mood, and you know, sheād be like,ā Oh, great. Heās better. Heās over this hump.ā And then by Wednesday Iād run into another blocker and Iād be in the same pit of despair again.
So what I noticed early on was that Iād have these really high highs and really low lows, and enough so that other loved ones in my life were able to kind of predict my mood based on what theyād seen from me the previous two or three days.
[19:58] And I say that because, to answer your question, I think that itās a very ā I want to like acknowledge and recognize that there are aspects to this work that are deep and creative, and require a lot of asynchronous chewing to successfully build and see the right solution. So even if you could just like, to your point, sling stuff really, really fast, sometimes features are a little bit better if you just take a more deliberate pace and allow yourself an overnight. Right now Iām in a role where Iām kind of split between duties, and Iāve found that itās actually been really nice that I have a few focused hours to work on software in the morning, and then I get racked with a whole bunch of meetings, but in the asynchronous time where Iām not explicitly thinking about it, I can come at it at the next day and have like a gust of inspiration. And if you think of the stuff that you write as being not just an inevitability of like percent to complete, but that the outcome actually changes based on a whole bunch of stuff that goes on in our brains that we donāt really understand, Iām almost trying to ā and I feel like Iām almost you know, describing, some sort of acoustic singer, songwriterā¦ You know, get a particular vibe going, you knowā¦ But I feel like what I would want to capture on a multi-person team level is a sense of that same sort of productivity, right? Like, you should feel challenged, you should end some days feeling like youāre up against the wall, and you should have enough time to give things a little bit of space to come at them from different angles the next day. But if like a feature is taking, I donāt know, a week or two weeks, other human factors sink in. You might just feel disillusioned, or disengaged, or dispirited, and other ādisā words.
And so I think that there is a boundary almost on like us as biological organisms. Thereās probably an answer there, of different spectrums for different people, for sure, but thereās probably something about the cadence of just the way that our brains work, how we exist as social creaturesā¦ And thatās probably where Iād start digging to give a good answer to that question, which is probably very unsatisfying for a lot of people.
Break: [22:21]
So Iāve noticed, Justin, that you had just started a Twitter poll recently. And the Twitter poll is ā this the question: āHas the Emergence of DevOps Sped Up or Slowed Down Teamsā ability to Deliver Software Overall?ā That was an interesting question. Iām wondering what the responses have been so far. I know weāll still go for another hour, but first of all, what made you ask that question, and what are people replying?
Yeah, because I have not the most healthy relationship with distractions throughout the day, I have to admit, Iāve only kind of glanced at a few of the repliesā¦ But the reason that I asked the question is because I think a lot about how the advent of sort of mainstream open source software - and that began, I think, in the mid-aughts, whenā¦ You know, I experienced it in the Java community, because of what the Java vendors were selling; enterprise Java systems and stuff were not particularly well designed or usable, and it created an opening for a lot of open source Java tools, chief among them probably Spring, and the Spring family of brands - to be the first thing that a lot of people in large organizations usedā¦ And that was open source.
[24:35] So I got down that rabbit hole ā of course, it was still incredibly hostile to actually try to contribute to these things, and if you werenāt a Unix hacker who was super-comfortable in mailing lists as a modality for how to communicate with humans, it was not at all welcoming. But the advent of GitHub, of course, changed all that. You know, once you got over the hump of learning Git.
So you had 227 votes so farā¦
Yes.
I was one of them. And the majority, 44.5% are saying āSped up.ā Thatās what the majority thinks, and thatās what I voted for as well. We may publish at the end of the poll the results in the show notes, so check them out when the episode comes out.
Okay. So do you think that the DevOps, but more importantly, the automation that seems to be abundant these days - do you think the automation made things better, or do you think it made things worse for shipping software?
So DevOps, just like so many things in open source, became a hot and trendy buzzword that was heavily marketed and associated with either products or sort of halo projects when it comes to recruiting in like big tech companies. And the original idea that DevOps would be like test-driven development, and if you just gave developers testing, they would incorporate it into their team room, they would automate away a lot of the pain around testing and quality assurance, and then the intrinsic quality would increase at a marginal decrease in that teamās ability to deliver things quicklyā¦ And in part, accelerated by the fact that they no longer had other people to have to communicate requirements to, so that things can be tested. So like the theory went, if we just did that with operations, we would get the same lift.
And to me - I had that experience, and it was called Heroku. You know, it was the most DevOps thing that I had ever used in my entire life, was being able to say āgit push herokuā, not have to think about my operations at all, but know that it was like taken care of, that I had answers to every question about scale, and about adding on additional components, without necessarily having to turn it into my side hustle or my day job or my identity.
But DevOps as a term has changed, as I think the Agile era of the aughts sort of undervalued and played down the importance of operations as a practice. I think a lot of the people who are the Linux sysadmin archetype of the late ā90s might be seen as sort of getting their comeuppance now or their day in the sun of lots and lots of new innovations and technology that are focused on meeting the same kind of just core desiresā¦ You know, some of itās like āHey, how big can we make things? How fast can we make these? How can we automate all of these very fancy and cool, but maybe a little arcane and unnecessary at small volumes and scales, like orchestration of like lots and lots of real and virtual systems up in the cloud?ā So DevOps and automation tools have enabled and empowered lots and lots of really cool stuff.
And my experience, of āI just want to be able to āgit push herokuā and have my app work in the cloud and not have to worry about it ever againā is, I think, still the pinnacle of what I would want as a developer. And of developers that Iāve talked to that have had that experience in real life, they all wish that we could still have that.
[28:10] And Heroku still exists and itās still a thing, and I love the people there and I love the product, but clearly, itās not a flavor of answer that the market is searching for, because everyone thinks that theyāre going to need Google scale and Facebook scale kind of tools for the job thatās in front of their very straightforward CRUD app, with very few users. And this is all of a piece with sort of startup culture that everything needs to be a billion dollar unicorn to be valuable, and so you have to presume the conclusion that of course theyāre going to reach that scale, so then you may as well just on day one reinvent the universe in AWS through all this automation.
So DevOps as an overall meme in the industry I think has been net negative, and slowed down a lot of teams by way of distracting them, where the fact that teams now have to hire a certain number of DevOps people, quote unquote, āto full time just keep the hamster wheel spinning of their cloud-based computingā, whereas before you might even have had an on-premises server that was just sort of sitting there and was just on and workedā¦
Thatās what I, in spite of the poll results - I think like 44% of the people saying sped upā¦ I think some percentage of those people are just people who like really geek out about DevOps technologies and kind of donāt care and are just team pro-DevOpsā¦ And some percentage are just people who like living in the ideology that we live in and probably just never had the experience of what if you could just set it and forget it and not have to worry about it again? Because if itās a means to an end, why would you want the thing that required a ton of effort and thought and complexity and specialized skills and so forth, and constantly having to read up?
So Iām coming across as pretty anti-DevOps here, but I think that when you look at the replies, the number one point of contention is that no one has a shared understanding of like what we mean by the word āDevOpsā. And so just to focus on automation here, itās - yes, I love real automation, but I donāt think that what weāre typically describing around DevOps related activities is like actually automating anything, in terms of actually automating away a problem.
This specific question is something that Iām really passionate about, because I am in the DevOps camp, but for other reasons. So itās not about the technology. I mean, there are some aspects of that, just to see how things are changing and how theyāre improvingā¦ But I understand it at a very fundamental level, since I have been involved with it for, as you mentioned, in your case, 20 years, but my focus has been infrastructure. And I live and breathe it on every single team; I went into the Puppets, into the CF engines, into the Chefs, into all those like infrastructureās code and configuration management, and so on and so forth.
One thing, which I would like to say, the first thing, is that git push is the pinnacle, youāre right. And that should not change. Changelock.com, the setup itself has always been git push. We use Ansible, we use Docker, weāre on Kubernetes nowā¦ Weāll be using something else not before long, Iām sure of it. It has always been git push, because that is the golden developer experience - push it and forget about it. Itās all the stuff that happens afterwards that makes a resilient system in production, and I think thatās where a lot of the DevOps folks or many DevOps folks forget about, because they get distracted by new and shiny, or āLetās just keep changing things.ā
I see a lot of parallels between test-driven development and testing, and DevOps and infrastructure, where you can see things right or wrong, and the outcome will be a result of your perception, of your principles, and eventually, your skill set as well.
So what I can say is that if your users are happy, latency is low, all the requests are going through, nothing is lost, data isnāt lost, youāre doing something right. And as long as developers, which by the way, are also users - as long as you can just git push and show them what is happening at all levels, whether itās testing, whether itās performance, whether itās regressions, whatever it may be, and eventually running in productionā¦ As long as they have a good understanding of how the system works as a whole - well, youāve achieved your task.
[32:12] So youāre right, Heroku had something for it, and thereās many things that have happened afterwards. But to be honest, not everybody cares about these things, or should they. They shouldnāt really care; they should just git push or just use the service and be happy. Thatās the end goal, to simplify it.
So letās switch focus to something that I know you have a lot of experience in, which is testing. Just as a lot of advice out there about DevOps is bad, I know that a lot of the advice thatās out there about testing is bad. Why is that?
Yes. So in trying to connect the two themes, what you just shared about DevOps is 100% true and matches my experience as well. And where the analogy between the two struggles a little bit, is that if I want to have that git push Heroku experience and it cost me $30 a month, it is very difficult, I think, to be like a human who works on infrastructure and do literally any amount of customization or custom stuff, and compete with that on price. But because of the way that we consider the cost of software development - like, a lot of companies out there, as soon as youāre a full-time employee, your marginal cost on an hourly basis is $0. Itās like they become blind to just the actual expense of peopleās time.
So I think of the failures of DevOps as being a failure to recognize the time sink that a lot of teams find themselves dumping lots and lots and lots of hours into when thereās commodity services that if you would only adhere to a set of conventions, would get the job done close to, or as well.
Testing is kind of like same core fallacy, is that we talk a lot about the activity and the importance of it in a sort of boolean state, like āAre you DevOps? Are you not DevOps? Are you in the cloud? Are you not in the cloud? Are you tested? Are you not tested?ā Itās sort of like the degree of sophistication, because these are secondary concerns to building a product that does the thing that itās supposed to do. No one really has the mental and emotional bandwidth to consider, āAm I DevOps-ing good? Am I testing good?ā
So simply, thereās usually some, if not a person, like a mood in the team thatās like āIn order for us to be a moral and ethical and upstanding team, we should be able to check this box or that box.ā And so I want to check the box that Iām doing DevOps on and check the box that Iām testing. And when we consider the bad advice about either, itās often coming from people who either are operating under that sort of simplistic notion, or for some reason have an incentive to enable and perpetuate it.
And so what I think about the failings of either are when the team lacks an appreciation for the overall total cost of ownership, the overall return on investment of where their time is going, and what are they getting for that time, or you know, in terms of AWS, or if youāre running a bunch of server somewhere to automate your CI build, and money. And if you appreciate that, then you can have a lot of really fun and interesting conversations about testing. How often are you seeing failures? When you see a failure, does it indicate an actual bug? Does it indicate somebody forgot to update a test? Is it brittle and flaky? Like, how long does it take to fix them? How many places do you have to fix the code or the tests in order to get back to a passing build? Like, how much time is lost in terms of the waiting to run the tests locally? Do you run the test locally? Do you run them in the cloud? And if you run them in the cloud, how long does it take until you get notified? And how many people get notified? Does the whole team get notified or just one person?
[36:00] And unless you know, in a quite data-driven way, the answers to a lot of these questions, general context-free advice that you see about the right way to run a test or the right tool to use is not necessarily going to help put you closer to the end goal, which is like the tests serve the team to accomplish what they were trying to do either better or faster.
Thatās a great one. Thatās a great one. I will have to do something ā go back on the DevOps slot; I just canāt leave it. Letās put it that way, I just canāt leave it.
So DevOps and automation - letās just talk automation - is something that once you get to a certainā¦ I wouldnāt even say like team size or certain complexity, a certain maturity - you have to do. And yes, you can delegate all of that to some service provider. But knowing how the service provider works and knowing how that service provider integrates with other service providers, whether itās DNS, whether itās certificates, whether itās backups, whether itās migrations for example, whether itās a distributed database, likeā¦ Because they do fail; all these systems fail in weird and wonderful ways. What about your CI system? Even if you use a managed service, every single one of them, in my experience, have small quirks.
So having that operational knowledge of how these things work, and how they integrate, and what happens between your git push, and the code actually ending in production. And what happens between patching all the stuff that needs to be out there. And maybe - you know what, maybe itās just like your code dependencies. But what does all that automation around the code look like to actually get the value continuously out there? And when something is wrong, detect it, notify it. And this is not just test, itās everything else.
So this is like operating your software; thereās a lot of knowledge, even if youāre using every single provider under the sun, and you delegate, you offload all those tasks, they still combine at some point. And whether you know it or not ā I donāt know of a platform that does it all, because they canāt; theyāre just too big. āIāve solved the operation of softwareā, you canāt say that. Just as you canāt say, āIāve solved testing. Every single type of testing for every single platform.ā
So thereās a lot of detail in how stuff runs and how stuff gets out there. And what happens when things fail? Because they do fail. How do systems degrade? So itās more of that operational knowledge that I think you have to have, that you need to automate around, so that things are easy, so that things are resilient, they fail predictably. And that is the DevOps and the automation. Thereās a bit of SRE there that I think about, which is a lot more complex than git push and to run it somewhere, āI donāt want to care about itā; the database, or the load balancer. āI donāt even know that I have a load balancer. āJust take care of it, Heroku.ā Itās a bit more complicated than that, because thereās all those other elements that make the big picture.
Thereās a burden of knowledge and experience that I bring about testing, that you bring about infrastructure to each new thing that you do or team that you join. And one of the things that I think we, as an industry, especially as we have created more sophisticated tools on every front, whether those are frameworks or language ecosystems or dependencies, or memes like DevOps or TDD, is that we havenāt done a good job, in spite of the fact that there are indeed very good tools for getting started with a brand new thing and slinging some code and proving out a concept - very fast. And thereās a lot of tools for how to with enough time and person power, rebuild Googleās infrastructure at scale.
What we fail to appreciate, I think, are the inflection points or really the step function, or what, in my brain, I envision as a literal cliff of what do you do when youāre transitioning from small enough to be able to use a commodity service and not really care about, so that you can focus on the thing that youāre building, versus the stage two or stage three of the rocket, where after some gigantic chasm, āOh yes, we ā ā In your example, āWe canāt use Heroku anymore, so we have to throw that entirely out and now we have to reinvent the universe while weāre continuing to operate suddenly own all of these things.ā
[40:20] And so I think about that in terms of slinging code and testing too, right? Like, if youāre able to build a proof of concept, get something out the door, thereās no tests at allā¦ The same would go for applying rigorous architecture and design principles to the software. And then same would go for letās say like to go faster, we build a server-side rendered traditional HTML templates with variables and stuff stored in a session or something, as opposed to a single page app thatās built in JavaScript that might have a snazzier user experience.
We might have done all of those kinds of things early, shed that complexity, to get out the door as fast as possible. But in each of those cases, once we reach that breaking point ā like if Iāve got a server-side render application, I canāt just like flick a switch and then remake it as a single page application, just like I canāt snap my fingers and have sophisticated DevOps. Or if I have like a big mess of spaghetti code all over the place, I canāt just like overnight refactor that into a well-formed, well-considered units of code, or write a test suite that is going to be well paired at appropriate levels of abstraction up and down the stack in terms of everything.
And appreciating that when we talk about scale, we are not talking about twisting a knob up or just getting more revenueā¦ Like, weāre talking about very specific inflection points where you have to start caring about those deeper levels of knowledge that youāre speaking about. And thatās where I think thereās a lot of a failure in our community and our industry, to put a name to those things, and to actually have patterns that are successful for helping teams navigate those transitions.
You asked me before we started recording what I hope to achieve with this podcast. Thatās one of the things. How do we share more of that? How do we bring those nuances out? How do we have those discussions and figure out how stuff is changing and how do we need to adapt to those changes? What makes sense for our specific setup? Heroku may make perfect sense; Kubernetes may make perfect sense. We just donāt know, because itās all specific, and guess what - you have to figure it out for yourself. We can help along the way, maybe simplify some of the choices or make them clear, but at the end of the day, you have to choose and you have to combine those choices long-term. And itās not a one-off. Continuously. And thatās what makes this really challenging.
Letās seek a very specific example about what I think is a test suite gone wrong. Imagine, Justin, that you have just joined a team of nine developers, so youāre developer number ten, and theyāre all working on the same monolithic codebase. This team has constant test flakes, which means that the testing part of the pipeline that gets code, slinging code into production, it keeps failing for random reasons, multiple times per day. And they keep hitting the Rerun All Jobs button, adjusting timeouts, adding more retries to their tests, that type of thing. First of all, what are your thoughts about this specific situation?
Yes, well - I mean, unfortunately, it is all too common of a situation, and I think that it is challenging to write tests that are not susceptible to several specific things that contribute to what is commonly called brittleness or flakiness, right? The most important thing to understand as weāre approaching this question from the outside in is what is our goal to have this built in the first place? And the goal is probably to have some sort of confidence that things are working. And if we get one green build out of five and no source code changes in the process ā like, my confidence is not high, right? And we know this based on kind of intuitive experience that if it passes one time out of five, it means that you have proven the system can work. And youāve probably also proven not that the system has some sort of fundamental flaw and will break in production, so much as you figured out that there are environmental timing or ordering implications that can cause one or more of your tests to not work.
And I think the first thing that a team should consider when they are running into this problem is to get back to consistent green builds as fast as possible. Because again, if youāre thinking about testing as ROI, if all nine of those people are getting an email every single time that the build breaks, and then say three of those people just independently start racing to go and screw around with timeouts and stuff - thatās a lot of dollars flying out of a window, because of one little tiny thing that might not be well understood. And then no one was no one woke up that day and was like āIām going to really wrangle all of the flaky stuff in our test suiteā, right? Theyāre all trying to do something else, and this is the thing blocking them from that thing. Not to say theyāre half-hearted in their attempt to fix it, but what theyāre really needing is a salve in that moment, as opposed to a solution.
So the first thing that I would do is I would lockdown everything. Normally, I donāt like freezing time in the system, right? But Iād probably start with the common quick fixes that I can apply, like āHey everyone, it is now 2019, August 3rd, itās a Tuesday, and itās 11:33 PM, and weāre just going to lock that whole server down that wayā, whether weāre using a test tool to do that or Unix. āHey, everyone, weāre also going to no longer randomize the test order, weāre going to use this particular seed that is known and good, that weāve seen work.ā āHey, everyone, weāre going to change all of the directory globs that are currently in an unspecified order in Linux, and thatās why Linux builds are failing, but on our Mac where itās the alpha order, those are all passing.ā Like, āHey, weāre just going to do a sort on all of those glob requests everywhere, so that weāre just loading alphabetically, because it doesnāt really matterā¦ā And weāre going to like go through the half dozen or so quick hit things to just try to get to consistent builds as soon as possible. And hopefully, thatās one person one day, if ever.
Yeah. What does the consistent build mean? So we said one in five passing; very bad, very inconsistent. What does a consistent build mean to you?
[47:53] For me, the ideal and the asymptotic goal that I would have is anytime that I saw a build fail, it means that something is broken in the application. And by the way, I think this is actually the popular notion that managers who are told about testing and see a build on the wall - like, their intuitive notion is that if light goes red, it means that the system that they are extensively paying to have built is currently in a state where it does not work, right? Thatās intuitive. But if youāre like this team that has become habituated to this environment, where things just break randomly, those developers will have lost confidence that red means that anything is broken at all.
Yes.
But the business person is still thinking, āWow, thereās like a lot of failures, and so itās time well spent to go and fix those brokennessā, because like in their mind, in the business personās mind, like anytime spent fixing the build is time spent making my system work when it didnāt work. And so that seems valuable. But if you were to tell them that 95% of the reds in the build that were distracting the whole team were just bullshit implementation problems in the way that we wrote tests, because data gets polluted from one test to the other tests, depending on all sorts of different things,- that business owner will probably be rather upset, right?
Yes.
And we shouldnāt be the same kind of upset, right? So the flakiness is one thing. I would only want my test to fail if something was actually broken. And I would go a step further and say, āI only want my build to fail if the production code doesnāt work.ā So if a test was just somebody forgot to update the test, I donāt want to see that in the build. I want people to run tests locally. And if theyāre not running tests locally, Iād want to figure out why. And then I want to make it fast enough so that people do that and they have the tools to want to do that. And so thatās the best answer I have to your question.
Yeah. What about ā so thereās follow up which I think it complicates things and it makes them more real as wellā¦ What about a test suite that has to rely on integration tests, because the software that it tests is really complex, and you have to do black box testingā¦ Because a lot of the stuff - like, youāre testing the correctness of the system at scale. So how does the system break, for example, when we expect it to break? So thatās one aspect.
The other aspect is not everyone can run it locally, because the stuff that the system has to provision, the setup is too big; it wonāt fit on a development machine. It needs multiple machines just to basically orchestrate the system as a whole. That may be an over reliance on integration tests, but this type of knowledge to go to something like TLA+, spec-based testing - and there is another one which Iām blanking out on; not feature-based testingā¦ Property-based testing. To do that type of testing, it requires a special type of knowledge and a special type of approach, especially when it comes to like a heavy data system.
So thereās that aspect, but the other one is around different CI systems flaking in different ways. So the same test runs in two separate CI systems, and not the same tests fail the same way in the two different CI systems. Thereās nothing wrong with the tests; there may be a timing issue, but more importantly, itās a resource contention issue. So what would you do in that case? Because itās not the order of anything, itās not like the globbing staff, itās not ā nothing that you can do, like the simple fixes; itās just like the sheer scale of the thing. And maybe a lot of approaches over the years, which maybe werenāt as goodā¦ So youād think this is like a mature codebase, like a decade plus, which tends to happen, which happens to be a lot of the software, which gets more complex, more brittle, more ā you know, you just have to tend to it.
Thereās another aspect to what we were discussing earlier, about this sort of boolean mindset that the people have - is it tested, is it not? And one of the things that ideology has led to most teams, at least the majority of that I run into, to conclude is that there is a single bucket for every app called test, and you just put tests in it, and youāre lucky if it hasnāt directories that are nested underneath there, in terms of organizing the tests.
[52:11] And there is a default sort of assumption ā even on, I would say, highly competent teams, you might be able to expect that there are unit tests that will indeed run locally, of most things that are added, and there might be like one integration test that may be will run the whole application and just prove that a feature works. You know, if you added widgets to an existing application, maybe you would expect that integration test to create a widget, read some widgets, update a widget and delete a widget, right? And just, again, do that CRUD flow for N times, for N features over the life of the system.
Thereās two things that make tests very, very expensive to run, that you hit on. One, on the first bit is this logical organization failure on our parts. So what I would say is - okay, letās say that youāre building a system that interacts with an early client of ours, interacts with the paging network on electrical grids to communicate to thermostats, to make them go up and down when itās really hot outside in Texas. Like, we could, if we wanted to, write every single test, with the assumption that a thermostat on a breadboard with a serial number is plugged in and powered on and on this network, so that we can like actually interface with it through every test, even if the tests that have nothing to do with that particular integration, even if our architecture could be built in such a way that we could just sort of like have a driver to that thing that we could easily mock out and like get away from thatā¦ Bbut if we walk into the assumption that we need a maximally real test, every single thing this system does, we need to be able to prove scientifically that like weāre going to be able to go completely end to end - like, if thatās your orientation, whether explicitly, like youāre buying into it, or implicitly, like I donāt know, weāve got a test directory, and every test just needs to ā at least one test needs to talk to this thermostat. So we just have to assume that they all do youā¦ Like, that is a failure to organize based on the constraints that youāre under.
And so what I would encourage people to do is have multiple test suites and work backwards. So like whatās the most resource contented environment that you might have? Maybe itās spinning up 100 different servers and so forth, they are operating under a particular scaleā¦ Like, great, weāre going to do the bare minimum number of testing to achieve confidence there, and then weāre going to break out a new bucket where you donāt have access to that, and everything else will default to that until we can find another really expensive resource contendee thing to do, and then weāre going to try to increasingly make the default place where people put tests into the least integrated, least complicated, necessary infrastructure for them to work.
So thatās, I think, at the end of the day, one of the answers to, I think, all these questions - you end up with trying to maximize the number of isolated units that can be tested in isolation, where you get really straightforward, not only fast feedback, but the feedback tells you exactly what line the failure was on kind of tests, and some number of locally integrated, everythingās talking to everything else, but all inside of that monorepoā¦ Some number of contract tests that will actually just go and like validate assertions against a running instance of some server that you integrate with, some number of like driver style tests to that kind of thermostatā¦ And then, you know, a hint, maybe just one, just a golden path of like āHey, when we turn this all on in the real infrastructure that we really have, can I make a user and log in?ā And maybe thatās the only thing that you actually needed to prove in that fully plugged together state. But then youāll sidle up to teams where they have 1050 unit tests, and you add one marginal unit test just to make sure that emails are formatted correctlyā¦ And thatās running up and down the stack, and now the base time that is each individual test, like if we run like logically and serially, is like four minutes long, and every single time you add anything, itās like this really outsized cost.
We have time for one last question, and itās going to be a quick oneā¦
Okay.
ā¦and Iām hoping, more importantly, a fun one. Iām curious how you describe, according to you, which is the most impressive Olympic event youāve seen come out of Tokyo so far?
Alright, I study Japanese language, and so the only reason I have an answer to this is because I watch the Japanese news every time Iām on my bike. And so the most impressive thing that I saw was a 13-year-old young woman from Osaka winning the gold medal in skateboardingā¦ And to see the level of excitement that had generated, because I believe sheās now the youngest gold medal winner in Japanese history, especially in a new event. So I thought that was pretty darn neat.
Thatās great. I was thinking something else, but maybe we drop that in the show notes, because itās too funnyā¦ I know that you could not have described that, but thatās what I was hoping would happen. Thatās okay, itāll be in the show notes, you can check it out. This has been a pleasure, Justin. I think we need to do another one. I mean, this just got me started. Thereās so many more questions that I have for you. Iām looking forward to it. Thank you very much.
Absolutely. Take care. Thank you.
Our transcripts are open source on GitHub. Improvements are welcome. š