Changelog Interviews – Episode #415
Spotify's open platform for shipping at scale
with Jim Haughwout and Stefan Ålund
We’re joined by Jim Haughwout (Head of Infrastructure and Operations) and Stefan Ålund (Principal Product Manager) from Spotify to talk about how they manage hundreds of teams producing code and shipping at scale. Thanks to their recently open sourced open platform for building developer portals called Backstage, Spotify is able to keep engineering squads connected and shipping high-quality code quickly — without compromising autonomy.
Featuring
Sponsors
Linode – Our cloud of choice and the home of Changelog.com. Get started on Linode today with a $100 in free credit. You can find all the details at linode.com/changelog
Pixie – Pixie gives you a magical API to get instant debug data. The best part is this doesn’t involve changing code, there are no manual UIs, and this all lives inside Kubernetes. Pixie lives inside of your platform, harvests all the data that you need, and exposes a bunch of interfaces that you can ping to get the data you need. It’s a programmable edge intelligence platform which captures metrics, traces, logs and events, without any code changes.
Retool – Retool makes it super simple to build back-office apps in hours, not days. The tool is is built by engineers, explicitly for engineers. Learn more and try it for free at retool.com/changelog
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.
Notes & Links
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
So we’re happy to have a couple of fellows from Spotify here. Jim Haughwout, Spotify’s head of infrastructure and operations, and Stefan Ålund, the principal product manager and head of the Backstage open source project, which we’re here to talk about. Guys, thanks so much for coming on the Changelog.
Thanks for having us.
Thank you, it’s a pleasure to be here.
We would love to get started by understanding a little bit about Spotify’s engineering culture and the scale at which you’re operating… Because Backstage, which we’re here to talk about, is an open platform for building developer portals. It’s really an infrastructure tool especially around organizations that have a lot of services, or microservices… This particular problem that comes at scale, and you guys solved it at scale. So tell us a little bit about the scale of the company - how many teams, how did the orgs break out… Give us an idea.
I’ll start off, and hand it to Stefan to talk about how Backstage has solved that. So within our culture, Spotify years ago rallied around the concept of giving teams autonomy, and unblocking them, to basically go as fast as possible, and empowering them to move quickly, with the idea that more teams building software in an unblocked manner leads to more discovery, and a better product, and basically a better experience for all of our customers.
[04:21] In terms of scale, we’ve been growing at an incredible clip. We’re now up to about 500 engineering squads in the company… And I say “about”, because those numbers change on a daily basis as teams reform and swarm on new opportunities. We do run at quite a high scale. Basically, we have over 2,000 microservices in production, over 4,000 data pipelines, which Backstage suits as well… We have several hundred websites that Backstage also produces. We run over 200 microfeatures in our client, if you use Spotify on Android or iOS, that are built as well. We also ship thousands of machine learning models… And on average, we’re pushing code to production about 2,000 times a day, and that’s interesting, given ten years ago there was the ten releases a day for the big DevOps kick-off. So it gives you an idea of the scale and the speed…
So coordinating all of that and letting people move quickly is a key challenge, and Stefan, you’ve been instrumental in helping us tackle that, so perhaps you can talk about how we’ve tackled that.
Yeah, so Backstage is the center around all of our software development inside Spotify. And as Jim said in the beginning, we really wanna have empowered and small software teams that can own a piece of a Spotify experience… And while we value autonomy, we also don’t want every single team to have to make a lot of different choices about the technology they use, and how they build the software.
So Backstage helps us really to kind of balance that autonomy and speed sustainably, by sort of introducing standardization in a nice way; standardization has kind of a bad rep for many engineers, but the way we look at it is that a team shouldn’t have to make some choices… And Backstage helps provide and drive towards fewer choices and more standardization, without sacrificing on the autonomy part.
So recently open sourced, but we’ve found that Backstage is not a new piece of software inside of Spotify. This is something that’s been baking in the oven for many years; is that correct?
Yeah, that’s correct. We actually started – I think the first versions were put into production almost six years ago. We actually started in the microservice domain, where we had these small teams owning their own microservices, like typical DevOps practices today, and the first need Spotify saw was to make it have a central place where we had a registry of all the software that we had in production. Like a service catalog. Interestingly though, what happened was that some clever infrastructure engineers started to add tooling on top of that read-only catalog. All of a sudden, you could click a button in Backstage to add capacity for your service, or redirect traffic to another region if one of our data centers was failing, or something like that. So it almost accidentally became a platform in that sense, for the microservice ecosystem that we started to build out.
Can you maybe speak to the internal tooling side of this thing? I think that a lot of engineering teams sometimes feel like working on internal tooling can be a waste, because they’re not building product, or they get kind of lost in the minutiae… It seems like this was maybe an internal tooling endeavor gone right.
[08:06] Yeah, I’d say so. What we saw was kind of interesting… We saw those patterns that infrastructure or platform engineers, like we can talk about them as, kind of started to add more and more use cases on top of this common system. And we sort of started seeing – when we started our data infrastructure organization, we started to broaden the scope of the platform also to cover data engineering practices… And what we really saw was that there was a lot of innovation that started to happen just by having one central place… And at that point, we kind of doubled down on this platform aspect. I said that it almost accidentally became a platform… So at that point we kind of saw that “Hey, we could probably build a centralized platform for all of Spotify, that covers all of our different infrastructure needs in all of our domains.”
So me and my team started building a more opinionated platform, more of like a plugin framework basically, where rather than having infrastructure engineers that wanted to really push the boundaries in their domains, what they did before Backstage was kind of build their own island of infrastructure… And now we kind of invited them to build a plugin instead in this central place… And that model kind of exploded in terms of the number of use cases. So now we have – there’s more than 140 different plugins that have been built internally, and more than 60 different teams have contributed at least one of those plugins.
I wouldn’t say that – you asked about internal teams not finding it interesting to work on internal tools… We have actually seen the opposite. There seems to be like an infinite number of possible things that we can add, and innovations that can happen on top of the central platform.
And one of our guiding principles is we wanted to empower engineers to go quickly, so we made purposeful investments in teams like Stefan’s and others, to build this tooling. But the plugin – the movement from Backstage to the plugin architecture was pretty interesting. At that time I was at my standing desk by the team that built Backstage, and we’d come in on Mondays and people over the weekends would build plugins. I remember, Stefan, the day you were like “Oh, we can now do machine learning model deployments”, on a Monday… So just bonus functionality that would come in for places.
That started to give us a sign that we had true community, where you’re not asking people to do things, but instead each day just new software shows up that’s expanding your platform, giving new capabilities, and people choosing – you know a platform is winning when people choose to use the tool because it’s an easier path for them from that perspective.
Similar, every hack week that we have at Spotify, which is an amazing experience - everybody stops for a week and can just self form and swarm on teams. There’s always about half a dozen “build this in Backstage” teams. And Stefan, maybe any of the few of the Hackweek projects that pop out to you that are interesting, that showed the strength of the community?
Yeah, as you said, we typically have a lot of different teams building stuff. What is really interesting is that it’s not only the infrastructure and platform teams that are building the plugins; we’re actually seeing our customers, the engineers at Spotify, who kind of scratch their own itch and see a need for something. It’s so simple to add a plugin that they can do that over a week, and so we’ll ship it out into production and get people trying it out, which is pretty cool.
[12:02] One of my favorite plugins, that I was actually a part of as well during a hack week project, was we identified that keeping track of people’s tech health; it was something that teams did using massive spreadsheets that were keeping track of “What are we deploying with here? What versions of job are we running for this service?” etc. So we came together, a team, and sort of built that in as a plugin in Backstage, where you can get kind of a bird’s eye view, a heat map almost of your services and software, and see what’s the fragmentation we have in our team, essentially.
Yup. So as you build and update software, your scores update automatically, and you can manage everything in one place. You had asked a little about how did we go from internal to external, from an open source project… So we had this vibrant internal community; we also about 2,5 years ago joined the Cloud Native Computing Foundation and several other open source foundations, and in May of 2019 I was basically sitting in Barcelona at the End User SIG Group, talking about everything from what CI engine do you use, how do you do observability… And one of our senior staff engineers sitting next to me said “I’ll show you - here’s how we do a build”, and he popped up on his laptop in front of other users and opened up Backstage… And they go “What’s that?” And it’s like “Oh, this is a tool called Backstage.” And he clicks away, he did a build… [unintelligible 00:13:36.02] click along, and a few of the End User companies said “Well, this is pretty interesting. You should maybe show this at one of the demo days for the rest of the SIG. This might be something that’s worth open sourcing.”
That started us on a journey where Stefan got pulled in, we did additional demos with discussions, and after talking to about a dozen or so companies, we saw that there was an interesting need there. We didn’t wanna just throw software over the wall for open source, but we saw a purposeful need; we started to work with those companies to get feedback. And when we decided to go open source, because we saw this need, we took this as a purposeful go-to-market. So we came out with a website, with demos, with a mailing list, with community engagement, and trying to do things the right way - open source contributor agreements, code of conducts, and the like, from that perspective. The community was so helpful to us, and we’ve been so excited with the reception from that process.
That’s really cool that you open sourced it. So the million-dollar question - I guess it might be a multi-million-dollar question - over the course of multiple years, with multiple engineers working on it, here you have scratched this internal need, and you’ve built a system which is serving Spotify very well, and is a platform inside of Spotify… You know, most people would look at that as like an incredible strategic competitive advantage, because you’ve solved a really hard problem in a way that is awesome. And giving that away to the world seems like maybe a bad idea. So were there conversations around like “Should we? Should we not?” or was it obvious, like “Of course we’re gonna open source it”? Because you could also just sell this thing to other companies, because there’s like a demographic of companies that would need this, that all have huge revenue streams… So I’m curious of the thought process behind open sourcing it.
It was a discussion, and we have had companies approach us that have wanted to do [unintelligible 00:15:43.21] service models around Backstage… We had this discussion, and there’s a few things… We have a fundamental desire to improve the developer experience, basically across the entire tech industry. The belief there is we’re competing on audio, and music, and podcasts. If we can create a great open source experience, we’re going to attract talent.
[16:10] Yes, some people will use Backstage as well, but people will come to Spotify because it’s a place where you can build amazing things like Backstage, you can open source them, you can get those open source credits from that perspective. Also, if we contribute to other open source projects, we benefit from that. We know we’re gonna get more back than we essentially lose by other people adopting this.
As Stefan mentioned, Backstage is like a central nervous system for our developer community; it’s entwined in so many of our systems… We’re in a good place where we’ve got a head start, and as new plugins come in, as new capabilities come in, we can pull those in, and it’s kind of a rising tide lifts all boats; we get people who come in on day one and they know how to push code in Backstage. People come to us for Backstage, and I think we’ve made our first hires through the open source project. Stefan, maybe you’ve got a few examples of some pretty interesting pull requests that came in on the open source, that helped us out. Maybe you could share that.
Yeah, I think we were kind of surprised and delighted by people coming in and starting to contribute to the project almost from day one, or actually on day one. We have solved a bunch of problems at Spotify, but we also think that collaborating with the broader ecosystem of infrastructure engineers across other companies, other open source project, we will eventually build a better experience to Backstage, and bring that back to our engineers.
I think it’s pretty cool already that a couple of weeks ago we were actually the third party company that adopted Backstage, really keen on having a better experience for API documentation… And this is not something that we had a very good start on internally at Spotify, but they came in, had expertise in that area, and contributed functionality that we are now bringing back into Spotify already. So yeah, we’re reaping the benefits of collaborating with a broader set of developers.
We firmly believe that this will give us a net good, and it lines up with our values of trying to create a great collaborative tool for community, that empowers developers everywhere.
It’s important to hear that too, because I think when you see the Why - the Why of open source is often the mystery, to some degree… And to see that your heart is in the right place, that 1) you’ve gotten value from open source and you recognize that value, and 2) you wanna give back… And I think, going back to “Keeping the main thing the main thing” - you didn’t say that, we say that, but you kind of said it through your words… Audio, music, podcasts is your main thing, and you compete on that level; and rather than compete on this strategic advantage as Jerod had mentioned, you’re giving it away in open source, and in many ways propping up a community around that, because you recognized how important it was inside of your organization in terms of community…
But Jim, you mentioned the terminology “central nervous system”. And as Stefan, you guys are describing this, I’m thinking “This is kind of like a brain for your organization.” How often do you have “What do we have out there? Where is it at? What version is it on? Whose teams are managing it?” And this seems to solve that kind of problem.
Stefan, you’ve demoed this… Please talk to it.
[19:48] That’s definitely true. What we found was that by talking to other companies who have kind of tried to build a developer portal and tried to build a central repository of all their software, sometimes that repository goes stale; it becomes like an accounting system, where teams have to go in and add “Here’s our services” etc. The really interesting secret sauce of Backstage is that we kind of integrated the tooling on top of that metadata that you have in your catalog… So we start off by, you know, you get features if you keep the metadata about your services etc. up to date. We have this really nice way of encouraging teams to keep the information up to date, and that means that we can build even more interesting toolings on top of that rich and up to date information about our old software ecosystem. So that kind of feeds the cycle of plugins, and improving the discoverability of everything in the ecosystem.
And there’s a bit of a flywheel also. As you do a build, you get a prompt to update your documentation; you update your documentation, you show up on the What’s New, you’re exposed for service discovery, and documentation discovery… So if a new team comes in and they’re looking for X, they can find it, instead of building it themselves; they reuse yours, or they extend yours with a pull request.
But Adam, to your question, to try to find something - this is absolutely vital on our SRE teams, when there’s basically monitoring or something fires off on a service; you just pop into Backstage and you can say “It’s this team. Here’s their Slack channel. Here is the pager duty integration.” So everything is connected. That time you would lose trying to figure out “Who owns this, how do I get them?” is a button click away, and it moves right into a Slack channel. People either resolve items, and… That’s part of the reason that you see Spotify rarely interrupted, from that perspective.
So everything from developer reuse, to production operations, to even supporting our audit and compliances, all in one place. That service catalog is the memory store for that brain analogy that you had.
Sure. To some degree maybe even a social network. If you can see other teams solving problems that you haven’t even met yet… Because those engineers that are solving similar problems out there, that I haven’t even met yet in Spotify - or my org, if I adopt Backstage - then that gives me an opportunity to, as you had said before, community. And I think that’s something that kind of is missing in large organizations. Jerod, you mentioned “at scale”. I think smaller orgs may have less of this problem. Like, “I know you. I kind of know what you’re working on. I probably know who owns it, if it’s just you and I primarily…” But in large orgs, where it’s 500 squads, which I can imagine is more like 2000 engineers or more - this can be hard to know everybody and connect.
Yeah, exactly.
Yeah, because more important as we grow as a company - one of the things we have seen is tracking the time it takes for a new engineer to join Spotify, like how long does it take for that engineer to get productive, we measure that by time until it takes to have merged your tenth pull request. It’s not a perfect metric, but it’s a metric that we use to look at. And what we were seeing before Backstage was essentially that that number was going up steadily. So we got slower; for every new engineer that joined Spotify, we got slightly slower, less efficient. And after rolling out Backstage and really doubling down on this centralized place to have all your technical information and all your tooling - we’ve actually cut that number in more than half; 55% it had gone down over the last couple of years.
Not all companies are onboarding engineers at the same pace as Spotify is. Basically, that is a fantastic proxy for how complex your ecosystem is. So if you reduce that, it also has benefits for your other engineers, because their life is easier; they can find stuff easier and be more productive as well.
Stefan, in the break you had mentioned – so the listeners don’t get the breaks, so give us a little bit of what you’ve just shared there in terms of how your orgs are structured. Jim, you’d mentioned earlier around 500 squads, quantifying that to product owners, managers, business folks… A lot of people involved in this, so in many ways it’s to some degree a social network, but also very much a nervous system for your organization. But if you have orgs out there or businesses trying to give small squads like that autonomy, startup-like features, Backstage gives them that. Go into further why that’s important for your teams to have that autonomy and that kind of drive, and sort of speed to innovation, and maybe even speed to the tenth commit coming in faster, rather than a lot slower.
Yeah, so we have a very interesting setup in the way we organize our engineering or R&D teams at Spotify. We call them squads, and basically all those small squads have a mission. That mission could be to do podcast ingestions, or recommendation systems, or work on our CDNs, or other small pieces that makes up the Spotify experience.
All of those teams are almost set up like small startups, where they have one product manager who’s setting the direction, figuring out what to do… They have all the engineers that they need; they have frontend engineers, backend engineers, machine learning engineers… Some teams have designers, if they have user-facing features… And they have engineering managers as well. So they are like one tight-knit team that is there to build one part of Spotify. And we want them to be able to iterate as fast as possible on their domain, with ideally as few dependencies as possible, because that’s when they move fast and are empowered to solve the problems, because they know the domain, they know how to solve it…
And the obvious drawback of something like that, that kind of autonomy, is that it’s really hard to know what other teams are doing, and it’s easy that you can now reinvent the wheel, and multiple teams building similar things… And that’s, once again, where Backstage comes in, and it allows us to work that autonomously, because there is one central place where you can go and get a bird’s eye view of all the different teams, what software exists, what libraries are out there, what technologies are used in production, so that you don’t have to reinvent things. You can build off of other people’s – what they’ve already learned, and you can use for example a [unintelligible 00:29:05.07] existing data that we have in our ecosystem. All that data is available to everyone in the company; you can see who produces the data and you can build more higher-level algorithms on top of it, without starting from the beginning.
And then kind of scaling that out. In the days when we had a small number of squads, you could keep everything in your head. Once we got more squads than Dunbar’s number, that was very hard to keep all of that in your head, to the social networking analogy. Backstage created that directory where you could see who’s doing what, who do I need to connect to, how do I jump right into their Slack channel, where do I see their documentation and get started? If you look at these benefits about enabling a small team to not have to rebuild tools, and being able to have a mobile engineer, a backend engineer, a data scientist all work from the same toolset, you multiply that by 500 teams - that’s a ton of productivity. You add product managers, other business owners that are looking at things, looking at insights and data - that’s also another 20%-25% benefit.
[30:12] And then if you think about the onboarding, Spotify is rapidly growing. In any given year, 30% of our company has been here less than a year, just because we’re growing so quickly. So if you can come in on day one, get your training and education on our tech stack, start building things in Backstage, get to your tenth pull request 25 or 30 days less, those productivity gains are even faster. And then talking to other companies, talking to people who invest in tooling, we’ve found that once you hit about 100 microservices, you need something like this. You can keep it all into your head till things get kind of crazy.
Well, you’ve just answered one of our questions, which was like “What size orgs need something like Backstage?” So let’s skip that question altogether - 100 microservices or more - there’s your key - and let’s talk about how it does what it does. We’ve been talking about the benefits, how it’s helped you, how it’s helping other orgs as they adopt it in the open source world and the ecosystems built around it… But peel back the covers - how does Backstage do what it does, and then on the other side of that coin is “How do people interact with it and developers build what they wanna build with Backstage?” But first of all, Stefan, maybe tell us how it all works.
As Adam described it, it’s like there’s a central nervous system, and the brain is what we call the service catalog, or the software catalog, where we keep track of all the software, and the teams, and who owns what, essentially. And that model - what we do is that we keep a YAML file, a small configuration file, a metadata file, that regardless of where your software is stored, you keep that information together with your code. And then Backstage harvests that information and makes it available in a centralized repository, so that you can build on top of it.
That allows you to model and keep track of microservices, and also keep track of bigger monolithic applications; the application can be divided into multiple logical parts, that are owned by different teams, but still keep track with that metadata YAML file that stores the source of truth for information. The same goes for data pipelines, and machine learning models etc. So that’s kind of the starting point.
Then on top of that service catalog, what we do is that we integrate all these various tools that you need. So rather than going into your Jenkins machine and looking at your builds, you start from the service catalog in Backstage and you find your service. And when you click into your service, there’s a plugin there, a UI for showing your builds. So it’s kind of an information architecture. We want engineers to reason about the stuff that they own, the services that they own, rather than the tools themselves.
This is a very different approach to how many organizations adopt infrastructure… They add infrastructure to their organization, and then the engineers need to figure out “How do I wire these things together? How is tool A connected to tool B?” We take a pretty different approach. We basically build plugins then for all of those different infrastructure tools, and we integrate them into one place. And the plugin is essentially a small web application that one team can build, and iterate on by themselves.
For example, we have a team at Spotify who keeps track of our deployments and runs our Kubernetes clusters for the customer… And not only run the Kubernetes clusters, they also build a UI plugin in Backstage, to make it simple for engineers to see what services are running, and do rollbacks, and those kinds of things. So those are sort of the key parts to the architecture.
[34:14] Okay. So Kubernetes on the backend… And then is that container orchestration system agnostic, to where you go ahead and get that deployed into the world? How does it hook into some sort of a cloud?
Backstage kind of abstracts away all the different infrastructure pieces that you have. It could be your cloud tooling, it could be your CI environment, it could be your security scanning… All of those different tools that you normally interact with directly, they are integrated into one experience instead, and they are expressed and showed as a plugin.
Yeah, so basically the team that manages Kubernetes has a web service that’s invoked by Backstage, that essentially once it detects a build is done, can pull the code over and start the deploy process, and that allows that team to work unblocked. First it was deploy into our own orchestration system; then later as we moved to Kubernetes, we basically then triggered the deployments in the Kubernetes APIs based on the YAML that you had set up in Backstage, that you had set using standard templates, so you had consistency… And now that team is working on things like doing automated canary analysis, safe deployments, and they can work and build that capability, and it just gets extended into Backstage.
So essentially, you’ve got our Kubernetes team working unblocked, you’ve got the Backstage team surfacing, and then you have feature teams, people actually building features and deploying… And for them, it’s just simply one day they deploy, the next it’s like “Oh, I’m getting automated canary analysis”, the automated canary analysis is getting even better, and for them the ecosystem just gets richer and richer from that point of view.
Maybe walks us through an end user who’s an engineer’s experience. Say I’m working for Spotify, and I’ve been tasked with creating a new service that recommends the best developer podcasts in Spotify’s catalog… So I write this little Go program, which every time you ask it, it just returns the Changelog, no matter what you send it; just the best - obviously, the correct answer.
[laughs] Obviously.
I have my little Golang binary, I have my repo… How do I say “Okay, this is a service in Backstage that anybody can call”? Where do I go from there? Just like get the YAML metadata and I’m done?
Great question. We have a concept that we call Golden Paths at Spotify, which is basically a standardized way to create a piece of software. So we have one backend Golden Path, we have one data Golden Path, a machine learning Golden Path, a web Golden Path etc.
Okay.
So what you typically do is that you follow that path, and a lot of the steps in that path are done through Backstage. So before you even decide how you wanna build your service, you basically go into Backstage, and rather than picking a language and picking the framework, and deciding how to set up the CI etc. - you don’t have to do that at all; you just pick a predefined Golden Path template, and that template basically gives you a Hello World application, with Kubernetes deployment information, a CI configuration setup, it uses the best practices that we have at Spotify for how we build a microservice, and it kind of removes the choices. I talked about standardization before, and how Backstage can drive standardization - that’s the primary way. We go in and we make the experience of using the preferred path better, and it’s like everything is automated for you.
I love that, because instead of standardization slowing you down, standardization is actually speeding you up, right?
Exactly.
[38:07] Yeah, so you could literally do our onboarding, [unintelligible 00:38:07.20] Hello World, rename it to Podcast Recommendations… Probably your code in there is gonna relatively simplistic…
Yeah, just a hardcoded string…
It’s just gonna be “on 1 return Changelog, deploy it, roll to production”. You’d automatically have your Grafana integration, your tracing and observability integration, you could check if it was a compliant-oriented or non-compliant type of a feature right out of the box; if it was a tier one service and it needed a global deployment, versus a single-region deployment - that would be covered as well. And if you wanted to expand or create a new Golden Path, then you would work out a Golden Path together and build that as a software template. That would be added, too. That’s what we’re working with people right now, as they adopt Backstage.
Yeah, because eventually someone’s gonna be like “That Golden Path is not my Golden Path. I have problems with that Golden Path.”
Right.
So I imagine, as we talk about Backstage, there’s a difference between maybe how Spotify uses it, and maybe what’s backed into Backstage as open source. So maybe help us toe that line, or at least define it as we bump up against it… Whenever you have these Golden Paths, I imagine since you’re an organization that cares about autonomy and ownership and stuff like that, there’s a way to push back on that and say “I think this is the wrong Golden Path. We’ll make a new one.” This is a consensus than you’ve come to, so this is organization-based, and less baked into Backstage; but Backstage enables it for anybody who uses it… Right? This Golden Path analogy, or this Golden Path opportunity.
Yeah, so if a company comes to Backstage open source, we don’t expect them to use our Golden Path, because we have our opinions, we have our way of building stuff at Spotify. But the Backstage open source software templates, as we call them - the point of them is that you make your Golden Path. You set up what is the preferred templates that you wanna have in your organization, and then you start to drive towards having that opinionated way of building your software.
What we have found though is that it’s really important for standardization not to become this top-down thing that we don’t really want… It’s important to have those opinions and Golden Path recommendations be very strong, but still have them held very loosely. They’re essentially code. And what happens internally at Spotify is that if an engineer feels like “Hey, this is actually not the best practices right now”, they can go in and make a pull request towards that software template, that kind of challenges the standard way… And if they can motivate that in a good enough way, through community discussions, that becomes the new standard, and from there on everyone uses that version.
So we don’t aim for these golden paths to cover everyone’s use case. We wanna cover 80%, and then leave 20% for teams that either don’t want to, or can’t–
You’re trying to enable speed. So if I wanna do what Jerod’s doing here, which is create this awesome new Go service that recommends only the Changelog for “best developer podcasts” - just saying that one more time, for everybody to know - I wanna be able to follow this Golden Path. I don’t wanna have to re-bake all the things. I might have opinions, but I can redefine those opinions, as you had just mentioned, but any new engineer can come and take that first step, or those first few steps into a new thing, with so much more assurance that the right steps is the right steps… Because they’re not having to think “Well, is my opinion different than everybody else? Is this gonna be accepted? Will I be shamed? Will I get a PR against my code, deleting it all, essentially?” These things can all be avoided by giving this consensus, this standard path, this Golden Path, as you mentioned.
[42:10] Yeah. And if you think about onboarding, and you’re walking into a platform with eight million transactions a second at the perimeter, 299+ million monthly active users…
It’s intimidating.
…and you’re gonna deploy a feature on day ten into production - that could be terrifying.
Yeah.
But Backstage essentially gives you the – you know, you’ve got the armor or the guard rails. You know you’re gonna go out and things are gonna be working well. On the flipside, we are always innovating. So a team that is pre-MVP, exploring a brand new area that that Golden Path doesn’t work - they build their own framework in place, and extend it in, and then they can get all the benefits of the tooling, the community, the alerts, the monitoring instrumentation, collecting events for basically observability and data analysis… But they’re not heavily constrained. They can create a new path. And if that MVP takes off, then as Stefan mentioned, that becomes something that gets adopted and grows in our own community, and helps us evolve.
And if you are a new company using Backstage, you’re gonna have your own opinionation. Backstage will let you make that opinionation friendly, easy, fast, and with a UI that’s pleasing and makes developers and product managers and the like able to move basically quickly and easily from that perspective.
Since we released Backstage open source, we’ve talked to a lot of companies, and this way of having standards seems to be very compelling to people. Basically, many companies when they grow to a big enough scale, they see the need to reduce fragmentation and insert a few standards and make sure that people are solving the right kind of problems.
I think Backstage has this nice way of introducing standardization in a way that it’s actually a benefit for the engineer, not something that ties your hands behind your back… And that seems to be resonating well with people that are looking at Backstage.
I really like that style - if there’s one place that you go, and that place has the standards baked into it, and it hands them to you, versus it dictates in some sort of documentation… Of course, we’ll talk about the docs too, which is a cool aspect of what’s in there… But the templates are there for you to get started, and you plug your secret sauce into what we already know as the standard, versus you having to go and read a spec, or follow a thing, a tutorial. I think that’s really cool.
So there’s two concepts in Backstage that you guys have on the website that I’m trying to plug our conversation into… The first one is plugins. I think that’s like the backend kind of thing, like the Kubernetes plugin, there’s maybe like a Postgres plugin, there’s CI… Is that the kind of thing that a plugin is? And then the templates - is that the Golden Path, is the templates?
Yeah, the templates are the Golden Path. The plugins are how you extend, and (I talked about before) how you integrate…
A functionality.
Yeah, how you integrated different pieces of infrastructure into your platform. So the long-term vision here is that we want there to be a thriving ecosystem of plugins, and those plugins ideally should be for every project that’s out there. So a company that walks up to Backstage and has a gallery of existing plugin integrations to whatever infrastructure they’re using. So if they’re running on Amazon, if they’re running on Azure, DevOps pipelines, if they’re using Lighthouse to track their accessibility scores for their website, if they’re using Grafana, if they’re using Kubernetes etc. - our hope and what we’re starting to see now is that there is a plugin for that, and they can pick and mix the plugins in the ecosystem, and sort of install them into their version of Backstage, and make it their own.
[46:06] If the tools – everyone has homegrown infrastructure in their basements; not everyone is running on the latest cloud native stuff and the coolest stuff. And for those people, you can basically build your own plugins, custom ones, so that your Backstage deployment becomes a combination of open source plugins that you pick off the shelf, and then you build your own custom ones and roll them into your version of Backstage.
And a key aspect of the plugin architecture is it basically [46:40] Without the plugin architecture, you would need this portal team to build things, or do integrations. With the plugin architecture, essentially anyone can build a plugin. When Backstage moved to plugins, we saw the explosion of contributions internally; it essentially lets everyone just move at whatever pace they want.
Guys, we’ve been talking around the docs… Let’s talk directly about the docs. Backstage has tech docs built right in, with markdown-based free documentation you get as part of this awesome infrastructure. Tell us about how this came about and how it fits into the Backstage story.
We did a survey of our internal engineers some years ago, asking them what are the main blockers for productivity… And one of the main things that came back from that was that it was really hard to find technical information at Spotify. Some teams put their documentation in markdown files, some put them in Google Docs, in spreadsheets, GitHub wikis etc. So what we’ve found was that engineers didn’t even know where to start looking for documentation. There was no starting point. So during another one of those Hackweek projects, basically, a small team of a few engineers and some of our tech writers came together and looked at that problem. What they built was a plugin in Backstage… It was basically two parts to it. We adopted a docs-like code approach, where engineers keep documentation together with their code; it’s really easy to basically keep your documentation up to date, because if you change your code, you can change the documentation in the same pull request, so you don’t have to go into a separate system to update the documentation as well, and so go out of the flow. Engineers really love that.
[50:07] The other dimension of solving this problem was that we have all these teams building documentation together with the code. What we did was to bring all that – we basically built documentation sites during the CI process, integrated nicely with our system, and then publish all that documentation in one central place in Backstage. So it’s kind of a really nice combination of engineers keep their existing workflows [unintelligible 00:50:35.14] while still making all the documentation centrally available to everyone in the company, in one place.
That plugin or that system that we call TechDocs I think was one of the most successful internal projects that we’ve ever done, ever rolled out. We didn’t have to do any marketing of it internally; engineers just loved it from day one, and it just had tremendous adoption.
Can you give us maybe a primer on the docs-like-code methodology? What does that mean? For those who are not familiar with that, what does that really mean?
It essentially means that you treat documentation as code. What we mean by that is you write your documentation as markdown files; you typically have a documentation together with your code for your website or for your service or for your data pipeline. In the same repository you have a Docs folder, and in that Docs folder there are markdown files, basically, that are different chapters. And what happens then during the CI process is that we use an open source project called MkDocs to basically translate those markdown files into HTML/web content. It’s those resulting websites (the content) that we then make available, integrated in Backstage.
So let’s say that I go to a website or data pipeline - I can see all the characteristics of it, and then the documentation for that system is just one click away for the person who wants to consume it.
And it’s the same documentation that’s in your GitHub repo, so you don’t have things getting out of sync. When you go to a site, you’ll see that this documentation was updated eight minutes ago, or something like that. So you know it’s current. You’re not going “Is this the documentation, or is there something else?” And the adoption rate - I think the Thousandth Documented Component, which is the name of something in our software catalog, was in under six months, which was just the fastest level of adoption I’ve seen in 25 years of trying to figure out how to solve this.
What about contribution to that documentation? Is it only from an engineering side, where it happens in code only, or is there an opportunity for other folks in the squad that may not be so much into code? Is there a way from inside of Backstage to contribute, or is it just a one-way path, where you just extract via MkDocs?
Basically, when you read the documentation and you wanna make a contribution to it or edit it etc. there’s a simple click of a button and then you end up editing that file in GitHub (GitHub Enterprise, in our case). It’s still an engineer-focused experience of writing the code, so it’s not for everyone. It’s pretty opinionated in that sense, it’s primarily for engineers. But what is really cool, I think, is that once you treat documentation as code, there’s a bunch of other code-related tooling things thta you can build on top. So the team that built this out - they have innovated a lot on top of the documentation.
One example is let’s say someone reads documentation and there’s an error in it, and you wanna change it. What you can do is you can highlight the documentation, and then a pop-up appears, and clicking a button, and you create a GitHub issue. So the documentation problem is then treated as a bug for the owners of that, and they can triage the bugs, and squash those bugs as they would in any software code. And all of those things contributed to documentation being kept up to date. There’s a feedback loop between people who read the documentation and the ones who own it.
[54:26] And if you think about – you know, one of the mantras is “Fix bugs before you build new features.” If your doc’s out of date, someone flags it, it’s a quick bug fix from that perspective. This model has taken on so quickly that our architecture documentation, our expectations of teams for how they operate software, how we respond to incidents are now all in TechDocs as well. So it’s all in one place.
And Jim, you mentioned this was something that was adopted inside of an organization faster that you’ve ever seen in the last 20 years, or something like that. Can you quantify that some more?
Yes. Basically, whenever you do doc– and there’s been tons of documentation tools, and the OpenAPI standard I’ve used over the years, but you’re always nagging people; it’s like “Well, I wanna ship a feature, I don’t wanna stop and do docs” from this case. Once we rolled this out and we actually built a squad around this team, as Stefan was referencing, we rolled this out and started to see in our logs and in our dashboards it was basically an exponential line of going up. Within six months we had 1,000 different components, websites, microservices that were documented. As you clicked on each, you could see the documentation was up to date with the last build, which is amazing. I’ve never seen that even at startups, at scale, at companies that are compliance-oriented; you have to hire whole teams of people to go do documentation.
This was something - as we started to look at open source, and we were talking to other companies, it’s a problem in the open source world, of trying to keep documentation up to date. You’ll find a cool repo, and the readme, and you go to the docs page and it’s an empty folder, and there’s nothing to do… Then you’re looking around on Stack Overflow to try to figure out how to use it, and then you wind up going to some other open source projects.
This was something – as we open sourced this component, we had thousands of hits on our blog about it on Backstage, we had a demo video and an earlier demo video of Backstage in general on the Spotify R&D channel on YouTube, and people were bookmarking the minute and second in the video where we were showing docs, and going on LinkedIn and saying “Take a look at this.” So it is one of those just amazing adoptions, and it shows what comes out of Hackweek, and then what an ecosystem like Backstage can raise to the front.
And now you get it natively when you adopt Backstage open source as well. The software templates that we talked about before, they come of course pre-wired with all the documentation, so an engineer doesn’t have to do anything to get documentation. It’s just there, scaffolded and ready to go.
Super-cool. I see why the adoption was so high inside of Spotify, because you’re really making people’s jobs much easier to do. Instead of it being friction to getting your documentation written and read, it’s just right there along with everything else.
How about adoption outside of the org? It’s been open source for a little while now… You all have a vision for this, which I thought was interesting; very intentional with your open source. You actually have a vision for the project, which you state is to become the trusted standard toolbox for the open source infrastructure landscape… How’s it going? It’s been out there for a little while, we had some CNCF stuff going on… But it’s a big ask to say – it’s not like “Try my library.” It’s not like “Here’s a cool command line tool that you should try out.” It’s like “Hey, run your company around this piece of software.” So I’m curious about adoption, if there’s been any struggles or challenges you’ve had to overcome to get other organizations involved.
[57:56] I can speak to that. From day one, I think people were pretty excited about the vision. It looked like we had done our due diligence and talked to a bunch of companies in similar situations as Spotify, and we’ve kind of identified that this infrastructure fragmentation and proliferational tooling is a problem that not only Spotify was challenged with… So from day one, we had fantastic reception. However, people didn’t really understand what it was. They bought into the vision and got excited about it, but then since Backstage is so many different things, and as you said, Jerod, it’s like a complex piece of software, we were kind of struggling to explain what it does. It’s a toolbox; it can be anything you want by this plugin framework, but too much choice and too much ambiguity - people need to get something that they can relate to.
So we had to write some blog posts about what this is really, and tried to do some videos demonstrating the longer-term vision of how it could look further down the line, because the open source project is just getting started, and there’s a long way to go until you get something that looks like what we have at Spotify.
So the main thing that we heard when we released it was “So where’s the software catalog, the service catalog?” That was the main thing that came off when we talked to a bunch of companies. So they saw our videos and we demonstrated how the service catalog and the central nervous system (as you said, Adam) was crucial to starting building your developer portal. And that wasn’t part of the initial release.
So I wouldn’t say we pivoted, but we kind of doubled down on building out that service catalog, and tried to as quickly as possible plug that gap in the story. So now when you have Backstage, there’s a service catalog and you get value out of the box, whereas when we launched Backstage, it was kind of like “Hey, here’s this open framework, and it can become anything.” People didn’t really understand that.
And we were lucky we had the videos, that we could show what we were doing internally. That was real software, it wasn’t mockware. Go ahead, Stefan…
Yeah, I was just about to say - once we shipped the first version of that service catalog, then we started to get real adoption. People started to use it, they started to set it up internally at their organizations, do internal demos, and some teams even started building a bunch of different plugins as part of kicking the tires and trying out the platform. So I think the pivot towards focusing on releasing that service catalog was the key enabler for companies to come onboard and start using it.
This reminds me a lot - bare with my analogy - it reminds me a little bit of the integration of SAP. I live in Houston, which is the energy capital pretty much of the U.S. at least, and a lot of people here work in oil and gas. And everybody uses SAP. I have friends who are just simply project managers of integrations, and they take a long time. So this is very similar, in the way that it has those benefits; it can run an org, essentially. It’s very much like that central nervous system. And I don’t know if it’d be of benefit to you, but maybe study the wrongs they’ve done in SAP to not integrate well. Maybe that might be a homework item for you all to not – or to avoid, I suppose… Because it has so much power, but only when done right.
I understand the value of it; as Jerod said, it’s a big ask to integrate this, and it can have such benefit, but it can be so massive, in both adoption, as well as its ability to progress an organization forward.
[01:02:04.03] Yeah, I’ve been on a few multi-hundred-million-dollar SAP implementations, and it’s a big investment, it’s a lot of change. What we’re trying to do with Backstage is make it easy, and start with one service, add the next… Stefan had mentioned in our product marketing team we’re trying to treat this as a true go-to-market. So when we got that feedback, we’re trying to make the templates the getting started pack, the documentation easier, so you can get it out of the box. You can even install it very quickly and get started, just like any open source.
As we’ve been doing that, as Stefan mentioned, our adoption has gone up. We’ve talked to over 200 companies now, we have 15 committed adopters on our list, that include some very interesting companies that have shared what they’re doing from that perspective. At this point, Stefan, how many external contributors do we now have?
Well over a hundred. I think we’re at a point where somewhere between 40% and 50% of all pull requests are coming from engineers outside of Spotify. I think we’re also, with this increased adoption now, what we’re seeing - which is tremendous - is that companies are putting together a team inside their company to be the Backstage team, to be the evangelist of their platform, and they’re starting to build their plugins, they’re starting to evangelize the platform internally.
While we had people coming in and helping and contributing to the project already from day one, we’re seeing now a shift in the kinds of contributions that we’re getting. When people’s jobs are actually to build Backstage and to be the Backstage team inside different organizations, they also share back substantial improvements to the platform, fully working plugins, and just like a complete and definite level of maturity when it comes to the contributions we’re getting now.
And to the question you asked earlier, about “Why open source, and what’s the return?”, 100 contributors is like 20 squads. So we have one squad on Backstage that’s working, and now we have 20 open source squads equivalents that are contributing software to us. So it’s a pretty good payback.
Have we mentioned being sandboxed with CNCF yet?
Not probably in a clear way.
Yeah, let’s maybe quantify that, because my question really is like, you know, now that you’re at this stage with the CNCF, what’s the support from them, what’s the inertia that they bring to the table to help adoption with this?
Our CNCF discussions started 2,5 years ago. We at Spotify are big fans of open source. We had aspirations to give back to the community and we were looking for the right projects. Backstage was - we’re hoping - the first of many that’s had community interest, community adoption, and many CNCF companies helped us shape the case, understand the offering.
A key step to going to CNCF is when we spoke to companies, especially bigger ones. They had a fear, it’s like “What if we adopt Backstage and Stefan wins the lottery and he retires?” [laughter]
I go play golf, yeah.
Or Spotify decides to turn it off… From that case, it’s like “What could happen to the code?” So basically, one of our guiding principles is by bringing it to the CNCF, it becomes a community project. It is bigger and broader than us or any other company. We’re showing it’s a permanent commitment; it’s going to continue to live and have contributions.
Sandbox is the first stage, but it’s a key milestone, because now it is officially part of the CNCF ecosystem. You can adopt it without fear that you’re picking up something that could be proprietary or eventually closed software. You have the appropriate licenses in place, so you know someone’s not gonna come along and say “You’re using software that has a copyright infringement on someone else’s intellectual property.” That would get you in trouble. So that means it can be picked up by banks, pharmaceutical companies, airlines, tech startups that get bought out and there’s no particular ownership or IP strain that could get you from that perspective.
[01:06:25.03] It also for us gives an endorsement of trust, of community, of adoption, that will lead other people and that will let us go to the next stage, which is incubation. With some of the large-scale adopters we have, as they move along, we move to incubation. The rate of change will slow, will go more to an incremental improvement. That now shows it’s an even bigger, more serious project, it will drive adoption, and our goal is to bring it to graduated status, just like projects like Envoy and Kubernetes and Prometheus… And then it truly does becomes that vision, that standard of developer portal for a great developer experience, that can take the CNCF technologies or what technologies you have… That’s the vision around them, and that was just announced this week; it was a very proud moment for us.
Congratulations. That’s an awesome achievement, for sure. As you were describing the kind of businesses that could use this, I was thinking of (I suppose) the way the world is now, and the fact that we live and run on top of software. So if something like this can be deployed and used as you’ve demonstrated with Spotify elsewhere in the world, how much better may be me getting on an airplane, feeling more confident that it’s not gonna crash, or more confident that I will actually have a seat, and I’ll get there… Or rinse and repeat - my bank is more stable, my money is more secure, whatever it might be.
As you draw back on the bigger picture of what open source is and why you give this back to the world, I think that’s it - if we can all live on and build better software, that’s a better thing for the world.
Yeah. And it’s better software, faster, you’re comfortable with compliance, you’re comfortable logging and monitoring and traceability… So if it’s an airplane, you know that they’re able to deploy software, and fixes, and manage reservations faster… And even during Covid, we had a company Weaveworks, that used Backstage to help get radiographs to doctors faster, which is just a really kind of inspiring use we never expected. So it’s all of those, giving back in wonderful cases, and we’re looking forward to hearing more.
Also, we have a lot of engineers that are working from home now, and stressful situations, and what Backstage ultimately does is improve the lives of engineers, and improves the developer experience. So if we can help people enjoy their work, enjoy their distributed work in a better way, that’s a win as well.
And especially if you’re new and you’re joining, and it’s like “Who owns this?”, you can actually find the library, you can find the owner, you can find the Slack channel, you can find the documentation… So that’s an important social connective tissue in bringing on new developers at this time.
[01:09:11.22] What is the best URL to share via audio for our audience to bake into their mind and check this out? Is it backstage.io, is that the place to go?
Yeah, that’s a great starting point.
Yeah. And you’ve got links to videos, the source code, everything there.
We also try to keep it very – we try to engage with the community. We have a Discord chat channel where everyone can pop in and ask questions. We’re really trying to encourage a vibrant community. We spend a lot of time thinking about how we can make it inclusive and how can anyone can participate. If we come back to the vision part of making Backstage an ecosystem of tools - we, Spotify, cannot build that ourselves. So for Backstage to become the product that we want it to become, there needs to be a thriving ecosystem where many companies contribute, not only Spotify. Spotify is just one out of hundreds of companies contributing.
How active is the newsletter you have there, so we can tell the audience to check that out? Because if it’s active, they should subscribe to that and pay attention.
Absolutely. We have a fantastic marketing team that makes sure that communication goes out, and that it’s high-quality and relatable as well.
Very cool.
Awesome, guys.
Well, is there anything else we haven’t asked you, that you’re like “Man, I really wish Jerod and Adam asked us that question, or talked about that scenario”? What have we not asked you that you can share?
We’ve talked about the volume of our deployment, we’ve talked about how we went to market and the lessons we’ve learned, the [unintelligible 01:10:48.12] An inclusive community is key to us… So I think we’ve covered the major points.
We’ve covered it all?
Yeah, I think so.
Awesome.
We got it all, I think.
Well, fellas, thank you so much for your time, it’s been an awesome conversation. Thank you for (I suppose) the four years’ worth of work, plus all the effort into this, and the examples you’ve given from Spotify’s point of view in terms of how this helps engineering teams… And having that position of thinking that this should be open source versus a paid product to other organizations… That’s something we obviously enjoy as individuals and here at this show, because that gives more teams out there software they can use.
Thank you so much for your time today, it’s been awesome.
Thank you for giving us a chance to talk about this. It was a great conversation.
For sure. Thanks a lot.
Our transcripts are open source on GitHub. Improvements are welcome. 💚