Today we welcome Matt Klein into our Maintainer Spotlight. Matt is the creator of Envoy, born inside of Lyft. It’s an edge and service proxy designed for cloud-native applications. Envoy was unexpectedly popular, and completely changed the way Lyft considers what and how to open source. While Matt has had several opportunities to turn Envoy into a commercial open source company, he didn’t. In today’s conversation with Matt we learn why he choose a completely different path for the project.
Tidelift – The first managed open source subscription helps you develop apps with components that just work—including comprehensive security updates, active maintenance, and accurate licensing. And the best part of all—with the Tidelift Subscription, you help open source maintainers get paid for their work. Learn more at tidelift.com.
Click here to listen along while you enjoy the transcript. 🎧
Alright, we’re here with Matt Klein, creator and maintainer of Envoy, and then software engineer at Lyft. Matt, thanks for joining us on Maintainer Spotlight.
Thanks so much for having me.
We are very happy to have you. Enjoy is quite a big deal, and it’s been going on for a while now, so the start of this show was when you announced your four years with Envoy. In fact, we’ve had a few people over the years, Dan Cohen being one of them, tell us “You’ve gotta have Matt Klein on the Changelog. We had you on our shortlist of people, and when I saw that tweet, I was like “Oh, good timing. He’s excited, I’m excited, let’s get him on the show.” So here we are.
So Envoy - a very cool project, CNCF-graduated. Started out of Lyft, and then handed off to the CNCF… Tell us the story of what Envoy is, and then kind of the genesis of the project inside of Lyft, and then we’ll fast-forward to the present.
Sure, of course. Envoy is a software proxy, so it would be most similar to projects that folks have probably heard of, things like NGINX and HAProxy. Envoy started within Lyft, at this point amazingly five-and-a-half years ago. It’s been quite some time, it’s been quite the journey. Envoy was originally started to help Lyft through its microservice journey. So like many companies like Lyft - Lyft started with a monolithic architecture, and prior to my joining Lyft they had probably like 15-20 different microservices… And the microservice roll-out was not going super-well. They were facing a lot of the common problems that folks face when they roll out microservices, whether that be primarily networking, or observability, or just general system stability. So I was hired at Lyft, and at the time there were only 80 software engineers, so it was quite a different company than it is today… And I was tasked with helping them to figure out how to actually do this migration.
[04:40] Based on previous experience over at Twitter, and prior to Twitter I was at AWS, so I’d been working on distributed system networking for quite some time, I had some understanding of how people did microservice edge networking, how people did microservice service-to-service networking, and when I came to Lyft and saw all these common problems, I thought “There’s a better way that we can potentially do this.” So in hindsight, again, to be totally honest with you, for such a small company, I was - as the phrase goes - given a lot of rope to hang myself with. They allowed me to start this project, as opposed to using something like NGINX or HAProxy, and I was very clear about what the goals were, and I thought that we could do better than what the status quo was so far.
The very brief story of Envoy is we actually started as an edge proxy. Now it gets a lot of talk from a “service mesh” or a service-to-service perspective, but Envoy is also widely used in the industry as an API gateway. That’s actually where it started at Lyft. So we replaced our load balancers with Envoy, and we did that so that we could get better observability, better load balancing, better access logs, and that allowed us to start understanding how the traffic was flowing within the microservice architecture.
And then from there, we used to run HAProxy on our monolith, and we used that to do various things around MongoDB. We ran into various problems with HAProxy, and we implemented some MongoDB protocol parsing, and some rate-limiting, and various things, and that was really the beginning of the service mesh at Lyft, because we ran Envoy at the edge, and we ran Envoy on our monolith… And the rest is incremental history. It was a very pragmatic, incremental project. We went from empty files to first deployment probably in 3-4 months, and then from there we went from the edge to an entire service mesh, we added lots of features, and then by the beginning of 2016 Envoy was fully deployed at Lyft. All services were routed through Envoy, it was all client-side load balancing, we were fully deployed at the edge, and then obviously we open sourced towards the end of 2016. Happy to go into more detail there, but that’s the very brief origin story.
If you were to give everybody an idea of foundationally, fundamentally, what makes Envoy different than an NGINX used as a proxy, or as HAProxy, what is that?
I think the biggest technical difference - and there’s lots of non-technical differences that are very interesting to talk about - is that Envoy was created to deal with modern elastic… You know, people call it cloud-native architecture. So things are auto-scaling, you have container runtimes, things are coming and going, things are always failing. So Envoy was built to have an eventually consistent configuration system, and now we have a suite of APIs which we call XDS. Those are a set of APIs that allows Envoy to dynamically fetch things like route configuration, or listener configuration, or cluster configuration.
[08:05] So in a simple config, Envoy can be statically configured, just like HAProxy and NGINX, but really from the beginning Envoy was built to be able to change all of its config on the fly, without having to reload. And that’s a really big difference from the way that historically HAProxy and NGINX have typically done their own config.
Now, since Envoy has come out, both NGINX and HAProxy have had to evolve in this area, also to work with these more modern architectures… But I think that’s the biggest technical difference. The other difference is that Envoy was really built from the ground up to be extensible, and we’ve seen that from a technical perspective - now we have filters that do all types of different things, we have different metrics plugins, and stats plugins, and tracing plugins… So I think those are probably the two main technical differences.
Thank you, that’s very helpful. So you mentioned maybe 5,5 years since you started coding, four years was the milestone of the open source. Was open source in the to-do list from the start, or was that a conversation inside of Lyft?
Yeah, this is a very interesting topic… I don’t think it was in the plan from the start. I think that Envoy is an iteration of technical work that I had done previously. I think that a group of us came over to Lyft from back at Twitter, and I think that based on some of the proprietary technology that we had developed at Twitter we were gonna develop some of it again, hopefully better the second time, and I think a bunch of us had some thinking that “Why not put this out there? Why not open source it?” It’s not in Lyft’s primary business interest, it makes sense to put it out there to be a good open source steward.
Now, I think a lot of people ask me “Did you plan on open sourcing it from the very first line of code?” and I think I had that in the back of my mind, but to be perfectly honest, my goal for the first year was to satisfy Lyft’s requirements. And I think slightly side-stepping, I think one of the reasons that Envoy became so popular so quickly is that it was not a vendor project, it was a project created by an end user for an explicit business use case, and by the time that we open sourced Envoy – you know, when I started at Lyft, it was probably 80 or so developers; by the time that we open sourced it was at least several hundred. Now it’s over a thousand. But we have battle-tested this code not only from a stability perspective, but every feature that was added was added to satisfy some particular business case. So I think having that captive customer, really building the software so that it was easy to operate, it was reliable, because we were on call for it - I think it really shows.
So our focus for the first 6-9 months, or at least my primary focus, was I didn’t really think about open sourcing at all. It was all about satisfying Lyft. And I think when we got to spring or summer of 2016 and we had a little bit of breathing room, then we started to think about “Well, we’ve put in a massive amount of engineering effort here. This has been a great success for Lyft.” Looking around at our peer companies, they were obviously all solving very similar problems… It would stand to reason that these other organizations would probably benefit.
So it was really in the summer of 2016 that I think we started to have serious conversations about open sourcing. And I think what is so interesting about those conversations - and again, I’m just being really honest here - is in hindsight how naive those conversations were. Because my personal history is, you know, I’m historically mostly a proprietary developer.
[12:02] I’ve done some open source contributions, I think I have one commit in the Linux Kernel, and I’ve obviously used open source. But prior to Envoy I had very little to no active experience doing maintenance of open source. I didn’t really understand what it took to have a successful project. And I don’t think Lyft management really understood that either.
So we had our ideas of what open sourcing would look like, and I think based on that we made some relatively naive decisions around “We’ve done this. Let’s go and throw it out there, and let’s figure out what to do, and how to do it”, but to be honest, I learned on the fly, and it was a difficult time because of this naive approach to open source and not really understanding all the time commitment, and what would go into making it a successful project. I think that we didn’t plan very well for what it would mean if the project really became successful, and what the time commitment would be to actually making it successful.
So at the end of 2016 and the beginning of 2017 were some of the most difficult times in my professional career. I came very close to complete and total burnout, trying to work two jobs. I was effectively leading our networking team at Lyft, still satisfying feature requirements, leading our team, and then the increasing demands of open source. I think that a lot of people think that a lot more rigorous thinking went into the open source process, and the funny thing is that I think some people have asked “Well, with the success of Envoy, has Lyft done more open source?” And if anything, I think it’s the inverse. Because of the success of Envoy I think we have a better understanding now of what it takes and how much that time commitment is.
And I’m a big believer that with open source – I actually think that most open source projects are a net negative. And I mean that in the sense where if you open source something, and you throw it out there, and you don’t maintain it, and you get some interactions online, I think from a corporate perspective you’re probably gonna put out more effort than you’re going to get back. Very few projects become successful enough where you get enough outside development that justifies the open sourcing effort. And I think that the experience of Envoy at Lyft - and we can obviously talk more about this - is just that it is a lot of work to have a successful open source project.
So I think that experience, that initial striking gold, if you will, has led us to a much better understanding of what are good opportunities, do we think that we can actually succeed. And if we can succeed, then let’s obviously go for it; but if we don’t think that we’re gonna get a net positive out of it, I think that’s something that we would have to think a lot more carefully about.
What is it you think you were actually trying to do then, when you first open sourced it? Give us an idea of what the naivety was? Not so much the details to that, but when you thought about open sourcing Envoy, what were you hoping would happen, I suppose?
Yeah, that’s a fantastic question. It’s actually funny in hindsight, because in the summer of 2016 when we were prepping for open source, I think my personal dream, or even Lyft’s dream was that “Let’s get one of our peer companies, one of our unicorn peer companies - whether that be a Slack or a Stripe or a Square, or something along those lines - let’s get one of these internet companies excited about Envoy, and let’s get them using Envoy, and let’s validate that this makes sense.”
[15:46] And I think when I open sourced Envoy, when we did that, I think that was our goal. I was really hopeful that maybe we could get one company using it, and that would be an amazing success, and I would feel really fantastic about that. And really quickly, that became unrealistic, because what happened prior to open sourcing - again, this is going back to summer of 2016 - is we went and had some conversations with these companies. We talked to Airbnb, we talked to Slack, we talked to Stripe… And everyone we talked to, we showed them the code, we showed them the documentation, they were all excited about it, they all had similar problems… But in hindsight it’s very obvious. Because if you go to a company that has a highly dynamic internet architecture, and they have three people running their networking or their services team, they’re not gonna swap out what they have with Envoy. That’s just crazy. They’re just not gonna do it.
So it became a little – I don’t wanna say sad, but disappointing, because it became clear that prior to open source it was a bit of leap of faith, because we weren’t gonna get another company to be an early adopter. And what happened right after open sourcing - I mean, this is literally within a week or two of open sourcing - Google shows up, Apple shows up, Microsoft shows up. And that was what was shocking to me - I think that these companies who have been dealing with larger problems at scale, and might have had some frustrations with existing solutions, they saw what we were doing with Envoy, they saw the feature set, they saw the non-vendor commitment to open source, they saw a bunch of these things. So what ended up happening is that we got these big companies first. And that was super, super-surprising to me.
What happened over the course of 2017, as we got all these big companies, which really super-charged the overall project - we got lots of developers, lots of investment… And then by the end of 2017 we started to see massive adoption from these Slacks, and the Stripes, and those types of places. And I think that was not what we ended up expecting… But that’s just how these things evolve.
From a success perspective, if you were to ask me, again, what was my goal, and has that goal actually been satisfied – I mean, never in my wildest dreams would I have imagined that we would have the project that we have today. It’s just such an incredible success; it’s a once-in-a-lifetime thing.
And because the lines of investment can be blurred sometimes with open source, can you define what you mean by investment? You mentioned a lot of input, but also a lot of investment. Can you clarify that investment statement?
Investment comes in many forms. It comes from obviously code - you want people to write code and write features, and that’s the most obvious investment that people think of… But the reality is that – and this is the thing that I tell people from an open sourcing perspective… Starting a successful open source company is no different from starting an actual company. It requires a lot of things; it requires engineering, it requires PR, it requires marketing… It requires what I call hiring in terms of finding maintainers, finding contributors.
So investment is all things. It’s not just code. It’s people writing blog posts, it’s other people going and talking at conferences, and talking at meetups. So I think what we saw initially from the Istio perspective - that was a Google project that came on fairly early - is that was a huge bump to the project, because it was not just code. They had a lot of people going out and talking at conferences, and building hype.
So I think that those were the types of things that I’m thinking about. It’s documentation, it’s blog posts, it’s build tooling… All of the things that make something successful, that’s not just writing features.
I didn’t wanna not acknowledge that, but I did have that question about investment… So that is super-huge.
Yeah… Look, I don’t wanna say that this is the most important thing that I will do in my career, but it’s going to be hard to top this. I mean, that’s just the reality. This has been a monumental success, and it’s a combination of things. It’s not that I didn’t work hard, to the point of, as I said before, almost complete burnout, but there’s a lot of luck here. It’s being in the right place, at the right time, it’s finding the right partners, having the right opportunities… So I think like all massive successes, it’s gonna be a combination of luck and execution.
This is just one of those things that just came together really well.
Yeah. So what happens with big successes is they often generate more success. One thing that happened to you early on, especially when your end users of a project are the likes of Apple and Google and the like - more opportunities come. And something that seemed to happen to you early on was like a lot of investment opportunities… It seemed like there’s a lot of money-making opportunity right here at the front door, right?
Because you could start a company around this thing, and many have, and some have succeeded, some have failed… But it’s commonplace now - especially in the cloud-native space, where there’s lots of problems to solve and lots of money to spend - to start a platform company. And you came out against that in 2017, on a very interesting post. I’m sure you’ve put a lot of thought into that… And we’re a few years away from it, so I thought it may be a good time to reflect why you’re not gonna do that. You’re not gonna start a platform company. What’s your thoughts on that?
It’s actually funny that you bring that up, because I re-read that blog post recently for the first time in a while… And there’s a few things that actually struck me. One of them is there’s a quote in there where – this is in the early 2017 timeframe, and I think a lot of the venture capitalists who were pushing me to start a company, many of them said to me that Envoy will never be a successful open source project, unless I start a company. And I had written in that blog post that I thought that that was not true. And reading back through that blog post makes me so happy to know how right I was, and how wrong all of those VCs were about that particular statement. And I think at least for the first several years with the project having it be not vendor-driven, having it be community-first, technology-first, we’re gonna make technology-first decisions - I think that’s one of the things that made the project so absolutely successful, is that people never had to worry about the fact that we’re gonna deny features because we’re gonna have some paid premium project, or something like that.
And the other part of it that I thought was interesting is that I had stipulated or I had theorized in that blog post that the way potentially to make money on something like Envoy is not necessarily going to be in the service mesh domain. And I think that has also proven to be true, in the sense that if you look at some of the companies who were within this ecosystem, I think that these companies are actually going through and I think they’re building some interesting services and support businesses… But I think that the pain of what I had written about in that blog post, of it being a very difficult problem to take something like Envoy, which is a data plane that you have to actually sit there and you have to deploy it, whether it be with Consul, or Kubernetes, or virtual machines… It’s a very painful DevOps experience; I think that holds true.
And the other thing that I said in that blog post is that I do think that there’s a lot of interesting things that can be done in the API gateway space, and I think we are seeing that happen as well. So if you look at what we’re doing with Envoy Mobile, of running Envoy on the iOS and Android devices… I think there’s a lot of interesting things happening in the edge computing space, lots of interesting things happening in the security space…
[23:55] But at the end of the day, I do not regret my decision at all, just because I think that my involvement in the project, just being where I am and not working for a vendor has led to an overall nurturing. And I think that if I were working for a vendor – it’s probably less important now, because the project is so established… But in the beginning, being able to be neutral and actually nurture that project without people having to worry about what my motivations were, I think it’s fairly important.
And again, this is me talking about the luck, and being in the right place, and all of these things… I’m not poor. I’ve been a very privileged person to work at a bunch of pre-IPO companies, and it’s not like I’m hurting for money. So for me, I think having the privilege to not have to worry about those things to a certain extent, and to focus on the technology, again, has been one of the things that has allowed the project to grow a lot.
One thing you said in that post which echoes a lot of the sentiments you’re saying here, and is another angle to that, is at the end of the day when you looked at the different types of businesses that would most likely be successful around Envoy, you said “None of these businesses are technologically or personally interesting to me.” Now, of course, you are in the position where you can say “Yeah, I’ll pass on that. Not all of us are in that position”, like you acknowledged there. But I think that takes a certain amount of maturity and reflection, to stop yourself there when the huge opportunity and the money is there, and VCs are metaphorically throwing money at you, to say “That’s not really the life that I want though. That’s not where I’m interested. I wanna stay just writing (not “just”, but you know) running the software, writing the software. That’s what interests me.” That takes some maturity.
I think that when I talk to people about their careers, I think one of the things I typically say to people is that we individually choose our path, and that path is gonna be based on lots of different things. It can be based on what technology we’re working on, how much money we make, where we’re working, who we work with - all of these things. There’s 10 or 15 different things or more that people are gonna care about, and each individual chooses their own path. It’s not black and white, it’s shades of grey, and everyone figures out what the right balance is of all of these factors.
For me, I’m not gonna say that I’ve never been motivated by money, or I’ve never been motivated by title, or all of these things, because all of that would be lying. But as we choose our path through our careers, I think at the time that Envoy happened, I just tend to bias for impact. I tend to bias for solving big problems, and having the large impact that I can have, and that’s what personally drives me.
So from an open source perspective, the thing that I have found honestly most gratifying is the impact. It’s the ability to see the software be so widely deployed, and have it now – I mean, now it literally blows my mind to see all of the cases and all of the companies and all of the organizations that are using this thing, that it just continues to drive me… Because I think that this is really fantastic.
So you mentioned how you were a bit naive when you open sourced this; you didn’t expect maybe the success, you didn’t know what all goes into running a successful open source project, building a community… One thing I am impressed by is Envoy’s community; it seems very robust. The fact that – I have just found out about your EnvoyCon, and I was like “Cool! They’re having a con. They must have arrived.” And then I went to the web page and it’s like, this is the third one… And I was like “Oh, boy. Where have I been…?” So… Very successful in that regard.
What about some of the struggles along the way? Because you mentioned the burnout phase… You’re still here, so you must not have burned out, so you must have learned a few things… Maybe you can share some insights, and some struggles, for other people who have open source projects, are trying to maintain them, or trying to maybe even build the kind of hype that you got for Envoy.
Yeah, I think when we go back to 2016 and early 2017, I think the biggest initial learning was needing to have really open and honest conversations with my actual employer. Because as I said before, we open source from a fairly naive perspective of what the time commitment will be, what the requirements would be in terms of actually making it successful… And from an open source perspective, like I was saying, in a perfect world the few successful projects are where you end up getting more back than you actually put in. That’s what I mean by a net negative or a net positive.
And Envoy, by any definition, in the last four years has been extremely net positive. We have probably 30+ people that work on Envoy full-time around the industry. There’s probably one, or one-and-a-half at Lyft. So it’s just an incredible success in terms of getting those amount of resources. But in the beginning, those resources didn’t exist. We had to seed that. And a lot of that was done by me going out and speaking at conferences, and working on documentation, and doing a bunch of things external to the company that if we were successful would reap benefits. But those are delayed benefits, right? So for the first six or nine months I just didn’t know enough to know how to talk to management at Lyft about what would be reasonable in terms of time commitment. Like, how much time should I be spending doing this, versus working internally?
And from talking to other people, particularly people that make projects, and open source them, and go through this same type of transition, I think this is very common. And this is the thing that I really counsel people on - you really have to have these open dialogues. Because in 2017 I went through some pretty difficult times. It wasn’t just about burnout, it was about honestly differing expectations. I’m a senior engineer at Lyft. From a senior engineer at a company like Lyft, I obviously have a bunch of different things that I’m supposed to be doing, from mentoring junior engineers, to helping fix issues, to design reviews, to all of these things… And there’s only so many hours in the day. So I think particularly during that 2017 timeframe I went through some difficult times, where I don’t know that I was doing my Lyft work to the extent that Lyft maybe expected me to do it. That led to some difficult times, and I was making trade-offs between spending time on open source work, versus internal work.
[32:14] I made trade-offs in favor of the open source work. And I don’t regret those decisions, but I think if we had had some more open conversations, some more reasonable expectations about time, I think – that’s probably the first major issue.
So I think that as we went through this process, trying to bridge that gap for, say – let’s call the first year the investment, before we started really reaping those dividends. I think that that was a very difficult year, of trying to figure out how much work was okay to be doing outside of the work that directly impacted Lyft before we started seeing those returns.
So for anybody else – I mean, I guess you had to be in that position where you are building a project that either is going to be open source, or is open source, inside of an organization like Lyft, where you really wanna have those conversations right up front.
And this is exactly why I said that I think this is counter-intuitive to most people… That because of the success of Envoy I think we are more rigorous around open sourcing now. We are much less naive about it, and I force people to go through this thought exercise of “Are we in it to win it?” Again, it’s no different than starting a company - why are we doing this? What are the goals? How are we gonna get this to be net positive? And if we’re in it to win it, these are the things that we need to do. And then I think we can have a more honest business conversation about “Is this worthwhile?” I think many people don’t go through this thought exercise probably because they are naive like me, and they just don’t know any better. But it’s really important to think through very carefully what is gonna be required. And this is often not technical, it’s not writing code.
For me, I think it was clear to me from the beginning – you talked about this is our third EnvoyCon, and it is. We have two tracks this year, and it’s virtual… It’s amazing that we can get this type of interest… But building that community was so critical. And I spent a tremendous amount of effort early on, making sure that the communication style of the project was super-welcoming.
It’s funny, I think a lot of people talk to me, and they think that probably the thing that I am most proud of is the massive deployment of Envoy. And look, I am super, super proud of it. But if you were to actually ask me, if I were to pick one thing that I am most proud of, it is actually about community. And I have been told by a non-trivial number of people that they had sworn off open source, particularly infrastructure open source, because the communities were toxic, and people were mean etc, and that they love contributing to Envoy… Because of the welcoming community, and the communication style, and all of these things. Some of these people that have told me this, they are epic contributors. And thinking about the fact that they had thought that open source was garbage, or that everyone was mean, and that they weren’t contributing - it makes me sad.
So if you were to ask me again, “What excites me the most?” It’s the fact that we’ve been able to build this community across competing companies, and I think that’s just really fantastic, and it makes me feel great. It makes me feel like we are doing good work across the board, meaning we’re building great technology, we’re building it with a group of people who are both satisfying their corporate concerns, but also getting enjoyment out of contributing to open source, and that just really excites me.
[36:02] This open dialogue you mentioned you have with your employer - Lyft, in this example - did it involve things like IP, or control, or ownership? I know it’s open source, of course, and there’s a license involved, but… Intellectual property, things like that. Are these things you sort of guard against, or consult against, as you mentioned?
These are things that people certainly have to think about. In the case of Envoy it was a little less complicated, just in the sense that a) it would be difficult to file patents on Envoy specifically, just because Envoy is – from a technology perspective there’s not a lot novel in Envoy. There’s no one component at Envoy that is super-novel. The composition is actually quite novel. But from a patent portfolio perspective, not being [unintelligible 00:36:50.24] primary business, I don’t think the IP concern was a large concern, as opposed to if we had open sourced some mapping software, or something like that.
I think where it gets more interesting - and then obviously, we had open sourced using the Apache license, and that’s a very permissive license… What gets more interesting - and this is something that I think we’ve had more conversations on since we open sourced Envoy - is that when projects move into foundations, what a lot of people don’t understand… And it certainly depends on the foundation - but most software that moves into a foundation, the license had already been super-permissive. So it’s like Lyft – Envoy was already Apache 2 license; anyone can take the code, they can use it for basically any purpose.
What actually moved into the CNCF was the Envoy trademark. So Lyft lost the rights to use the name Envoy in a theoretical future project. And I think this is really misunderstood around the industry. I think a lot of people think about foundations as like this donation concept, or something along those lines… And it’s not really a donation. It’s more of a transfer. And it’s not even a transfer of IP, it’s typically a transfer of trademarks.
So it gets tricky there if you’re a company - and we see this more on the vendor side. Frankly, if your vendor - particularly if you are a VC-backed vendor, you’re typically using open source in order to gain traction. It’s like you’re gonna be trying to figure out some type of business model, whether it be open core, or some type of SaaS service, or something along those lines… And you’re open sourcing not for the greater good of humanity, you’re open sourcing because it’s a good thing for your business. And where it gets tricky for those companies is that the company and the open source are typically intertwined from a trademark perspective. And trying to figure out how to disentangle the company from the trademark and potentially move the trademark over to the foundation - that can get quite tricky.
From a Lyft perspective, because Envoy is not really our business, and because we didn’t really have any plans to do anything with that mark, that was not something that we thought about a ton.
What about operationally? Let’s just operationally think about that process in terms of - what fi you tomorrow get an offer from Google that you can’t refuse, and you decide “I’m gonna go work for Google”? What changes in Envoy’s life, what changes in Lyft’s life? We know what changes in your life - you switch jobs… Anything? Is it completely separate? Does that suck for Lyft, or does it not matter? …besides losing a great employee, of course.
Sure. I think that if we’re speaking honestly about it, I think it would have mattered more a few years ago, when the project was a lot more mature… Meaning, if I had went and worked at Google three years ago – I mean, Google already are an incredible portion of the project. They have amazing engineers, they do amazing work. And I think if I had moved to Google three years ago, it would have been very clear that the project at that point was effectively owned by Google.
[40:10] And I actually believe - and again, I can’t actually prove this, and I’m not saying anything bad about Google; again, I wanna be clear - they have been absolutely fantastic partners, and done an incredible amount of work for Envoy… But I think that if I had moved to Google three years ago, the weight of the project early on would have been Google.
And I think that if you look at some of the other people that are making bets on Envoy - some of the major cloud providers, whether that be Microsoft, or Amazon, or VMware - I think it might have made them a bit less likely. I’m not saying that it wouldn’t have happened. It might have still happened, but I think it would have made it less likely.
And similarly to Lyft, as I mentioned to you three years ago, I was a lot more involved in Lyft at that time, running the networking team, and being a bit more on-the-ground than I frankly am today. Fast-forward three years from now, my time at Lyft is probably split about 50/50. I spend about 50% of my time doing infrastructure leadership, but it’s a bit more high-level. And then I spend about 50% to 60% of my time doing open source leadership. But the project is so much more mature that if I were to go work at one of the cloud vendors, so much adoption has already happened that I don’t know that it would change the trajectory. It’s not like companies are gonna abandon their use of Envoy if I went and worked at one of them. Now, it’s a separate question as to whether I would actually do that, but I think that – and this actually comes back to what you were asking before, about not actually starting a company… It’s actually the same answer.
If I were to go and start a company today, I don’t know that it would substantially change the project… Meaning the project is so established at this point that if I were to go and start a company - sure, people might be a bit more wary in terms of actually telling me things, because I might be a competitor, but I don’t think it would change the project, because the adoption has been so huge, and people aren’t going to abandon it because I decide to go to work for one of the cloud vendors, or I decide to go start a company.
So if we could loop back around to the community discussion, the thing that you’re most proud of is the community. If you could give us and the listeners tips, how you did it… If you could say “This one thing”, or “These five things were the most impactful to building that community”, what would they be?
First, I will plug an OSCON talk that I did - I would encourage people to go look it up - where I gave an entire talk on this topic, just because it’s a topic that is so important to me. But I think the first thing that I did, which I already alluded to, is – and this sounds so obvious; when I say this, it sounds so obvious, but I think a lot of people don’t do it… Is it’s just critical to be welcoming and to be nice to people, particularly early-on in a project lifecycle. And I’ve really come to the realization through this process - and you can look at other open source communities and see how they evolve… But humans have a herd mentality, and by herd mentality I mean that when a precedent is set, human behavior typically follows that precedent. So the seeds of how people treat each other, the seeds of the process - they are really set by the people that form the core. And again, it’s no similar – there’s open source, there’s corporate cultures… I mean, you see this all throughout human evolution or human history how these communities get started - it is easier to follow the herd, it’s much harder to be an outlier.
[43:52] So if you start a project and you’re a mean a*****e to everyone, then people are gonna join the project and they’re all gonna be mean a******s, because that is the way that it’s done. And without naming names, we have seen that in some very notable open source projects. And I think I knew that I did not want to do that, and I knew how important it was to set that example from the get-go. And that ranges from taking meetings with people, trying to help them with their early deployments, to just being nice to people on GitHub, appreciating them for their work and all of this stuff that, again, it sounds obvious, but people don’t do it.
And I think that early on, setting those examples, the people that became maintainers - they followed that example, and then we have succeeded in building a set of maintainers and contributors that all act in this way. And I really do think that in a short period of time – I mean, I looked the other day, and at least on GitHub, just from a code contribution perspective, we’re almost at 600 contributors… Which for a fairly low-level – you know, C++; it’s not Go. It’s a pretty low-level project, so that is freakin’ incredible. It just blows my mind. And then you look at all of the vertical products and all these other things that are built on top.
Again, there’s a lot of reasons that people have done it, but I think that a big part of it is that we actually went through and we built a seed from the beginning of welcoming contributions, and making the community very important, and being nice to each other… And then I think those seeds spread.
Well beyond anything else, I always tell people that just being nice and welcoming and seeing that community and having basic human decency is by far the most important thing. Beyond being nice - I mean, I think there’s some other aspects, which comes back to some of the things that I was saying, around a lot of… There’s a lot of non-technical reasons that Envoy succeeded. I invested a lot early on in documentation. I’m a pretty decent writer. So it’s like making sure that the documentation makes sense, and having some blog posts, and going and talking at conferences… And then, again, having people value quality documentation is something that I think is really important.
And basic things… Like, when we launched, we had a nice website with a logo, and it looked professionally done. I think a lot of – not a lot, but I think some people scoff at this idea that you can fake your way through open source, or you can have a very poor technical product, but you can have a great website and great documentation and all these things, then you’ll become a successful project. I take a more pragmatic view, which is that - as I said before - it’s like starting a company. There’s eight different things, that range from technical, to documentation, to PR, to marketing, to HR… And you have to win at all of them.
So to me, it’s not an and/or, it’s a both. And I think that’s what became really clear to me very early on, is that we need to invest in all of these things. And if you were just to sum up – I think some people ask me “What have you learned in the last 4+ years of open source?” And if I have to be honest with you, I have learned little to nothing technically. I worked on these proxy systems for a long time, and sure, I’ve learned a few things here and there, but I don’t even do that much coding anymore. I have learned so much about community building, open source leadership, and leading in a sense where – you know, open source is fascinating, because it’s basically anarchy. None of these people work for me, so it’s about building coalitions, negotiating, making sure that people are happy… And there’s been so many learnings here that have been really incredible.
It’s a shame that the bar is that low though, just to be nice… You know, get your success. I mean, like you said, it seems logical, but… To just be nice, or to be kind, or to be welcoming.
Yeah, it’s kind of crazy that it’s like that. That that’s your profound a-ha moment, or a-ha thing from this conversation or your journey… That that’s the thing, just be nice and welcoming and inclusive.
Yeah, I think when you say it like that, it sounds pretty bad. But I think you just have to look around the industry, and look at other examples of open source to realize that it’s clearly not that obvious, because it’s not done, in so many different cases.
You have to value it, for one, and then you have to be good at it, too. You have to value communication, and good communication, and human behavior, and being kind; you have to value that. But then you have to be good at executing on it, which is, I suppose, being talented, or having some sort of desire to skill build on the human front. Taking care of people, treating people well. It’s not easy…
And the time, as well. You talked about how much time you put in early. A lot of that is – it requires the time. Being nice sometimes takes longer than being not nice, right?
And look, I’ve been in this industry for over 20 years, and I have quite honestly made my fair share of mistakes around not being nice, or failing for non-technical reasons. In fact, my career is littered with non-technical failures. I don’t think I’ve actually ever had a technical failure. The technical stuff has never been super-difficult for me. It’s always been the non-technical side of things.
So my career has been a long list of failures in those areas, and learning from those failures. So some of what I brought to Envoy is an understanding of how important these things are. And like I was saying before, even within that confines, it was not without making mistakes. There were some difficult interactions at Lyft, partly because I was stressed, or I didn’t have enough time to devote to what I was supposed to be doing there… And you know, given what has happened with Envoy, life is full of trade-offs, and I don’t know that I would make different decisions today. Knowing what I know now, I’d probably handle it better… But that’s life. We learn these things as we go along. So I think that’s just the nature of things.
So you’ve got your third EnvoyCon upcoming… Surely, the community will gather, there will be things old and new shared there… What is the present and the future for Envoy, and for the community, and for Matt look like?
Sure. I’m very often asked – you didn’t ask this, but I’m very often asked this question about “What is Envoy’s roadmap?” Because I think people these days have an expectation, partly through vendor-driven open source, that there’s a roadmap, there’s a product manager. And as a community-driven, technology-first project, we don’t have a roadmap. I vaguely know what people are working on, but things happen incrementally, people add features as they need them in their deployments or the products…
So when it comes to EnvoyCon, I think it’s not like we’re announcing a version for this conference, or we’re announcing some big bang feature; I think we’re just seeing the iteration year-over-year of the project becoming more mature… And at this conference we have great talks, that range from large-scale control plane deployments, to large-scale API gateway edge deployments… WebAssembly I think is going to be huge; that is the biggest thing that I think is happening in Envoy right now. For those that don’t know, this is basically embedding a WebAssembly runtime within Envoy, and being able to run extensions there.
[51:50] Historically, the way that people have extended Envoy from a code perspective has been to write Lua in a very limited set of cases, and mostly C++. And apart from C++ being difficult for most developers, the main issue beyond the C++ is that just because of the way these extensions are linked into Envoy, they have to be compiled together… It’s a very painful thing. So WebAssembly is really an amazing ability for us to have a stable API, a stable ABI, allow people to write extensions in Rust, or C++, or Go, or TypeScript, or whatever they want, and have that be separate from Envoy.
So I think WebAssembly is going to be huge for Envoy, both from an edge computing standpoint, security standpoint, observability standpoint… I think it’s just – we have so many extensions today, but it’s just gonna super-charge the ability for people to write further extensions for Envoy. So we have a bunch of talks about WebAssembly.
The security investment in Envoy has been huge, mostly from Google. They’ve done a fantastic job. And that’s not just on fixing bugs, or filing CVEs. For example, there’s people at Google now that are working on software supply chains… And again, for those that don’t know what a software supply chain is, it’s this idea that we have Envoy, and Envoy is hundreds of thousands of lines of code. But Envoy depends on millions of lines of libraries. And if you don’t understand where those libraries come from and what their CVE process is, and what their maintenance process is, you can wind up in a lot of trouble. And this is not a problem just for Envoy. This is a huge problem across the industry, that people don’t have a great understanding of software provenance and what all the dependency chain is.
So we have some people that are working on some really interesting stuff around better tooling, making sure that we track our dependencies, various other security concepts, various other dev-focused talks… So I am obviously – I’m actually not a huge conference fan, but when it comes to EnvoyCon, for obvious reasons, I absolutely love it. So in this year that we’re in, I am sad that we are not gonna meet in-person. It really does make me sad, because for the last couple of years I have loved seeing people in person. So I am bummed that we are doing this virtually, but I’m still excited for the talk line-up.
We’ll be sad with you then, because we miss several conferences this year that we totally love, that are staples for us, OSCON being one, for example… And not being able to go there and high-five and hug the people we see each year is a real bummer. Back to the human connection thing - there’s nothing that replaces a good face-to-face with somebody. Being separated is so difficult, especially when community is such a crucial aspect to the enjoyment; not just the success, but the enjoyment.
And from the open source side of things even more, where these are people that I talk to on a daily basis. We interact so much, and I’ve become relatively close to them, at least from a work perspective… And I only get to see them typically once a year. So it’s like – just from that perspective, I think I do miss actually seeing people a lot.
Yeah. Matt, any closing thoughts to share? You’re a maintainer of an awesome project. Obviously, you’ve shared plenty, but is there anything left that we haven’t asked you, that you’re like “Man, I’ve gotta share this.”
I think the only thing that I would share is I think as an industry this is a topic that we need to talk about more. Just because I think my experience of trying to balance my corporate work with the open source work, and just the general topic of – open source is not what it was 30 years ago. It’s mostly not a bunch of hobbyists, meaning the major projects that we use today - they’re developed by professionally paid programmers that have lots of job opportunities… And trying to figure out “How do we compensate these people? How do we make it so they don’t burn out? What are the right structures?” is something that we do not understand from an industry perspective.
[56:01] I think that we see a lot of burnout among maintainers, and we don’t have a good idea of how to pay for this work… And I don’t have any solutions. If I had any solutions, I would have already implemented them, and I think this is a really thorny problem. So I think the only thing that I can say is that I think we have to have more conversations like this, where we actually talk about these issues. We talk about funding, and burnout, and toil, and just how much work is involved, so that people better understand how the software is actually made.
And I come back to the conference talk that I was talking about before, the one that I gave at OSCON… And I always chuckle, because I don’t give as many conference talks as I used to, but I used to go to a conference, and I would give a talk on Envoy, a technical talk, and I would get several hundred people in the audience. Everyone wants to come and learn about the technology. And I went to OSCON and I gave a talk about the open source sausage-making, and 20 people showed up. And it’s very indicative to me that people still - rightly so, they care about the technology. They want software that they can use, but they’re not really investing in understanding how the software is made. And for such critical infrastructure, I think it’s in our interest to better understand how it’s actually made.
Well, you’re here with kindred spirits, because we care, and that’s why we have this show, this flavor of The Changelog called Maintainer Spotlight. We love digging in to kind of peel back the layers of a developer’s/maintainer’s lifestyle, the choices they make, why they make them, why they didn’t or did build a company around an open source project, whether it’s even okay to make money from open source… There’s many different subject matters we’ve covered. We’ll link them up in the show notes. Check out the topic “Maintainer Spotlight” on our website.
Matt, thank you so much for sharing our desire for sharing this topic, and talking through it with us, because we think it’s very important.
Thank you for having me, this is fantastic.
Our transcripts are open source on GitHub. Improvements are welcome. 💚