Open Source, Then and Now (Part 2) with Karl Fogel, author of Producing Open Source Software (Request For Commits #2)

All Episodes

Nadia Eghbal and Mikeal Rogers kick off Season 1 of Request For Commits with a two part conversation with Karl Fogel — a software developer who has been active in open source since its inception.

Changelog++ members support our work, get closer to the metal, and make the ads disappear. Join!

65 minutes
Recorded Aug 4, 2016
Published Aug 4, 2016
Download (62MB)
Transcript
🎧 4,316

Featuring

Karl Fogel – Website, GitHub, X
Nadia Eghbal – GitHub, X
Mikeal Rogers – GitHub, X

Sponsors

Linode – Our cloud server of choice! This is what we built our new CMS on. Use the code rfc20 to get 2 months free!

Rollbar – Put errors in their place! Full-stack error tracking for all apps in any language. Get the Bootstrap plan free for 90 days. That’s nearly 300,000 errors tracked totally free. Members can get an extra $200 in credit.

Notes & Links

📝 Edit Notes

Karl served on the board of the Open Source Initiative, which coined the term “open source”, and helped write Subversion, a popular version control system that predates Git. Karl also wrote a popular book on managing open source projects called Producing Open Source Software. He’s currently a partner at Open Tech Strategies, a firm that helps major organizations use open source to achieve their goals.

Read Karl Fogel’s book — Producing Open Source Software
Make sure you start with part 1 with Karl Fogel where we kick off this conversation.

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

I’m Nadia Eghbal.

And I’m Mikeal Rogers.

On today’s show, Mikeal and I continue with part two of conversation with Karl Fogel, author of Producing Open Source Software. Karl served on the board the Open Source Initiative, which coined the term ‘open source’ and helped write Subversion. He’s currently a partner at Open Tech Strategies, helping major organizations use open source to achieve their goals.

Our focus on today’s episode with Karl was around shifts in open source communities and governance. We talked about the role of casual contributors in a world of rising open source activity, and have different projects handle this increasing demand.

We also talked about cultural gaps between generations of open source and where it might all come together in the future. If you missed our first show with Karl, make sure you got back and listen to part one of this interview first.

So Karl, there are more open source projects today that are an order of magnitude higher than they were even ten years ago when you wrote the first edition of your book. There are also more people that are learning to code than ever before. How is this difference in scale of projects and resources changed the open source landscape.

Oh, what a big question. The part that I notice - and this is gonna sound very like curmudgeonly old man-ish, is that it’s no longer possible to know everyone in open source. Of course, nobody ever knew everyone in open source, but you sort of at least knew their names or you were at most one degree of introduction removed from whoever was running that project over there, and that’s just no longer true. Open source is just this gigantic, teeming world of people, most of whom you will never meet, and that feels different - I like it, it’s like wandering out into a real world, instead of a clubby little tribe, but you have to sort of adjust and realize that you might as well throw away your Rolodex because you’re just never gonna meet everyone at this point. So that’s a very personal answer, like how does it feel to me. How does it affect the dynamics of the ecosystem? Or is it even meaningful to say THE ecosystem as one unified thing now…? It’s funny, the effect it seems to have had is actually that it’s forced a greater uniformity and standardization of processes. It’s just like light sockets - once every building gets electrified, you really have to have a standard socket size; you’re going to use your devices in every building now, so you have to be able to plug in everywhere, and it’s the same… People need to be able to wander from project to project and just get stuff done, so they’re gonna look for a file named ‘readme’ and maybe it’s gonna have a .md extension. They’re gonna look for a filename ‘license’, they’re going to submit pull requests the same way, they’re going to look for a contributor’s guide, and it’s usually gonna be ‘contributing’ in the top-level of the project tree, things like that. Those standards have become more important as the number of people involved and the diversity of types of projects and of levels of skill of programmer have changed, and I think that’s great. Anything that can make open source more accessible to programmers of all types and all skill levels is a good thing, in my opinion.

[\00:03:48.17] I don’t know if this is an apt comparison, but it reminds me a little bit of something that Rod had said in our episode about the role of contribution - when you have tons and tons of people coming in and a higher volume of contributions, then it’s actually counter-intuitively harder to change things than if you had a BDFL [Benevolent Dictator For Life] or whatever; I think he’s trying to say that people who are afraid of giving up control or opening things up because they’re afraid that the crowd is essentially gonna change the project, but actually when you have tons of people involved it becomes harder to change things. So your comment about how greater scale can actually help standardize things and help make projects more uniform is kind of a good thing, even if it doesn’t always seem that way.

Well, when you say ‘change things’ you mean like change the technical procedures of the project, not change in a technical direction, like feature changes or design changes.

Right. It’s almost like those structures become more codified, because you just have to deal with so much volume.

Yeah. It’s interesting, you made a contrast there between having a BDFL (benevolent dictator for life), the person who is the final arbiter of decisions when the group can’t come to consensus in an open source project, versus having a lot of diverse contributors. I actually don’t think there’s any contradiction there. In fact, a BDFL is probably, if anything, more likely to happen in a project that’s growing rapidly and where lots of people are coming in, because it’s a much faster way to resolve conflicts. Democracy - direct democracy especially - doesn’t work very well in a place where the electorate is constantly changing and it’s not even clear who the electorate is, whereas the one thing about the BDFL is it’s very clear who’s making the decisions when decisions need to be made. And it’s not real dictatorship; if people can fork it, then dictatorship becomes safe. It’s an okay option for governance now.

That’s an interesting comment. I do think that it’s more common for BDFLs or de facto BDFL models, because they’re never actually codified in projects like this, at this scale. But also, I keep thinking about the sustainability of those projects, and if one person is essentially responsible for all the decision-making and maintenance and they’re not growing other people that can handle that burden, in this new world where you have all of these casual contributions and all these drive-by contributors, how are you managing that increase and load? What’s a sustainable strategy to do that?

Well, that raises a really interesting question. One question I wonder about sometimes is “How do we know whether the Linux kernel is good or bad?” I mean, my box is running fine, I’m not worried about it crashing, but Linus has been so good at keeping the project unified that there hasn’t really been a serious attempt to fork, and because there hasn’t been, that makes it much harder for anyone who is contemplating it to actually do it. So with that positive feedback loop, they decide not to do it, so you never hear about it. But in a project like that, how do you know whether things are as good as they could be? Maybe there are gazillions of really good patches that just never get incorporated. I don’t know, I’m not involved in kernel development at all, but I think for projects like that it’s very hard to know unless you’re closely involved whether the project is actually being successful in its own terms, or as successful as it could be.

And also, the Linux kernel is not GitHub, right? It’s not dealing with a flood of casual contributions like those that would come from GitHub.

I don’t think the fact that it’s not on GitHub is the reason it’s not dealing with a flood of casual contributions; I don’t think you can be a casual contributor to something as complex as the Linux kernel. There’s just too much to learn. I’m involved in another project, the Emacs text editor; I’m not one of the major developers at all, but I maintain a few of the Lisp packages in Emacs; [08:01] if you’re really gonna work on the internals of Emacs, there’s just a lot to learn, and they could put Emacs… It’s in Git, it’s not on GitHub, but it could be on GitHub and it wouldn’t make a difference in terms of… The obstacle to writing a code change is learning the incredibly intricate internals and coding conventions of the Emacs source code; it’s not where it’s hosted or what the PR process is. And that’s definitely more true for the Linux kernel.

I disagree with this in a couple ways, because I hear projects say that a lot, and usually that project has awful documentation and a shitty website. Those are things that are not very difficult for people to actually technically go and fix, but they’re not being fixed because the barrier to trying to fix them or engage in them is just to high. And because you don’t have people coming in at that level…

Okay… I have a counterexample for you.

… and because you’re not bringing in people to fix small doc changes or to fix the website, you’re not even growing a culture that is thinking about barriers to entry, so of course you’re gonna continue to develop conventions that are very hard to make it through. You’re right that we don’t have a fork of the Linux kernel to look at, but just look at FreeBSD. FreeBSD is a lot easier to contribute to, has done work to make it simpler to get involved in the community - at least compared to Linux - and while it’s not on GitHub, it does have a huge and thriving localization community; the number of users of it is significantly lower than Linux, but it actually does have a pretty enviable amount of contributors.

Yeah, that’s a really interesting point. It may be worthy of more study, because I haven’t looked closely at that project. I mean, I do believe that its user base relative to Linux has been going down, unfortunately… Or unfortunately for them; I don’t know if that’s unfortunate in the global sense or not.

So market share-wise yes, but market share-wise on servers everybody’s losing to Linux no matter what, right? I think that their growth rate relative to themselves - their growth last year - is actually looking well.

Well, I’m glad, because I think some degree of diversity is healthy. I don’t want all of the free software operating system eggs to be in the Linux basket either, even though I’m a long-time Linux user myself. But just to give you a quick counter-example, although I think what you said surely is true of some projects, but the ones I’m most familiar with, where I’ve looked closely or in one case I am a direct participant (Emacs) it is not the case that they have set up extra barriers. There is incredible documentation on not just how to contribute, but how to understand the internals. There’s a mailing list that is ready and willing to answer questions and does all the time, and for things like documentation and the website they accept changes all the time. The website just recently got revamped by a total out of the blue contribution from a volunteer who did a great job.

It looks great, yeah.

But the core of it is just hard, and it’s just not about being on GitHub, it’s just you have to understand how the C source code interacts with the Garbage Collection routines, what macros to use and how the redisplay engine works. You have to spend a lot of time studying it, and that’s not gonna happen for drive-by contributions. And I think that is also true for the Linux kernel, and there are some other projects where it’s true.

Yeah. I mean, it just doesn’t map with my own experience. We’ve just had areas in NodeJS that we never thought that we would end up getting contributions to, outside of a core group of people that were spending all their time working on Node, because they were just too technically complex. And the more that we opened up… Yes, we saw a flood of contributions for the things that were much easier to get involved in, but over time people leveled up into those areas.

[11:59] Yes, I didn’t mean to say that that doesn’t happen. That happens, but I think that that leveling up is going to happen - do you think that being on GitHub or not would have made a difference in that leveling up?

I think that one, you have to have good processes in place and a culture of mentorship to level people up. But if you think about it like a funnel, GitHub is gonna increase the size of the funnel coming into that process, unquestionably. So I do think that it would increase the number of people coming into the funnel; I don’t think that in and of itself it’s a solution though to leveling people up.

I can’t argue that that’s not true, I just don’t know. And certainly there are other dysfunctionalities in the way the Emacs project is run, although they have been improving a lot lately. That sort of confound this experiment, so it’s hard to know. But yeah, I think the idea of increasing the funnel makes a lot of sense. So if I was arguing with you, then I’ve officially stopped.

[laughs] I think maybe a good thing to study would be to look at projects that have a long history that then decided to move to GitHub, and see what happened there. I have a couple examples. For instance, jQuery was one of the first larger projects with a big history to come over, and at the time pull requests were barely a thing, but John Resig spent a lot of time looking at people’s forks… Like, literally they’re off in their own corner just doing something and he’s guiding them to do changes in a way that might actually be incorporated later…

Wow…

… so it actually did turn into… I mean, it was a lot of community management on his part, but it turned into a big change in the project and how many people were involved.

Well, but it’s also seeding a culture. Every one of those people where he was off helping them with their own fork, not even in the upstream core repository - they remember that experience, they carry it forward and most of them will help do that for someone else later on. I think it’s not so much the mechanism, it’s just the psychological message of paying attention to people and giving them feedback and encouraging them to do the same for the next people who come along; that’s what makes the real difference.

Yeah. I mean, even the people that weren’t directly involved see it, because it’s happening in public. When we had Rod on we talked about how it’s really important that every change comes through a pull request, even from people that have been contributing for years, because then everybody sees it, they see the same review process, and it just creates this culture of review and mentorship, and helping people along.

Yeah, making that visible is a huge, huge part of a healthy project, I completely agree. You know, one of the things you said about how… A really good thing to do would be to find long-running projects that switched over to GitHub without making any other major changes and see what effects that had. It reminds me of one of the unfortunate economic realities, which is there are a lot of really interesting research questions in open source, and there just is not that much funding to do them. Nadia, since you sometimes do have that funding, I hope that you’re able to do all this research, or at least some of it, because this is stuff like from my company… We help organizations and government agencies launch open source projects and manage them and run them and fix their contracting language, and stuff. But I’m always looking for the customer that is gonna magically pay for us to do that kind of research, and it’s very rare that happens.

Very rare, that’s true.

Yeah, there’s a couple projects that have gone from proprietary to open source on GitHub, and did so with good policies of accepting pull requests and mentoring. Surprisingly - well, not surprisingly anymore, but surprisingly if you talked to me ten years ago, Microsoft has been really good at this.

[16:04] Yeah, it’s been amazing.

Yeah, and ChakraCore as a JavaScript DM… It is a fairly complicated piece of technology, and their total contributions and the number of people involved shot up hugely when they went open source. But it wasn’t just putting it out on GitHub, it was also having a culture where the people internally that have been working on it for a while have time set aside to review code and mentor people and bring them into the project.

Yeah. Well, also if the change was going from closed source to open source… Sure, contributions shot up, because it wasn’t possible to contribute before. [laughter] The experiment there seems fairly clear cut. Open sourcing your code gets you more contributions. [laughter]

That is definitely true.

Some of your research for the revision of this book, you were looking at how the CLA landscape has changed. Can you tell us a little bit about what’s changed in the past ten years or so and how that dovetails with these casual contributors?

Yeah, I think the past ten or fifteen years was a time of experimentation among CLA. For listeners who don’t know, a CLA is a Contributor License Agreement, which is basically like there’s some upstream open source project that you wanna contribute some code - maybe you fixed a bug or added a new feature for them - you send in your changes, but they want you to also send in some kind of assertion, digitally signed or maybe fax in an actual real signature saying that you are giving the project this code donation and all future donations made via the same mechanism under a certain open source license or under certain terms, so that they can incorporate it safely without any fear of legal repercussions later into their codebase. The idea is later if you change your mind and you’re like “I didn’t give you that code to distribute under the GNU general public license, I’m suing you now”, they can say “Well, we have the CLA that says you can’t turn around and do that.”

What some companies started doing - and they’ve mostly stopped now, because this got unpopular, although a few still do it - is they would have CLAs that would say “You agree that you’re donating this code to project XYZ and that we, company Q, which is a major sponsor or the founder of project XYZ, are allowed to redistribute your changes under any terms we want.” That includes the open source project license to the project, but it also means that they could make a proprietary fork of the project. Some of these companies did that in order to retain the right to do a proprietary fork, or sell licenses to, or something like that.

Those kinds of CLAs have gotten pretty unpopular, because a lot of developers just said, “Well, I’m not gonna give you asymmetrical rights. I’m giving you code under the license here, giving the world, including me, code under license. Let’s just keep it symmetrical and not give you rights that I don’t have.” This became objectionable enough that then when a company or a project would set up a CLA of that style, they would just immediately be noise.

What surprised me though - this is something I discovered during the research for the book, and I have to give a shout out to Bradley Kuhn, who follows this stuff; he’s at the Software Freedom Conservancy, and was able to tell me a lot about what had changed in the CLA landscape and point me to examples. It’s that not only have those particular odious kinds of CLAs become less popular, but CLAs in general have become less popular. More and more projects have just said, “Look, as long as you certify that you are the author of this code or that you have the right to contribute it under our license to the project, and that we can redistribute it under that license, then we’re good.” So that’s not really a licensing agreement for the contribution, it’s called more of a DCO (Developer Certificate of Origin) where you assert, and usually an email or maybe a digitally signed document of some kind is enough to just say “You have this code. Here’s my DCO, and now we don’t have to sign anything or have an agreement.”

[20:20] So I think the world is moving more toward DCOs. There are still CLAs out there. Some projects have important reasons why they need a CLA; for example, apps that are gonna be distributed in the Apple App Store but are free software under a copyleft license, there are various things about the Apple build and production process that get things in the Apple Store where the terms that the project has to agree to with Apple are not compatible with the GPL, so the developers who contribute all have to sign a CLA with the project, where that gives the project enough of an exception to the GPL to be able to sign this agreement to get the thing in the Apple Store, but otherwise the project is under GPL.

There are some cases where CLAs are still necessary and many other cases where projects still use them and people generally agree with them, but I do a general move away from complicated or onerous CLAs and towards simpler, more lightweight things like DCOs. I think - although I should stress that this is not legal advice to anyone - my business partner and friend James Vasile, who is a lawyer, has observed some of the same trend and clued me into it, so I should give him some credit for keeping tabs on this as well. Does that answer your question?

Yes, that’s a very good answer. And it dovetails great into what we need to talk about next, but first we’re gonna take a short break, and then we’re gonna come back and we’re gonna get a little more deep on governance policies.

We’re back with Karl Fogel. Karl, earlier you said that the scale of open source has lead to the standardization of a lot of processes and policies. From my perspective, I haven’t seen a coalescing around particular governance models, at least not yet.

Oh, I’m glad you said that, I agree with you. I have not seen it either.

Yeah. Do you have any thoughts on why that might be?

Yeah, I do. I end up explaining this to our clients. Our clients are people who are much less familiar with open source than anyone on this call. For many of them it’s their first foray into this, and one of the things we always have to tell them is that governance is not the first thing - or even the fifth thing - that they should be thinking about. By the time they come to us they thought, “Okay, we’re gonna release this thing as open source software.” Maybe they’ve written it already or maybe they’re in the process of writing it, and the first thing on the agenda for that kickoff meeting with us is like, “Okay, we need to write down a governance policy, a clear membership structure and all this stuff for how the project’s gonna be governed”, and we’re always telling them “Don’t worry about that, don’t give it a thought. Just release the code, make sure that the developers managers are aware that the developers will need some time to deal with incoming questions and pull requests, and we’ll sort out the governance later”, and they’re always kind of shocked, because they brought us in as experts, they thought, on governance, and we’re telling them not to worry about it.

[24:16] The reason is, let’s do a thought experiment: why do we have government at all? This word ‘governance’ comes from the idea of authority structures, and those authority structures exist to help us make decisions about how to allocate scarce resources, right? We have private property and ownership of real estate and stuff, and the whole point of government is to quickly and definitively adjudicate disputes over the use and allocation of those non-replicable resources. But an open source project doesn’t fit that definition - it is replicable, you can fork it, and so you don’t need governance. To a first approximation, you don’t need governance at all, and that’s why BDFL works.

The reason to have governance is the non-replicable resource; the finite resource in an open source project is obviously not code, and it’s not the CPU cycles, it’s the developer’s attention. The scarce thing that might go away if there’s a fork is that everyone might start paying attention to this thing over here instead of that thing over there. And that is a decision that every individual who’s attention is in play makes for themselves. So governance is really a form of marketing or persuasion. What you’re trying to do is convince every developer in the project that every other developer is going to stay here, so they might as well too, because nobody wants to do a fork where they’re the only one forking, right? That’s a losing proposition right from the gate.

This is a very cynical way of saying it and I don’t actually think of it this way, but it’s a kind of Stalinist move. How to become Joseph Stalin or any dictator? You convince everyone in the room that everyone else in the room will obey you. Once every person believes that about the people around them, they will obey you too, because it’s too dangerous not to. Well, open source is the nice version of that - how do you convince every developer that every other developer really believes in the current leadership structure and in the way things are going. Once you figure out how to do that, you’re gonna have a stable project.

So that is not really an exercise in governance. You don’t need a police force, you don’t need a national defense, you don’t need a courts system to make that work. You just need persuasion and personal skills.

Now, we can notice of course that many projects do evolve some kind of formal governance structure and sometimes it involves voting. Usually, voting is a fallback mechanism for when consensus cannot be reached. It’s not like they vote on every decision, but everyone knows that the potential to hold a vote is there, and so they will sense which way the wind is going and give a decision and just compromise and go with that, because they know they will lose the vote anyway or, conversely, win the vote.

So the reason I think that projects move toward those kinds of governance structures is that once a BDFL leaves - the charismatic founder of the project maybe goes off and does other things, or screws up in such a way that nobody trusts their judgment anymore, or whatever it is… Once that happens, there is not a clear answer for who should be in the driver’s seat now, right? And so the default answer, the solution that everyone can quickly agree on, and more importantly, the solution that everyone believes everyone else will agree on is “Oh, we’ll have some kind of democratic, consensus-based governance model”, and so that’s what they do, because it’s the proposal that everyone know is gonna be accepted, so it almost doesn’t matter who makes the proposal.

[28:13] And it’s especially helpful when you have organizational participants. If you have corporations or governments or nonprofits who are investing money in the project, either through direct contribution or by donating developer time - or, we should say, investing developer time - the managers, the decision-makers at those organizations, they feel more comfortable when they see that kind of governance model, so it becomes a self-fulfilling prophecy. The investment energy is gonna go to a place who’s governance structures make everyone comfortable, even if you never actually have to take a vote. And in practice, there’s usually a few people who have technical leadership just by default, because they know the code really well or they have good people skills, or a combination of those two.

So I think governance is very soft in open source projects; it’s mostly not necessary, and it’s usually not the most interesting topic. It’s way less interesting than figuring out the right workflow for incorporating contributions and things like that.

Do you think it doesn’t really matter what model we use? Is there really just no difference between BDFL and a meritocracy or whatever?

I guess I’d have to ask, matter for what? What’s the objective by it mattering?

Yeah, that’s a good point… Because I feel there is a difference in terms of… And I don’t know how to put my finger on it or articulate it, but sort of just like philosophy, or culture, or some other very soft word like that? Especially in how people think about welcoming new people and how they think about handling contributions. When it comes to decision making, I actually feel like everyone is kind of the same, in some shape or form. There’s always some sort of ultimate tie-breaker in how well it’s enforced and not.

That’s a good point. I think you’re right that projects that have a single leader who is the arbiter or stuff tend to - I think, this is sort anecdata, but I think they tend not to concentrate as much on welcoming new developers and on making the contribution workflow easy etc., partly because when it’s a single person who feels responsible for steering the project, that person naturally falls back on dealing with the people that he or she is already most comfortable with, and those are the people who are already incorporated in the project and know the procedures.

Whereas when it’s a group, for a given person in the group, a good way to have an influence on the project and to make things - whether it’s out of desire for personal influence or a genuine idealism about keeping the project healthy or whatever the motivation is… One of the best things you can do visibly in a project is get more people into the project, as long as those people are good contributors and they play well in the sandbox, so to speak.

So for group-governed projects, there’s a natural feedback loop where the group wants to make it possible for new people to come into the group.

Yeah, and I don’t know if I would call it governance. I struggle with this… It’s about something else, I think. Participation models? Contribution models?

Yeah, you’re right… There’s no good word for this. I guess we’ll probably end up using governance as the word, but then everyone will misunderstand what we mean.

In Node we do have a separation between the governance of the project and the contribution policy of the project, because one is the formal structure for decision-making and the other one is like “How do we get contributions in?”, but…

I think every project makes that distinction.

[32:00] Right, exactly. So there is a distinction in those policies, but where I do think they meld together - and I really love your method of saying, “Government is there to allocate scarce resources”, so how do we identify what the scarce resources are? I think one of the shifts that happened is we have all of these contributions coming in that are small, we’re not lacking those resources; it’s a matter of how do we incorporate those, and then the scarce resource actually stops being the time and attention of a ton of people and really just the time and attention of those people that are maintaining or trying to get things in. In a BDFL model, if the BDFL can handle all of that workload on their own, then there may not be a problem, but usually they’re spending time on more than one project, right? So maybe the BDFLs are involved in a ton of different projects and just don’t have the time to do all of that workload, so governance becomes a way to share that workload and to have a system by which we can share the workload and make decisions as a group, because it’s actually less effort on any one individual.

It’s just like management models, or something.

Right, right. So this is like a really good, basic economic model, but if you look at it in terms of behavioral economics and you’re like, okay, let’s assume that people are not always rational, I think that what we see is that a lot of people don’t move to these models. They stand in a BDFL model until they burn out, until it’s bad for them and bad for their project.

Yeah, that happens a lot. But that’s kind of like, “Okay… Well, whatever it takes.” Maybe the project has a rough year until they finally get it through their head that they can’t go on this way.

It happened on Linux. I mean, it’s still BDFL, but there was this sort of like time of reckoning where they were like, “Oh, we need to fix all these problems.”

Yeah, Linus is the BDFL in the sense that if there’s controversy, everyone will agree that he can resolve it about a decision, but he’s not the BDFL in the sense of he’s the only person who can incorporate patches in practice. He’s part of a group of people that he has appointed who are all now approved to put those patches in.

I’d like to hear more about this dumbbell effect that you’ve talked to us about before. You said that you were noticing in today’s open source an increase of more one-person projects on the one end, and then you have these very large B2B projects on the other side. If we’re talking a little bit more about different project models, however related that is to governance or not, what are the emerging norms around different project models that you’re seeing?

Yeah, so what I meant by the dumbbell effect… I think that’s partly a consequence of the standardization of the contribution workflow model around whatever GitHub promoted, which is basically the pull request model, which now even not on GitHub is basically the way other sites work, and also a result of there being largely one user-facing development and usage environment, which is the browser. I think that’s an underappreciated revolution. It used to be that if you were gonna write something that went on someone’s screen, you had all sorts of options, like which widget, ex Windows or other graphical user interface widget library were you going to use, how was it going to interface with the system; you had to make all sort of decisions and your code would be incompatible interface-wise and perhaps library-wise with things that had made different kinds of decisions. And now there’s this one world; the browser is, if you take mobile platforms off the picture, and even they are somewhat browser-based, the browser is like the only platform that matters. Nobody writes native apps anymore, except a few exceptions like LibreOffice and things like that.

[35:54] What that means is there’re all these users who started learning to do View Source and then they started learning that all that JavaScript is minified, and if they got the unminified copy, they could read it and understand it, and it’s this tremendous gateway for individual programmers to start making contributions to open source, because every company’s writing web programs, and they’ve gotta find people who can write web code and surely there’s tremendous demand; everyone knows that’s a promising route to go if you’re learning to code. And the result is you get this huge universe of JavaScript libraries and JavaScript-based projects that were started by a person, who suddenly finds themselves overwhelmed with contributions coming in from this huge number of programmers, because they all agreed that JavaScript and the browser was the way to go for programming. So that’s one side of the dumbbell, this swelling of that kind of project.

If you were just some normal journalist and were not really involved in this stuff, you could be forgiven for thinking that open source is essentially just JavaScript stuff on GitHub. That’s what it all looks like, right?

Yeah. [laughs]

And then the other side is this new thing where companies start using open source releases and projects as a strategic move in markets, where they open source things because they see for example that a competitor is moving in on something, and the first company realizes that if they get first mover advantage by releasing a decent library that’s open source - okay, their competitor will use it, but the first company has all the employees with the expertise, they have momentum, they will be able to run the project and maintain influence, and the second company won’t really have a choice, except to get on board. So at least now you’ve put them in a kind of parasitic position relative to yourself. You gave them free code, but you also hobbled them a little bit, you coupled them to you in a way that is advantageous for you. And that’s just one motivation, it’s not the only reason companies release open source.

The other end of the dumbbell is these large scale, always salary-to-developers funded multi-company projects that as an individual contributor you’re not very likely to waltz in. I’m sure there are some people who are talented enough and have the time or the ability to go in and make some fundamental, important contribution to TensorFlow, but I have a feeling that most of the changes going to TensorFlow are from Google employees or from employees at other companies who are using TensorFlow and where they have to make the changes part of their job.

So that stuff, that requires a higher upfront investment and expertise, and it’s only sustainable because there are corporate dollars behind it.

And then the middle part of the dumbbell is thinning out a little bit, which is what I used to think of most of the open source world, which is this kind of profusion of apps written in all different languages, for all different kinds of platforms, with different GUI widget toolkits and things… It’s not that they don’t exist, but as a percentage of open source activity, I think that’s going down.

Well, also that middle is… Even Windows projects exist, they’re actually just collections of all of the smaller projects at the other end of the dumbbell, right?

Yeah, actually that’s another… I never thought of that as being part of the reason, but you’re right. Part of what’s going on is that there are so many libraries now, that most of what you used to have to write by hand you get from a library. So to get whatever done, you’re just writing less code to get that thing done. But that means that most of the actual open source activity is happening out in the things that are your dependencies, which is the left end, or the first end of the dumbbell.

[39:53] From a sustainability perspective it becomes really interesting that there are all these different emerging models, because on the company/corporate end I think the sustainability is not really an issue, it’s more of “Can we actually get people to use this project?” And on the very lowest end, of people having very small projects, sometimes it’s trivial to manage. But then there’s this awkward in-between of… It’s big, and there are a ton of people using it, depending on this thing, and I don’t exactly know what model it fits into for the future.

Right. It’s like, I know this thing has economic value for a lot of places, but I don’t have any clear path for channeling some of that into the maintenance and into supporting my work on it.

Right, and it’s not so big that… It goes into a foundation, or something.

Yeah. I mean, it seems to be one of the areas that you’ve been focusing a lot in your writings, which I’m really glad to see, because there are a lot of important projects that fall in that in-between zone, where there’s just this burnt out lead developer who doesn’t know how to sustain this thing, and yet there are all these people depending on it.

Yeah. I think this is a great time for a quick break. Then we’re gonna dig into that middle section, and how these cultural shifts have affected sustainability in open source.

We’re back with Karl. Karl, we’ve talked a lot about projects in terms of differences in governance models, and I think that there’s… We’ve kind of been taking for granted this notion of starting a project or developing a plan, but I’m curious how older projects that have a set of policies are affected, because yes, we have this huge amount of growth in open source contributions, but it’s really happening in these new tools, in these newer models. How do we sustain existing projects?

I’m trying to think of some examples of what you’re calling the existing projects, just so I can draw on them and sort of focus the answer a little bit. Are you thinking of things like infrastructure projects, like the DNS servers?

Right, right… That’s a pretty extreme example. I mean, you could even think of languages like Python…

Oh, okay. I do think there is a sense in which software projects reach maturity and they just don’t need a huge amount of maintenance, or not as much as they used to. I get the feeling that the effort to reach Python 3, that was probably the last big push in Python. I’m not sure… Like, where would the language go from there, right? They’re always gonna have bugs to fix, and there will always be a core maintenance team, and there are plenty of companies depending on Python, so there will always be money to support that, but…

Well, but the world changes around you and the market changes around you. For instance, one of the problems with Python is that we’re moving towards this sort of microservice, dockerized world where we actually have fewer resources for each application process, and Python isn’t particularly good at resource utilization in that world, right? Java has some of the same problems as well. So unless they can get people to work on things that they had not traditionally worked on, like improving VM performance, they may not be as good of a fit as some of the new languages, right?

[44:12] And those are major overhauls…

Yeah, there are some products that have long-standing, or future, upcoming technical issues that one can see down the road, but I feel like the question of whether it is important for Python to solve that problem will be answered by whether Python solves that problem. If it’s really important to someone - by someone I mean some company or a group of companies; that’s the kind of resources it takes to pay for things like that - then of course it will get done. They’ll fork Python if they have to…

That’s sort of a tragedy of the commons kind of mentality though, right?

No, I don’t think so.

Like, someone out there will take care of it, but sometimes it doesn’t get taken care of.

No, I think the tragedy of the commons - to the extent that it exists is a different thing - that’s when every company is sitting back, waiting for someone else to take care of it.

I think that’s what actually happens to a lot of projects. We might think, “Well, surely a company that depends on this will make it happen” and then I hear from maintainers, “Well, why is nobody stepping forward?”

Well, you mentioned Emacs earlier, and you said that Emacs has done… Emacs has made a lot of changes in how you contribute, and as the profile of an open source contributor has changed, Emacs has been able to continue to change along, and it’s certainly not done. I don’t think Emacs will ever be done, but it clearly is moving forward.

Oh no, I think it has been 30 years already and it’s still not done. But actually both Emacs and Python, and probably almost every project we’ve talked about - there’s a pattern that keeps happening in them that is really important, that I think answers some of these questions, which is that they all grow some form of extensibility mechanism; plugin systems, add-ons… In Emacs’ case, Emacs Lisp is a full programming language with the ability to just have modules that are separate from Emacs, and what happens is that a lot of the interesting creative work, the places where maintainership energy would normally go end up happening in these satellite projects like an Emacs. I would say some of the most interesting stuff at Emacs is actually happening outside the Emacs tree, in the Org mode project. Org is growing by leaps and bounds, it’s got a lot of happy users, it’s got its own conference, I think… It’s incredible. That’s all happening in Emacs Lisp, the programming extension language that is used to program Emacs, but it’s not part of the Emacs project officially.

Similarly with Python - tremendous stuff has been happening for years in the scientific Python community… You know, the sort of big data and mathematics Python libraries communities, but do you consider that stuff to be part of Python, or are they separate projects?

I think what happens is as soon as you grow an extensibility mechanism, the energy moves out to these things that from the point of view of the central project or satellites, but are actually core things to their own communities.

Right. I think that one of the problems that you do run into though is that now that the energy has moved into this ecosystem and there’s a lot of smaller projects that are not centralized in this place, that new community that’s building around that has a very different set of expectations about what it’s like to contribute and what the barriers to entry might be and how easy that might be, and if the core that the ecosystem is built on remains really difficult to work on and doesn’t adjust any of its governance policies, then all of that energy that’s happening in the ecosystem may be happening on top of a project that’s not sustainable and can’t continue to move forward in the best interest of its users necessarily.

[48:11] Well, now we’re addressing a different question. If the idea is that the original core project is difficult to contribute to, specifically because of policies rather than just being technically difficult, that’s a different problem. But I don’t’ see that many projects actually in that situation. Maybe if you can give me some examples we could look at them, but generally these core projects, the reasons that they lose maintenance energy is just because they’ve kind of reached maturity and the core is now very large, and if you wanna make a change to it you also have to add a regression test, because you gotta make sure your change passes all the existing regression tests, because there’s such a huge legacy installed base to take care of; the core is always gonna move more slowly over time.

Going back to the economic model, where you are competing for developer attention to some extent, if there is this huge ecosystem of projects that are easier to contribute to, aren’t they going to take a lot of the resources out that could potentially be dedicated to maintenance if the policies there don’t change? And I don’t mean this in terms of the policies have gotten worse over time, it’s just that the expectations of people have changed.

If there’s a core group of people that are really comfortable with the contribution policy and the rest of the world and even their own ecosystem moves on to policies that make it much easier, that core isn’t all that incentivized to change for the people that are already there. It’s really an opportunity cost that they’re missing out on.

Right. I think it’s sort of that uncaptured thing that we’re not seeing.

Well, but it sounds like we’re saying it is getting captured, just by someone else. Maybe people who would be fixing core Python are writing SciPy stuff instead. I mean, are we talking about a zero-sum game or a positive-sum game, I guess is my question?

At some point maybe it’s zero-sum, because we only have 24 hours in a day. I mean, you could work on multiple projects, but at some point you’ll only have so much time.

I guess the reason you’re hearing me resist the thesis is I don’t know of any good way to evaluate the question of whether a project is getting as much resources, as much developer time as it ‘should’, because I don’t know what the ‘should’…

Well, this is the hard thing about any sort of software infrastructure, the tension between do you just build something new or move on to the next thing when the old one has run its course, or do you try to reinvest back into older projects? And I get that software or anything digital will always move a lot faster than anything physical, but a part of me wonders whether we just accept that norm of “Oh, you know, we just move on to the next project” because there are no resources available for people to improve the existing ones?

Yeah, I don’t know how to make the argument that the places people are allocating the resources now are the wrong ones, and that they should be allocating them in some other way instead. Because whenever I look closely at how someone is allocating their time and attention, I can see the reasons why they chose to do that, and I don’t see a convincing way to say to them, “Oh, you should be doing this other thing instead.”

I don’t either, and I think that’s part of the problem.

I never dreamed that I would be making a market fundamentalist argument; I’m the last person to do that, but I guess I kind of am. [laughter] Yeah. Next thing you know, I’ll be voting for Rand Paul.

[51:57] Back to your point that if it gets bad enough there will be a fork - I think that’s sort of like escape hatch, like the reason that “the market is gonna figure this out” is that if it gets bad enough there will be a fork.

Oh, yeah.

And certainly that has happened and could happen, but the problem with that being the main approach that we have, or the only recourse that we have, is that there is a fair amount of time that it takes for the situation to get bad enough that there’s a fork, and then it takes time for that fork to ever take over or get to the point where it will be merged back. So that time that you lose in that tension getting worse and worse and worse - during that time we could just move on to something else. And not because it was necessarily good for that community or good for that project, or that the technology had run its course, it was just that we had this particular artificial barrier that created this tension, and that meant that no work was happening for a particular amount of time.

And also, I’m not convinced that we necessarily move on to a new thing. If we move up the stack, we tend to just forget about things that are bad at the bottom end of the stack and we end up with problems like OpenSSL, right?

That’s a very good example, yeah.

Yeah, I mean we have very good methods by which we can forget about how bad a project might be run or what state it might be in in terms of sustainability by just isolating it, rather than dealing with the problem. And that’s a very problematic way to do it.

Yeah… You know, it’s funny, the argument I’ve been making reminds me of one I hate… I hate George W. Bush actually made this when he was governor or Texas. Basically, there were these prisoners on death row, and some of them were innocent, and these nonprofit, volunteer-run student law clinic things would go into all this research and prove that the person was innocent and finally get them off death row after years and years and every obstacle in the world being thrown at them. And when that happened, the governor who had not ever pardoned them or anything until the evidence was clear, would say, “See? The system works.”

Yeah, that’s you now, Karl. You’re like George W. Bush. [laughs]

Yes, so I’m sort of saying, “Look, things get really, really bad and people move heaven and earth to make the right outcome happen and like “Hey, the system worked,” because they could fork.

That’s my fundamental… For coming from nonprofit background from way back in the day, it just sort of boggles my mind that… Yes, in the nonprofit sector for example, no one expects to get rich in nonprofits, but you get a salary, so there is money flowing into the nonprofit sector in some shape or form. I just suggest that there should just be no money flowing in, or if it works because volunteers do everything, that’s fine. Sure, it might get done, but is that really the right way of doing things in the world?

Well, the question of whether something is done by volunteers seems to me to be a separate one. There are a number of projects where someone is maintaining something on the side, where that thing helps them do their day job, but it’s also kind of a personal project; they’re sort of a semi-volunteer, but not completely volunteer. But then there’s also a lot of open source that is under-resourced, but the resources it has are salaried.

Yeah, I think that’s why I’m still wrestling with the original question I’ve had coming into this space, of like, if you have money, where does it go? I still think it’s a very, very hard question to answer, and one that is extremely delicate. But I know that the answer is that there must be more that can go in there somewhere, and I wanna be really careful about where it goes.

[55:51] I completely agree that that’s a huge priority, and I’m glad you’re focusing on it. And my answer hasn’t been terribly helpful, I think, in providing any guidance on that.

Well, I think that we’re very good at fixing crises, right? If something hits a point of crisis, then we have mechanisms by which to fix it.

OpenSSL.

Right. We can deal with heart-bleed, but we can’t deal with the situation that OpenSSL was in ten years ago when it was obvious to everybody involved in the project that something was wrong.

I wish we had - and maybe Nadia should be the executive director and you all can be the board - an open source weather center, where you’re funded to just keep a lookout, keep in touch with a lot of projects and identify whenever a certain kind of intersection happens. An intersection between a project that a lot of people depend on, and that project showing signs of burnout or under-resourcing. Then just say, “Here are the warning signs, here are some people to talk to”, or you could be the go-to people for the foundations or companies who are looking for help.

I think CII is doing that a little bit, and I think that that works for projects that have been around for about ten years, or maybe even five. I think the troubling thing is that we’re coming into contact with this problem and recognizing this problem at the same time that we’re also recognizing that all projects are getting more distributed, that they’re becoming collections of all these little tiny things with these incredibly complicated dependency chains. So the idea that you could have a centralized weather center that then talked to centralized projects or looked at centralized projects to try and figure out what the state of the ecosystem was, it’s getting less and less practical as we move into that future.

Or maybe it’s inherently built into the nature of the solution that you’re envisioning, because if you’re going to allocate money somewhere, that’s an inherently centralizing thing to do. The money’s gotta land in some bank account somewhere, so the trick is just to identify of all these myriad moving parts and intricate interdependencies which are the key things - like OpenSSL or one of the JavaScript libraries that everyone depends on - which are the ones that are really going to be in pain and that everyone’s gonna feel that pain, where we can just see it coming five years ahead of time.

Some of it also structural, right? It’s like, “Should we build this way?” or “Are there bigger ways of thinking about reorganizing entire system that that work should be happening and it’s not?” And also people are thinking about this around like DevOps for example… I’m trying to figure out, shouldn’t we have an XYZ set of projects that together can just make a better system, versus cobbling together from all these different things right now? That’s sort of like bigger work than any one project.

Yeah. Actually one of the reasons - I’m gonna try to squeeze in this observation because I know we’re running out of time - I’m focusing a lot of my work and our company’s work on helping governments get more involved in open source is that despite their reputation and despite what we see in the current US presidential campaign, governments are in some ways really good at focusing on long-term questions, and especially in the US and in similar systems it’s partly because there’s a civil service that has such good career and pension guarantees that people stay in their jobs for 30 or 40 years. Now, in the tech world we think that’s horrible, and we’re like “How can somebody possibly still stay skilled and relevant in tech for even 10 or 15 years, let alone 20 or 30?”, but from a long-term open source project sustainability perspective the more government dependency we have on open source, the more government engagement and funding of open source we have, the more there is a force for long-term trend observation and solving of problems in open source.

[59:55] One of the problems that we have is that open source is rooted in a personnel sense in the tech industry right now, and that’s people who switch jobs every three years, and that’s considered long. So the actual individuals involved, their priorities keep changing because their jobs keep changing, because it’s such a fertile field for new things happening, and there are not that many institutions that have long-term personnel involved in open source, and I think that we see that causing problems in open source as a whole.

Yeah, that’s a fantastic observation about software in general. It came from a very fast, very high growth, very capital flush sector, so that changes how we think about it. But then if you look at it in economic terms, open source doesn’t actually fit into that at all - it’s much more like a public good - and where is the institution that supports that? Government tends to do that for all other aspects of life, but it doesn’t do it here. And it’s not gonna be as easy as just being like, “Oh, now we have an agency that deals with open sourcing government”, because that would also be weird, in some ways probably awful. But trying to figure out how you get those longer term thinking institutions to care about something that is a longer term question within software that often gets overlooked, I think that’s the challenge.

Yeah. Like, what is the actual level of state dependence on the Debian project, versus the level of state funding that’s going into the Debian project? There’s probably a huge imbalance there.

Yeah.

Is part of the struggle getting government into open source? This differential, that they’re thinking more longer-term and the communities that are trying to engage around open source are a little bit too short-term and distributed?

I don’t think that’s what’s preventing them from getting involved. I think it’s partly that the actual personnel (the people in IT and government) historically they’re not coming from a background that would have had them involved in open source. The managers don’t have background, and especially the elected officials at the top of these command hierarchies, their main concern is risk-aversion. They don’t want to do anything that could embarrass them or give their opponents something to work with, and open source has just more exposure.

If you launch a technology project and it fails and only your department ever knows about it, that’s okay. But if you launch it on GitHub and then it fails, now some journalist can write a report about that, and you can end up in the next weekly news and then your opponent can hold it up at the next debate. So I think it’s more just the general culture of government is incompatible in some ways with open source.

Right. There’s a lot of churn on GitHub. It’s sort of baked into the system that if you do things as much, many of them are gonna fail.

Yeah. This was the exact argument… We actually saw this debate play out with Solyndra. The US government gave some form of loan guarantee, I don’t know the exact structure, but some kind of subsidy essentially, to a bunch of solar power and other clean energy companies - Solyndra was one of them - and in fact, the government turned out to have been a pretty good VC. Its successful investment ratio was not bad for those solar investments, but Solyndra was a pretty big fail in that set, so the administration got hugely slammed for a portfolio that any VC would have been happy to have. This just shows you how different the incentives are in government.

Man, I think that we have to leave it there, but I anticipate probably having you come back to talk just about this, government and open source.

Alright, I’d love to do a podcast on that. And you guys, I really feel bad for talking for so long, because when I let you speak you had such really interesting things to say, and good prompting questions. I love these conversations, so I’m happy to do it anytime.

Yes, that was fantastic. Thank you, guys.

Bye.

Thank you, I’ll talk to you later.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art