In this episode, we’re joined by tech Lawyer Luis Villa to explore the question, who owns code? The company, the engineer, the team? What about when you’re using AI, Machine learning, GitHub Copilot… is that still your code?
Featuring
Sponsors
Square – Develop on the platform that sellers trust. There is a massive opportunity for developers to support Square sellers by building apps for today’s business needs. Learn more at changelog.com/square to dive into the docs, APIs, SDKs and to create your Square Developer account — tell them Changelog sent you.
FireHydrant – The reliability platform for every developer. Incidents impact everyone, not just SREs. FireHydrant gives teams the tools to maintain service catalogs, respond to incidents, communicate through status pages, and learn with retrospectives. Small teams up to 10 people can get started for free with all FireHydrant features included. No credit card required to sign up. Learn more at firehydrant.com/
Honeycomb – Guess less, know more. When production is running slow, it’s hard to know where problems originate: is it your application code, users, or the underlying systems? With Honeycomb you get a fast, unified, and clear understanding of the one thing driving your business: production. Join the swarm and try Honeycomb free today at honeycomb.io/changelog
Notes & Links
- Tidelift
- Luis’s new newsletter!
- NYTimes review of Dennis Duncan’s “Index, A History of the”
- The Berne Convention for the Protection of Literary and Artistic Works
- Google LLC v. Oracle America, Inc.
- Why Andy Warhol’s ‘Prince Series,’ the Subject of a Long-Term Copyright Dispute, Should Be Considered Fair Use After All
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | Opener | 00:32 |
2 | 00:32 | Sponsor: Square | 00:54 |
3 | 01:26 | Intro | 00:43 |
4 | 02:09 | Welcoming Luis | 01:51 |
5 | 04:00 | Getting to know Luis | 02:59 |
6 | 06:58 | Defining ownership in software | 01:47 |
7 | 08:45 | Can anyone own the idea of an API? | 03:28 |
8 | 12:14 | The core of copyright | 02:28 |
9 | 14:41 | How Oracle vs Google ended | 02:30 |
10 | 17:22 | Sponsor: FireHydrant | 01:18 |
11 | 18:56 | What does code ownership REALLY mean? | 02:08 |
12 | 21:04 | The importance of creativity under US Law | 01:22 |
13 | 22:26 | Copyright and the doctrine of "fair use" | 03:53 |
14 | 26:19 | Who's liable if copywritten code causes damage? | 02:22 |
15 | 28:41 | Who's liable if AI causes damage? | 03:57 |
16 | 32:39 | Do code maintainers have liability? | 04:09 |
17 | 36:48 | Tidelift's approach to the problem | 01:29 |
18 | 38:17 | Is open source sustainable? | 02:14 |
19 | 40:31 | The actual purpose of copyright | 01:59 |
20 | 42:29 | On legislators being tech-challenged | 00:45 |
21 | 43:15 | Scary contractor contracts | 01:48 |
22 | 45:14 | Sponsor: Honeycomb | 01:25 |
23 | 46:53 | On legal systems and copyright challenges | 02:31 |
24 | 49:25 | Specific legal considerations for Go? | 02:55 |
25 | 52:19 | What happens when open source rules the world | 04:15 |
26 | 56:34 | Luis's new Open ML newsletter | 00:51 |
27 | 57:25 | Open source as proof of humanity's goodness | 01:29 |
28 | 58:54 | It's time for Unpopular Opinions! | 00:33 |
29 | 59:27 | Luis's unpop | 01:46 |
30 | 1:01:12 | Kris's unpop | 03:07 |
31 | 1:04:20 | Closing time! | 00:20 |
32 | 1:04:46 | Outro | 01:12 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Hello, and welcome to Go Time! Today we are going to be talking about who owns your code. A question that certainly has been on my mind. So we’re going to be exploring who owns the code - the company? Is it the engineer? Is it the team? Is it all of the open source contributors if it’s a project? How about when you’re using AI, machine learning, GitHub Copilot? Is it still your code? I’m really excited, we have a really brilliant guest with us today. We have Luis Villa, who has a programmer turned attorney who has been involved in open source since college. He’s worked at Mozilla, where he revised the Mozilla Public License at Wikimedia Foundation, where he also briefly led the community team… And as a lawyer, he’s worked with Google, Amazon and many other small startups. So currently, he’s the co-founder at Tidelift, which works to make open source better for everyone by paying the maintainers. We’ll hear more about that. But before that, I’d like to introduce our co-hosts. We have Kris. Hi, Kris. I haven’t seen you in a hot second…
Hello. I am back, after a very, very long, but much-needed break. So I’m feeling rested, and I’m ready to get into the Meta of this “Who owns code?” It’s gonna be fun.
You’re ready, I’m ready… It’s a very interesting topic. And the beautiful Natalie - I’ve seen you, I think, far too often for your own liking in the recent weeks… My wonderful co-host…
It’s like our weekly one-on-one, but it’s not one-on-one.
A weekly one-on-one with anyone who decides to tune into the live… [laughs]
Weekly anyone. I like that.
Weekly anyone. Beautiful. Me and Natalie don’t do one-on-ones, we do anyones. [laughs]
That sounds terrible… [laughter]
Oh, God… Luis, I would love to hear a little bit more about you and your thoughts on code ownership.
Well, it’s been a long time since I wrote anything approaching useful code… But I have been involved – I had an interviewer ask me while I was interviewing for my first law school job out of law school, law firm job out of law school, and they said, “Well, you seem to really like tech. Why did you leave tech?” and I said, “Look, I’m not leaving tech. I am only interviewing with law firms that are very much tech-forward, tech-first kind of law firms.” So the goal was never to leave tech. The goal was – I was at a startup, open source… Back in the first year of Linux desktop, I was at a small startup, we got acquired; during that acquisition process I worked with the attorneys, and I was arrogant, I was young, I was “Oh, I can do a better job than these people.” So I decided after a little bit of experimenting, and I took like a night school law class that I enjoyed, a couple of night school law classes that I enjoyed, and so I decided to go to law school. But the goal was always, all along, to continue to focus on tech, and specifically very much to focus on open law, because there seemed to me at the time to be a body of lawyers who were sophisticated about technology, but they came at it very much from a patent-first, control-first kind of mindset. And that was something that was already starting to break down at the time… So I was in law school 2006-2009, and it was beginning to be an understanding amongst legal academics that open was a thing.
I had attended a conference of legal academics where Creative Commons was announced; that was 2001. So there were some legal academics who got it. And in fact, I pretty only applied to law schools that had at least one faculty member who had written something that indicated that they got it in the slightest amount. So that meant applications were easier, because there wasn’t that many schools to apply to. And I think that has worked out well. It’s been a good career, it’s been a fun career, because I very much – the point was not that “Oh, I can make my piles of money and work 3,000 hours a year, or whatever.” It was, “I have friends who are open lawyers, who deserve better lawyering than the lawyering that they were getting at the time.”
[06:19] I think that’s been that sense of, “Hey, I’m doing this to help open people get better lawyering” has served me well as a sort of motto and mission, and has led to a lot of fun outcomes… Because open people are doing a lot of fun projects, have been doing a lot of fun projects, and that hasn’t changed, and I certainly don’t think it’s going to change anytime soon.
But a lot of it does come back to this question of “Who owns the thing?” And I admit, I have enough lawyer brainworms at this point that my immediate thought goes to “Okay, well, the contract of the–” and then one of you during the top prep said, “Well, what about like team ownership?” and I was “Oh, right.” We talk in law school about this analogy that ownership is – for reasons I don’t remember, or maybe never knew, we talk about ownership being a bundle of sticks. And the idea is that we sort of – we talk about it as if it’s like one big trunk. But it’s actually a lot of different small things. And ownership in the code world is very fragmented, because there’s this sense of well, okay, for almost all of you, if you’re working for a company, at the end of the day your company owns the code that you are writing, at least on the company time, unless you’re very careful to not do it on company time, not do it on company hardware, and to do it in areas that are unrelated to what the company is working on. If you’re doing those things, then you keep it, but otherwise, as a general rule of thumb, the company owns it.
So that’s like the sort of lawyer brain answer, like “Well, yeah, okay, we’re done here. It’s been a nice podcast. Glad to talk to you all.” But there’s very much of, course, all these fractal little senses of “Well, okay, but what does it mean when the team – team versus individual ownership?” Because in a lawyerly sense, the company is the one who can sell it; but who has responsibility for it within the company? Like, that’s not a legal question. That’s a team norms, team behaviors, kind of question.
And there’s also these questions of what exactly is it that you own? Because I spent several years of my career – I am what we call a transactional lawyer. Basically, I do contracts. If the contract goes wrong, for some reason, that is somebody else’s problem to argue about it in court. And so I’ve only ever been to court for work once, which was a little case called Oracle v. Google, and you might have heard of that one… And the question at some level was about who owns, or can anyone own the idea of an API? And that’s probably not something you’re thinking about too much in your day to day, right? And your corporate lawyers probably aren’t most of the time, either. They’re not thinking about who owns the API. They’re just thinking about this file, or this binary that we’re distributing, or these days, often, this SaaS that we’re putting out over the internet. Your customers never actually see code, except for whatever’s JavaScripted, or WASMed, or whatever. And of course, that’s a whole other thing. Anyway, it’s just fractal, and we could talk about it for way more than an hour, but sort of my 10,000-foot overview is that there’s both this ownership in the legal sense, but very much also in the code and culture sense, and we can talk about any or all of those, including to some extent how Go is different. its packaging system is one of those things that occasionally makes lawyers tear their hairs out a little bit, because it’s not something – so many of our lawyers… Not our lawyers. Well, our lawyers too, but our licenses predate Go; they predate some modern language distribution practices, and sometimes that shows up. We wrote technology-specific things for like C or C++ into our licenses, and then somebody says, “Well, how does that work in the case of Go?” and the answer is, “I have no idea.”
[10:10] I guess those of you listening to this as a podcast can’t see the face I just made… So just assume, like, perplexed; search Giffy for your favorite perplexed GIF, and that was me just now. So yeah, so where do you all want to start with that? [laughs]
I also have a kind of meta view of some of this, because I think similar to the line of thought you were going down, where it’s like, okay, there’s the code that you’ve typed, the thing that you’ve written, but then there’s the knowledge of writing it and the idea of the thing, and that’s where the API question is very interesting… Because it’s like, “Oh, well, someone else–” Like, if you retype the same code, is that the same thing? Or what if you type it slightly differently, but it’s conceptually the same thing? How far do you have to get away from – how different does something have to be before it’s “Okay, now someone else can own this thing.”
I remember over the years talking to lawyers about all the non-competes and things that we tend to have, and one of the things that lawyers consistently told me, at least in New York, is that your employer has no right to the knowledge that you gain. So they can have ownership over the code that you write and the things that you produce, but they’re not allowed to say, “Well, we gave you this knowledge, so you can’t take it over there and use it in one of our competitors.” I’ve always found that very interesting as well, because it’s like “Oh, well, this is like another aspect of things.” It’s like, as an API knowledge, or is it, the code that you’ve written? So there’s this really meta aspect, for me at least, to the whole idea of ownership.
Oh, yeah. Absolutely. And by the way, you specifically called out New York, and Natalie said in the pre-chat that she wanted to ask about the EU… We as open source programmers, and open source developers of various sorts, tend to make an assumption that we can write a license that applies across the entire world. And in most law, that’s like a completely laughable idea. It’s somewhere between laughable and actively considered harmful. And so we’re sort of lucky, in some ways, that the core concept of copyright, which is what applies to that actual written thing, distinct from the ideas, is actually a global standard. A treaty called the Berne Convention; that’s 1908, maybe, or 1903.
So copyright has been standardized across the entire world for 100 years, which makes it a good platform for lawyers to build a global system on. If you’re talking about databases, no global platform, no global legal platform… So writing a database license is responsible for a lot of these gray hairs; again, sorry, podcast listeners. You should watch it live next time. And similarly, with AI, we don’t know how some of this is going to play out. We can offer guesses, and frankly, very interesting guesses. This is a very fun theoretical game, but we don’t know how it’s gonna play out in court.
And by the way, Kris, I think the other meta thing that’s really interesting for programmers… I often like to remind developers that writing a contract is like writing code and then not executing it, at all. And you’re just sort of reading it, and like, we all agree more or less on what the output should be, but until it’s actually been executed, which happens through a court… Right? A court says, “This is what the thing is”, then we don’t actually know what the outcome is.
So everything that I say here is going to be based on like a handful of things. Part of the challenge in Oracle v. Google is that we’d all been operating under these assumptions about what copyright law was, but there had been no litigation over whether or not an API could be copyrightable since the ‘80s. So the closest analogy we had was Lotus 123 dropdown menus. And it turns out things have changed a lot since then. But because it hadn’t been executed by a court, we really didn’t know how this was going to turn out. And so that makes predictions – this is why lawyers’ favorite phrase is “It depends.”
[14:11] There’s one thing I want to stick a pin in before we move on, Angelica, and it’s that idea of AI… And I also want to call out code generation, and I really want to talk about that later, because I think that’s also like a very interesting thing. The first thing you said was “Oh, who’s typing the code at the end of the day? That’s how copyright is generated.” So I just want to make sure we circle back to that later.
Absolutely.
I will say, for those of us who weren’t following that Google law case step by step every step of the way, was it had been litigated. What was the conclusion?
Well, so to take a step real quick back just to the very beginning, Oracle – well, Sun had created Java, and originally the Apache Project sort of funded by IBM had reimplemented Java. Complete clean room, very strict, very effective clean room, as best as we could tell from the pieces afterwards. Literally, just a handful of lines probably, out of several hundred thousand in that reimplementation that ended up looking like they were actually perhaps copied and pasted, rather than a true cleanroom reimplementation. And so Google used that Apache reimplementation in their Android phones.
Ultimately, what happened after literally a decade of litigation - I think I was on my fifth job by the time the case ended… Like, from when the case – actually, six jobs from when the case started to when the case ended. The case started with essentially Oracle claiming there’s – some other stuff I’m going to leave out for simplicity, but Oracle claimed that copying and reimplementing just the API headers was a copyright violation, and that therefore, all of Android should have to be licensed from Oracle for… They originally asked for 5 billion, and I think by the end of the case, they were asking for 9, or 15 billion, something like that.
The courts found essentially, through a series of rulings over the years in this case, that an API could be copyrightable independent of the implementation, but there was a plausible what we in the US call a fair use argument, that essentially, if you reimplement in a way that’s particularly transformative, like you’re doing something that is really different than what the original authors or copyright owners of the API intended to do with the API, then you have an argument that it’s okay to reuse that API in that way, through reimplementation.
Lawyers like to say that Fair Use is simply the right to get sued… It’s ambiguous; it’s one of these things where, again, you can’t know ahead of time what the outcome is going to be, and that, of course, makes it a playground mostly for large companies, unfortunately. So I think in some ways, that was not a great outcome. It was a better outcome for open source than what could have been, than what Oracle wanted to have… But it wasn’t an ideal outcome.
I wanna ask one step back, and when we talk about code ownership, what exactly does it mean I own the code? …whether I am an individual, a company or anything - does it mean I’m allowed to make money off it? Does it mean I can print it and hang it at home? Does it mean something else?
Well, I’m gonna give you my lawyer answer to that. Those of you whose GitHub accounts do things other than commit to other licenses - which is pretty much all I do these days with my GitHub account. We’ll have better notions of code ownership as a cultural practice among programmers, like who’s responsible… I do want to talk a little bit about that one, but let me put a pin in that and come back to it.
The basic system since at least the ‘60s in the US - I’m not sure exactly the timeline in the EU, but I would imagine similar - is that… Well, actually, let me go back even further. Copyright is intended to protect creative works. So what do you have to do to get copyright in a thing? And I’ll explain what copyright is in a second, but let me start with what you have to do. And what you have to do is you have to write down something that’s creative. “Write down” can be broad, right? It can be sculpting, or – but you have to take it out of your head and put it out into the real world in some way. That can be typing it in a computer, it can be, like I said, sculpting it into a sculpture; sculptures can get copyright. It can be a work of art, so it can be an oil painting, or whatever; it can be a Vim poster… Honestly, these days my development environment is Word, but I used to be an Emacs guy…
So that is the key thing, is you are doing a creative thing, and it can be mediated by tools. And Kris, this gets to your point about the AI and where is copywriting there… It can be mediated by a typewriter, or a paintbrush, or I believe - that we don’t really know for certain yet - it can be mediated by an AI. But you are doing some creative something, and turning that into a fixed thing.
Alright, so what happens once you’ve done that? Actually, before I get into what happens once you’ve done that, because I think there’s an important exception… In the US at least, that creative – what does it mean to be creative is not zero. It’s pretty close to zero, but it’s not zero. There’s an important case called Feist vs. Rural Telephone, and the whole thing in that case is literally, telephone books aren’t creative, and so therefore they don’t get copyright. Because what’s the point of a telephone book? The point of a telephone book is to literally just mechanically go through a town and have phone numbers for everybody. So it’s hard work, but it’s not creative. And in the US, at least, you have to have some kind of creative something.
So if you do like a phone list of the 100 most awesome people in New York City - that’s creative; you had to select – one of the ways which you can be creative under US Copyright law is selection. If you pick those 100 people, then hey, you’ve done something creative, and your list of 100 people is copyrightable. But if you’re just “Every single person who lives in Manhattan”, that’s not creative; you don’t get protection. And that plays into questions of databases, and ultimately, I think - and we might not have time to get to this today, but the question of the models themselves. Because there’s both the output of models, what’s the copyright on that, and the models themselves. We don’t actually know if they’re copyrightable. That may be too esoteric; you might have to invite me back for another one for that.
[22:25] But okay, so you’ve created this thing… So now what do you do? So now you’ve got copyright. What does copyright let you do? Copyright lets you control what others can do with it. It lets you decide who gets to use it, who gets redistribute it, who gets to modify it, within the certain limits. But it’s pretty strong.
So the limits include what’s called First Sale doctrine, which is, “Hey, I sold it to somebody. They can usually sell it to one other person.” First Sale doctrine made a lot more sense in the era of like books. That’s what creates used books stores, is First Sale doctrine; it means that I bought the copyrighted thing, and now I can give it to a used books store and they can resell it. In the digital age, First Sale doctrine is a little more complicated… But suffice to say, that’s one of the limitations.
Similarly, fair use says, “Hey, if you’re using this for education, if you’re using this for nonprofit purposes…” I’m oversimplifying a little bit here; the tests around fair use can be a little complicated. Critically, in our digital age, fair use in the US has expanded quite a bit to include what’s called transformative use, which is to say, “Hey, you’re doing something super-new, super-different.” Courts are often going to allow that in the name of sort of not impeding progress.
So for example, Google Book Search is in some sense the biggest copyright violation in all of history, because it’s literally copied systematically millions of books, made these digital copies. But then a court said, “Well, but actually, it’s so different. It’s so great.” And they put strict controls around, you know, you can only get a few pages at a time, and authors can opt out if they want, after the copying has been done… So like Google Book Search is a good example of what transformation means, and potentially analogous to what Copilot is doing. But we don’t know.
The flipside of this is that we just had court cases – we had a court case a couple of years ago about the song Blurred Lines, some of you might have heard… And courts there actually said that even just sort of copying the style of the artist could potentially be a copyright infringement, which was a big surprise to a lot of lawyers. A lot of lawyers are still unhappy about that case.
Next week there’s going to be – or no, tomorrow morning, actually, maybe… There’s going to be a case about Andy Warhol doing – a photograph of Prince that Andy Warhol transformed into one of his Andy Warhol canvases. And the Supreme Court – it’s a little weird, but I think that case might actually have a lot of impact on artificial intelligence… Because we’ve all done, we’ve all played with Stable Diffusion, or Midjourney, or OpenAI, or whatever, to create foo in the style of bar. Well, if bar is still alive, and still has a valid copyright, maybe that’s a problem. We don’t really know yet.
I saw a research paper yesterday that said, “If you prompt Copilot to do code in the style of –” I’m forgetting the guy’s name. Petrov, I think… A top Python programmer - that you actually get fewer vulnerabilities in your code if you prompt Copilot with the name of a top maintainer. And the flipside, the paper’s author was honest enough to note that they prompted with their own name, and the number of vulnerabilities went up. I thought that was nice and humble of them.
So style is an issue that could potentially come up in code as well. That was a very long-winded answer to your question, Natalie. I apologize.
No, it was interesting. So you said that for code ownership it basically means who is allowed to sell and profit off that, who is allowed to give it their own personal interpretation…
[26:16] Yeah.
It’s also who is who’s there to answer in case of a problem, right? I wrote a piece of code that made my work lose a lot of money. Is the ownership on me?
Yeah, so that’s where it gets complicated… And we have really good answers for that in the case of things like – if you manufactured a car wheel, and the car wheel explodes because you used bad materials in that case, in the car wheel, then we have some well-developed laws and intuitions around “Okay, well, we sue the car wheel manufacturer.” Software doesn’t have any of that really yet. We’ve sort of operated in this rules-free zone where everybody was like “Software is so cool! I guess we’ll just let it happen.” And I think that age is coming to an end, to be perfectly honest, actually.
I think the European Union has published in the past year - including one last week, two weeks ago - papers on liability for software. The idea of - if a car wheel explodes and causes a car accident, we have a very clear idea of how we should figure out who is liable for that. If an AI goes wrong, or software goes wrong, and the car goes off the road and causes the exact same accident, we actually have very little idea how we should apportion liability. And it’s not necessarily about – for practical reasons, those kinds of things tend to look the same… Because at the end of the day, the company that commissioned the code is also the company that sold the product. So you tend to see those things tied together. But there’s no formal reason for that, right? Copyright law doesn’t – especially because copyright law historically was about things that didn’t cause car crashes. Historically, copyright law – like, literally, modern copyright law in the US is in large part because of player pianos; like scrolling wheels… Like what you saw in Westworld, like scrolling wheels with little punch holes that cause the piano to do things… Pianos didn’t run off and kill people, so copyright law doesn’t really have much to say inherently about product liability. And that’s something that we are, I think, screaming towards at very high velocity, at least in the EU, and I suspect because of AI in the US soon as well.
I feel like there’s an interesting component to that as well, because when you think about what we create, we’re just creating words on a page. The manufacturing process of turning that into something that does something is not necessarily something that the person who wrote the code does. So it’s like something that somebody else does. And then there’s all of the “Well, the machine you run it on, like if a bug does happen with like a car that’s driving - is it the fault of the person who wrote the code? Is it the fault of the machine? Is there a problem with the machine?” Who gets the blame? And I think that it gets extremely murky, because we’re dealing with such new stuff, that we have never had in the existence of humanity.
Yeah. The original history of this in the English and US law systems was that literally, to get to that point where “Hey, the car wheel explodes. We should sue the car manufacturer” involved a lot of people dying in train accidents, and the train companies being like, “Oh, but that’s not our fault. We just laid the tracks, bought the train, bought the coal… It’s the guy who was driving. It’s their fault. So you can’t sue us.” And that actually, as a matter of law, was like a good argument for decades. And then the number of accidents as trains became ever more present as part of our economy, as part of simply how things moved around - well, the technological change drove a change in understanding. Because that rule was originally from “Hey, some dude on a horse. It’s not my fault.” If he’s my squire, or whatever, pick your ancient British legal term - if they’re out on my horse, like genuinely, it’s sort of not my problem if they caused an accident. I mean, yeah, I own the horse, but…
[30:29] And so the train companies for a long time were like “Well, look, it’s just like a horse. This train is just like a horse. I can’t be responsible.” So at some point, the legal system was like, “Actually, this is ludicrous.” And so a combination of courts and Congress changed the rules to make the train companies more liable. No surprise, the trains then started getting safer.
Yeah, Angelica, I’ve actually got a British person on this podcast, I believe… So why am I not asking you the proper terms here?
Yeah. footman’s fault.
Exactly. It’s the footman’s fault. So I think what we’re seeing is both an exciting and a terrifying time for lawyers; we are in the midst of one of these very rare technological changes. I think AI in particular is going to be that new train. Nobody really understands, nobody wants to take responsibility, for - Kris, as you say - super-good reasons. This stuff is literally the most complicated systems ever built by humankind. We even in the best case have only the vaguest sense of how it works… And good luck explaining it to a judge.
I had a conversation with some lawyer friends last week that was like “How would you explain…?” And these are fairly sophisticated – most either are programmers, or one of them is married to a programmer… And we’ve been doing tech law all for cumulatively many decades. How would you explain machine learning to a judge? And we all just sort of stuttered in horror at that thought… Because it’s a really – again, even the programmers who haven’t thought about it, it can be really hard, unintuitive; the vocabulary is changing all the time, the technology is changing all the time… And to try to explain it to Congress, or to a judge, is a scary proposition.
It’s an exciting – whoever gets to do it first, that’s going to be a super-great lawyer job for somebody, but also - boy, when you screw it up… And we felt a lot of pressure in the Google Oracle trial, that this was something that if we got it wrong, it would really hurt open source… And I suspect every good lawyer, of course, cares about their client, but some clients are just – represent one client. And other clients represent these big systemic changes. And you feel that weight as a lawyer.
I feel like there’s the other side of the problem as well, because if you try and assign blame to the person who owns the copyright of code, there is a huge amount of code that we all depend on all the time that’s maintained by some random dude in Nebraska… Like, individual people. And it’s like, well, if you can sue to get to them, because something they wrote caused some problems somewhere down the chain, then that’s obviously a problem… So I can kind of see the chain where it’s just like, “Well, who actually gets the blame for the bug that was written, or the problem that happened with the code?” because you can easily just like keep tracing that back further and further and further, by passing on the plane.
So it’s like, once again, with the train, it’s like, “Well, I didn’t create that wheel. Someone else created that wheel, so that’s their problem.” Or with the Spectre/Meltdown hardware problems, where it’s like, “Oh, well it’s not my fault that there was a breach. The processor shouldn’t have been speculatively executing.” There’s so many weird arguments that you have because of this stuff that we don’t really understand what it is right now.
Yeah. And I think applying the old models is probably going to get us some very bad outcomes… And unfortunately, the way the legal system learns sometimes is by having bad outcomes. Everybody stubs their toe on it, and then you sort of fix that up as we go. But some people end up being caught in the middle. That guy in Nebraska - I assume all the listeners here have seen the XKCD comic about the guy in Nebraska. The problem is, of course, it’s not even just one guy in Nebraska, it is a tower of 10,000 guys in Nebraska.
[34:10] And so I do want to talk a little bit about the day job here, because I co-founded a company called Tidelift. And Tidelift’s mission, as we said at the top of the show, is to make open source better for everyone, in part by paying the maintainers. Because what we’re seeing happen all the time - we saw it happen a couple times this week just in the JavaScript community - is the solution to this kind of problem that you’ve identified, Kris, so far is “We’ll just start applying standards.” So we’ve got the Open SSF standard security scorecard, we’ve got salsa.dev, which is a different kind of security scorecard… GitHub caused some controversy by saying “Hey, we’ve identified the most popular - I think only npm for right now - projects, and we’re sending you all a free two-factor authentication key. And also, we’re mandatorily turning on two-factor authentication for everybody.” And a couple of maintainers, for various reasons, were just like, “That’s too much work. That’s gonna complicate my life, it’s gonna break my build scripts…”
And we can go back and forth about like whether or not two-factor authentication in some of these cases is a good idea… But I want to step out – I mean, I generally think two-factor authentication is a good idea, don’t get me wrong. But that’s the easy case. It just gets harder from there, right? Like, “Okay, what do we need to do to sign our binaries?” In Go, I understand that signing modules is mostly a solved problem, and a lot of other language ecosystems it’s not. So okay, so there’s extra work, and we’ve just created this extra work, Kris, as you say, on some guy in Nebraska, or actually a stack of 10,000-some guys… And I apologize to listeners, it’s probably grating for me to say 10,000, guys, but I think it’s worth both admitting that this is a problem, and I think saying that part of the problem of the gendering of open source is very much that in a world where women are often called on to take on more than their share of household duties, and home care, child care, elder care, if we’re not paying people to do open source - well, guess what, that’s part of how we get guys doing it… Because guys, for various cultural reasons, have more free time. That’s an important side note that I think is important to say.
Anyway, we’re putting all these new requirements on people, Kris, because of exactly this intuition you’ve had about – but we’re doing nothing to make open source more fun, easier… Like, all we’re doing is loading more work on top, and I think at some point – I think we’re already starting to see it in some communities, where people are gonna snap, people are gonna walk away, people are gonna say–
So what do we do? So what Tidelift does to help address this problem is we go to our customers and we say, “You’re going to get more predictable, more reliable open source if developers follow these standards. If they’re not going to follow these standards on their own, you should pay them. So if you want the stuff you use to follow those standards, write us a check; we will go out, find those developers for you - hopefully, we already have a contract with them from other customers, but… We’ll go out, find them and pay them, in order to get some of this work done.”
Now, there’s a lot of challenges around this, in part because - guess what? Nobody wants to pay for open source. It should be free. And well, guess what? If you’re liable, all of a sudden maybe it’s not free. And I think one of the interesting things that we’re gonna see in discussion about these EU regulations, for example - they contain exceptions for open source. The definition of open source sort of looks like it excludes commercially-sponsored open source, which as we know, is a lot open source these days. We don’t know how that’s gonna play out, we don’t know what it really means. Like, their definition is vague enough that maybe it only includes a small slice of open source, or maybe it includes a lot open source; we don’t really know yet. I’m sure that’s gonna be lobbied over… And in fact, I’m going to publish something, hopefully tomorrow or Thursday, on dev.to about what open source developers can do to help lobby the US government on this topic, but a lot of the same thing’s going to apply to the EU government as well.
[38:16] Yeah, one of the thoughts I had during what you were saying is - I express this in private to some people, and I always get kind of the “You’ve just said heresy” look or comments… But I have been wondering, is open source sustainable as the method of how we do things in this industry? Is this focus on sharing so much of code actually going to be something we can continue doing in the future, since there’s so much ambiguity around all of this? And quite frankly, I think it also atrophies the whole industry, because we’re not rewriting things. We’re not reimagining things. I think that’s one of the core problems with copyright in general - there’s this whole thing that Disney has done, where it’s just like… Copyright used to be like 20 years. Now it’s almost forever; or as Disney would like to have it, actually forever. So it’s like, “Oh, well, now these things are just kind of sticking around, and it’s so much harder to move things forward.”
If I remember the kind of genesis of copyright, or the vague genesis, it’s like “We want to protect people, making so they can profit off of their creative work for some time”, but then it goes back into the general pool of things so we can kind of continue making progress forward.
Yeah… Boy, that sustainability question is a big one, Kris, and I really don’t know. I’d like to say that we have a real clear answer to it. Certainly, I think that Tidelift is part of the answer to that… But I think it’s a really good question to be asking, and that heresy - it is an elephant in the room that a lot of people… I’ve gotta say, I get a little frustrated when an employee of a trillion-dollar company is like “I don’t know… Paying people… Open source seems to work fine for me.” I’m like, “Well, yeah, because you literally work for a trillion-dollar company, but a lot of the software you rely on, and certainly that your customers rely on - they don’t have that luxury.”
It is the kind of thing – you know, you’ve got a puppy… Puppies are often more fun than replying to pull requests from automated bots. And, Kris, I think one of the interesting things - we’ve been a little backward-focused, but I think there’s a lot of cool stuff… I know we’re flying through this time, but I think there’s a lot of interesting questions about this future-looking around machine learning, Copilot, things like that… And part of those go to what you’re saying – Kris, you were saying “Well, the original motivation for copyright was to–” There were sort of three original motivations for copyright that vary depending on who you’re talking to you.
So in the US, in the Constitution, it says that the purpose of copyright is to encourage authors. So it is a very utilitarian, like, “We’re going to give you this copyright, and as a result, you’re going to create more stuff, and that’s a bargain that we’re gonna have. We’re gonna give you this monopoly”, which otherwise the founder’s super against monopolies, but they created the copyright monopoly specifically in order for the rest of us to benefit from this incentive of a whole bunch of stuff being created. So that’s one story.
In the EU, and really actually most of the rest of the world except the US, it’s more like, “Your creativity is like a part of you. It is part of your human – your human nature is to create”, and so there’s often what are called moral rights; the idea that inherently you have some control over the thing that you’ve created, even if it’s not productive, right? Even if there’s no social value to it. And we’re going to be running headlong into that with all of the foo in the style of bar. Bar is going to be really irritated that their moral rights were infringed.
And by the way, the third historic - like, the original copyright, was literally just basically a tool of censorship for the UK government in the early 1600s. It was a way for them to control printers. And I think, Kris, there’s a really interesting discussion we’re going to have over how does open source – it probably won’t be mediated through copyright, but as you were saying about liability, and security… How does government interact with this? Because it’s one thing if these big businesses are running around saying, “Hey, we should make this stuff more secure.” It’s very different if governments are running around saying “The whole world needs you to be more secure.” That’s the big thing that we still haven’t really wrapped our heads around.
[42:29] Yeah. It doesn’t help that our legislators aren’t very tech-savvy, and they tend to write a lot of laws that you’re like “This makes no sense.” Or ask questions in hearings that are questionable at best.
Yeah. There’s two parts of that. That’s the one – and again, I’ll try to be quick here. That’s the one that everybody thinks about, because we’ve seen our legislators on TV, and it’s terrifying. But there’s also this thing where – so legislators sort of provide you like a rough draft, and then the courts are used to refine that. But because litigation has gotten so expensive, and everybody hates it, we don’t do the refining of the rough drafts anymore. Like, it sort of becomes industry convention, and that sticks us with a lot of cruft, that I think is a problem. Natalie, you had a question you wanted to…?
I wanted to say that – that’s a couple of topics back, but I had a… I’m a contractor, so I have clients around the world, I see all sorts of different contracts… And one time I had a contract with a California-based company, and there was a clause that said that any damage that I caused, I am responsible of it. So back to the conversation about that Nebraska person…
I mean, contracts can say that… The good news/bad news is that you were probably what is known as not a deep pocket… Like, they’re unlikely to sue you, because what are you going to give them? Your collection of goldies would not be worth much to them.
I was really terrified. Exactly.
You probably should have negotiated that out, but that’s one of these ways in which the legal system is very imbalanced, unfortunately… And that’s a whole other rant, for a whole other show.
Especially the American one. But the thing is, some friends who work in California said that this is actually a normal clause, that they had this in the contract in the past. I think nobody working in California is on this panel, so maybe somebody listening can keep me honest here… But I’ve been just told “Yeah, don’t worry. It’s always there. Just don’t take it seriously.”
Well, there’s a whole other thing about cruft in – again, another rant for another day… How lawyers deal with cruft could learn a lot from how programmers deal with cruft, because a lot of the stuff that is – we don’t have any sense of dependencies, or module reuse or anything like that in law… And so you get stuff that literally just gets copy-pasted. Like, imagine if you copy-paste all your code all the time… We as programmers know, of course that creates errors. And we have linters, and dependencies, and we have all that kind of stuff. Lawyers have none of that, and that is a problem… Though I’m curious to see if machine learning helps us with that in the future.
I was gonna say that, like, when you were talking about how we have the laws get written, and then they don’t get tested or refined, I’m like, “That’s kind of like writing code, but without tests.” It’s just kind of like, “I don’t know, it’s just running out there, and we have no idea if it’s doing the right thing, or doing what we intended. We just wrote it.”
There’s no test frameworks, there’s no linters, there’s no – by the way, when you compile it, there will be somebody else trying to persuade the compiler to do things totally different from what you intended… It’s a very adversarial system that is not set up for robustness. Don’t get me wrong, it works reasonably well in a lot of cases, but it’s mostly because of humans – this was one thing that drove me nuts about all the smart contract stuff… Like, contracts only work because humans are around to smooth off the rough edges, bridge the gaps… Like, as soon as you start making contracts into code, you’re just doomed to failure, because of all the failure modes of code that we as programmers know, and that don’t go away when you have contracts; you just have a more forgiving, at the end of the day, execution environment… Because at the end of the day, humans are in the loop, in a way that they aren’t with smart contracts, and in the way they are with code.
I find it interesting how it also causes some class problems as well… I have legal insurance, so every time I get a contract of any sort, I send it to lawyers to review. I’m like, “Is any of this weird?” But also knowing that I can just like take a pen and just strike through anything I don’t like in a contract, when so many people think that’s “Oh no, I’ve gotten this thing. This is like set in stone, and I can’t do any – I have to take it or leave it.” And it’s like, that’s not how this really works.” So knowledge also makes it so the legal system’s kind of like a little bit more wonky than I think it would be otherwise.
Yeah. Well, and on the flipside, of course, especially in the US, Natalie, I think you’re correct to say that this is less of a problem in the EU, though definitely not unknown as a problem… You get lawyers who end up working more defensively than they might otherwise… Because they’re thinking of the person like you, Kris, and so extra effort gets puts in, extra layers get put in… And that’s not necessarily a bad thing, but it does mean – there were virtues to the days of the handshake deal. High trust environments, versus low trust environments is a real thing in law, unfortunately, for better or for worse… And it’s absolutely – there’s all kinds of class and privilege issues around that. Again, another rant, another day.
And that seems to be the case with every episode we do, Natalie. We’re gonna have to get you back for a part two, to go deeper dive. I have one more question that I want to dive into before we go to the unpopular opinion section, which is - and you alluded to this earlier… Are there specific considerations when we’re talking about Go? I know you talked about kind of package management, but what are the specific legal areas when it comes to ownership of code that are brought in specific to the Go software engineering language?
[49:54] I think there’s one – I’m getting a little over my skis here, but Go is very much a – because of the way that you all have done packaging, it has some real implications I think for… It breaks the brain of a lot of people who come out of like other language package managers… And it has some stuff that really made sense internally to Google… There’s this real question - Kris, you were alluding to it earlier, the guy in Nebraska, XKCD… My impression of it from an outside perspective is that Go wants to make some of that not a problem by like, “Oh, we’ll just grab this specific revision from this specific repo, and voila… We know exactly what this code is, and that guy in Nebraska can’t hurt us anymore. He can’t upgrade, because we know exactly what it is that we’ve got here.” But of course, that ends up being a sort of adversarial relationship with that person. It means that when that person brings knowledge to the table, brings new versions to the table, there’s this sort of assumption…
We never really got to it here, Natalie, but I think there’s this penumbra of – it’s not ownership, but it’s sort of entitlement, almost… And this is certainly not Go-specific, but this sense of like, “Well, I’m using it, and so I’m going to treat it a little bit like I have a support contract, like traditional software.” And in fact, open source has sort of encouraged this, because people – it started from this very collaborative, community-based culture, and so the norms are like, “Hey, I’m gonna help everybody who shows up in my issue tracker.” And of course, at some point you get too popular and that breaks. But we haven’t really acknowledged that that’s – I mean, literally, the word “entitlement” comes out of some of the same roots, the same Latin legal roots as ownership. When you own a thing, you are entitled to do x. And we end up with entitlements without the ownership part, without the payments part, without the labor part, and that really is what sends a lot of things sideways, I think, in open source. Again, that’s where Tidelift comes in, and I think also Go, I think, has tried to cabin some of that off with its module ownership, but sometimes where that makes technical sense, it may not always make social sense. And with more time, some other time, that would be a fun conversation to have in more depth with some of the folks who know more on the packaging side than I do, that’s for sure.
So, I guess the unpopular part is not Go-specific, but that’s definitely the responsibility lies with all of us, because we specifically decided in open source that “Well, I’m not owning this thing in like some sense”, but we very much decided in other senses that… Like, one way of putting it is we decided that use of the code translates into ownership of somebody else’s time… And that is not – when open source runs the entire world, which it really does at this point, that doesn’t scale, and we don’t know yet, Kris, to your earlier point of heresy… We don’t know how that’s gonna continue scaling. We don’t know what happens when we ask everybody to do really complicated, multifactor signatures on everything… With very few exceptions, people don’t hate this stuff. They’re software developers; like, they know security matters. They know signatures help. But they also know they’ve only got so much time in the world, and sometimes they come home from work, and they really just want to pitch their laptop into the lake. And that does not help you close out your issues if you’ve pitched your laptop in the lake. So we’re gonna have to figure that out. It might involve buying some people, some laptops, or maybe writing them a check, or maybe it’ll involve helping them with AI, which we really didn’t get to at all, but again, maybe next time, so…
I remember a conversation that happened I think among a smaller group of people within the community, but when modules were being designed and developed, one of the comments that kept coming up was “This is biasing toward the consumer instead of the provider, the maintainer. Is that a thing we really want to do in the Go community? What effect is that going to have on people’s ability to actually maintain and build open source things?” And I think we’re really starting to see some of the outcomes of that, like what Ben Johnson has been doing, where he’s just like, “I don’t want contributions. I don’t have enough bandwidth to kind of deal with some of these things.”
[54:14] I think we’re gonna see a lot more people that just are like, “Well, I can’t –” Like, if there is a bug, and now I have almost no recourse to fix it, or I can’t get people off old versions of things, that does kind of erode the ability of people to do open source really, which erodes our ability to maintain the modern world, in a sense.
Yeah, absolutely. And that’s one thing I would say to the Go folks - you’re not alone in that. Every ecosystem is struggling with that. There are different flavors caused by different technical choices and different cultural choices along the way… But the core problem - we did a Tidelift conference right before the pandemic. I can’t wait to do our next one, because very much a theme is going to be “How can people across many language ecosystems share notes, figure out what this looks?” Because it’s very much not – if you feel alone, if you’re having a bad day and you want to chuck your private keys away, and never log into GitHub again, you’re not alone in that. That is a common thing. And GitHub is going to bust their butts. I will give them credit, they’re trying to do a lot to make some of this easier for maintainers… But ultimately, Kris, as you say, big companies are going to bias towards doing the right thing for the consumer.
Microsoft has done a lot of amazing things for the open source community, which 1997 me is like aghast that I’m saying that out loud… But at the end of the day, when it comes down to push or shove, the decisions are often going to be made in favor of the consumer, and we do need to have some of those honest discussions about what that looks like… Because Natalie, to get back to some of what you’re saying, and the overall theme of the show - legal ownership is only part of the story here. Cultural ownership, responsibility, entitlement - all these things are related to, but cannot be solved just by our legal systems. And I think maybe that’s my one regret. I have very, very few regrets about going to law school. It was a lot of fun, I met a lot of great people, but people often come to me seeking legal solutions for what are ultimately cultural problems. And I can only do – the best lawyers know how to straddle that gap. And I like to think that that is certainly my biggest strength as a lawyer, especially in this space, is how to straddle that gap. But it’s not easy. Which actually, by the way, reminds me - side project, fun project, and then we’ll leave it… And I know we’re running out of time here. I am writing a newsletter called OpenML.fyi. It is new. I literally sort of launched it to some friends a couple of weeks ago, and more broadly yesterday… But it is literally about these questions about open and machine learning overlap, which includes questions “Is this the end of no warranties?” Because every open source license, as you point out, Kris, has like big, all-caps text that says “No warranties. If you break it, you buy it.” And what does that mean in light of EU regulation? We’ll be talking about a lot of this stuff, like Copilot there as well, so I’d love to have you all back, but for those of you who are curious about that topic, it’s a ghost AGPL-powered newsletter, OpenML.fyi.
We’ll add that in the show notes.
I want to end with – because I know we’ve got to get to unpopular opinions… But I just want to say, one of the things I always think about when we get into these conversations is like people tend to think like humans are like transactional, and they’re kind of mean to each other, and we want a war, and all of that… And I feel like the existence of open source and the existence of our industry as a whole proves that people are a lot of times selfless, and will sacrifice a lot just to make other people feel good; just for the happiness of other people. And I think that shows how incredible and how collaborative we are as a species. And I think more of us need to remember that, especially in the times we are now. We are not necessarily this always angry at each other, always warring, always territorial species; quite often, most of us are just like these – we just want to help our fellow people out.
It’s been too long since I worked at Wikipedia, but… I mean, here’s this thing, it’s this amazing cultural treasure, and anyone can go and graffiti on it at any time. And something like 1 in 1,000 edits are spam. Think about what that says as – to exactly your point, Kris… Actually, most of the time, most people - we all want to make this work, right? And open source, open data are very much, I think, like, genuinely amazing… That’s why I enjoy doing it. There are a lot more lucrative things probably all of us could be doing with our lives. But yeah, it’s human and humane in a great way.
Lovely not to end the episode on, we’re gonna have to get you back for a part two, for sure. But before we let you go, we’re gonna be doing a little bit of unpopular opinions.
So over to you, Luis…. What is your Go Time unpopular opinion?
Oh, boy… The one I have in the show notes is absolutely one I already nailed, which is “Hey, we should all be paying for this, right? We got it for free for a long time, and that train is running out”, for very human, decent reasons. It’s not like I think companies are bad for having used this stuff. But as Kris was saying, sometimes you raise that employee company… I will never forget - so there was this project that I was invited to come to a meeting about. I won’t be specific, but it was one of many, many, many, many, many open source metadata projects. And people went on for like about 45-50 minutes, and I was “Okay, but why are volunteers going to create all this metadata for you?” Quiet. Silence. Quiet. Silence. “Okay, but why? What’s their motivation?” “I don’t know, it’s probably just gonna happen.”
Needless to say, that project has not really gone much of anywhere. But I was treated as like a pariah, and literally not invited to future meetings for a while, because I had dared to ask this question of “Why would people do this?” Unfortunately, I still get that all too often. I think, to be fair, lots of people are getting the message, finally, but it’s taken longer than it should have. That’s my, sadly, unpopular opinion today.
It’s like that meme about that guy that’s being thrown outside of the window… [laughter]
Wait, which – thrown outside of the window? Now I’ve gotta google this…
Where it’s like, they’re all in the meeting…
It’s like this comic strip that –
Oh, yeah. Yeah. Yup… I’ve been that guy.
Intrigued to see how unpopular or in fact popular your opinion is. And then I want to ask you, Kris, for an unpopular opinion, given that we’re just getting you back… I’m sure you have something on your mind; you always do.
I have so many unpopular opinions… I don’t think this is gonna be unpopular… So I think most people probably agree… But it’s like a thing I want to put out into the universe more, and that is that every tech company larger than probably 20 or 30 people should hire a librarian.
We create ridiculous amounts of information, but then we usually just dump it into a wiki, and then we’re like “We’ll be able to find it. Just use the search functionality.” Or we like try and make a docs page, and we’re like, “Users will be able to find stuff.” And it’s like, there’s an actual degree program of people who get doctorates in how to arrange information so people can find it. Go hire them.
They’re not even that expensive to hire… Just go get a couple, a librarian and an archivist, and then make your data and your information just much more clean and much more organized. It will probably help you make a lot more money in the long run, and make your engineers less frustrated with the world.
How come not all database companies, and Google, and so on, are hiring librarians and archivists to do this?
I’m assuming people don’t hire librarians because they’re just never – a) I think most people don’t know what librarians actually do. I think most people just think librarians are the people that can help you find books in the library. And they don’t think much more about that. They don’t think about “But how do they help you find the books?” They’re just like, “Yeah, they just helped me find stuff.” So I think that’s part of it.
And unless you sit down and think about what the problem is, I don’t think it’s like that kind of clear thing. You’re not gonna look necessarily outside of the world you exist in. You’re gonna be like, “Oh no, this is the world… We can do this with computers. We can just write some code that will do some indexing, and that’ll work.” Like I always look at books, and I always look at the indexes they have, and like, “I’m someone is trained, probably has like a high-level degree in how to actually pick what words go in an index.” That’s like a really challenging job, because there’s a crapload of words in a book. Like, well, which ones do I pick and put in that index? It’s like, well, no, that’s like a hard job. And yet, books forever – well, not maybe forever-forever, but for a very long time, had indexes. And it’s like, well, we should probably get those people.
But yeah, I think most of the time, we as technologists are just like, “No, no, our technology will just do it for us. We’ll write some stat stuff, or some ML, or AI, or whatever, and it can obviously replace the thing that humans have been doing very well, for a very long time, even though we have no idea of what that degree program or industry is at all.” Typical things that we do.
The Times published a book review yesterday, of a book on the history of indexes, which apparently has like three separate indexes, this book on indexes… So it looks really interesting.
And I promise I did not tee that up. It’s not an overt company ad. Yeah, The New York Times actually does some really good work… [laughs]
I genuinely forgot that there.
I’ll drop your check off later. [laughs]
Just a discount on my subscription, that’s all I ask.
We’ll chat. It has been an absolute pleasure having you on this show. Thank you so much for joining us. It’s also wonderful to have you back, Kris, and wonderful as always to have you co-presenting with me, Natalie. And regrettably, we’re now going to have to say goodbye… So thank you all. I’m hoping to have everyone together again soon.
Yeah, absolutely.
Thank you!
Our transcripts are open source on GitHub. Improvements are welcome. 💚