In this episode, we’re joined by tech Lawyer Luis Villa to explore the question, who owns code? The company, the engineer, the team? What about when you’re using AI, Machine learning, GitHub Copilot… is that still your code?
Luis Villa: Well, I’m gonna give you my lawyer answer to that. Those of you whose GitHub accounts do things other than commit to other licenses - which is pretty much all I do these days with my GitHub account. We’ll have better notions of code ownership as a cultural practice among programmers, like who’s responsible… I do want to talk a little bit about that one, but let me put a pin in that and come back to it.
The basic system since at least the ‘60s in the US - I’m not sure exactly the timeline in the EU, but I would imagine similar - is that… Well, actually, let me go back even further. Copyright is intended to protect creative works. So what do you have to do to get copyright in a thing? And I’ll explain what copyright is in a second, but let me start with what you have to do. And what you have to do is you have to write down something that’s creative. “Write down” can be broad, right? It can be sculpting, or – but you have to take it out of your head and put it out into the real world in some way. That can be typing it in a computer, it can be, like I said, sculpting it into a sculpture; sculptures can get copyright. It can be a work of art, so it can be an oil painting, or whatever; it can be a Vim poster… Honestly, these days my development environment is Word, but I used to be an Emacs guy…
So that is the key thing, is you are doing a creative thing, and it can be mediated by tools. And Kris, this gets to your point about the AI and where is copywriting there… It can be mediated by a typewriter, or a paintbrush, or I believe - that we don’t really know for certain yet - it can be mediated by an AI. But you are doing some creative something, and turning that into a fixed thing.
Alright, so what happens once you’ve done that? Actually, before I get into what happens once you’ve done that, because I think there’s an important exception… In the US at least, that creative – what does it mean to be creative is not zero. It’s pretty close to zero, but it’s not zero. There’s an important case called Feist vs. Rural Telephone, and the whole thing in that case is literally, telephone books aren’t creative, and so therefore they don’t get copyright. Because what’s the point of a telephone book? The point of a telephone book is to literally just mechanically go through a town and have phone numbers for everybody. So it’s hard work, but it’s not creative. And in the US, at least, you have to have some kind of creative something.
So if you do like a phone list of the 100 most awesome people in New York City - that’s creative; you had to select – one of the ways which you can be creative under US Copyright law is selection. If you pick those 100 people, then hey, you’ve done something creative, and your list of 100 people is copyrightable. But if you’re just “Every single person who lives in Manhattan”, that’s not creative; you don’t get protection. And that plays into questions of databases, and ultimately, I think - and we might not have time to get to this today, but the question of the models themselves. Because there’s both the output of models, what’s the copyright on that, and the models themselves. We don’t actually know if they’re copyrightable. That may be too esoteric; you might have to invite me back for another one for that.
[22:25] But okay, so you’ve created this thing… So now what do you do? So now you’ve got copyright. What does copyright let you do? Copyright lets you control what others can do with it. It lets you decide who gets to use it, who gets redistribute it, who gets to modify it, within the certain limits. But it’s pretty strong.
So the limits include what’s called First Sale doctrine, which is, “Hey, I sold it to somebody. They can usually sell it to one other person.” First Sale doctrine made a lot more sense in the era of like books. That’s what creates used books stores, is First Sale doctrine; it means that I bought the copyrighted thing, and now I can give it to a used books store and they can resell it. In the digital age, First Sale doctrine is a little more complicated… But suffice to say, that’s one of the limitations.
Similarly, fair use says, “Hey, if you’re using this for education, if you’re using this for nonprofit purposes…” I’m oversimplifying a little bit here; the tests around fair use can be a little complicated. Critically, in our digital age, fair use in the US has expanded quite a bit to include what’s called transformative use, which is to say, “Hey, you’re doing something super-new, super-different.” Courts are often going to allow that in the name of sort of not impeding progress.
So for example, Google Book Search is in some sense the biggest copyright violation in all of history, because it’s literally copied systematically millions of books, made these digital copies. But then a court said, “Well, but actually, it’s so different. It’s so great.” And they put strict controls around, you know, you can only get a few pages at a time, and authors can opt out if they want, after the copying has been done… So like Google Book Search is a good example of what transformation means, and potentially analogous to what Copilot is doing. But we don’t know.
The flipside of this is that we just had court cases – we had a court case a couple of years ago about the song Blurred Lines, some of you might have heard… And courts there actually said that even just sort of copying the style of the artist could potentially be a copyright infringement, which was a big surprise to a lot of lawyers. A lot of lawyers are still unhappy about that case.
Next week there’s going to be – or no, tomorrow morning, actually, maybe… There’s going to be a case about Andy Warhol doing – a photograph of Prince that Andy Warhol transformed into one of his Andy Warhol canvases. And the Supreme Court – it’s a little weird, but I think that case might actually have a lot of impact on artificial intelligence… Because we’ve all done, we’ve all played with Stable Diffusion, or Midjourney, or OpenAI, or whatever, to create foo in the style of bar. Well, if bar is still alive, and still has a valid copyright, maybe that’s a problem. We don’t really know yet.
I saw a research paper yesterday that said, “If you prompt Copilot to do code in the style of –” I’m forgetting the guy’s name. Petrov, I think… A top Python programmer - that you actually get fewer vulnerabilities in your code if you prompt Copilot with the name of a top maintainer. And the flipside, the paper’s author was honest enough to note that they prompted with their own name, and the number of vulnerabilities went up. I thought that was nice and humble of them.
So style is an issue that could potentially come up in code as well. That was a very long-winded answer to your question, Natalie. I apologize.