In this episode, weāre joined by tech Lawyer Luis Villa to explore the question, who owns code? The company, the engineer, the team? What about when youāre using AI, Machine learning, GitHub Copilot⦠is that still your code?
Luis Villa: Well, Iām gonna give you my lawyer answer to that. Those of you whose GitHub accounts do things other than commit to other licenses - which is pretty much all I do these days with my GitHub account. Weāll have better notions of code ownership as a cultural practice among programmers, like whoās responsible⦠I do want to talk a little bit about that one, but let me put a pin in that and come back to it.
The basic system since at least the ā60s in the US - Iām not sure exactly the timeline in the EU, but I would imagine similar - is that⦠Well, actually, let me go back even further. Copyright is intended to protect creative works. So what do you have to do to get copyright in a thing? And Iāll explain what copyright is in a second, but let me start with what you have to do. And what you have to do is you have to write down something thatās creative. āWrite downā can be broad, right? It can be sculpting, or ā but you have to take it out of your head and put it out into the real world in some way. That can be typing it in a computer, it can be, like I said, sculpting it into a sculpture; sculptures can get copyright. It can be a work of art, so it can be an oil painting, or whatever; it can be a Vim poster⦠Honestly, these days my development environment is Word, but I used to be an Emacs guyā¦
So that is the key thing, is you are doing a creative thing, and it can be mediated by tools. And Kris, this gets to your point about the AI and where is copywriting there⦠It can be mediated by a typewriter, or a paintbrush, or I believe - that we donāt really know for certain yet - it can be mediated by an AI. But you are doing some creative something, and turning that into a fixed thing.
Alright, so what happens once youāve done that? Actually, before I get into what happens once youāve done that, because I think thereās an important exception⦠In the US at least, that creative ā what does it mean to be creative is not zero. Itās pretty close to zero, but itās not zero. Thereās an important case called Feist vs. Rural Telephone, and the whole thing in that case is literally, telephone books arenāt creative, and so therefore they donāt get copyright. Because whatās the point of a telephone book? The point of a telephone book is to literally just mechanically go through a town and have phone numbers for everybody. So itās hard work, but itās not creative. And in the US, at least, you have to have some kind of creative something.
So if you do like a phone list of the 100 most awesome people in New York City - thatās creative; you had to select ā one of the ways which you can be creative under US Copyright law is selection. If you pick those 100 people, then hey, youāve done something creative, and your list of 100 people is copyrightable. But if youāre just āEvery single person who lives in Manhattanā, thatās not creative; you donāt get protection. And that plays into questions of databases, and ultimately, I think - and we might not have time to get to this today, but the question of the models themselves. Because thereās both the output of models, whatās the copyright on that, and the models themselves. We donāt actually know if theyāre copyrightable. That may be too esoteric; you might have to invite me back for another one for that.
[22:25] But okay, so youāve created this thing⦠So now what do you do? So now youāve got copyright. What does copyright let you do? Copyright lets you control what others can do with it. It lets you decide who gets to use it, who gets redistribute it, who gets to modify it, within the certain limits. But itās pretty strong.
So the limits include whatās called First Sale doctrine, which is, āHey, I sold it to somebody. They can usually sell it to one other person.ā First Sale doctrine made a lot more sense in the era of like books. Thatās what creates used books stores, is First Sale doctrine; it means that I bought the copyrighted thing, and now I can give it to a used books store and they can resell it. In the digital age, First Sale doctrine is a little more complicated⦠But suffice to say, thatās one of the limitations.
Similarly, fair use says, āHey, if youāre using this for education, if youāre using this for nonprofit purposesā¦ā Iām oversimplifying a little bit here; the tests around fair use can be a little complicated. Critically, in our digital age, fair use in the US has expanded quite a bit to include whatās called transformative use, which is to say, āHey, youāre doing something super-new, super-different.ā Courts are often going to allow that in the name of sort of not impeding progress.
So for example, Google Book Search is in some sense the biggest copyright violation in all of history, because itās literally copied systematically millions of books, made these digital copies. But then a court said, āWell, but actually, itās so different. Itās so great.ā And they put strict controls around, you know, you can only get a few pages at a time, and authors can opt out if they want, after the copying has been done⦠So like Google Book Search is a good example of what transformation means, and potentially analogous to what Copilot is doing. But we donāt know.
The flipside of this is that we just had court cases ā we had a court case a couple of years ago about the song Blurred Lines, some of you might have heard⦠And courts there actually said that even just sort of copying the style of the artist could potentially be a copyright infringement, which was a big surprise to a lot of lawyers. A lot of lawyers are still unhappy about that case.
Next week thereās going to be ā or no, tomorrow morning, actually, maybe⦠Thereās going to be a case about Andy Warhol doing ā a photograph of Prince that Andy Warhol transformed into one of his Andy Warhol canvases. And the Supreme Court ā itās a little weird, but I think that case might actually have a lot of impact on artificial intelligence⦠Because weāve all done, weāve all played with Stable Diffusion, or Midjourney, or OpenAI, or whatever, to create foo in the style of bar. Well, if bar is still alive, and still has a valid copyright, maybe thatās a problem. We donāt really know yet.
I saw a research paper yesterday that said, āIf you prompt Copilot to do code in the style of āā Iām forgetting the guyās name. Petrov, I think⦠A top Python programmer - that you actually get fewer vulnerabilities in your code if you prompt Copilot with the name of a top maintainer. And the flipside, the paperās author was honest enough to note that they prompted with their own name, and the number of vulnerabilities went up. I thought that was nice and humble of them.
So style is an issue that could potentially come up in code as well. That was a very long-winded answer to your question, Natalie. I apologize.