Predrag Gruevski and Chris Krycho joined the show to talk about SemVer. We explore the challenges and the advantages of semantic versioning (aka SemVer), the need for improving the tooling around SemVer, where semantic versioning really shines and where it’s needed, Types and SemVer, whether or not there’s a better way, and why it’s not as simple as just opting out.
Chris Krycho: I think the answer is yes. And one of the reasons that I’m more bullish on sticking with SemVer and putting tooling around it is because I did survey the rest of the world as it were when it comes to versioning, and there are a lot of approaches that just say “Ah, these problems with SemVer are fundamental. Scrap it.” One of them, SoloVer, is just have one version number - it’s 1, 2, 3, etc, just go up. And that has a certain appeal to it, but the actual fundamental issue there hasn’t changed. All it does is take the burden off of the maintainer of a library, and put it on all of the users. It says “Okay, now you’re responsible anytime any one of your dependencies changes, including transitively, anywhere in your dependency tree. So go read the release notes”, which tend to encode things like breaking changes in the release notes. Because again, communication problem, right? We want to know “What did this do?” Also, as an aside, all of those proposals include things like “Well, you can also stick like pre-release numbers on the end.” And I’m like “Hold on, hold on… It seems kind of like we’re backing our way back toward this whole SemVer thing now, aren’t we? Shouldn’t your pre-release just be another number?”
I think there is a sense in which there is a maybe fundamental local maximum. Maybe it’s local, but the hill’s so big that we’re not going to find a different path. I could be wrong about that. But when I go looking around, the things that seem like they might change the calculus here don’t so much eliminate the value of SemVer as they do build on it. So a good example here is what the Unison programming language does. Pretty small language, but it is aimed at industry. It’s not pure research. And they do something that’s really wacky, in the best way. You don’t store your code as plain text. Instead, they take advantage of the fact that they’re a pure functional programming language, with really well specified semantics, and they say “Okay, we can take your code, normalize it, hash it, and store the compiled output of it with a pointer to it”, which means a whole bunch of interesting things… But for the purposes of versioning means when I make it breaking change, the original version is still there, because that hashed, compiled version of it got committed to a database instead of to plain text. And that database version is what anybody who depends on it sees.
[00:34:06.12] So when I add a new parameter to my function, the consumers are still pointing to the old function, which means they can pull this update and say “Okay, I can progressively switch over to the new function signature, but I can do that at will, and the two can live next to each other”, and because it is a pure functional programming language with no side effects that aren’t managed off in the runtime, etc, etc. You know, leave all that aside. Suffice it to say because of that choice, they can just “ship a breaking change” without ever breaking anyone.
The reason you still want SemVer here though is because SemVer is a communication tool. And so SemVer lets you say, “Okay, there are these new features in the library. Here’s a bug fix. You’re going to want this one.” And even though that means you need to actually go update which compiled version of this function you’re pointing to, you’re getting data from that, and when you go to publish your library, you want to be able to use that information. Even knowing that it’s not going to break your users in the same way, it does let you then say “Oh, I didn’t actually mean to make a breaking change here. I wanted this to be compatible and to just keep working forward.”
So things like that, I think, are pointers in the right direction. There’s also a couple of papers out there from folks at the Nova University of Lisbon, who are asking “What happens if you bake versions as types into Java?” Java because it’s the kind of default language to do this kind of research on. Their proposal is very interesting from a type theoretic and versioning perspective, and would never get adopted in industry in a million years, because it’s just way too much boilerplate… But it does the same thing we’re talking about; it bakes this notion of backwards compatibility in, in a way that I think if you were going to actually ship something like that in an industrial programming language, you would actually want SemVer as basically how you do it. And their type system that they slap on top of Java, effectively encode SemVer with keywords. It’s upgrades, and replaces, and things like that.
So I think there’s work to be done here, but I don’t think it’s going to be in the near term, for one. So we’re going to need the tooling. And even if and when we see something like that type system on top of Java, or what Unison is doing, becoming more widespread, I think those kinds of things lower the risks in really interesting and important ways… But they would still really benefit from the kinds of tooling that we’re talking about. They also though highlight, I think, one of the things that’s easy to miss in these kinds of discussions, which is a lot of times people like me, who are type theory nerds, etc. like to go looking for that kind of a solution to a problem. It has two limitations. One is that’s never going to work for Ruby. I say “never”, but you could imagine a world in which type adoption for Ruby is at 100%, but that world seems very unlikely to me… Not least because a lot of people who love Ruby love it because it’s dynamically typed.
And second, doing all of those things purely at that, like “Let’s bake it into the type system level” has costs, because it turns out that itself then becomes a thing that you need to think about in terms of the versioning of your language. Because one of the things that shows up is that the more foundational, whatever your tool is - like, if you’re an app, and you just have consumers, it’s not that big of a deal. Your versioning is basically purely marketing. If you’re a library, you have a bunch of apps that use you and maybe some other libraries. If you’re a framework that everybody else builds on, how well you do this now affects everybody else in the entire ecosystem. If you’re a programming language, you’re kind of doing it at the maximum level, and you still have to communicate those versioning constraints to other people. And the more complicated your type system is, the harder it is to actually understand what the implications are for versioning.
[00:38:09.28] So the tendency that people like me have, to say “Ah, bake it into the types, and it’ll be rigorous and checked forever” can actually undermine your net goals here, because now you’ve made it harder to think about this fundamental communication problem.