The Changelog – Episode #266
The Future of RethinkDB
with Mike Glukhovsky
Mike Glukhovsky joined the show to talk about the future of RethinkDB. Mike was a co-founder of RethinkDB along-side Slava Akhmechet. RethinkDB shutdown a year ago officially on October 5, 2016 — and today we’re talking through all the details with Mike. The shutdown, getting purchased by the CNCF, relicensing, buying back their IP and source code, community and governance, and some specific features that Mike and the rest of the community are excited about.
CircleCI – CircleCI is a continuous integration and delivery platform that helps software teams rapidly release code with confidence by automating the build, test, and deploy process. Checkout the recently launched CircleCI 2.0!
Bugsnag – Mission control for software quality! Monitor website or mobile app errors that impact your customers. Our listeners can try all the features free for 60 days ($118 value).
Linode – Our cloud server of choice. Get one of the fastest, most efficient SSD cloud servers for only $5/mo. Use the code
changelog2017 to get 4 months free!
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.
Notes & Links
- The Changelog #114 (with Slava Akhmechet)
- The Changelog #181 (with Slava Akhmechet)
- Proposal: Modifier functions · Issue #5813 · rethinkdb/rethinkdb
- The ReQL query language
- RethinkDB joins The Linux Foundation
- RethinkDB Documentation
- RethinkDB is shutting down
- Contribute to RethinkDB
- From Slava Akhmechet on why RethinkDB failed
- Announcing RethinkDB 2.3.6: the first release under community governance
- CNCF (Cloud Native Computing Foundation) Members
- The liberation of RethinkDB
- Apache License, Version 2.0
Click here to listen along while you enjoy the transcript. 🎧
Mike, we’ve had quite a journey with RethinkDB; we’ve covered on episode 181 and 114 with Slava, kind of the background of Rethink as a technology, the company, and we’re now at a point where we’re looking back and Rethink the company had failed, went defunct, and ultimately has moved to this newer community-driven project. For those who are catching up, help people understand who you are in this role here. You’ve co-founded RethinkDB with Slava… What’s your portion of the story?
Thanks for having me on The Changelog, I really appreciate it. I have spent a lot of time working on RethinkDB. Looking back, Slava and I started RethinkDB back in 2009, so almost eight years of my life. It’s been pretty incredible. Slava and I started RethinkDB as a technology startup. We were backed by Y Combinator back in the summer of 2009, and we decided to build something designed for modern hardware, for solid state drives, for multi-core CPUs; we wanted to build really low-level systems technology. When we originally started the company, we were focused on building a storage engine for databases.
As time went on, we really started to build not just a storage engine, but a full database. Eventually, we decided to open source it. When we first released it to the world, it took us almost 4-5 years to be able to build RethinkDB before we were ready to share it with the world and get it tested, get it out in the community. We really decided to flip that switch and become an open source company.
My background is that – Slava and I both started computer science; his background is in distributed systems and storage technology. Mine is in user experience and design, so we come from very different backgrounds, but between the two of us, you’ve got RethinkDB, which is a very capable, very powerful database. [04:01] It scales linearly, it’s open source distributed, it allows you to run hundreds of thousands of real-time streams in parallel and be able to solve some really interesting and really complex use cases.
I was really focused on making the database useable, making it friendly, making it something that people could engage with. When we first looked at building RethinkDB, often times people look at systems software as being cold; you interface with it through a terminal, you read really complex and arcane manuals, you spend a lot of time deciphering really complicated query languages and APIs trying to understand how the thing works, and they tend not to help you along the way. But when we designed RethinkDB, we really wanted it to feel very much like a consumer product, to feel accessible to developers who were just starting out, building their first application.
So I spent a lot of time working on things like the dashboard that’s in RethinkDB, I spent many late nights designing and building the user experience of the database, thinking about how people use it… I built the community operations for RethinkDB; we had a really amazing worldwide community that was deeply engaged in the open source experience. Open source manifests as being a software development model, as a license type, as a community, and I spent a lot of time thinking about how to make RethinkDB really accessible in terms of our community as well. A lot of the work that we did in making RethinkDB really friendly and accessible and capable has born through as the company shut down. Because when the company actually shut down, the community just came up in arms and said “We’re gonna keep this thing going. Whatever it takes, we’re gonna champion the needs of RethinkDB and we’re gonna continue building it.”
So my experience has been basically anything that required doing, I worked on at Rethink. These days, I have taken a lot of the experiences I’ve spent in the open source community and I work on developer relations at Stripe; I help around developer relations for Stripe, which builds developers tools for conference and for finance.
Right. You’re at Stripe now - is it because of the transition of engineers and whatnot that was… I guess at that time transitioned to Stripe; Stripe stepped up and said “Hey, we’ll hire…” - is that why you’re there, or are you there for other reasons? Is that what got you there?
Yeah, so our team is pretty incredible. We have a really incredible team of engineers who built RethinkDB, and when the company shut down, we had lots of options of where we could try to take our team to be able to feel like we could continue working together, continue building things together, and Stripe was by far and away one of the most exciting options available to us. I personally am really excited and dedicated to what Stripe is working on, because having worked on developer tools for a long time, there’s immense power in allowing creators and builders to be able to use technology to be able to solve problems that previously required teams of people to solve.
Stripe is really building an abstraction for the world of commerce, for the world of business, to be able to allow people to redefine how they build the companies, to be able to redefine how they build payments infrastructure around APIs and around developer experience. And the whole RethinkDB team has spent a lot of time thinking about how to abstract away the world. Software represents abstractions about the real world, and we’ve spent a lot of time thinking about how to help developers represent the real world and to be able to use those abstractions to build really powerful things… So Stripe was a really natural fit. When you look at the developer tools, companies out there, there are a few of them, but Stripe is far and away one of the most exciting companies out there for developer tools. So it’s been a really comfortable home for the RethinkDB team.
[08:03] Gotcha. So to paint some of the picture too for timelines, you said that Rethink began in 2009 - that’s roughly around when we started The Changelog too, by the way, so that’s kind of crazy in terms of overlapping there… Operated as a company for several years, took VC out Y Combinator; I’m kind of glossing over some of the details, but then in 2016 was the RethinkDB shut down, post kind of describing some of the things you’ve just mentioned here, and Slava even penned his own “Why RethinkDB Failed” and kind of put out some thoughts on developer tools and how he had some speculation on the market and whatnot, and shared some hindsight notes about how you chose a terrible market and optimized for the wrong metrics… There were some downsides so to speak, but that’s roughly the story there. And now we’re transitioning from a failed company - to put that loosely; I don’t wanna say you all messed up… Just easily said, failed company, transitioning into a community-driven open source project. It’s always been open source but now it’s community-driven. So help us paint the picture from where you failed as a company to CNCF stepping in, even a lot of the details there around getting the code back and being able to use that… Help us fill in the gaps on those details.
Yeah, absolutely. So it’s really interesting to look at open source development – a comparison that I always really appreciate is the bazaar in the cathedral, which people are probably familiar with, but I’ll repeat it for those who aren’t. Open source is the bazaar - it’s messy, it’s loud, it’s complicated, there’s lots of things happening, it’s hard to keep track of what’s going on. In contrast, the cathedral is closed source; it’s very polished software development that comes out with very precise, very clear products that are shipped, and you don’t have a lot of interaction between people who are using it. It’s very prescribed.
What’s interesting about when you build open source as a company is that you can kind of strike this happy medium between the two, where essentially you have the resources of what you would call the cathedral model, and yet you’re able to still work with the bazaar and to open and invite people to be able to contribute, collaborate and work on the project with you. So RethinkDB has like hundreds of thousands of users around the world who use the project, and throughout the company’s history, people regularly were contributing major features to it. We would have pull requests that would come in, we had an open development model on GitHub; we did everything out in the open. There was no private trackers, no private communications that were significant. Everything we did was in the public, in the open, on GitHub.
So the community would come in and work with us and be able to build features together, close a product feedback loop, be able to help us ship things more quickly because we understood what their needs were. And when the company shut down, the intellectual property was in a state where it was held by a third-party, because when companies shut down, typically investors will hold on to certain assets of the company. So we found ourselves in this position where the open source license was AGPL, so people could continue using the project and feel comfortable using it. But we were unable to change the license or make any significant changes, because the intellectual property was in the hands of a third-party.
The first thing that happened was that people said “We’re not gonna let this die. It really doesn’t matter what you guys do, because it’s open source, so we’re gonna keep going with this”, but if we’re able to retain control of the assets, we can make some changes we wanna make. For example, when the Linux Foundation and the CNCF ultimately ended up acquiring the assets of RethinkDB, they changed the license to the Apache Software License, which some people consider to be more permissive, because it places less restrictions on how you use the software. That was one change that they really desired, and I understand why.
[12:02] They also acquired a lot of the closed source assets of the company. We had a version of the database called RethinkDB Enterprise, which had only a small handful of features, things that were useful to very large organizations, like audit logging, audit trails, things like that… And we’re working on sharing those more broadly with the community and actually getting into the open source version.
So essentially, people stepped up and said “We’re not gonna let this go away”, so I agreed with them. I didn’t want RethinkDB to stay in limbo, so I worked very closely with the open source leadership team, which was comprised of former RethinkDB team members - including myself - and a number of folks from the open source community who really had made this a part of their lives. They had worked with us for years. We had very close connections and relationships in that community, and they were our friends; we all share the same passion, to be able to keep the project going.
After a few months, we decided to be as public as we could about the process, and we held regular meetings. We have a Slack channel where we had over 1,500 people who all joined to try to figure out how to move forward with the project. After a few months we were approached by Dan Kohn and Brian Cantrill, who suggested that they would be interested in acquiring the IP and assets of the software project. So we worked with them to close the deal, to be able to move over the control of the assets and be able to relicense the project.
Yesterday actually, we just shipped the first release for the open source project under the hands of the community, which is 2.3.6, which sums up a series of stability fixes, bug fixes and improvements, and it’s also the first official release that has been fully under the Apache Software License. And we’re steadily working on releasing more of the things that we’ve built, including some of the closed source features that we have built in the past eight years, and other things… We had a full-time resident artist-illustrator who is creating art around the open source project. This is something that I really cared about, because to me software is a vehicle for ideas, but art is also a vehicle for ideas; it helps you express the way you reason about the world, the way you think about the world. So in building a community project, we had a lot of art that we created around it, which really helps people understand the ideas that we’re communicating.
So all that art, and we’re working on also open sourcing a lot of the projects that we had internally for build testing, performance testing, load testing. They’re all getting steadily open sourced. So the challenge and the opportunity for the project is to figure out how to move to this community-based model.
You guys have great artwork absolutely, so hats off in that direction. Take us into the nitty-gritty a little bit and help us understand from the outside the legal ramifications of what you said back there with the AGPL and acquiring the assets. If I understand you correctly, because it’s AGPL, the community could have forked it, because it was open source and open license enough that they could fork it, rename it and continue on. But there was things that were drawbacks to that, such as (as you said) the artwork assets, other things, the name of course - other aspects of the intellectual property that because the open source project was operated by a company that had creditors (people who had invested into the company), they had ownership of the intellectual property? Am I following this correctly?
[15:45] Yeah. So RethinkDB started as a closed source company back in the day, so the terms under which we were working with investors were very different than our needs four or five years into the company when we really open source everything. So the way license and copyright work - they’re very different. The license is the terms under which you can use the project. When it’s AGPL, that means that someone could very well fork it, rename it, do whatever they like. In fact, lots of people have forked RethinkDB, and that’s encouraged; that’s the goal of copyleft software. Everytime a fork happens, the terms of the AGPL will continue to those forks.
While that’s something that was perfectly workable for users, a lot of developers wanted to work on it but they weren’t necessarily comfortable with the AGPL. I have a neutral opinion on the AGPL - I see both its pros and cons - but the role of the community was very strong; a lot of folks were committed to working on it, especially if it was under the Apache software license. Because a third-party owned the actual copyright to the code, we were unable to change the license.
And it’s not even about whether it’s a change from AGPL to Apache Software License or anything like that, it’s about thinking what happens in ten years or 15 years when a new license type emerges and we decide that we may want to shift the license for whatever reason, or give ourselves any sort of forward flexibility. So being able to actually retain the name, the common law trademark usage of RethinkDB and not having to rename it, being able to keep our website, being able to keep our documentation, being able to keep domain names, all the collaboration and communication tools, the GitHub organization, the repository of issues that we had built up for thousands of issues that were recorded - all of those details… It’s best for the forward velocity of the community project if the Linux Foundation and CNCF have that. So when the LF acquired the IP and assets of the software project, it really made a lot of problems go away.
Otherwise you would have had to start at ground zero again, right? You’d have to go back to – I mean, you would have the code to fork, but you’d have to…
A lot of stuff you’d have to leave behind, and a lot of…
You’d have to rethink everything.
Tell us about the Cloud Native Computing Foundation (CNCF), which is associated with the Linux Foundation (or is the Linux Foundation). Help us understand who that is and how they can swoop in and buy the assets… Tell us about that.
The CNCF is very interesting. It’s part of the Linux Foundation; the Linux Foundation has a number of foundations under it’s aegis and umbrella, and what they really focused on are technologies that are designed for the cloud-based environments that we’re using today, so that’s why they’re called the Cloud Native Computing Foundation.
Examples of projects that they work on are things like Kubernetes, which you guys may be familiar with, which to me is one of the most exciting (if not THE most exciting) infrastructure change or project that’s popped up in the past ten years… And a lot of the projects that they work on are designed to help build the infrastructure necessary for the next ten years of what we’re gonna need for cloud computing. Brian Cantrill, who has been a long-time member of our community - we’ve collaborated with him on several projects and he’s been a joy to work with… So he approached us and said that the CNCF would be willing to work on acquiring these assets. And between them and the LF, we’ve been able to make this change very quickly.
They had a lot of enthusiasm, energy for completing the transaction and for helping establish the future of the project. Right now we are part of the Linux Foundation, but we are hoping to establish governance soon (community governance), to be able to decide whether we want to join the CNCF or remain as part of the Linux Foundation. The CNCF has a certain structure to it, certain ways they want projects to be run. So that’s a decision that needs to be made as a community, together.
[20:07] So the CNCF/Brian Cantrill found motivation to approach you to say “Hey, let’s help you acquire your assets”, which requires dropping a check, basically, and taking care of creditors to get that IP back and to be able to do all the things you just said there. Where do you think that motivation came from? Why is RethinkDB so important to the community, but also to those individuals and organizations involved to drop some money down to acquire the IP back and have full control of it?
We also were at the forefront of a lot of the changes that happened in infrastructure software. Being able to add real-time streams to databases - thankfully, that idea has really started to pop up in other parts of the community. People are starting to move to the idea of real-time subscriptions, and thinking about how databases can be shaped differently. There’s the query language which is really incredible, because the way the software is built, it essentially is a giant distributed computing environment that allows you to run functions that operate in parallel, in a distributed cluster.
All of that is really, really neat technology, so to look at that and to consider that for a marginal cost you can buy back the ability to direct the future of the project - I think that the CNCF and the Linux Foundation really felt like it was the most reasonable of investments.
In terms of why it’s exciting, I think it’s because for a long time RethinkDB has been thinking about where databases should go, what the future of databases should look like, and being able to continue exploring those ideas as a community seems just deeply powerful and deeply useful to continue as an entity. It’s really exciting, because even in the next version of RethinkDB that we have planned, it already unlocks so many more ways that RethinkDB can be used to solve all sorts of different types of problem. That will be RethinkDB 2.4, which is being worked on right now.
Coming up, at this point in the story for Rethink - the company has shut down, creditors are holding the IP and the source code, and meanwhile, the community just wants things to carry on, they want the software to carry on. We cover what the actual value of the IP and source code is worth, what they end up spending to buy it back from their creditors, and we also talk through the new governance model and how they can move forward as a community-led project. Stay tuned.
So Rethink shut down October(ish) 2016, creditors held the source code, and clearly, based on the story you’ve shared with us, Mike, the community and everyone else wanted things to move forward, and for the reasons you’ve mentioned before of like not wanting to restart over, or fork the code, or think of a new name… You wanted to reuse this past seven(ish) years of work to move forward, but that required someone to come in, or basically to buy the assets back from the creditors. That’s VC, that’s whomever held the IP when the company went defunct. Can you share roughly what that figure might have been? I’m sure the community is thinking like “Geez, this must be worth a lot of money!” What are we talking about here?
Yeah, so Brian approached us and was super excited to try to help figure out how to take next steps for the project. I think he really personally cared about RethinkDB’s mission and figuring out what it would take to unstick it. Thankfully, the number was not that high, I think largely because the folks involved who held the IP really recognized that the people who are most interested in the future of RethinkDB that really were dedicated to it were the community. They were the ones that were really asking for this shift to happen. So we were able to negotiate a very reasonable number, which was $20,000.
I think that is a testament to the fact that the community was very vocal about its needs, and that it was clear it wasn’t a corporation that was acquiring the assets… That it was something for the community’s benefit. They made it really easy to amicably make the transition, and thankfully, the Linux Foundation and CNCF stepped forward to be able to facilitate that. They’ve given us a lot of the resources we need to be able to continue, like the infrastructure that you generally need to be able to do things like accept donations for future software development of the project, or be able to help us with hosting of domains and things like that, and be able to really give us the legs we need to stand on as a community-based software project going forward.
[28:18] Well, I guess mad respect to the negotiators, as well as to the creditors, for negotiating that price… Because that’s an excellent price for what the world got out of it, which is a free and clear RethinkDB that can be handed over to the community.
Tell us what the community looks like, because everybody’s got one, but they all are different. I guess with RethinkDB it’s probably even a little bit tough to make that actual transition, because it was a product from a product team inside of a company for so long, even though you all were developing it open source-style; like you said, it was a middle ground between the cathedral and the bazaar… So now you’re going full bazaar, and you’ve gotta figure out what your bazaar is gonna be shaped like. What does it mean when you say “community governance” for RethinkDB?
That’s an excellent question. We were lucky to have a lot of people care about the project, as I mentioned. It’s funny, because people used to say that after checking Twitter and Reddit, they would go to GitHub and they would just catch up on all the emails that were being sent between our developers, to sort of just sit and observe how the project was unfolding. On GitHub we have North of 19,000 stars, a few thousand people I think who watch the project, and when you watch a project, it essentially means that every email that gets sent out gets shared with every single person who’s watching it.
It’s funny, because we kind of had this silent audience that was observing everything we were doing, and watching everything unfold, just eating popcorn and watching as it progressed. [laughter] So now those voices started to make themselves heard, and in the open RethinkDB channel on Slack, as I mentioned, we had like 1,000+ people all advocating for the future of the project, and while it was in limbo, we were trying to work things out. People were debating should we fork it, should we try to organize development somehow? Should we try to build a software development fund?
I think that the nice thing about being under the aegis of the Linux Foundation is that it allows us to have a very safe home from a legal perspective, from a structural perspective, but what we really are trying to figure out now is how to decide what decisions we’re gonna make for the future of RethinkDB in terms of its technology. So the most important thing that we’re going to be building is a technology steering committee, to be able to decide what features are going to go into the next version, what is something that should go into the API and shouldn’t go into the API. A lot of these decisions only happen through lots of conversation, lots of structure, lots of discussion.
At RethinkDB we spend a lot of time – you could even just go, if you look at the issues for RethinkDB, take a look for example at the dates and times issues where we added date support to RethinkDB.
Dates are notoriously complicated, because everyone disagrees on what calendar you’re using… Especially if you go back through history, calendars changes based upon political rule, or based upon particular standards that would emerge. If you look at timezones, there’s even timezones that are half an hour off, 45 minutes off… So these things get very complicated, and how you represent it in the database is very important, because you’re gonna do a lot of queries based upon dates and times. And if you just go to that thread and you just observe the hundreds of comments that unfolded, just trying to figure out every single angle, every single nook and cranny of how they should be added to the database - that kind of agitating force is what’s really powerful about open source.
[31:53] A lot of voices come to the conversation, they are able to agree on what the best thing to do is only through letting the ideas do violent conflict sometimes before you emerge with the best result. So building a group, a committee that will be able to think about these things and do so on a regular fashion is really important to the open source future.
I’m here on the issue tracker, and I’m trying to find that issue but there is 1,400(ish) issues. That’s quite a bit. Is that potentially because of this cycle we’ve been in since the last October to get to where we’re at now? Is that why there’s so many issues? Or what’s the state of issues, I’m just curious…
Issues are a funny thing on GitHub, because people think about issues differently for each project.
Right. Like a to-do list, or something like that. It shouldn’t be clear, basically.
Yeah, and so basically, every time that we have had an idea on what to do with the database, we would just create an issue for it. Then it would allow the conversation to unfold for some crazy speculative idea that wouldn’t see the light for even sometimes two or three years. So if you look at a lot of the open issues, not all of them are bugs. A lot of them are like “I can’t build in this particular environment” or “I’m having difficulties with this particular workload.” Aside from that, many of them are speculative, like “We should build some new feature” and if you look at the number closed, which is 4,500, it dwarfs it significantly.
But if you just search, for example, for dates and times – if you search for dates it should pop up… You should be able to see just these proposals, that are titled for example “ReQL Proposal.” In a proposal, we basically start exploring a new idea to be added to the query language, and it’s tagged with a particular tag, and then we just let the discussion roll for a long time until all the ideas are exercised. That’s why there are so many issues on RethinkDB. Sometimes people look at it and they wonder “Is there a problem with the database?” and the reality is that that’s just how we approach…
Or stagnation… I mean, when you get those numbers, you kind of think like “Is it being managed? Is somebody triaging these things?” It’s almost as if you have to have the precursor of like “This is how we do issues on GitHub” kind of thing, like you just said there, basically. Because otherwise you come to it and you’re thinking like “Is somebody triaging these issues? Is it really being tracked?” that kind of thing.
Yeah, and that’s a lot of the work that folks on the leadership team are doing today - working through those issues, replying and triaging - and obviously, we would appreciate a lot of help from more folks in the community in helping keep track of all these issues.
Easy call out there. So you’d mentioned some things around governance TC; it sounds a lot like you’re going through with RethinkDB what Node went through roughly a year and a half ago, when there was a major fork… In the case of RethinkDB had the acquisition of the IP not happened, there could have been an io.js/Node.js kind of scenario. I’m assuming here, but who are you leaning on or what communities have you leaned on to get the insight you’re getting to go the way forward that you’re going?
We’re getting a lot of advice from folks at the CNCF and the Linux Foundation; we also have some folks who have been involved in a number of other projects, like Chris Abrams and Ross Kukulinski, who have seen things happen in the Node world, and have a lot of deep connections in the open source community.
[35:58] So we’re not looking to innovate as much as take the minimum viable organization that we need to be able to ship new releases on a regular basis and really feel comfortable for the next bit. Unlike Node, it’s not terribly contentious, it’s more collaborative. So many people used Node.js and so many people cared about the future of it that that split between io and Node happened because people were dissatisfied with a lot of how things were unfolding. And in the case of RethinkDB, it’s a lot more collaborative. Folks are just trying to understand the best way that we can all build this together going forward, rather than arguing over its future.
Just going back, I think Adam said the word ‘stagnation’, and when I think about the cathedral and the bazaar metaphor, the one thing about a cathedral is you don’t have to ask for anybody’s opinion, you just push forward the way that you think is best inside of the product team. And with the bazaar, like you said, things have to come out of violent arguments, or sometimes violent agreement; whatever it is, ideas have to prove themselves first, so there’s a lot more conversation happening, and sometimes that can result in a lot less agreements. If all you have is disagreements, then you’re not exactly going anywhere.
I’m curious what your thoughts are on that, and the reason why I ask is because - and I’m not trying to hold you to the fire or anything, because we’re software people; we know about ship dates and stuff… But in the Linux Foundation post back in February it said that RethinkDB 2.4 was coming a few days later, and as you said earlier, 2.3.6 just shipped 17th July. So there was something that happened there, and I’m wondering if the transition may have caused some stagnation, or if there was actually potentially a problem of disagreements and generally slower-moving product because of the community governance.
I agree with you, it’s definitely slower-moving than I personally would like. One of the things that RethinkDB really pioneered in its development model is just rapidly-shipping releases. So we had a goal of shipping a new version of the database every 6-8 weeks, and each time it had major features, like secondary indexes, or adding a distributed cluster using Raft… Each of these releases was the product of extremely careful scheduling, and trying to figure out how to keep the pulse of the database moving quickly, because it is the cathedral.
I personally think that a slower pace is a healthy thing for a database. RethinkDB has only really been production-ready for 3 or 4 years now, and you look at something like Postgres, or even MongoDB, and it takes a long time to let databases really bake. Once they’re built, they don’t really need to have – you could argue that dramatic shifts for a database is not necessarily a good thing, because people are relying on it. It’s like the most core part of their infrastructure. Just very different from something like Node, where you’re building it for a community where the web is moving so quickly; people have framework fatigue, new features are being released, async/await - people desire it yesterday, so there’s a lot of pressure and a lot of loud voices.
In the case of databases, I would argue that moving to a community process - it’s probably healthy to slow down the pace of software releases. Stagnation is very different; that means that you just stop work, right? One of the biggest problems that we faced when we wanted to ship a new release for RethinkDB was that we did not have our test and build infrastructure that we had built. We had everything to be when we were a company - racks of servers, some with like 32 cores on each machine, just working on producing builds for the database, working on testing every night, automated testing infrastructure… All of these things are really useful to have, and when you are going to rebuild as a community project, you need to rebuild all of them, because the servers just don’t physically exist anymore.
[39:56] So a lot of the effort that went into shipping 2.3.6 was letting all of the work that we needed to just catch up and then decide “Okay, now we can ship our first community release for RethinkDB.” So it’s really monumental in the amount of effort that’s gone on behind the scenes. And just to call out some people who really put a lot of work into this, I would definitely name Marshall Cottrell who’s been a long-time member of the project, and Etienne Lauren, who was a core member of the RethinkDB team and has just tirelessly continued to work on it, along with Sam Hughes and Ryan Paul. These folks have just put a lot of effort into steadily, quietly working on tidying up all of the issues that go into shipping a new release to multiple platforms for distributions. 2.3.6 also included packages for new releases of Ubuntu for Yakkety and Zesty, and unifying and just making sure that all of the bug fixes and stability improvements went into it.
We also shipped multiple release candidates to give the community time to be able to try to air it out, because we are leaning more heavily on them for the built-in test aspects of this. And as time goes on we plan to speed this up, because we’re gonna have a lot of the community processes down. But I think that moving wisely and steadily is better than moving quickly when it comes to system software.
Yeah, that’s a great answer on the other side of my question about stagnation; there was another question about instability, and perhaps not technical instability, but as the case has been over the last year with RethinkDB, at least since October 2016 when the shutdown was announced, it was uncertainty of like “What’s going to happen with this thing that we love or rely upon?” and the last thing you want in terms of instability or uncertainty is with your data store, right? So I think the silence, as the transaction was happening, the negotiation that was happening between October and February had probably a lot of people wondering what was going to happen. But since the joining of the Linux Foundation and this transition and a move into the community governance, a bit of a return to normalcy - and maybe that normalcy is a little bit slower than it was previously, but at a healthy pace - is probably providing a lot of certainty and answers for people who are either running RethinkDB in production or are considering picking it up for their next project.
Yeah. I had a lot of folks in the community ask me “Should I be worried?” when the company shut down. And quite honestly, my answer has always been very clear, that there is not a commercial entity that is behind the project, and if that affects your choice as to whether to use the database, you should factor that into your consideration. But you should also weigh it against the fact that it’s been worked on for eight years by an extremely competent and really deeply technical team, and it’s been vetted and tested in some of the most rigorous and difficult benchmarks in the industry.
The distributed systems test that was performed by Apher to test the new Raft implementation for our distributed clusters passed with flying colors after our team worked on it for a year. This is not something that goes away when the company goes away.
The other side of it is that there are a lot of teams that are still building for it. As an example, there’s some teams at IBM that are doing a lot of work on RethinkDB, despite the company background (that’s irrelevant to their projects). For example, porting it to PPC or Prezi. Those teams are just working on it in the assumption that it’s a solid database, it continues to power a lot of infrastructure on the web, and there are really no extraordinary problems people have been encountering.
Up next we get into some specific features of RethinkDB that Mike and the rest of the community are pretty excited about. We talk through the power of RethinkDB’s query language ReQL and the idea of modifier functions which let you embed ReQL natively into you programming language for the database to execute on every right operation, and also how they differ from start procedures. And finally, we talk about the future of RethinkDB, the role of CNCF in that future and how you can get involved. Stay tuned!
Mike, you’ve taken us through a little bit of the history of RethinkDB, you’ve taken us through the transition from the company running RethinkDB the project, to the community-led governance for RethinkDB… We haven’t talked much technical about the database itself, except for its merits and why so many people have come to use and rely upon it. We wanna look into the future a little bit and see where its headed now that it has this complete freedom from IP constraints, and this community-led governance. What are some technical aspects of the software that are coming down the pipeline, things that maybe you’re interested or excited about?
One of the things that I’m most excited about is a new feature that we have had in the works (it’s mostly implemented at this point), that will improve the query language by adding what we call modifier functions. Basically, for those who aren’t familiar, one of the biggest strengths of RethinkDB is that it has this query language that in most databases when you write a query, you normally will write it in something like SQL - or in the case of MongoDB, some custom JSON - and you will compose using some of their syntax, and then send this query to the database and return a response.
In RethinkDB our query language - which we call ReQL - allows you to basically write functional programming calls that embeds natively into your programming language. As an example, r.table.filter.map.reduce, and each of those (what we call) terms like “filter”, “map” and “reduce are just functions, and each function can accept things inside of it. You can do embedded functions, and what this essentially feels like is a functional programming language that’s baked into the database, and is also embedded into your native programming language, which is really useful because you can do things like use your debugging tools and it doesn’t feel like you’re switching context. And at the end of these queries you just sent off the whole command stream to the database, and then it will run in a distributed environment and return the computation back to you. This just looked very simple on the face of it, like you’re just writing inserts and gets and you’re doing very fluent commands to be able to manipulate the data. On any of those queries - or on many of those queries - you can then open a stream by just adding changes at the end.
[47:56] So the query language is really innovative; it feels really comfortable. People say that it takes a little bit of time to really grok it, to really feel like they understand that these functions are running in this environment within the cluster, but once they do, they get addicted to it and they just wanna it all the time, everywhere. I genuinely think it’s one of the best parts of the database; it’s one of the things that we’re proudest of… A bunch of programming language design geeks got together and basically built this really beautiful functional programming language that allows you to operate on data, in this distributed computation environment, and that’s really exciting.
But one of the things that we got as a consequence of building a really well-designed query language is that everytime we added a feature to the database we were able to reuse it in lots of different ways. Every single thing that we added enhanced the rest of the database. Modifier functions are like this, because it’s basically a function that is applied to each write, so everytime that you perform a write on the database, you can then use a modifier function to do something. What this allows you to do is essentially add things like schema validation… So everytime you perform a write, you can check if it matches a schema. To be able to rewrite documents or to add additional fields whenever you write something to the database, you’re able to document expirations - you can say “If this document is older than a certain date or time, then it should be deleted”, and to do things like security rules. For example, you can imagine adding a modifier function that would say “Only inserts are allowed to run. Otherwise, any type or write will be discarded.”
So basically, this is the building block that will allow us to add things that people have been asking for for a very long time, like document expiration and schema validation. We expect that, like any good piece of technology, people will use it and reuse it in ways we don’t expect, and we’ll discover all sorts of things that they’re pushing the query language to do that it just wasn’t capable of doing before. And when that happens, we can try to build porcelain commands that will very simply allow these features to be exposed and they’ll be implemented in terms of these modifier functions.
So we’re giving this toolkit out to the community to see what they’re gonna play with and what they can come up with to solve the use cases that maybe we hadn’t realized RethinkDB was capable of helping with.
So where are you authoring these functions or these modifiers? Is it inside of this tools that you’ve built, the web interface? How do you actually go about adding those to your RethinkDB database?
So basically you do it through the query language itself. You can specify that the modifier function will live on a table. We may eventually expose that in the web interface as well. If you look at a corollary, it might be the statistics that we built into the database. So we added a system table, which is essentially a table that tracks lots of details around the cluster and around the database, and it’s exposed just as a table in RethinkDB. That’s really powerful, because you could then query that table using the query language; you can open a real-time stream on any aspect of that table.
You can say, for example, “Watch for whenever a certain threshold gets hit, and then add more replicas within the cluster to react to that change.” That can all be done with RethinkDB’s query language. And we also exposed it within the web interface, and it’s just implemented in terms of RethinkDB’s queries.
That’s really powerful, and you can sort of see what I mean by building a building block and then reusing it in lots of different ways. So by allowing modifier functions to be declared on tables, you can see how people can then use that to build on top of it things within the web UI, command line tools and other RethinkDB commands.
Right, some coming from kind of an old-school RDBMS angle… To me this sounds like stored procedures, so how is this different than what I am thinking these functions are?
Stored procedures are close, because stored procedures are run whenever a command will happen, and the procedure is stored under the database. This is just RethinkDB’s version of that, and it’s expressed in terms of modifier functions, because we are a functional programming environment. But it’s very similar. You can do a lot of the things that you want to with stored procedures with these modifier functions.
[52:06] So the age-old question is like “How much logic do you put into the database and how much do you keep inside of your application code?” I’m not asking you to answer that necessarily, because I think that’s fit for the case-by-case scenario, or even developer-by-developer perhaps. But it sounds like with the way that RethinkDB’s query language works, you could be writing these modifier functions but it almost feels like your application code. Is that what you’re trying to say? Because of the style of it and the fluency of the ReQL?
Yeah, absolutely. Take expiration as an example - if you want it to be able to expire documents today within RethinkDB, what you can do is very simply open a change feed and subscribe to all documents that are older than a certain date or time, and then delete it whenever they pop up, whenever a new change appears within the table. And that is done across the database application barriers. You have to have some sort of application process watching the database, subscribe to that stream, and then just periodically deleting it. That’s how a lot of people implement expiration today; it’s totally reasonable, it works very well, but there’s some efficiency to be gained by doing that in the database, because it has more knowledge about what’s happening, and you don’t have to build as much infrastructure outside of it. I’m not arguing that you should always do things within the database - every use case is different - it’s about how much baggage you’re willing to pick up and how much work you’re willing to do.
It’s about tradeoffs, right? Options.
Exactly. The pragmatic programmer likes tradeoffs.
[laughs] YEs, and we’re all pragmatic, aren’t we?
Cool. So when is the modifier function – is that something that people can expect soon? Because as a building block, you kind of want those out there asap, so that people can start doing things you never thought of.
Yeah, they’re available in a branch on GitHub. I think there’s still a little bit of work that’s left to do on it, but there is an issue that’s tracking them right now. If folks are interested in learning more about it, they’re welcome to join our Slack channel and just ask in #Open RethinkDB, and I’d be happy to direct them to it if they wanna test that feature.
And in terms of diving into the background details too, you mentioned too that a lot of things propagate through issues, and there’s an issue for this - we’ll link in the show notes - that’s got the specification for the introduction to this proposed feature, and then 23 comments of kind of going back and forth… And this is a closed issue, so it’s something that I guess people can kind of come to and say “Let’s learn the history of this feature”, basically.
Cool. Anything else, Mike, that you’re interested in that’s in development, coming soon?
There’s a number of ports that people are working on. I mentioned that people at IBM are working on porting to PPC and a couple other environments, and there’s also a very excited group of users who are working on porting to ARM64, because they wanna be able to run on ARM environments like Raspberry Pi and things like that, which is really awesome to see; people are really diving into figuring out how to bring RethinkDB to their environment… Which at this point, it’s a pretty mature and stable project, so that’s like the question is “How do I get to use it more often and what can’t it do?”
The other thing is really just open sourcing a lot of the features that were baked into RethinkDB enterprise and other things that were in development but not finished up. One example of this is audit logging. Being able to tell all of the queries that were run on this system is something that a lot of people have asked for and were excited to be able to just drop it into the next release.
So as we look to the future of the project, one thing that we always have to think about with the future of anything is “How is it going to GET to the future?”, so we talk about sustainability. You mentioned earlier under the umbrella of the Linux Foundation - they help you guys get set up to take donations and have some infrastructure around that… Tell us about that and how people (and who) is helping financially or with developer hours or however it helps support the future of RethinkDB.
[56:02] Right now the Linux Foundation has enabled us to accept donations, and folks have really stepped up; we’ve had a number of people contribute to the project to be able to help maintain servers, keep the downloads available for the public, and hopefully in the future establish a way to maybe even hire developers to work on certain features for the project. We’ve explored ideas like open source bounty systems and other ways of organizing, through tools like Patreon.
Luckily, Stripe has been very generous and is matching up to $25,000 in donations from users. They’ve already helped us with the first set of donations that we’ve received. Every bit obviously helps, but the thing that matters more than financial support is strong C++ engineers who really want to participate in what is a mature, powerful database. It’s one of the largest projects on GitHub right now, and it’s really deep, intricate, and people generally say a very tidy and well-designed codebase.
There are some really awesome features we wanna build, especially around the real-time technology that is baked into the database, and being able to have more help from good systems engineers who really care about figuring out the future of databases would be really fantastic. Outside of that, any technical contributions that people make are always appreciated. People speak with their words, but code matters more; when you can build things that are helpful for the project, it just moves the state of the art forward.
If you wanna just support us financially, we would appreciate anything that you guys can offer.
On your site you’ve got /contribute (rethinkdb.com/contribute), you’ve got some of the things you just mentioned there in the “Become a contributor” and even a form you can fill out. I’m not really sure what this for is; I guess it’s just to say “Hey, this is me. I wanna be involved somehow”, and you’ve also got the option to donate, and you mentioned that Stripe where you work at currently, which acquired many of the engineers, has generously agreed to donate (or match) $25,000 in donations. When I read that, I was thinking “Super awesome”, number one, and then number two was “How close are we?”, so how close are we?
I definitely wanna shout out to DNSimple because we use them as well, and knowing that they support this is super cool, because we love them.
I’m kind of curious about the future of donations - is this just the start? What can people expect? A lot of people tend to either use Open Collective, or (as you mentioned) Patreon; there’s other obvious models… Is there a way that the Linux Foundation/CNCF requires you to go about donating? What’s the future of how you’ll be able to sustain financially, at least?
There are no requirements. We’ve had some companies who use RethinkDB offer to either donate a percentage of the profits that they make using Rethink, or provide some sort of recurring subscription, and the only thing that’s really stopped us from doing that so far is just wanting to get our bearings as an open source project first. So we’re excited to explore lots of options, but we also want to move steadily and carefully and be able to offer a way for people to provide financial support where we know exactly that it’s going to be going to a development fund, or hosting services, and figure out what that looks like. But we’re excited to explore a lot of different options.
[01:00:18.08] Great. You mentioned earlier the Technical Steering Committee - what are the plans in terms of that side of sustainability, I guess the community-driven part of it? Where are we at with that?
We’re doing well. We have a number of folks who have been thinking about how to structure it, but our goal and priority right now is on shipping 2.4, because we want to get that out the door, to get a lot of the technical changes that were already underway and agreed on, and then we’re going to start drafting the group that will pilot and shepherd the future for RethinkDB.
Very cool. So 2.4 roughly when? Just curious…
I don’t wanna commit to any dates right now, because this is happening on a lot of people’s personal time… [laughter]
It was sort of a tongue-in-cheek question, honestly…
If I was a company I’d be, like, sure… I’d be speaking for people who are donating their time, and I don’t wanna do that.
Gotcha. Are you planning like six months from now, this year? That’s what I’m kind of asking… Not so much like a date, but roughly when.
Yeah, it should be this year.
Okay, cool. So that kind of paints a picture - so you’ll get 2.4 out sometime this year, and at that time (maybe early next year) start to look at the Steering Committee, forming that, getting governance in place. That’s such a good pace for those listening purveying what’s happening here with RethinkDB. Very cool.
What’s some call to arms…? I guess you’ve kind of mentioned, so we don’t really have to ask you directly that question, but asking you directly, what’s some ways…?
C++ developers, get up there.
Oh yes, C++ developers…
I’m just saying, that’s the big one. Listener, if you’re out there and you’re a database or a systems person, you’ve got C or C++ skills, it sounds like that’s the best way to help out, but I’ll let Mike expand and contract on that.
Yeah, I mean one of the great things about open source is that open source always finds a way; life finds a way.
Jurassic Park line, nice one.
There you go. [laughter] Open source provides a space where people can explore the source code of a project and understand not just what it is, but how it was built, and we learn from the state of art in software development. So the codebase that’s there - it’s so much work that can be learned from, so much deep system software… Things like the coroutine engine that we built before coroutine engines were features in modern programming languages, or the serializer and the storage engine that was designed for multicore CPUs and for solid-state drives - this stuff is all really interesting, really super cool, and if people are really good systems engineers or are students and want to understand how these things are built, there’s a lot to learn from in the code standards that we used, in the solutions that we provided. Open source allows people to examine it and look at it from every single angle and to think about how they can improve it, how they can see some change they can provide or add. And even just taking one issue on the project, and just examining it, learning “Why is this an issue?” Not even thinking about maybe how to solve it, but just learning about how the issue is produced, how the system operates.
It’s just an incredible learning exercise. It really allows you to further your skills as an engineer and really learn how you can make a small change that can have drastic improvements for hundreds of thousands of people. So if you are somebody who is a systems engineer or you even just love a particular programming language and wanna think about how to bring that functional programming experience from the database level into a language that you really care about… What should RethinkDB look like for the future of Node.js? What should it look like for Go or for Haskell? Your helps is just deeply appreciated, and an opportunity to really grow as an engineer and to help a project that really helps a lot of people.
[01:04:07.15] So if someone out there is hearing this call-to-arms from you, the best place to go would be /contributor? Or what would be the best place to go and say “Hey, I’m one of these people… How can I step in?” Or do they just step into issues and start planting their ideas?
I would go to RethinkDB.com/contribute to figure out how you can become a contributor. Also, just dive into an issue. GitHub is super open, so go into an issue and say “Hey, I wanna take a look at this.” Ask for help when you need it; if you need help getting your build environment set up, there’s lots of documentation on how to do that, but we’re also very friendly on Slack, so if you wanna just come and say “Hey, I wanna tackle this issue. Can someone help me get my build environment set up?”, we’re more than happy to help because we recognize that the more people involved, the better we’ll have open source future for the project going forward.
Well, we’re seven, maybe eight(ish) years down the road since the beginning, a lot of people to thank (I’m sure) along the way, but maybe specifically to this last year, is there anybody or any particular group you should wanna give a shoutout to that’s like “Hey, without your help, we wouldn’t be where we’re at today”? I’m kind of putting you on the spot with that I’m sure, but anybody you wanna give a shoutout to?
Yes, I really wanna thank the Linux Foundation and the Cloud Native Computing Foundation in particular for really stepping up and taking action. We are deeply appreciative. They removed a whole lot of work from our plates and really enabled an open source future for the project… Namely Brian Cantrill and Dan Kohn. These two guys really did what was necessary to be able to make RethinkDB’s future secure.
And in terms of the open RethinkDB leadership team, to just name a few people - I mentioned them earlier, but Marshall Cottrell, Christina Keelan, Etienne Laurin, Chris Abrams, Ross Kukulinski, Ryan Paul, Sam Hughes, [unintelligible 01:06:00.28] and core members of the RethinkDB team that have stepped in to pitch their help in.
Aside from that, I’m really appreciative to Stripe, because they really think about RethinkDB’s open source future. The fact that there are a few developer tools companies out there, we really regard each other with great respect. It’s really funny, because Stripe and RethinkDB both were in the same Y Combinator class back in 2009, so we’ve known each other and watched each other grow up for a while, and they’ve really enabled every bit of support that they can offer, including actual donations, to be able to help RethinkDB have a stable future.
And all the folks in the Open RethinkDB channel, of which there are hundreds, hundreds of people who would not let this go even when it was hard, and have done nothing but support us and really celebrate all the work that people put into it for years. It’s the community that makes this really possible; that’s the only reason that RethinkDB is alive today, is because the community cares. [01:07:06.13] That gives me great confidence in its future, because I know that no single person has a role in its destiny, but everyone together is building a future for it that is bright… Very bright, and really quite moving. So I really extend just my heartfelt thanks to everybody who just has donated a dollar or even just said “I care about this and I want this to survive”, so thank you.
That’s awesome. I know from our perspective, we kind of feel like we’ve been on this journey with you. As we mentioned, we’ve had Slava on a couple times to kind of talk about the history of Rethink over its years, and it just made sense to have you back on to kind of cover this transition from company to open source community-led project, and its future… So I kind of feel like we’ve been on this journey with you to some degree - maybe not all the years, but at least some of the years - and I can definitely echo that thanks too, because without a thriving community behind something like this and that kind of passion, you just don’t have much. It’s about the people. The code and the product that comes from it is obviously the point, but it’s the people who really make everything happen, so that’s awesome.
Mike, thanks so much, man. It was a pleasure to have you on the show finally, and to hear your side of the story and to share the future of RethinkDB. Thank you, man.
Thanks a lot, I really appreciate it.
Our transcripts are open source on GitHub. Improvements are welcome. 💚