Changelog Interviews – Episode #461
Fauna is rethinking the database
with Evan Weaver, Co-founder and CTO at Fauna
This week we’re talking with Evan Weaver about Fauna — the database for a new generation of applications. Fauna is a transactional database delivered as a secure and scalable cloud API with native GraphQL. It’s the first implementation of its kind based on the Calvin paper as opposed to Spanner. We cover Evan’s history leading up to Fauna, deep details on the Calvin algorithm, the CAP theorem for databases, what it means for Fauna to be temporal native, applications well suited for Fauna, and what’s to come in the near future.
Featuring
Sponsors
InfluxDB – InfluxDays NA 2021 Virtual Experience (October 26-27) — InfluxDays is an event focused on the impact of time series data. Find out why time series databases are the fastest growing database segment providing real-time observability of your solutions. Get practical advice and insight from the engineers and developers behind InfluxDB, the leading time series database. Our listeners get $50 off the Hands-on Flux Training - use the code changelog21
. Learn more and register for free at influxdays.com
LaunchDarkly – Ship fast. Rest easy. Deploy code at any time, even if a feature isn’t ready to be released to your users. Wrap code in feature flags to get the safety to test new features and infrastructure in prod without impacting the wrong end users.
Teleport – Teleport Access Plane lets you access any computing resource anywhere. Engineers and security teams can unify access to SSH servers, Kubernetes clusters, web applications, and databases across all environments. Try Teleport today in the cloud, self-hosted, or open source at goteleport.com
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com
Notes & Links
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
So we’ve had a lot of conversations recently around computing on the edge. I’m not talking about IoT or smartphones necessarily, but smart CDNs…? I don’t know, you have your Lambda, you have your Netlify and Fastly functions, you have your Cloudflare workers… We have a lot of people trying to push our server-side logic as close to our users as possible, and reaping the benefits of that. Every time I’ve had that conversation, I tend to interject with “Yeah, but what about my database?” Because caching is great, and running logic at the edge of my CDN is great, but if all of my data is centralized, then aren’t I gonna be roundtripping really far away eventually anyways? And I’ve had at least three people say to me when I present them with that, they say “Fauna is working on it.” And I say “Fauna is working on it?” And they say “Yup.”
So there are people working on this… Hopefully we’ll get there. And they keep bringing up Fauna. So Evan, welcome to the show. Tell me, is this the problem that Fauna is working on, or are people misspeaking?”
They’re not misspeaking. It’s one of several problems we’re working on. At Fauna historically we’ve had the “boil the ocean” kind of attitude. The problem we’re working on is the problem of the imperfection of the operational database. Edge latency is definitely a big part of that.
Okay. The imperfection of the operational database… I love that phrase. Can you unpack it and tell us what it means?
Yeah, so the genesis of Fauna really came from me and my co-founder’s experience at Twitter. I was employee 15. I ran what we called the software infrastructure team there from 2008 through the end of 2011. And that was the early days of NoSQL. MySQL and Postgres were big. We started with a MySQL cluster… Cluster is a strong word; we started with a server, it became a cluster over time. Then we had to carry on from there, as Twitter went through hyper-growth.
[04:20] We didn’t go to Twitter as distributed systems experts, we didn’t go as DBAs. We went there as essentially Rails developers. We were frustrated that we couldn’t find any off-the-shelf data system that could meet Twitter’s needs, delivering a real-time type product at global scale to a consumer audience.
We looked at Mongo, we looked at Cassandra and invested quite a bit in Cassandra open source, and quite a few other solutions, and ended up building a whole bunch of custom stuff in-house… But we never quite got to the general-purpose data platform that we wanted to have. That dream never died for us. People who work on databases typically work on databases out of frustration and rage, not out of love for the existing tools. We felt eventually if we didn’t try to attack this problem from the ground-up, it wasn’t gonna get solved. That led to the early version of Fauna.
Fauna is essentially a database as an API, trying to get rid of everything about the operational experience, everything about the metaphor, the physical computer that interferes with your ability to access your data. One of those things is latency variants based on the physical location of the client.
So my dream as a developer is I have objects in my code - if I’m coding object-oriented - which represent data and logic… And those objects are always there, and they’re available, and I can use them, and I can set them aside and pick them back up again… And that’s it. I don’t have to write to the database, read from the database… I can grab some objects, I can throw some objects away… Is that kind of what you’re aiming at when you say “Removing the concept of a database?” Are you talking about “Only think about code that’s just ever-persistent and available to me”? Or are you saying like “Use an API client instead of a database client in my code?”
A little bit of both. The concept you’re describing is sort of the tablespace concept, Lynda being one of the very early example in the ‘80s, where you would have a giant, globally-available heap, and you could change those records. So it turns out that’s not enough in the real-world for data access. People want structure, they want constraints enforced, they want transparency as to what that data is, they want the ability to index and query it beyond the single object level. Fauna has inherited some of those concepts in particular in the way we offer serverless functions in the database itself. We would have called them stored procedures in the past; stored procedures of a host of operational and developer productivity issues. They only run on the primary node, that kind of thing.
So effectively, you can write business logic which runs collocated in the database next to the data, it has that transparent access, like you’re talking about… So sort of our goal now – and this works in Fauna today; people do it all the time, they really like it… Our goal now is to make that experience more seamless, more closer to, in particular, developing with JavaScript and GraphQL, so that you can get that world we’re talking about. You can have business logic on the client, that interacts with familiar interfaces to the data. You can also write business logic in the database that uses the same interfaces, that has better consistency and availability and scalability properties, and you can stop thinking about “I provisioned a server and I have to think about what it’s gonna do.”
[07:57] Nor do I have to think about the geographic location of said server, or any of those problems, when we talk about specifically that edge database, or that edge access layer for my data… And if I’m running a Lambda function that happens to be in Singapore, having my data collocated there - that’s also part of what Fauna is doing, right?
Yeah. Fauna offers global and regional deployments. People don’t necessarily want all their data available everywhere for compliance and performance reasons… But you know, we ensure that whatever topology you choose for your database, all your clients will access the nearest region within that topology, and have a consistent performance and correctness experience by doing so… It’s automatic, you don’t have to think about “I’m querying the primary, I’m querying the secondary”, or “I set up my shard key to make sure this data is clustered together, otherwise it can’t perform, or I can’t do a transaction”, that kind of thing.
So here’s a really simple question that probably has a really complicated answer… How do you accomplish this?
We do quite a few unique things in the database world, in particular for the transactional algorithm. We grew up a little bit on NoSQL; we were experienced with Mongo and Cassandra, and those kind of things… In that era where people said “You have to scale, because if you don’t scale, your business will die.” Well, if you’re gonna scale, you have to give up transactions, you have to give up correctness, or the ability to rely on your database to really enforce anything at all about the data, except make a best effort at replicating it sometime, somewhere.
That wasn’t really acceptable at Twitter at the time, but we had to tolerate it anyway. It’s not acceptable in particular for developer productivity, for the general application, or for higher-risk data than arguably tweets usually are. Your usual suspects - ticket reservations, banking, crypto, that kind of thing. You really wanna know if your data is accurate; especially with private data, you have to make sure that security rules are enforced in a transactional way, and that kind of thing.
That led us when we were prototyping Fauna to pick up an algorithm called Calvin. At the time - this was about four years ago - there were really only two serious algorithms available in the industry for doing multi-region strict serializability for read-write transactions… And strict serializability is the optimal level of correctness. It’s the same level of correctness you would experience if you were doing everything on a single machine sequentially, without any concurrency at all. And that’s easy to reason about for a developer, it’s easy to think about… That happens before a relationship, as long as you know there’s no overlapping in the read-write effects of those individual actions or groups of actions.
The first algorithm in the industry is the Google Spanner algorithm, and that relies on access to physical atomic clocks to sequence your transactions. Those are hard to get; not as hard as they used to be, but still not generally available. It also relies on bounded latency for accessing as atomic clocks… Because it’s not enough to have the clock if you can’t guarantee that you can check that time in a specific latency window; then you don’t really know what time it is, even though the actual source data was correct.
And also, it can be potentially slow, because for a lot of transactions you have to multiple roundtrips to multiple shards, to drop locks into the records and then clean them up once you’ve made some writes to that data.
We felt, based on our Twitter experience – Twitter was a global system; there wasn’t natural partitioning in the user base the way there was for Facebook when Facebook rolled out school by school in the early days, and you couldn’t actually communicate with people outside of your cluster. Twitter was never like that. Twitter was always global.
[11:53] We wanted a data system which would support that kind of global access and would still give you an optimal latency experience… And that led us to pick up Calvin, which came out of Dr. Abadi’s lab at Yale. Dr. Abadi is one of our advisors now. And that algorithm is a unique algorithm. It’s a single-phase transactional consistency protocol which doesn’t rely on knowing what the transactions are before it puts them in order.
There are a couple key things that have led people in the industry to kind of reject that algorithm originally. The first was that the paper was very opaque, and kind of scary. There’s a lot of sections where things are left as an exercise to the reader, there’s handwaving about putting locks everywhere, which sounds slow and error-prone. It doesn’t have the kind of brand backing of Google, who can say “We did this in production. It works.” Whether it took 20,000 engineers five years to make it work is kind of beside the point, but it happened, so people believe it. Calvin was not like that. But what Calvin offered was a system which was uniquely adapted to NoSQL. It’s harder to do SQL over Calvin. You can do it, but it has some performance implications that take extra work to work around on the part of the database vendor.
But in Calvin, if you submit the transaction as a pure function over the current state of the data, so not like begin transaction, do stuff database-side, do stuff application-side, then commit transaction. But if you submit it only as a work that can happen in the database as a single expression, then Calvin will order those in a globally partitioned and replicated log, tell you what the order is, and then apply them to the replicas, which can be anywhere, and tell you what the data is. So that order is inversed from the typical lock-based transaction system you’d find in something like Spanner or Postgres. And that means it can do everything in a single roundtrip in the data center quorum, and it gives you optimal latency effectively. All reads can happen from the closest data center without further coordination, so it’s a very good edge experience. All writes are just one roundtrip for whatever the majority of the regional cluster is.
Anybody else out there that you know has grabbed Calvin and run with it like you guys have?
Yeah, there’s Yandex… After we did our work, Yandex eventually released a system that they had built internally initially, which does SQL with a Calvin-inspired system… Then also, Facebook has an internal system which shares some similarities that also popped up somewhat concurrently, maybe a little bit after we were first publishing what we were doing; I forget the name of that system. It’s not available to the public, it’s not open source.
Well, I was googling Calvin while you were talking about it and I’ve found a nice blog post on – I think it’s called Fauna.com, called “Spanner vs. Calvin: Distributed consistency at scale” by Daniel Abadi, back in 2017. We will link that one up for people who want that comparison… Because I haven’t heard of Calvin; I’ve definitely heard of Spanner, probably because of the marketing prowess of Google, and just the fact that when Google does a thing, it gets out there and is talked about by developers all around the world.
For a while we had a serious technical marketing challenge here, because if you remember the NoSQL vendors, like Datastax and those guys, they would bang on forever about how distributed transactions were literally impossible, and you should just abandon hope of enforcing transactional consistency in your database, and you just need to make the application detect when the data is corrupt, and make a best effort to clean it up. And if you lose some transactions, who really cares? It was only a few transactions; hopefully they weren’t big ones… But that’s your problem now. Now as the party moved on from Datastax, from Mongo, from Couchbase, from many others.
[15:56] At the same time, you have the Postgres crowd, you have the Redis crowd saying “Well, you don’t need scale. Just get a really big server, do everything with a really big lock. Locks will get faster over time. Moore’s Law will never end.” It did end, but putting that aside, it theoretically could start up again, and you can get more and more hardware… And if it goes down, the down time isn’t too bad, and it’s all worth it for the cost of transactions… And that meant when we first started publishing what we were working on, people didn’t believe it. Even Google had some challenges to overcome with some of their papers about Spanner, about Chubby, and some of these underlying strongly-consistent systems where people said “Well, the CAP theorem says you can’t do that. I don’t need to read this, because you’re trying to do something impossible. This is a perpetual motion machine”, that kind of attitude.
So Google sort of paved the way and convinced people with specialized hardware and a ton of gruntwork you could actually get something which was better than the primary/secondary replication system for transactional data. But then we had to extend that and prove, through our blogs, through the Jepsen report, that kind of thing, that Calvin actually works in an industrial context, and that you can do better than Spanner, you can do better than these multi-phase commits, you can do better than the hardware dependencies, and kind of get the best of both worlds in terms of the NoSQL experience of scale in the RDBMS experience of transactional consistency.
For the uninitiated, can you break down what the CAP theorem is?
So the CAP theorem says that you can’t have consistency, availability or partition tolerance all at the same time. Basically, you’ve got a bunch of nodes on the network, you want them to be perfectly synchronized - well, if you lose your network link, then either they become unsynchronized because they can’t replicate to each other, or they become unavailable because they refuse to accept transactions, because they know they can’t replicate to each other.
The P is kind of weird, because you can’t be partition-tolerant. Saying you’re partition-tolerant means you have a perfect network that can never partition, which is not the real world. But people read this as a theory; it’s in the name, the CAP theorem. It’s not a physical limitation on how data can replicate… And in particular, if you have enough machines on wide enough distributed commodity hardware, with enough replication links between them and enough algorithmic sophistication to handle those faults, you can effectively approach now something which feels like a fully CAP-compliant system.
So probably the best way to describe Fauna is big C, small A, big P. So we’ll never give up consistency; worse come to worst, consistency will be maintained, while availability is sacrificed. So in practice, availability is essentially never sacrificed, because the algorithms are fault-tolerant. They can route to other nodes, and that kind of thing. And when you have the client routing to the closest nodes and regions all the time, and they’re doing the same thing; you can actually dodge the typical network partitions you have in the modern, hyper-scale public cloud.
That’s interesting, you have kind of two angles of routing around, from two perspectives. When you combine those from the client-side routing, like the network routing of the client to the database network side, you’re saying that you basically can just minimize those to where it’s rarely a problem.
Yeah. A key thing about making this work is making sure that every step of the communication process knows how correct it is. One of the unique things about Fauna is that Fauna is natively temporal. So all data has a versioned timestamp, and you can look back in history for audit purposes, or show a chat history, or that kind of thing. That also means any query from any client knows how fresh that query was. Transactions, when they’re being written, they know how fresh their dependent data was, and at every step you can check and make sure that you’re not trying to do something which could fail if fresher data potentially existed. That means you’re never wondering, “Do I have an up-to-date view or not?” You know how up-to-date it is and you know whether you can rely on it.
[20:16] Any chance you’re a Silicon Valley fan, Evan? The show Silicon Valley…
I saw a couple of episodes… It was pretty close to home in terms of my Twitter experience, so I don’t think I’ve found it as humorous as others… There were some early Twitter engineers who can summon a –
A little painful to watch?
Jerod loves it when I bring it up, because he’s not a big fan, or he hasn’t gone (I guess) all the seasons, and I – it’s just so close to home, really… Because this is a big thing that they did, they solved for at least, with this algorithm that was essentially the plot theory of the whole show… But they never envisioned an internet where you can have so many devices essentially to take that P part of it, to make it possible… Because you just have so much non-latency between devices, and this possibility… So I’m just curious if you saw that, because that was a big thing they solved for there - they never envisioned where you would have this many connected edge devices, in this case… Smartphones in everybody’s pocket, with data, that were ten times more powerful than the computer that took us to the moon, for example. This amount of computing in everybody’s pocket, with data, with network, globally. That seems like, with the partition part, the P of the CAP theorem, is the big part of it; if we can minimize that latency between so many nodes in such a big network, then you open up a world of possibilities, essentially.
Yeah, that’s accurate. It’s not enough to say the database is available or not either. It’s a much more nuanced real-world question, like “Is it fast enough? What level of correctness did you explicitly request? In what period of time do you want your data to be searched from? That kind of thing.
The history of operational databases, I think, is a little – like, database development lags other infrastructure software development, because it’s harder. It’s one thing if your compute node or something craps out and you have to start up a new one. You lost a couple of requests, but that’s basically it. If your data is unavailable, if your data is corrupt, which is even worse, that has permanent impacts on the health of the business, on the customer experience, that kind of thing. And making systems that are reliable is very, very difficult.
So what we ended up with was – you know, the RDBMS is basically designed to be put in a closet, in a physical office building and accessed from PCs. That was sort of the Microsoft Access model, the SQL server model, the Oracle model. You’d run these rich clients on desktops, which would have a relatively reliable network; and if it wasn’t reliable, you could walk down the hall and pester somebody to go make it reliable on your behalf, and then your problem would be fixed. Well, that’s not the world anymore, and we’ve tried to extend these systems which were designed for much smaller deployments, with physically accessible, low-latency links between them, for a cloud world with, like you said, people on smartphones, in cars, all kinds of crazy smart devices accessing –
Refrigerators even. My washing machine in the other room has got Wi-Fi access, you know? I’ve got it on a WLAN, of course, because I don’t want anybody hacking my house through my LG model L – I’m just kidding around. Giving access key essentially to my network, but… It’s on a WLAN, but the point is, you’ve got edge devices everywhere.
Yeah, and they move. Think about a corporate deployment in like a store, or something… Like Hertz. Hertz knows which cars and which team members, and who’s running from which site, most of the time; the data doesn’t move around at high frequency, but when you have people playing mobile games, and doing social media stuff, or even using Salesforce, they’re flying all over the place, and they want their data to be quick and correct from anywhere. That was sort of the great unsolved problem of operational databases until very recently… But there are a lot of other problems that came along with that kind of legacy, physical deployment model. We lifted and shifted it to the cloud, and you’ve got your VM instead of your physical HP, big iron improved a bunch of things. You didn’t have to go to steak dinners to buy your titanium chip anymore; you could deploy something immediately by clicking a button…
[24:26] But you still have to think about what it is, like, “How much capacity do I need?” And no one knows how much capacity they need. So you either provision way too much and you pay for all this wasted capacity, wasted resources, you literally waste electricity keeping those things on… Or you don’t deploy enough, and then you have some kind of event, positive or negative, that damages the experience of people using your product, whether it’s an in-house IT thing or something for the consumer, the public…
All these problems are problems of the metaphor, the physical machine. And if you use something like Stripe, for example – you know, you never think about “Which Stripe node am I gonna deploy, so I can accept credit cards?” The concept doesn’t make sense. And we want that concept to disappear for data too, so that it stops making sense to think about where a physical piece of data is linked to a physical machine.
So when you say Fauna is an API - I think we all at this point know what that means in terms of how I’m then using it. It also has a database layer. It brings me a little bit of apprehension, because it’s kind of like, I get access to an API, and then I get my access removed to an API… And my data is precious, and sometimes it’s my business. It is the business, in some cases, at the end of the day… So it kind of gives me a little bit of the apprehension of like, “Well, if that API goes away, for whatever reason, my database is gone. Same thing with Stripe, right? So these are things that we have to deal with as developers and as decision-makers, of what makes sense for our circumstances…
But open up with Fauna is an API, unpack it some more, and then let us know what that all means. What does that end up meaning for me as a user?
Yeah, the API experience really is the web experience. This idea that you can use standard interfaces to access data from anywhere. You don’t care where you are, and you don’t care where the server you’re talking to is. You don’t have to go over a secure link, you don’t have to be within a special network parameter; you don’t have to go get your special Lotus Notes credential and install a special app, and use a special protocol. It’s the web; that’s what makes the web interoperable. It’s what makes it ubiquitous, it’s what makes it so productive, both to develop with and to consume, for SaaS and consumer products.
[27:51] We want the database to be just like using any other web service. You’re right, that comes with some downsides; in particular, operational transparency is not a given when you’re not deploying your own server. You don’t have any administrative access to the underlying hardware, you can’t go inspect the VM, you can’t go back up the physical bits on your own. That means it behooves us to build that transparency back into the system to give you access to resource consumption metrics, to give you access to a backup system where you can make your own decisions about restoring your data, to give you access to the history of your data, over the temporality features often useful, and also to give you performance transparency into how things are operating… Because do you really care that much how Stripe performs? Like, a little bit… You don’t want it to take five seconds to check out; but if it’s 10 milliseconds or 100 milliseconds, that’s probably not a huge deal for you. If your database performance is that variable, that is a problem.
Right.
So it means we have a very high bar as a cloud operator, because we developed the software and we operate it ourselves. Fauna is hosted on AWS and GCP right now, and we’ll go to other cloud providers shortly.
We’ve taken that burden off you, and it’s part of the value you get as a customer, but we also want to make sure we don’t eliminate the transparency there that you would get from a managed cloud solution, or something you’re physically operating yourself.
Yeah, I noticed you have an entire subdomain on the website called trust.fauna.com, and I think that’s really what it comes down to, right? I’m kind of having a looser hold over things that are historically precious to me, or held tight, in terms of, like you said, operational transparency. I used to be able to walk over to the server and pop in a new drive, pop that drive out, pop this one in, or run that backup script manually, or whatever we used to do… And I don’t want to do those things, and I don’t want to have to do those things, but I do not know if I can trust not being able to do those things.
So like you said, you have a very high burden of trust, and like you said, it behooves you guys to be as transparent as possible. It looks like you have all sorts of white papers, and reports, and compliance things… And the entire goal, I’m assuming, of that subdomain and of these efforts is like, a) being trustworthy, and then b) proving to us that you are trustworthy.
Right. Yeah, you don’t wanna do the work anymore, but you still wanna supervise that work. If you pop the drive out and it’s the wrong drive - well, that’s your own fault. Honestly, that feels a little bit better than if someone else popped out the wrong drive on your behalf. [laughs]
Yeah, exactly.
Like, “What the heck? Who are these idiots?”
Yeah. Like, I could blame myself, but – yeah, somebody else is way worse.
Right. And then you’re like, “I learned something. It was all worth it.” Well, we’ll see. We need to make it possible to supervise that operation to understand how everything works. You need to understand it so you can communicate effectively with our own support team if you’re having an issue, and that kind of thing. So it’s not the same level of opacity as you might get from something which is lower risk, potentially lower value, like, you know, a domain-focused, vertically integrated API… Because the way – we’re sort of going through a series of transitions, as we usually do in the industry. We had all on-prem everything; even for payments, you would go get your physical payment block, and you’d sign a contract with Braintree for thousands of dollars, and you’d be able to take some credit cards… And then things like Stripe and Square moved that into APIs. The same way for software infrastructure - we had on-prem deployments. Then we got managed cloud, then we moved to more dynamic managed cloud, with VMs and fast provisioning, and even more dynamic with containers, and now we’re moving into purely API-driven infrastructure solutions where there’s no physicality at all… But databases lag, because they’re harder.
So we’re in transition… Especially, you see it with things in the serverless space; you mentioned Netlify… Netlify and Vercel, their deployment and hosting systems. A lot of the work that edge providers are doing with eventually consistent data and caching, and that kind of thing. Lambda, and moving to more dynamic interfaces which aren’t based on Posix anymore, like WebAssembly - all of that is driving us to this world where physicality and the computer metaphor doesn’t matter anymore. We need to get databases there too, but like you’re saying, you can’t get there without trust.
[32:33] Yeah, you say no until you say yes, basically, and that’s where trust comes in. You mentioned before around technical marketing, you had some challenges… In especially the outlook on database, you have to think so far down the road to do what you’re doing with Fauna, because you’re sort of like by-passing a lot of things even… Like, turning this into an API, it turns off a particular developer – there’s some apprehension, as Jerod had said… What are the challenges you see currently then, technical-wise, that take that from a hard no to maybe even a yes for developers listening to this show? What is it that makes people trust that you can solve this problem? Calvin as the algorithm; it’s newer, or newer-known, so it’s – you say no until you can say yes, and that yes comes from trust… So how do developers begin to trust that you’ve solved this problem?
Yes, choosing to adapt Fauna or something like it - there’s really nothing else like it. But imagine some day there is – you know, choosing to develop a system which is that novel really comes down to two things; it comes down to that trust, it comes down to having an implementation and architecture, transparency about that architecture, and the feature set in particular, the security model, which is pretty unique in Fauna, that makes it safe to access data, either in a secure or insecure environment like the web… And it also comes down to usability and adoptability. It still has to fit into the toolchain you currently use, and the toolchain you wanna adopt. That’s where Fauna features like the GraphQL interface in particular come in, so that – you know, we’re not a SQL database, but we can still be familiar and approachable to people in particular who know GraphQL and JavaScript.
Yeah, I was just gonna ask about the interface itself, because like you said, you’re not doing SQL, and you come from a NoSQL kind of roots, from the founding team… So what is the API? Is it Mongo-esque, is it a brand new thing?
So we offer a GraphQL API which is compliant with most of the GraphQL standard. It lets you get up and running very quickly. And GraphQL is great, but it’s also incomplete as a query language. It doesn’t support mutations directly. It’s about composing existing services and datasets on the read side, and kind of mixing and matching the response you wanna get on a smart client, like a dynamic SPA on the web or a mobile device.
We also offer our proprietary language called FQL, which is a functionally-oriented query language which lets you write very complex business logic and mutation statements, and has a rich standard library. For that, there are DSL for JavaScript, and C#, and Java, and Scala, and Go, and Python, and what have you, that make it easy to compose those more sophisticated queries within your application, or to attach them to GraphQL resolvers as functions, as stored procedures that let you expose them over a GraphQL API.
Yeah, I’m looking at this FQL, and this is very much in the vein of what SQL folk would at least look at… I mean, select all, select where, select where not, alter table, truncate table… At least at the very outset, it seems familiar, even though it’s its own thing and proprietary.
At its heart, Fauna is a document relational database, so the relational concepts are there. You have foreign keys, you have unique indexes, you have constraints, you have views, you have stored procedures… But like you said earlier on the podcast, you’re developing an app, you’re probably doing it in an object-oriented way… We wanna support that object-oriented way directly, without forcing you to go through an ORM or something else that translates that to a tabular model that isn’t actually what you want.
[36:24] So what are some perfect use cases for this, if you were to describe either an application that we all know, or a business, or even - you could make one up if you like, where it’s like, “These people should be using Fauna, and here’s why” or “This is using Fauna, and here’s why”, or “I would use Fauna to build this, and here’s why.” You can’t say everybody… [laughter]
Not everybody. Obviously, a lot of Fauna’s features were inspired by things that we wanted to have at Twitter, and we’re forced to develop and forgo on our own. Fauna is really design for the modern web 2.0+ application world. With SaaS, in particular - I would say the majority of our customers are building some kind of SaaS app, with a business purpose. Or consumer-oriented applications. And then I think that the third category, which somewhat overlaps with the first, or blockchain-adjacent applications, things that use crypto for public transactional purposes, but also store additional data for application purposes.
The thing that these all have in common is that – you know, there’s a wide variety of customers and people interacting with datasets that interact with it in a soft, real-time way; they interact with it from the web, from mobile applications… You know, it’s all the apps you use today.
What we don’t do is analytics. We’re not a olap database, we’re not a data warehouse. We’re not a cache for some other database. The transactional consistency does have a cost in throughput and latency, so if all you want is a cache, you should go get memcache or something like that. We’re not an information retrieval system; we don’t replace Elasticsearch. We’re not a queue. You can go get Kafka or something else like that for those purposes. It’s really sort of the dream of MySQL, like - we wanna be to the serverless era, and JAMstack, and kind of the API infrastructure era the same way MySQL was to the web 1.0 era, where this is a general-purpose operational data platform. It’s very easy to use, it’s very easy to adopt… No startup costs, develop on your laptop. It does a very good job; I mean, we can argue about whether MySQL did a good job, but it was a better job than others at the time, because it existed. It does a very good job at that core, short request transactional user data, mission-critical data, constrained indexed use cases… And then it does a decent job at everything else you need to build a fully-featured application, so that you can get started without having to have a whole bunch of tools all mixed up in your tool chain.
We fundamentally don’t really believe in the classic polyglot persistence attitude where you pick the best tool for every single kind of query pattern you might have in your app. Databases are heavy pieces of infrastructure. It’s hard to move data around. You don’t wanna have too many of them. So the more general-purpose that it can be, the less you have to use. We do have an advantage in the cloud though, that we can connect and integrate more easily with adjacent systems in a way that takes the integration burden off the user. So that’s one of the things we’re working on going forward, making it seamless to link up to the analytics database you wanna use, the queue you wanna use, and that kind of thing.
So Evan, it seems, based upon your resume, that you’ve been doing this for a while. You mentioned your time at Twitter, employee 15, 2008 to 2011… I’m a LinkedIn stalker, I do it quite well, so I saw that Fauna Research was there, May 2012, which was obviously just after Twitter, to January 2016… And I think we’ve talked to many people like you who have solved big problems like this and they began with pain… So probably pain is better you mentioned how you couldn’t solve all these problems there… And then I’m curious what Fauna Research was, what that timeframe represents, and how you got to where you’re now, given – it just seems like you’ve been working on this problem for a very long time. Is that true?
That is true. Most of my – although I studied bioinformatics in grad school and I worked on gene orthologs in chickens, most of my career (really all my career) has been working around problems in the data systems.
After grad school I worked at SAP briefly, and then I worked at CNetwork and I did chow.com and urbanbaby.com. And Urban Baby was a threaded, asynchronous, real-time chat for moms, which has a lot of similarities with Twitter, if you stop limiting the audience only to moms…
[laughs]
It was hard to scale that on MySQL, and then Twitter was also scaling on MySQL, and we saw the problems in a number of ways there. After Twitter, we weren’t sure. Me and my co-founder, Matt Freels, we wanted to start a company, but we weren’t really sure what we wanted to build… So we did consolidate for almost four years in the data space. I had two kids, my co-founder had another kid, so just kind of low and slow, just exploring the market. We didn’t raise venture capital, we didn’t move into product development until 2016… But we kept our eye on what people were doing and we saw that everyone was running a half dozen different databases at single-digit resource utilization, struggling to integrate through data, struggling to scale things up and down, struggling to keep their data consistent… Having the same kind of problems we had at Twitter.
[44:07] Did they need a purpose-built social graph that could do millions of requests per second? Probably not. So commercializing this stuff we had literally built at Twitter didn’t make a lot of sense, but we started to get this idea of a better data platform and a better data system…
And I think one of the things which is a little bit unique about Fauna - there are more deep tech startups now… The last couple of years have changed things in terms of the funding market for companies that are based on real hardcore technology, and focused on solving those problems first before bringing them to market… But 4-5 years ago it was rare to be looking for venture capital for a deep tech infrastructure company. People believed Amazon had solved every problem that the market would ever want, and the only thing to do is business model innovation, and if you were really good at marketing then it didn’t matter how good your code was… Sort of the Mongo or the Redis story, that kind of thing.
Luckily, we got funded early on, and we got the time to invest in solving these problems that remained unsolved, foundational problems in computer science like the distributed consistency problem, and also the opportunity to bring it to market. That also meant – you know, we were a little too early to market. When we first launched the serverless product as an alpha in 2017, people were scared of serverless. Lambda had just come out; there were no other serverless data systems.
The idea that you’d access a data API that scaled on your behalf without your intervention, without you having to go twiddle knobs and that kind of thing was weird. People didn’t want it. So it took us some time to both mature the technology and figure out how to go to market, wait for the market to be ready for us… Now serverless is big, JAMstack is big, people were becoming familiar with these development models… And the vision for Fauna has never changed. What has changed is the market readiness.
Yeah, it’s like a perfect-ish storm. To just touch on your funding… 2016 was your seed round, 2017 it seems like, at least based on Crunchbase data - if this isn’t accurate, you can tell me and I’ll go back and edit it and make it correct if it’s not… But early 2017 was a series A; another series A I guess in 2020… That would be the series B technically, right? Wouldn’t that be that? But it seems like almost six million in funding to get to where you’re at right now… And even what you said too with the funding models and the capital available for a deep tech company like yourself, it seems like now it’s available, it’s becoming more and more common… The market is matured to the needs that you’re bringing to market, so it seems like a perfect(ish) storm for you to be where you’re at right now.
Yeah, I think that’s true, in particular Mongo, and then Elastic, and then Snowflake really changed things in terms of the capital markets appetite for doing real deep tech infrastructure software. We’ve raised 16 million dollars, we’ve brought on Bob Muglia as chairman at the end of 2019, we’ve brought on Eric Berg as the new CEO replacing me last year… We got more professional management around the table, so my co-founder and I could focus on technical problems… Because I was always just the least technical co-founder. I was never really the business guy, so to speak…
Oh…
And we were kind of surprised… There were a few other companies that set out to solve this same problem around the same time, in particular Cockroach and Yugabyte. But they also found – a different technology than us, but they also found the market wasn’t really ready for this kind of interface model. But their solution was to fall back and build another SQL database. And that’s fine, I guess; there’s 30 cloud PostgreSQL things you can use, and it’s hard to differentiate among them… But if you can carve out a niche, you can make a business there.
We didn’t wanna replicate those old interfaces; we really wanted to build an interface which was for where the world was headed, where the new stack was being built; for people who were building this dynamic edge and mobile and SPA browser applications…
[48:11] In particular, we fit well with blockchain and crypto stuff. There’s no commitment to SQL in that world. People are looking for the newer, better language, the newer interaction model, and these kinds of things. It’s easy to adopt Fauna if you already use GraphQL in your organization in particular, because we offer a native GraphQL interface.
I think one mistake people sometimes make is they’re like, “Well, how do I migrate my existing Postgres cluster to Fauna?” You can if you really want, but it’s not what we’re really designed to be. We’re designed for new workloads and for augmenting the existing systems. So if you have a big Postgres cluster, whether it’s in the cloud or whatever, leave it alone. That’s okay. If it works, great. But maybe you don’t wanna continue investing in it, you don’t wanna run the risk of altering tables, you don’t wanna deal with provisioning more hardware for new use cases and that kind of thing. Use GraphQL in a federated system, like Apollo, to augment that with Fauna, and put your new data, your new applications, your new product features in the new stack, and let the old stuff alone. That’s more the Fauna model.
You know, the fair line in the sand to draw, especially considering you’re taking on large problems, and the difficulty of providing migration paths for the older technologies, or adaptors, or whatever it would be, will take you off of where you’re trying to go with Fauna, and saying “Yeah, well, we’re for new things, and you could slowly adopt us by doing the set of things that you could do to keep that thing running, and augment, and put new stuff here… Maybe slowly transition… Maybe never completely transition off of it, but have brown (is that what you call them?) brown path…
Green field, brown field…
Green field, yeah.
Leave your brown field alone, and here’s your green field over here, and it’s built on Fauna. I think that’s a decent place to position yourself. But what about all the free offerings out there? So Fauna maybe has some free as beer for a little while… I actually didn’t look at your pricing page yet, so let me know exactly how that works. But free as in freedom is also important, and transparency… To some folks, Postgres is a different kind of database, but definitely the same in terms of trying to be your primary general purpose data store that Fauna wants to be; not built anywhere near the same way… But the price tag there is zero. There’s a lot of that out there… And for databases as a service, or data APIs as a service, as it grows and becomes productive and useful, it’s gonna cost you. So how do you compete with free?
So we are also free. In fact, we can be more free than other cloud database vendors, in particular like the – the Fauna architecture, it’s a true API, it’s multi-tenant, the isolation is internal to the database kernel… So you’re never paying for idle capacity.
With most cloud databases you sign up and you get like a 30-day trial, because they deployed some container or VM and it’s costing them $100/month. That means some salesperson looked at your email, decided you were worth spending $300 on a database date and you get your 90 days. Then after that, they call you and they’re like, “Do you wanna pay or do you wanna go away?”
Fauna - we have no fixed costs for a new user. We only have to pay for resources actually consumed. So anyone can sign up for Fauna for a free forever database. You don’t need a credit card, or anything. Then if you start to scale, then you can start to pay for it by the resources consumed. So the actual economics of it are much better, both for us as the vendor and for the customer, than your typical managed or containerized deployment of any kind.
[52:06] I think some people do care about open source. Fauna is not open source, it’s only cloud; it’s only proprietary. But the majority of the market, in our experience, cares about the interface more than they care about the codebase. The number of people who are gonna crack open their database and make some fix is very, very small. Most people treat databases as some kind of artifact that’s handed down from the gods… Opening it up and changing the implementation is the last thing you would ever try to do. You’ll exhaust every other opportunity to fix your problem before you would take that risk.
Did you all patch MySQL at Twitter?
Oh, yeah… [laughs]
Yeah, I’m sure you did. So at a certain scale, those people do exist. But I agree, they’re on the margins. But they’re also big customers, right? I mean, Twitter would have been a huge customer for Fauna.
Twitter would be a great customer, but in reality, that classic company is nobody’s customer. Google, Twitter, Facebook, Salesforce, LinkedIn, whatever - they all have the capacity to build completely custom databases in-house if need be. So you don’t get the kind of vendor-customer relationship you do with someone who really needs you, which is the vast majority of the market. And I don’t mean that in like a leverage – not in like an oracle way, like “Oh, you’re stuck with us now, so now you have to buy us a yacht.” Just like a collaborative way where you’re working together to solve the problem in the platform together. And I think a lot of people aspire – they want their companies to grow and be a Twitter, but if you get to that point, you’re gonna be building custom stuff most of the time anyway. We’re designed for the typical company. The small team which is trying to get something to market quickly, the mid-market company which has a lot of products that they need to extend and augment, the large company which may have an internal system which is custom, but it’s also building other apps like IT apps, or new projects, and they need something which is easier to deploy in particular.
We see a lot of usage where there is kind of a classic IT organization – this is similar to the Mongo story, in some ways… There’s a classic IT organization, they have the official way to do things, but it takes a lot of work; file your JIRA tickets, get your machine provisioned, you have to justify everything to everyone, and there’s no place to run experiments in that world. If you wanna build something new quickly, you get off-the-shelf tools, and we’re trying to be the most usable and fastest to market off the shelf tool for people building modern applications.
Did you evaluate the open source nature of it? There’s some companies who’ll use it as a – I guess community adoption, there’s a lot of things around there. You’d said that nobody is gonna crack open their database codebase and start wielding it… Jerod used Twitter as an example - sure, at that scale you probably would… But did you evaluate the goodness you get from being open source? Like, the public good almost; the commons that people talk about and refer to often. Did you evaluate that? Because you’ve gotta think about it from a business standpoint, right? You’re building a business primarily; you’re not necessarily building an open source product, you’re trying to build a business. So when you evaluate that, you think “Well, could some of this or should some of this be open source, 1) as market leverage, and 2) maybe developer adoption?” But if you can do the free and no cost to you and just have it simply metered, maybe that’s the best of both worlds, but did you evaluate the criteria of open source deeply and just simply say it wasn’t required to build the company you wanna build?
We did, and we continue to evaluate it… Because the market changes, and what people need changes. We decided what people want are the benefits – there’s a certain section of the market which is religious about being open source. Like, fine. Those people don’t use Amazon either. They might use hardware and deploy Postgres to it, but they’re not using Aurora, they’re not using Dynamo, they’re not using Azure, Cosmos DB, and Microsoft’s cloud. And those are all effectively proprietary systems.
Yeah.
[56:06] The benefits you get from open source, the things that really made it take off, especially in the ‘90s with LAMP, where it was free to try, and it fit with the rest of your development environment. And that fit really means standard interfaces. So the things we value about open source that we try to replicate are that free to try experience, the local development experience; we do have a Docker image you can run a single node copy of Fauna on your own machine, to develop against it without having to deal with the cloud… And the interface standardization, which we’re working to improve, both in GraphQL and in FQL, and most likely eventually other query languages, too. If you have that, who cares if you have the exact same code your cloud provider is running? You have code you can run locally if you need a local edition. You have interfaces that people are familiar with and understand, and because of the unique architecture and so on, you have economic benefits from the deployment model, and the vendor pricing, and so on.
Good answer. Anything else, Adam?
Not necessarily… I can somewhat agree with you. I mean, there’s no real benefit, because you have to think about what you’re optimizing for as a company; what you’re optimizing for as a company is to build a successful product, a successful database that solves the technical challenges first, not the “must be open source” challenges as well. It’s just very common for a database, because of security and different things involved in it, whether you want contributions or you think it’s viable for people to see the code or not, it seems to be “the way”, even if you become source available, like with SSPL or business source license that the code is visible. I was just curious about how that shifted for you and how that played out for you.
Yeah, there’s a slow trend away from that model. The thing that people really hang on to is the sense of transparency and trust they get from the open sourceness nature of the database, the portability, the idea that you can switch from one vendor to another is important to a lot of people. In reality, switching an operational database is painful no matter what, even if in theory – you know, going from one version of Postgres to another version has problems, let alone going to a completely different implementation, or from one cloud vendor to another. Everything people use in AWS RDS or Aurora - it’s heavily customized, to the point of being unrecognizable at this point, compared to the open source editions of the database… But people definitely still value those aspects of the open source experience, and occasionally they ask for it…
But I think as we move in particular to a wider variety of cloud databases composed around these standard and proprietary interfaces in particular, one of the things that we’re working on that we’re excited about is better ability to query the same data from different query languages in the same database. At that point you don’t care so much whether that particular implementation is open source, and we’ve found people value the time to market, the operational experience, the pricing and cost benefits, the unique capabilities a lot more than they value being able to fix their own bug, or being able to put the source code in a vault in case some data vendor goes away.
What do you think the biggest challenge is for you right now, given the place of the market? Business-wise, even though you’re CTO now and you’ve hired for a CEO, despite that - not saying you can’t play a role in those by any means, but… Talk about future funding… What’s the biggest challenge you face technically or business-wise right now?
[59:49] I think at this point the biggest challenge is really keeping up with our customers. Building a database is a slow process. It’s not that kind of slapdash code development you would typically see at an early stage startup. But that doesn’t mean the market goes slower to match you. We have tons of growth on the platform, lots of people pushing it in new and unique ways, and they also want a lot of new capabilities, stuff that’s been on our roadmap for a long time that we still have to deliver.
There’s no one else really doing what we’re doing in the market, and that means the bottleneck to our growth, to satisfying our customers, to giving everyone a better Fauna experience is really us and our ability to execute on the vision that we laid out several years ago.
So it’s time to accelerate, basically…
Yeah, exactly.
What’s on the horizon then? You mentioned your customers are not slowing down by any means; that means you have to move faster to keep up, even though you can’t move fast, because – or not so much can’t, but it’s not by nature the way you build a database… What’s on the near horizon? You mentioned some features that are specific, that customers want. What’s on the six month roadmap that might be coming to fruition sometime soon? Give us a tease of what’s the future like.
Our focus is twofold. It’s on maturity and resiliency for customers who are already successful on Fauna. We launched the Region Groups feature earlier this year; more region groups launching… We’ll have a better backup and restore capability, that’s more under direct user control, that kind of thing. More compliance for different regulated industries… We did GDPR, we did SOC, and there’ll be other ones coming. So the kind of things that help you grow once you’re at scale, and like you said, have trust in the database.
Then at the same time, the other area of focus is really the adoptability. Making FQL easier to use, making GraphQL more standards-compliant, eventually building other popular query standards on top of the same database kernel. Making sure that Fauna is always the easiest thing, both operationally and in terms of the development experience to build your new application or your new feature on.
Is there anything that we haven’t asked you that you’re like “Man, I really just wish they would ask me about these things”? You’re speaking to a developer audience, potentially future customers, or at least curious about what you’ll solve in the future. They’ll pay attention… Is there anything we didn’t ask you that you wanna close on?
Yeah. We talked a lot about trust, like “the databases, this scary thing that can never be changed.” There’s no risk to trying it out. So I’d just encourage people to go to the website and click the sign-up button. Database provisioning is instantaneous. You can go through the tutorial, play around with the GraphQL and the FQL interface and see if you like it, and give us feedback if you don’t.
You’ve mentioned the Free Forever before… So you’ve got a Free Forever monthly plan, you’d mentioned the Docker image you could use locally… Does that Docker image locally require a sign-up, or is that something you can just pull down from Docker Hub, or whatever?
It’s not an authenticated package. You can get it and run it.
Okay. You can try without signing up if you want to then, through the Docker image.
You can. It’s just the email, where you can use GitHub or Vercel or Netlify identity to sign up as well.
Cool. Evan, thanks for the deep-dive into all things Fauna. We really appreciate these technical deep-dives. Going back to the white paper, Dr. Abadi that you’ve mentioned as a board member for you… We’ll link up the blog post that we’ve kind of referenced to some degree in this call here, in our show notes. The Trust page, of course, and any other links we can think of that make sense… But Evan, thank you for your time. It’s been awesome.
You’re welcome. Great to be on the show, great to meet you.
Our transcripts are open source on GitHub. Improvements are welcome. 💚