Go Time – Episode #103

All about caching

with Manish Jain & Karl McGuire

Guests

All Episodes

Manish Jain and Karl McGuire of Dgraph join Johnny and Jon to discuss caching in Go. What are caches, hit rates, admission policies, and why do they matter? How can you get started using a cache in your applications?

Featuring

Sponsors

KubeCon + CloudNativeCon – The Cloud Native Computing Foundation’s flagship Kubernetes community conference which gathers adopters and technologists from leading open source and cloud native communities. Learn more and register — get 10% off with the code KCNACHANGELOG19 Feel free to use the Convince Your Boss letter in part or in full so you can your team can attend.

TeamCity by JetBrains – Build and release your software faster with TeamCity — a self-hosted continuous integration and delivery server developed by JetBrains. TeamCity is super-smart at running incremental builds, reusing artifacts, and building only what needs to be built, which can save over 30% of the daily build time. Learn more at teamcity.com/gotime.

LinodeOur cloud server of choice. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019. Start your server - head to linode.com/changelog.

FastlyOur bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.

Notes & Links

Edit on GitHub

  • ristretto - a high performance open source Go cache
  • caffeine - a high performance caching library for Java that was part inspiration for ristretto.
  • TinyLFU - a paper discussing a highly efficient cache admission policy adopted in many modern high performance caches
  • BP-Wrapper - a paper discussing a way to improve lock contention for caches and databases

Transcript

Edit on GitHub

Hello, everybody! Welcome to Go Time. I’m here with Manish Jain [Jane], or Jain [Jean]…

Manish Jain [Jane].

Jain [Jane], sorry. And then I’m also here with Karl McGuire. Karl, do you wanna say hi?

Hey, everyone.

And Johnny Boursiquot.

Hello there! Good to be back.

And I am Jon Calhoun. Today we’re gonna be talking about caching. We just wanna talk a little bit about what it is, to start, why it’s useful, that sort of thing. And then Manish and Karl are both from Dgraph, and they’ve recently released a caching library, I believe… Is it a library?

Yes, it is a library.

Yes. So they released a caching library that we wanna talk about a little bit, so you guys can learn a little bit about what they learned building it, why they built it, what problems it solves, that sort of thing. Okay, so to get started, do you guys wanna tell us - or anybody, I guess - talk about what caching is and why it’s useful?

Computer systems these days are limited by the speed of the internal components, and the fastest component that any computer system has tends to be the RAM. After that, lower than RAM would be SSDs, and then comes hard disks. Systems in general have a problem of trying to store the data in a cheap possible way, while also trying to make the requests as fast as possible… So you are doing juggling between keeping data in a RAM, which is more expensive, quite limited, versus keeping data on disk, which is cheaper and you can fit a lot of data in there.

So the job of a good cache is to try to keep the data in RAM, so that any future requests can be served faster than having to read it back again from any disk. Caches are typically judged by – the terms used are hit ratios or miss ratios. A typical hit to miss ratio would show how effective a cache was in serving that request from the RAM, instead of going back to the disk, or any other system outside.

One of the things that is worth also level-setting here is that we’re talking about a caching library, not a caching server. A lot of developers are typically in the mindset of thinking that “Well, maybe I’ll use Redis”, which is a popular caching server, “…or maybe I’ll use some other thing”, along those lines. But what we’re talking about here is not something that’s gonna go for the network, this is something that’s on host, correct?

[00:04:06.07] That is correct. The idea of ristretto was to be used within our other systems, like Badger, which is the embedded key-value database, and more importantly in Dgraph, which is a server which you can go over the network with… But again, we wanna make sure that we are being effective in our request resolution.

Now, as you mentioned, there is Redis, there is Memcached, which are essentially caches, but over a network interface, so you can dedicate an entire system just for the cache itself… And funny enough, Google’s web search index, the top tier of the index is running in this thing called Mustang, which is completely in RAM as well. I would say a good cache like ristretto could be made to work as a network system, but that’s not what it does out of the box.

We talk about using an in-memory cache… It’s not necessarily new, but I feel like more recently people are starting to use them for much, much larger datasets. Do you think that just has to do with the fact that RAM is getting cheaper, and it’s possible to stick much larger datasets into a cache, or are there other reasons for that?

I would say RAM has definitely gotten a lot cheaper than before. At the same time, I feel people are just more willing to dump the data into cache these days because of how advanced these systems have become. Redis can do quite a lot of things; it can literally become your data structure, it can add to lists, it can do maps, inserts… Not that I have personally used Redis at all, but I think a lot of it probably also comes from how effectively Facebook would use Memcached, and use it in front of all of their SQL queries. I think just by how willing the big companies have been and how generous they have been using in their caches, people are more willing to use the cache as well.

You talked about having a cache in front of a SQL database… In this day and age, where a lot of people talk about NoSQL and things like that that scale more, is that as much of a concern, now that you can realistically use a cache of some sort, rather than jumping straight to a NoSQL database? I guess what I’m saying is is the database decision, trying to get something that’s highly scalable, as important now that you probably could realistically get pretty large just using a SQL database, and throwing caches in front of that?

Caching would only take you so far. Actually, any good multiple version concurrency control system - it becomes very hard to use cache in systems like those… Which includes Dgraph. Because every transaction could return slightly different results based upon what happened just before. So I would say at least Dgraph - and I think any good database - would try to avoid doing query-level caching; they would only do some data-level caching, and even then would have to be sophisticated about it.

Now, I think the argument about “Hey, why don’t I just use a cache in front of SQL?” instead of having to use NoSQL, or having to use a graph system - they provide different things. The functionality of a graph database, for example, can be quite (I would say) evolved. I’m obviously biased, I don’t wanna upset any SQL people… But it just gives you a lot more functionality, and it’s hard to achieve that. Caching would not get you there.

[00:07:51.08] On top of that, caching across multiple systems is also a hard thing because of just the race conditions involved, and so on and so forth. Memcached, for example, gives you a CAS - compare and set - counter, so you know that if two different systems are trying to update the same key, one of them would fail. It almost becomes like a transaction, but at a lot more atomic level, at a key level.

So then if you’re putting your cache across systems, you have to deal with those kinds of issues, and the more you deal with these things, the more complex your code becomes, and so on and so forth. I think caching helps, but it is not a replacement for the different functionality offered by different databases, and the scale of these databases, and so on and so forth.

Say I have an application that is a service, and on my host it’s using the caching library to cache something… If I have multiple services that each have their own cache, is it possible that I’m - depending on how you use it, I would imagine, but is it possible that if I hit one service, it’s gonna have data that another service might not have… But because you can’t control which host you’re gonna hit, therefore you can’t control which data you’re gonna retrieve from which cache. So how do ensure that the same data is in every node, when you’re dealing with the cache on the host itself?

I think a good cache + database system, let’s say running on a single server, should appear seamless to the caller. So even if they’re calling multiple different servers for (let’s say) multiple different sets of data, the cache should be smart enough to make sure that you are getting the latest version of the data without the systems having to know about the cache. So the systems themselves should be completely unaware that the other system might be using a cache. That’s how I think a good cache should work like. Now, obviously if you’re running cache servers which are running outside of these systems, things become a bit more complicated, with the race conditions etc. But if you’re actually putting cache on the host itself, you as an outside entity, outside client, or on the server, you would just make the calls as you would, as if there was no cache, and you should expect the same results.

So from an application developer perspective, I should expect that it’s quite possible that if a particular client happens to hit a service that’s in a host that hasn’t perhaps cached a particular piece of data yet, that it’s gonna be a little bit of latency while the data is retrieved and put into memory and then returned, and then subsequent hits from the client could hit a server that either already has or doesn’t have the data, right? So that should be part of how I should think about this as a developer.

That’s right, yeah. And sometimes if you play with (let’s say) Postgres, and you will shoot a query to Postgres, you can see the first query tends to be relatively slow, but then the queries after that become extremely fast, and that is just the magic of the cache. I’ve seen this setup on systems where people would build this cache warm-up mechanism when they’d run their servers, so that it would pick up what they think would be a decent initial set of data, and then over time it would just improve to hopefully increase the hit ratios, essentially. I think that’s what any cache is going for - to be utilized as frequently as possible.

To be clear, things like hit ratios and stuff like that only truly come into play when you don’t have enough RAM to store everything, correct?

That is correct.

Okay. So for anybody who’s not familiar with caching, sometimes you can be lucky early on, where you can query an API and get some sort of data, or whatever it is that happens to be pretty static, and if you can store it all in memory, your cache implementation almost doesn’t matter that much at that point, because it’s literally just “throw it in memory and keep it there.”

[00:12:02.19] I’ve actually done this myself, where I’m hitting a couple of things and pulling the data, and then I’m basically rendering markdown that’s rendered in HTML from that point on, so I can just store the HTML and I never have to hit that API again. So the first query is kind of like what Johnny was saying - it’s slower hitting a SQL database, but after that point it’s very fast. So when Manish starts talking about having good hit ratios and stuff like that, what he’s referring to is the fact that when you get to a point that not everything fits, you have to decide “What do I throw out and what do I keep?” and that becomes a really complicated problem, because you never truly know what people are gonna need next.

Yeah. Along those lines, I’m hoping you’re gonna get into the caching validation strategies that you use as well to do that performant jettison that Jon was talking about. I think there are some stories there you can probably tell with regards to the latency that’s involved in there.

Jon, I must say that I envy you when you say that your cache did not hit capacity, and you could just store everything in there. That would be a great world to live in, where you can just put everything into RAM and never have to worry about it. All the queries are super-fast, everybody is happy… But yes, unfortunately that’s not the case. I will give you an example - in Dgraph we deal with terabytes of data, and the RAMs, even the most generous RAMs, would be (let’s say) 64 GB, and some of them have had about 128 GB of RAM available in the system. Now, that’s actually pretty generous, right? I wouldn’t expect every person to give us a system with 64 GB of RAM. In any case, it is still limited, and that’s when we run into the capacity of the cache, and that’s when we have to figure out clever ways to determine what we keep and what we kick out.

Predicting the future is extremely hard, but you basically just learn from the past and try to see what would be valuable. That’s what we have tried to do with ristretto in our implementation.

Can we start with some history? Can we talk about some of the more basic caches that people started trying, started out with, to figure this stuff out? I think one that most people have probably heard of is just a Least Recently Used cache, which is a relatively simple idea of something in memory that whatever object has been used least recently, that’s what you evict whenever you need to replace it with something. I think that one’s even common enough that I’ve seen it pop up in interview questions, which is slightly crazy… But it does pop up in interview questions, and I think Java even has a linked HashMap in the standard library, which is essentially a Least Recently Used cache. It might not be the most efficient one in the world, I’m not sure, but it essentially serves purpose.

So obviously that’s a model people can use… Why does that not work at scale? Why is that something that – it’s relatively simple to understand, I think, where you’re just keeping track of what items were used more recently… But why does that not end up working at scale when you’re getting in large datasets?

I think before we begin the discussion I should probably explain the scale. In this case, by scale at the internal system memory level we’re talking about scaling in terms of the number of cores, the number of goroutines, the number of concurrent lookups that could be happening… As opposed to when we talk about database scale, we talk about different machines and how much terabytes of data you can keep. So scale in this case is the number of concurrent accesses that could happen…

[00:15:59.05] So we tried in Dgraph a bunch of different techniques. The simplest thing that anybody could do is take a map in Go, put a mutex lock around it, and then for every get you just acquire the lock and you do the retrieval. Now, that would work, and that works very nicely for some basic use cases with low concurrency, but it becomes a hard challenge on what to evict and when. If you do it badly, you will directly affect your hit ratios, which means that things would actually slow down… Because note that a cache can also slow things down, right? Cache is an extra step that you have to do. Not only do you have to retrieve the data from the underlying hard disk or system, you also have to first check in the cache if the data exists, and then later on put it into the cache. That lock acquisition and release can become a source of contention, as we’ve found in Dgraph.

In Dgraph what we had done was we took the LRU implementation by groupcache, run by Brad Fitzpatrick of Memcached team and obviously the Go team. It was obviously a very nice implementation of LRU cache that we picked up. We put a lock around it and we started using it, and we knew that we’d have to optimize it at some point, but we did not realize how bad it was.

At some point I was looking at a particular query - this was one year after implementing the system - and we realized that if we were to remove the cache, our queries would improve by five to ten times… Even a 30% query improvement is a good day for an engineer, but when you increase it ten times, that’s just incredible. So we immediately removed the cache and we started to look around to see what we could use. That’s when the whole idea for ristretto started.

Obviously, you’ve built this for your specific needs… But I’m assuming that you also thought of this as like a more general purpose library as well. How did you go about deciding “How are we gonna test this? What metrics matter the most for us?” Because I’m assuming it’s like most software, where there are some trade-offs. It’s really hard to have the best of everything. So when you were trying to design that, was it just mostly focused on your specific needs, of a lot of concurrency and a lot of queries like that, or did you just sit down and come up with a generic set of requirements?

We felt like if we were to solve this problem, we should do it in a generic enough way that it would be generally useful to the Go community. A lot of times I tell my engineers that we stand on the shoulders of giants. There are people who have already solved a lot of these problems, and our job is to learn from them and then decide how much of that we should be using, and if we should be introducing new things of our own.

We’ve done that, for example, for distributed transactions in Dgraph; we picked up from multiple different papers - from Spanner, from HBase, from Bigtable transactions, and so on and so forth. And then we ended up devising something which is a mix strategy of all of these. In caching it was not different. We came upon caffeine, which is an extremely efficient, fast, concurrent cache in Java… And it’s being used by multiple databases in Java, including Cassandra, Neo4j, and any big Java system. We reached out to the author of that cache Ben Manes, and Ben has been extremely helpful in helping us understand his implementations. He’s written multiple papers about it… And also to help us write our version of caffeine, which is what we’re calling ristretto.

[00:20:12.13] Now, we did not pick up everything from caffeine, because caffeine had been around for a while and they’re more sophisticated, I would say, than ristretto is… But we came up with an initial good set for ristretto, and I think some of the benchmarks that caffeine had already done around concurrency, around hit ratios etc. - we learned from that.

Now, we wrote a blog post about this before we started talking to Ben about the state of caching in Go, and for that we just showcased all the different caches that are available in the Go ecosystem, and just compared them, and we wrote some benchmarks for that, which were around throughput of the cache etc. So we sort of improved those benchmarks, we picked up more benchmarks from Ben, wrote them in Go, and that became sort of our guiding light.

So I would say ristretto is designed in a way where it is generally useful for the entire Go ecosystem. That’s when Karl came into the picture - he was recommended by Ben, and he came in and just started executing.

Break

[00:21:34.13]

So Karl, how exactly did you start executing? What were you working on?

Well, actually Ben found me on GitHub, and one of the papers that Ben co-authored with a few other people was called TinyLFU. So we’re talking about the cache metadata as far as determining item value, like what you should evict, what you should let in… TinyLFU was published late 2015, and it’s called an admission policy, which I haven’t really seen much of as far as in the literature. We all have heard the LRU eviction policies, and then the TinyLFU paper was basically a new way of deciding what you let into the cache, with a small memory footprint, and the eviction policy wouldn’t even matter; it would just increase the hit ratio.

I was writing my own implementation. Of course, Ben was looking around on GitHub, and I got linked up with Dgraph. Since then, we’ve kept the TinyLFU admission policy, and we’re actually using the same counters for admission and eviction. So rather than doing just standard LRU eviction, we’re doing the sampled LFU eviction, which we’ve seen some work done in Redis along those lines, and I think it’s performing pretty well so far.

Talk a little bit more about the admission policy decision… I must admit, that sounds very unusual from what I’m used to in caching systems. Loosely, what is it based on? Is it the frequency or the likelihood that something’s gonna be asked for, or what is that?

It’s based on the access counters. You can think of it – each item, when you try to set a new item, it could either be accepted or rejected. So the TinyLFU admission policy will reject the items that it doesn’t deem valuable. And to do that, we keep access counters for probably – I guess you’d call it like a ghost cache, some metadata for items that aren’t necessarily in the cache. So if we see an item that someone tries to add in multiple times, and we see that it’s valuable enough, eventually we’ll let it in.

The idea is that the eviction policy doesn’t exactly matter. As long as the eviction policy is good enough, the TinyLFU admission will give us a 10% boost on the hit ratio.

To make sure I understand this right, that would generally mean that if you have some sort of new data, that was just introduced in some way, that likely the admission policy is gonna reject it the first couple times, so you won’t see any performance gains. But at some point, if people keep trying to hit that… I guess a good example would be if you had a new top story on Hacker News and everybody is trying to hit it, the first few times it might not be, but at some point it’s gonna end up getting cached, and then because it’s kind of learning “Oh, this is important. This is something I need to cache…” That’s how it would work?

Yeah. And the TinyLFU paper also has this – it’s a freshness mechanism. So if you think of an item that – well, if you just think of the long tail distributions, the really popular items, new items wouldn’t really be able to compete with them. So the freshness mechanism essentially halves all the access counters for each period - which doesn’t really matter… So we have the counters, and eventually new items do get the chance to go in.

It’s about 12 bits of overhead for the amount of counters we have, for each counter, and Ben Manes has done a lot of documentation and research on it, and I think the benefits are pretty interesting… Because like you said, the admission policy isn’t really anything that I’ve seen, and I think for a modern cache it’s pretty much a no-brainer.

[00:27:54.09] I find the concept pretty interesting, because like we see with Reddit, with Hacker News, with websites like that, they’re essentially doing the same thing, but it’s more of like at a visual level, like for people, to make sure it’s something that they actually care about. But I guess it’s unique to see that applied somewhere else, like in caching, where you might not see it, but realistically it does sound like something that would make a lot of sense, because what the data people care about today is not necessarily the data they’ll care about in two weeks, especially for some websites.

As far as that stuff goes, do you allow users who are using – like, if I’m using a caching library, is that the type of thing that I could customize? That refresh period, that sort of thing? Or is this something that you fine-tune once and just work with it?

We obviously have a configuration for ristretto. You can configure the number of counters, which – since we keep metadata for items that aren’t in the cache… So you could have so much ghost counters, I guess, that it might increase your hit ratio. I guess you can sort of fine-tune it… Right now we found that the amount of items you expect to be in the cache, if you multiply that by ten, so 12 bits for each item, you find a pretty good boost on the hit ratio.

But just to answer that question, yes, I think we do allow a bunch of different options in how you can configure your cache. We have this concept of lossy buffers… Because again, I think the big thing about ristretto is that it scales really well, which means that if you’re doing a lot of concurrent accesses, the cache should not slow down your system… Which is the biggest issue we were seeing with Dgraph; when there were a lot of concurrent accesses - and Dgraph is a highly concurrent system, and graph queries can return millions of results in the intermediate steps.

So you’re trying to access millions of keys concurrently, the locking on the cache becomes a bottleneck. And one of the big things that we wanted to avoid with ristretto was to even in the case of high contention and high concurrency, the cache should deteriorate in terms of hit ratios, but not in terms of the speed of the cache. So we allow options of how many things that you need to batch up before they get applied.

For example, when you’re doing tracking of the access counters for the gets in the cache, for every get you need to update a counter. Now, if you were to do it in the simplest possible way, you would acquire a lock, you would update the counter, you would release the lock. Obviously, that’s not gonna scale if you have a lot of concurrent gets, so one thing that Karl did there which was really interesting was that he used Sync.Pool to build up sort of like a stripe system for a buffer of gets.

One of the options that is present in ristretto is that you can buffer up 64 gets before the stripe gets applied internally by acquiring a lock. And I think the throughput of that call was pretty high compared to some other things, right?

Yeah. Compared to just a naive channel implementation, the sync pool was probably five or ten times the throughput, just because of the – well, we have a pretty unique use case, but the sync pool internally uses thread-local storage, and per processor, so we don’t really have access to that outside of the standard library. So the sync pool for our use case, which is basically we get a buffer, we get a stripe of the gets, and then eventually we drain it; draining is essentially acquiring the lock and incrementing the counters. The sync pool works very well for that.

[00:32:06.12] And actually there were some GitHub issues that we pointed to another blog post, where people are asking for that thread-local storage, and of course they can’t have it, so hey, the next best thing is to use what Go people have written, which is Sync.Pool.

Ristretto actually is an interesting collection of a bunch of these – do I want to say “hacks”? They’re not really hacks, but they are just interesting ways to get around some of the limitations of the Go language, to increase performance, I would say.

When you have to do these things, like when performance is absolutely necessary and you’re trying to make all of this work as well as possible, one of the things I think – I mean, you guys aren’t on the show, but Johnny and I talk a lot about making your code readable and easy to maintain. Would you guys say that your code suffers from that a little bit as a result?

No. And I can say that very confidently, because I am actually a big – I hate technical debt; in fact, the way we run things in Dgraph and all of our projects is that we consider user feedback to be the top priority. Then comes bugs, then comes refactoring, and then comes features. So if we have a choice between refactoring a code versus adding a new feature, we will go refactor the code first… And if your code is clean, features just fit in; they just fit in like a block. So we spend a lot of effort on doing code reviews. I personally do a lot of code reviews for the growing team of Dgraph… And we’re always trying to find the simplest possible way.

Even these interesting, nifty things that we’ve done in ristretto, if you look at the code, the code is extremely simple to understand. In fact, I think that other engineers could potentially pick up some of these techniques in their own code, and learn from our little design things and implement it in their own codebases.

You’ve talked about some of the things you’ve learned, like reading the TinyLFU papers, and talking with Ben you’ve learned some stuff… I suspect you’ve also learned things on the other end of the spectrum, like things that you shouldn’t do, or you probably tried some things and then realized that didn’t work the way we expected… Do any stories or experiences stick out in your memory, anything that you’d like to share?

One thing that he mentioned was Hacker News before - we said a new entry comes in Hacker News and you obviously want it to be serviced quickly. Now, I think if you were to look at the distribution of keys in that case, or distribution of excesses in that case, you realize that the top ten or the frontpage of Hacker News has exponentially more clicks than the second page of Hackers News, or the third page of Hacker News. And one of the big things that we learned a while - even before we started building ristretto - was that there is a Zipfian distribution of keys, which means that the most frequent keys are accessed exponentially more than the less frequent keys… And therein lies most of the downsides of current caches; they would end up hitting – even if you were to shard your data, let’s say… You shard it, you put 32 shards, and you have a lock around it, you will end up hitting the same shard over and over again, because the few keys which are being accessed exponentially more times will actually end up on that chart.

So some of the typical strategies of “Hey, okay, we have a LRU cache. Why don’t we just split it up into 32 LRU caches, and we’re gonna use that?”, you end up going to the same shard, which means you end up having the same contention. So one of the things that we wanted to avoid was for a Zipfian distribution of keys - we are able to spread that around nicely.

[00:36:14.03] Some of the things we did with Sync.Pool - even if you are hitting the same key over and over again, you don’t end up in the same shard or the same buffer on Sync.Pool… Because Sync.Pool is gonna give you something randomly. It’s gonna just pick from one of the items that it has, it’s gonna give it back… So we avoid that contention at that level. So these are some of the things that we learned.

The other thing that we learned was, again, going back to the Go runtime - it’s such a beautiful, marvelous thing - we wanted a fast way to get a hash. So instead of using – I think we were using a form of farmhash by Damian Gryski and we’re using it in many places in Dgraph… We realized that if we were to hook into the MemHash that Go uses internally, things are a lot faster. And once we had that hash, we are now using it for many different things by just doing a modulo of that. So it’s just these nifty things that we applied to solve these common problems.

Given a scenario where you are lucky enough to know ahead of time that you’re about to get a massive spike in traffic, and you’d like to absorb that as gracefully as possible, is it fair to want to be able to pre-populate your cache and actually get the benefits that we’ve been talking about using ristretto?

If you knew - yes, you would absolutely go ahead and do the sets upfront, so that you will just get the accesses. But I might argue that you probably would get them pretty quickly, because again, of the Zipfian distribution of the keys. So I think the first time that ristretto sees a key, its counter is zero. It has never seen this key before, it doesn’t know about this, so the chances of this getting admitted would be zero. But if it comes a million times over, pretty soon it’s going to exceed anything else that the cache has, and that would happen pretty quickly. So it would come into the cache quick enough that you wouldn’t have to do anything specific at your end. It should happen naturally as a system sees this load.

That’s pretty cool.

It’s pretty cool, especially because even at companies like Google I’ve seen some weird practices around – like, when you know a website’s about to get a massive surge of traffic, engineers will do some weird things at times. The one example I can remember is Google Code Jam. I helped one year organize things and run it a little bit, and right before it was about to go live with the competitions, they actually ran a little script that I’m pretty sure was just hitting the server, to sort of get it ready for that influx of requests… And I think that was just a quick, hacky “This will get it ready. It’s fine. We don’t have to do anything else.” But you know, I could definitely see that not being scalable all the time… Like, in that one specific case where it’s once a year or something, it’s not too bad. But the other ones, it’d be much trickier, so it’s nice to have options available.

And talking about predicting future, I think one of the good ways of figuring out how well a cache is doing is we talk about hit ratios, right? So Ben had written this particular future-predicting system, which cannot be built practically, but for tests it’s a great thing… And Karl actually applied that, and called it Clairvoyant. Karl, you might wanna talk about that.

[00:39:57.02] Yeah, I think there’s a Wikipedia article on it, but it’s called Bélády’s theoretical optimum. Basically, the idea is you’d play a trace over this implementation, and then you would run it back and figure out – you can use the future knowledge to essentially calculate the absolute optimal eviction candidates. We don’t have the luxury of that information in the real world, but with the Bélády’s algorithm, when you run it back, you essentially figure out the optimal hit ratio. So when we’re graphing all of these different cache implementations and ristretto’s hit ratio performance, we can use that ideal hit ratio to see how we’re doing and how close we are to the optimal. It’s been really useful. Ben pointed us to that. And caffeine has been really close to it… We’re trying to catch up.

One of the features that I happen to really like with systems like Redis is the automatic expiry of data that is not frequently accessed. It sounds like you’ve got something a bit different going on here because of the admission policy, and the ways you’re choosing to eject data out of the system. Can you talk a little bit about that, how you handle - or whether you even handle expiry at all?

I think at a very high level what you wanna do is you want to evict – if you’re running at capacity, you want to evict something which has lower value than what is coming in… Because you’re always trying to optimize the value of your cache. Now, what is value? That could mean different things for different people. And for ristretto, the value means the chances that we will see this key come again.

In LRU cache you say that the one that was least recently used, we would not see it again. In the LFU, which is the Least Frequently Used cache, we say that if this wasn’t seen as frequently, we have less chances of seeing it. So we set the value to be the estimate of the counter. The biggest thing a TinyLFU counter gives you is an ability to store millions of keys with very little RAM usage. I think it uses, if I’m not wrong, four bits per counter. Let’s say you think about two hundred million keys, you can store their counters in a 100 MB RAM, which is quite a lot. So the more you know about the universal set of keys, the better you can estimate their value.

So, cache running at capacity, everything that comes in should have a higher value than everything that gets out. So the juggling thing that ristretto is doing is that for every incoming we figured out what the estimate is. If we are at capacity, we try to create a sample set of what could be evicted, and try to find the one with the minimum value. And if the entry has a higher value than the one with a minimum value, we’ll admit the incoming and evict the one. Otherwise, if this one has a lower value than the one which is going to get evicted, we will reject the incoming. I think that’s the novel concept that is not present in typical caches, including LRU… Because in LRU, at the moment something comes in, it’s admitted, because it’s the most recently accessed, and then it would evict something out. But to actually get better hit ratios, you really wanna be judicious about who you let in.

Break

[00:43:53.18]

If I’m understanding this correctly, when you say you get a sample you’re not looking at all the data, you’re just getting a small subset of it and looking at that. I assume that what that essentially means is even if you don’t admit something right then, if it keeps getting hit a couple more times - because if it is actually popular, that’s gonna happen - at some point that sample will actually show you something where it can get let in. So while it might not be the absolute optimal performance, it’s gonna be pretty good, especially considering that checking everything in your cache is not feasible at all (that would take up way too much time). So the idea here is to kind of play the statistical “We’re trying to be at like 90% or something like that, without wasting a lot of time getting it.” Is that correct?

That is correct. There’s two different things happening here. One is the incoming - one thing that we do is that irrespective of whether we admit a key which is incoming or not, whether we reject it or admit it, we would always update its counter. So we can keep track of how often we have seen this thing, so that it would keep on building its value within our system. So at some point, once the value of this key is higher than the eviction candidate, it can be emitted. So everything just keeps on building value.

The second thing is that - and that comes back to the idea of, “Hey, how complex is our code?” - one way to figure out the eviction candidate is to keep track of all the values of every key, and do maybe a priority key or something, and find the key with the minimum. Obviously, more code, it might be slower, it might have issues because the values are constantly changing… So all we did was we said “You know, Go maps gives you a pseudo-random access to the keys; we already know that, right? It’s not completely random in Go; people have done some tests and they show that it prefers certain keys over others… But it is still random, in some level. So we were like “Hey, why don’t we pick (let’s say) five of these keys that are coming to us at random, and use that to find the eviction candidate with the minimum value?”

As you can imagine, the code is really simple to find five things from a map. We just loop over it five times. But that gives us a pretty good hit ratio, as Karl’s benchmarks showed. So we work within 1% of what would be a priority queue approach to finding the eviction candidate.

I find that aspect of it really interesting, because if you’re studying algorithms or any of that stuff, you learn about things like the traveling salesman problem, and these things that realistically solving them perfectly are not possible. It takes way too much time and it’s way too hard to do. But as engineers, we’ve realized that if you can get within 10% of the best solution, usually the difference is so minimal that it just does not matter. And it sounds like you guys are taking the same type of approach, where for caching it might not be optimal, but optimal is gonna take so much time to verify and to make sure that it’s always there, that being optimal is not actually faster because of all that extra work. So it sounds like that’s a really unique approach, and it sounds like it’s working really well, which is cool.

[00:48:15.20] Right. And I think that’s the one thing that we keep on doing - we like to go for good design, but at the same time we also like to be judicious about “Is this extra design worth the extra code complexity?” So the juggling act of maintaining simplicity of the code, with the performance of the design - that’s very crucial for us at Dgraph, and you will see it across all the different things, including Dgraph the database, Badger, as well as ristretto.

Today ristretto is a library, it’s something that you can import and use into your code, but in my mind’s eye I could definitely see a server implementation of this, even with the network hop, I think it would still be efficient, given certain circumstances. Is there a plan around having a server model for this?

We have been asked about this… If it’s useful to the Go community, or in general to the wider dev community, we would be open to writing something like that. It should be relatively straightforward, because all we have to do is put a network thing on top of it. But then I wonder, “Hey, we already have Redis, we already have Memcached, people are pretty happy with that… Is it worth it?” We just don’t know. We could be convinced.

You might be underestimating developer’s desire for novelty. [laughter]

Yeah, if there’s enough demand for it, we would love to build something.

So when we talk about this type of caching, where we’re getting into slightly more complicated – I know that you’d mentioned that from the developer’s perspective you’d kind of like it to be almost like they don’t know a lot of the details, so that they don’t have to worry about them… Is that true?

That’s the idea. We keep the options to just what they really can understand, and nothing more, yeah.

So that would mean that realistically there’s no harm in using this over, say, some other Least Recently Used cache, or some other naive approach that they could implement themselves… If that’s what they wanna do, I guess implementing it themselves has some merit. But if they’re gonna pull in a library, at that point it doesn’t really make a difference which library they pull in, because they all should realistically be making it easy, so it’s just a matter of the most performant one.

Absolutely. And I feel like the problems that we run into are general enough problems that other developers could learn from, or could benefit from. Again, we are not the only ones, because caffeine already exists, there’s multiple papers about caffeine, it’s already being used…

I think one of the things when we were starting this project was, you know, we are a small company, with limited engineering resources, and what we should be prioritizing is a highly debated thing. But one thing that kept us going about building this cache is that we felt like Java has a lot of interesting things that Go does not have… For example a low-class map, which runs at atomic level. Now, the current throughput that you can get from a Java’s low-class map implementation I don’t think can be matched in Go… But at the same time, we all love Go. Go is an amazing language; it is so simple, it is so easy to use, the code is so readable… What else would you use? You wanna use Java. So a part of our effort was “Let’s bring the Go ecosystem closer to Java’s.”

[00:52:02.09] I always joke in the company that Go is like the Wild West. It has a lot of opportunities. At the same time, if you want something, you have to go build it… So this was an attempt by us to get the Go ecosystem to be at the same level as where Java is.

So would you say today that if I’m your average Go developer and I’m building an application or a service that could benefit from cache, that I should definitely consider ristretto where I would typically rely on other libraries that perhaps have been used for a while? What’s the requirement, or rather how should I be thinking about when to use ristretto?

Use it. Use it if you have one goroutine, use it if you have 20 goroutines or 100 goroutines. I think ristretto is ready to be used. I think we have some bugs in the system, that we already know about, that we’re already working upon, but I think the idea for ristretto was to unite the Go community around a cache which is designed for scalability, designed for performance, designed for better hit ratios… All the things that a cache should aim for, ristretto is going for that. Over time, I have no doubt that ristretto will become the default choice for the Go ecosystem.

More generally speaking, I don’t think that cache is always the best choice for – let’s say I’m throwing together a web application. Realistically, there’s some point where you need to start thinking about caching. And I think that you even said that for you guys, you weren’t using a cache necessarily the whole time, or you had one and it was slowing you down more than it was helping you. So if you were talking to somebody who’s sort of starting up something new and trying to pick and choose where to spend their time, around when do you recommend they start looking into caching options, and that sort of stuff.

I think ideally the system has been built in a way, with a good design, that the latency of the requests to the system are fast enough that you do not need a cache for a while. Dgraph currently does not have a cache. And we released version 1.0, we are at version 1.1, and so on and so forth, and we still don’t have a cache… And it’s performing really well, actually. It’s outperforming a lot of other databases.

So ideally, you build a system in a way where you can go without a cache for a while, because the introduction of cache – caching is a hard problem. You introduce correctness issues, you introduce contention issues, you introduce 20 other things that you don’t even know about. But it’s more like a double-edged sword. It can really get you going really fast, it can really improve the latency that you have; at the same time, you could end up returning the wrong results.

So in general, I’d say just be careful around using cache, but once you know that caching can really improve your latency, go ahead and use it. But do a lot of correctness testing.

As I mentioned earlier in this talk, in a multiple-version concurrency control system, caching becomes particularly hard, because each version has a different state for the same key. Then you need to be even more careful.

One of the things that we’re gonna do with Dgraph is as we introduce ristretto with Dgraph, we’re gonna be running Jepsen tests on it to make sure that we haven’t introduced any new correctness issues to the database. However, some of the initial benchmarks that we are doing do show a very positive impact on the latency numbers. The latency actually is improving because of the cache..

[00:55:57.24] When you talk about running a test this way, it’s not like your standard unit test where you can just test one thing in isolation, I assume. I assume this is something where you have to orchestrate a whole lot of things working at the same time. I guess we have a tiny bit of time left, if you guys have a couple more minutes… Can you talk a little bit about how you went about that testing, and how you made it reproducible, how you made it useful? Because I know from my experience the more pieces you get involved in a system test, the harder it is to reproduce it to actually figure out what went wrong. They become way less useful in some ways, because it’s just hard to actually figure out what broke. So have you guys learned anything from that process, or is there anything new coming from that?

We have multiple levels of tests. We have tests within ristretto for ristretto’s own correctness, and then we have tests within Dgraph written by us, which are around Dgraph’s correctness. Then we have Jepsen tests, which is the third-party distributed systems test for the database, which tests correctness while introducing a whole bunch of edge case scenarios, like network partitions, machines getting lost, processes crashing etc. I think you need all of those, really. You need to test for correctness at multiple different levels, and the thing about correctness testing is that a lot of times you don’t know what you’re looking for. You’re just throwing things at it, expecting them to run. For example, Badger has a bank test, and we run it for eight hours every night, to move money around between accounts and make sure that the total amount in the bank has not changed.

Some of these things you can do directly on the component itself, but I think there’s a lot of value in having a higher-level test, which does not care about any particular component, but just lets you know if something is broken in terms of correctness.

This is kind of like – I think it was a couple weeks ago, Mat talked about security, and in there they talked about fuzzing, and just sending random data. So the idea is to come up with something that can be verified… Like you mentioned, if you have a bank, there’s a total balance that should realistically stay the same. Then from there it’s a matter of just throwing whatever you can at it. So there are ways you can verify that things are working, but at the same time it’s random enough that you can test things that you don’t even know what you’re testing for, which makes that unique in that sense.

Exactly. And some of these tests - they will not tell you which part is broken. And that’s not the job of these tests. The job of these tests is to tell you if that entire system has an issue or not; just that, and nothing more. And then you have individual tests, which will tell you “Okay, this particular part is not working correctly.” And then we go to distributed tracing and all that stuff, which can help you identify the issues… But I think you do testing at multiple different levels, and I find a lot of value in this black box testing, I would say, where the system should act in a certain way, and that’s it.

[00:59:09.02] I completely agree with you. I think it’s also hard when you’re on the other end of it, where it’s broken and you don’t know why it’s broken, so you just wanna bang your head off a wall for a while… So it’s not that I don’t see value in those – I see value in those tests; it’s more of trying to figure out how you actually take the fact that you know a test is wrong, and turn it into something to act upon and fix. That sometimes become a challenge.

I would give you a story about exactly that scenario. I think my last years were spent on trying to fix some of the Jepsen tests. Jepsen is this black box testing scenario, where it would just tell you that the system has a problem, but it does not help you in any way in identifying what the problem is. And I think me and my engineers - we’ve spent many months trying to figure out all of the issues there, and why that issue was being caused. Ultimately, we introduced OpenCensus tracing into Dgraph, and by connecting that to Jepsen’s own tests, we could track it all the way down to the last part. That helped us get an insight into what might be going on, and we had to write these crazy scripts to figure out what was the state of the system at that time. So we had to do a lot of things to be able to understand why the test was failing. It took us a while, but if that test was not even there, we would think that it’s working just fine.

I’m a big fan of OpenCensus. Having that, and open tracing etc. all those things are just incredible. So one way to deal with some of these issues is to add more instrumentation.

Alright. Well, I think that about wraps up this episode of Go time. Thank you, Manish, thank you, Karl, and thank you Johnny for joining us. If you have any other questions, you guys can definitely ask in the GoTime Slack. Manish and Karl, if you guys want to check out that Slack channel and answer some questions, I’m assuming you guys can… But yeah, that sums it all up.

We will. Thanks, guys, for having us.

Thank you.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

0:00 / 0:00