In this episode Matt joins Kris & Jon to discuss Kafka. During their discussion they cover topics like what problems Kafka helps solve, when a company should start considering Kafka, how throwing tech like Kafka at a problem wonât fix everything if there are underlying issues, complexities of using Kafka, managing payload schemas, and more.
Matthew Boyle: So I think [unintelligible 00:55:00.06] good or bad, but I guess just kind off a brief story of how weâve been using it at CloudFlare, which has been interesting⌠So firstly, I did mention that weâve been using Protobuf for our schemas. The support for Protobuf and gRPC in Go is excellent. Itâs first-class. So that was a good fit and a good choice, and I would make that choice again. So for schema management, Protobufâs definitely worth looking at, especially if you are a predominantly Go place.
Something else we did is we created a Go library that we call Message Bus. Very creative. And effectively, what we do in this is we use â it used to be run by Shopify; itâs a Go library called Sarama. Sarama is a Go library that was created by Shopify, that basically allows you to do pretty much everything with Kafka. And I actually credit that library with an awful lot of the adoption of Kafka, both at Cloudflare and elsewhere where Go was involved, because it enabled you to basically do everything that the Java libraries were doing.
One really hard thing about Kafka as we talked about is configuring it can be hard. So one good choice that we made as a company, I think, is we made it an opinionated library, that kind of set up a very good set of default settings and constraints for how we think you should interact with Kafka at CloudFlare, and made it as easy as possible for you to do it. Itâs got a bunch of power user settings, if you will, where you can override what we deem to be the best settings, but that was a pretty good choice, I think. And we added a bunch of Prometheus metrics within that library as well, so it means that everybody who pulls in our library gets this dashboard for free of how their Kafka service is performing⌠Which was very, very helpful, and again, is another thing Iâd recommend doing. Itâs not Go-specific, you can do it in any language, but we were able to do it with Sarama.
Slight tangent, but Sarama actually got picked up by IBM. So IBM is now responsible for the maintenance of Sarama, because it turns out that Shopify arenât using it too much anymore. So IBM have taken over stewardship of it. So that was a really cool thing to do⌠I havenât checked in on the project in a while, to see how itâs progressing, but it was excellent that they put their hand up to carry on stewardship.
And then the final thing that we have been using Go in Kafka for is we built this thing called â we call them connectors, and theyâre built on⌠Thereâs a framework called Kafka Connectors, which effectively allows you to plug some code into your database, [unintelligible 00:57:09.24] into Kafka, and then it just like moves the data between the two. So when people are trying to take things out of a database and push it to Kafka, Connectors are a pretty common way to do that.
We built our own framework that we also call Connectors; itâs all written in Go, and effectively with a very small configuration file you write in YAML, we allow you to specify a reader, some transformations to apply, and then a writer. And so what this means is teams can deploy very simple code that reads from a database, applies some transformation to a Protobuf format, and writes it to a Kafka topic. They can do it without actually writing any code; they just create some environment variables and deploy it. And same thing - weâve got Prometheus metrics, you get a dashboard for free, and you get some alerts around it for free, and stuff.
[00:57:49.17] So all of these things have really helped with Kafka adoption. And I think if youâve got the resource to deploy Kafka at your company, I would really consider having a team like mine, a platform team that provides tools and services that makes it easy for other teams to do the right thing, and to teach them, too. I think a huge part of our teamâs job is just teaching as well, and just making sure people are following the right patterns when using some of these things. And it can help overcome some of these barriers to entry, but obviously, itâs a large cost investment.
One of the reasons Cloudflare picked Go in the first place, and we continue to use it, is it just scales so well. Weâve had a couple of issues with Kafka consumers not being able to keep up with the amount of messages that are being passed through, but after some small tweaks that you would have to make in any language, weâve been very easily able to scale a bunch of our services to tens of thousands of messages being read a second, without too much heartache. We havenât had to do anything clever, we havenât had to write any sort of crazy code to do so. Itâs just kind of standard optimizations that â you know, a linter will probably help you with most of them, if Iâm honest. So Go has been fantastic for that⌠And even people who join CloudFlare who havenât got experience with Kafka and havenât got experience with Go, weâre usually able to get them productive and writing decent Go in a week or two just because 1) how easy Go is to learn, 2) how easy it is to read, which I think is really important and often overlooked with Go. Being able to read it⌠I can pick up pretty much any Go service and I can follow it through, and I can roughly figure out what itâs trying to do. I canât promise you the same thing for Java or PHP, where thereâs lots of like auto-wiring and magic, and you have to understand the framework a little bit.
So thatâs been really, really powerful and useful for us in terms of adoption, too. And generally, the performance of the Go app. We are a cloud, so we pay a lot of attention to resource utilization, and containerized Go services⌠I think they are so tiny in comparison to some of the other things that we have to run. And even quite complicated applications that we have running, that are processing a lot of data. Their footprint is tiny. So if we were to do this all again, maybe [unintelligible 00:59:44.25] some other languages, and to be completely clear, especially in the context of the conversation weâve been having, there has been more and more adoption of Rust at Cloudflare. More teams are definitely starting to dip their toe in and figure out if thatâs a good fit for them. And TypeScript, too. There has been a lot of TypeScript usage, especially in Cloudflare Workers, because itâs natively supported and itâs fantastic. But Go isnât going anywhere. We see new Go services deployed every day, because it just does what we need to do incredibly well.
I think one of the huge drawbacks of picking go and taking this approach that we had - and itâs something that weâre sort of still reevaluating - is, as you probably infer from what Iâve said, weâve invested a lot of time in tooling for teams that write Go. If you donât like Go, we havenât got actually a whole bunch to help you right now. We havenât rolled the same libraries in Python, we havenât rolled them in Rust. So weâre actually kind of making it harder for teams to blaze a trail and potentially do whatâs right for them, because using Go is easy for them⌠Which is kind of by design. We want them to stick with Go until maybe it doesnât make sense for them anymore, because weâve got all this great tooling thatâs got production-battled experience and it works.
But one thing weâd love to support in the future is the same sort of patterns, ideas for other languages. And so weâve been exploring some interesting things like could we put gRPC in front of Kafka, and therefore generate bindings for further languages, and therefore we could benefit from the same tooling which was set just behind our gRPC server, but the teams who interact with Kafka need to know even less about Kafka, because weâll handle the hard configuration for them? And then we can support these other languages, too. And the thing that keeps causing me to pause is exactly what Kris was talking about, is if we do this, weâre going to remove the need for teams to understand Kafka at all, if we do it well. And that sounds like a great thing, and it maybe is in the short term, but I just feel like it will bite us a lot in the long term if we donât â this fundamental piece of infrastructure, if it becomes one team, like mine, who knows how everythingâs configured and connected to it, and another teamâs kind of passing through it without ever really truly understanding it, I donât know if thatâs actually optimal in the long term. So weâre still trying to battle that and figure that out, but⌠As of right now, itâs the only path I can see to scale all this great tooling [unintelligible 01:01:50.17] to a way that we can support all these other languages that are starting to appear in the Cloudflare ecosystem.