Ship It! – Episode #67
All your network are belong to eBPF
featuring Liz Rice & Thomas Graf
A few weeks ago, Jerod spoke with Liz Rice about the power of eBPF on The Changelog. Today, we have the pleasure of both Liz Rice, Chief Open Source Office at Isovalent & Thomas Graf, CTO & co-founder at Isovalent, the creators of Cilium.
Around 2014, Facebook achieved a 10x performance improvement by replacing their traditional load balancers with eBPF. In 2017, every single packet that went to Facebook was processed by eBPF. Nowadays, every Android phone is using it. Truth be told, if it’s network-related and it matters, eBPF is most likely a part of it.
DEX: Sort the Madness – Join our friends at Sentry for their upcoming developer experience conference called DEX: Sort the Madness. This event will be in-person in San Francisco AND virtual on September 28. This is a free conference by developers for developers where you’ll sort through the madness and look at ways to improve workflow productivity. Learn more and register
FireHydrant – The reliability platform for every developer. Incidents impact everyone, not just SREs. FireHydrant gives teams the tools to maintain service catalogs, respond to incidents, communicate through status pages, and learn with retrospectives. Small teams up to 10 people can get started for free with all FireHydrant features included. No credit card required to sign up. Learn more at firehydrant.com/
Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights
Notes & Links
- 📘 What is eBPF? - written by Liz Rice, free to download
- 📕 Security Observability with eBPF - written by Natália Réka Ivánkó and Jed Salazar, free to download
- 🎬 A Guided Tour of Cilium Service Mesh - Liz Rice, Isovalent - KubeCon 2022 EU
- 🎬 eCHO Episode 51: Life of a Packet with Cilium - Duffie Cooley
- Learn more about Cilium and eBPF from the technology experts
- ✨ Twinkly - Liz’s wall neon
|4||04:30||When was day 1?|
|5||06:53||EBPF is everywhere|
|6||08:36||5 years later|
|7||11:19||EBPF in security|
|8||16:32||Is this module secure?|
|9||17:22||Being open source is so helpful|
|11||24:22||HTTP2 vs HTTP1|
|13||31:25||Liz's Star Wars demo|
|14||37:03||James Webb telescope|
|15||40:10||Why the hexagon?|
|16||41:13||Why is Hubble important?|
|17||46:20||What is Tetragon?|
|19||51:30||Security Observability with EBPF|
|20||56:16||Who else is in EBPF?|
|21||59:07||How to get started in EBPF|
|24||1:03:22||What is EBPF though?|
|26||1:07:45||How could we work with Liz?|
|27||1:09:12||What's up next?|
|28||1:10:22||How about you, Thomas?|
|29||1:12:17||Wrap up / Neon lights|
Click here to listen along while you enjoy the transcript. 🎧
Hi, Liz. Welcome back to Ship It.
Hi! Thanks for having me again.
Second time. Thomas, first time. Welcome to Ship It.
Thanks a lot for having me as well. It’s great to be here.
We briefly spoke at KubeCon EU 2022, the one that just happened. Thank you, Thomas, for taking your time, for sharing your eBPF excitement with me. I could feel it, but not just coming from you. 20 to 25 people, various people that I spoke with, that’s like half the conversations that I had at KubeCon, everyone keeps getting on about eBPF. “This thing is so cool…” People just love it. And I’m wondering, is it Liz? Is Liz an amazing chief open source officer, and she talks to everybody? Is that what’s happening here? What do you think, Liz? Or Thomas?
it’s really the technology. And I got very excited about eBPF over the last few years. Thomas has been involved in it since day one, but for me, when I first came across it, I could really – it was an eye-opener about what this could do, and how this could change… “we can change the behavior of the kernel.” That’s pretty cool. And I remember seeing Thomas presenting at – I think it was at DockerCon that I’ve got the poster up behind me on the wall as we record, showing Cilium, and I remember thinking “This eBPF thing… I’m gonna keep an eye on that.” And I’ve certainly, you know, over the past few years got more and more excited about it, and now it’s pretty much 100% of my focus is on Cilium, and eBPF, and the amazing things that eBPF enables. And if my excitement is…
Yeah, exactly. If that’s coming across to other people, then - well, it’s genuine. I genuinely think it’s a game changer, and I wouldn’t be working in it if I didn’t think it was so exciting.
I think it is very much so. The people make a huge difference, and if it started with Thomas - because I think that’s what I’ve heard here; Thomas was there since day one… When was day one, Thomas? Because by the way, the poster on Liz’s wall is DockerCon 2017. I’m sure day one was long before that. So can you tell us a bit more about that, Thomas?
Yeah, absolutely. Day one when it was called eBPF is around 2014, very close to when the first Kubernetes commit happened as well. The origins of eBPF are a little bit older. But 2014 is when the integration of eBPF - it was not called eBPF back then, but the integration of the idea into the Linux kernel, that was 2014. And back then it was Linux kernel, low-level only… The discussion was among kernel developers, not the eBPF with this broad ecosystem that we talk about today. So it’s been quite a number of years, and it’s been very exciting to watch the excitement levels go up and up, starting out initially by kernel engineers only, and the hyperscalers out there, like Facebook and Google, with their own kernel teams, and what they build… And then Facebook coming out and saying, “Hey, here, we have replaced our entire load balancer, and we saw a 10x improvement” - the light bulb started going up. And now what we have now is like an entire cloud-native ecosystem getting excited about it. It’s very, very wild.
There was a fantastic little fact that Daniel [unintelligible 00:05:49.14] presented - I can’t remember which talk it was, but I think it was 2017, or maybe 2018, one of those two; quite a few years. Every single packet that goes to Facebook - I guess now Meta - goes through XDP, which is part of eBPF; it’s kind of eBPF’s sort of network packet handling hook. And I think that’s astonishing, every single packet of the I don’t know how many countless billions that go to Facebook for the last four years, processed by eBPF. It’s pretty cool.
Wow. Now, that sounds amazing. when the big players are using it successfully for so long, you know there’s something there. Kubernetes started a bit like that, even though it was very different when it began. And now almost like the whole world is using it, I think, or at least has heard of it, and the majority considers it table stakes. I think it reached peak maturity. And I think eBPF definitely crossed the chasm. Definitely, from what I’m seeing. So ’80s in the making? A bit longer than that?
Yeah. I would even say the majority of us are actually using eBPF everyday, which just don’t know. For example, if you have an Android phone in your pocket, the little statistic that shows how much traffic each of your apps is consuming or producing - that done by eBPF. nobody knows… It was used as this lower-level magical technology for such a long time… What we’re now seeing is this getting exposed to more and more the industry and the public itself, which is super-exciting, but I think actually from a – as Liz was saying, eBPF has been used for years and years and years now, in many, many use cases that we rely on every day.
I think the real changing point, for the cloud-native community at least, is that everybody is now running kernels in production that are new enough that they have a pretty significant amount of eBPF capabilities built-in. So if you want to run eBPF-based tooling, you needed a certain level of kernel support, and pretty much everyone has that now in their production deployments. And I think that’s why we’ve seen this uptick in interest and excitement, and just adoption really, from people who were using not just Cilium, but also the observability tools like Pixie, or Parka or - there’s dozens of tools that people are now able to use. And they’re seeing great gains in performance, they’re seeing how effective eBPF is at giving you visibility, giving you high-performance… It’s a revolution.
It is five years later, Thomas, right? This is a reference to a talk that you gave, where if you want a feature in a kernel, you can ask for it, and five years later - boom! You have it. So is it five years later now, that people have been asking for this, and it’s everywhere, in every single kernel? I think that’s what’s happening here… That was your eBPF day talk, which was really, really good. Now, I only watched the first few minutes, so can you tell me the rest, and how does it continue? [laughs]
Yeah. So you mentioned Red Hat, ten years kernels… In my mind, if there’s anyone really serious about kernels, it’s going to be Red Hat, with their enterprise distributions, with their super-hardened systems… So security-wise, eBPF makes sense. Why does it make sense security-wise? Why the big companies are adopting it, and they’re like “Yep, this is good. This will not compromise our systems. Or the context, the boundaries around it are safe, are proven. We’re okay.” Why is that?
There’s two aspects here. One is eBPF has a technology to build really, really good security tooling. And typically, you can build really good security tooling if you have the most amount of control and visibility. So being able to see everything is great, and then being able to control as much as possible is great. The eBPF is exactly this. It can see everything, from the lowest levels, like a driver level, really close to the hardware, all the way into observing and tracing applications themselves, the function calls they make, and everything in the middle. That’s incredibly powerful and it unlocks security tooling.
At the same time, any security tool and any tool that’s used for security purposes obviously has to be secure by itself, and the foundation is built on top of. So there’s a lot of emphasizes on the security model of eBPF itself, because what eBPF enables is not completely new. If you have been using a Linux kernel, in particular earlier on, you may be familiar with the concept of a Linux kernel module, which allows to load additional kernel code. And it was invented for drivers. So if you have, let’s say, bought a new laptop, you needed a new driver for your graphic card, then you would load that kernel module. And kernel modules are completely insecure. If they crash, if they have insecure code in them, they would bring down the entire kernel, and they’ve also traditionally been used essentially as a way of invading the kernel as well to load malicious code. eBPF has built and includes a lot of additional safety, while enabling that same programmability. So there is a verification component that ensures that only safe and secure code can be run. eBPF itself is bound to capabilities, so you need the specific eBPF capability to even load eBPF code itself. And then it even goes as far as to implement things or concepts such as constant blinding, to make it harder to abuse the code-loading aspect as well.
So now we’re talking as kind of a next-generation code signing, so making sure that the kernel will only execute an eBPF program if the signature of it still matches the one that was generated when the program was written, and so on. And there’s eBPF being used to instrument, to observe what eBPF programs are being loaded, eBPF can be used to restrict what programs you can run, and so on. So I think there’s both the aspect of this really powerful capability, what you can do with eBPF, and then eBPF also really focuses on being a secure runtime, which is obviously used and required if companies like Facebook and Google use this at massive scale, for everything that they do in their own infrastructure layers.
I would say when people first sort of get the idea of eBPF, one of their first questions is “Wait a minute, this is all powerful. Is it safe?” And the verifier that Thomas has been talking about is a huge part of making sure that it is… But also, users need to treat it with the same respect that they treat root privileges. It is all powerful, and so is root, and that’s why we are very careful about who we allow to have root access to a machine. The same should be true for eBPF tooling as well. You need to be running – don’t download an eBPF program from the internet if you don’t know where it came from, the same way that you wouldn’t download any other code from the internet; you shouldn’t just download any other code from the internet and run it without knowing what it is. And things like the signing will really help to give people confidence about that.
I think this is an incredibly important point. I think it was Alexei Starovoitov, the other co-maintainer of eBPF working at Facebook, who said at one of the eBPF summits that you should treat eBPF programs as if you would write kernel code and merged that into the kernel and then shipped it to millions of users. And if that is the assumption, if you essentially – this is now part of the kernel, and you would do the same sort of vetting of the code that goes in, then eBPF is massively more secure than actually writing that kernel code, because it does run in a sandbox environment, it goes through the verifier, which is actually why some of the hyperscalers use eBPF to, for example, solve zero-day exploits, because they can literally not reboot all the machines that are affected quickly enough. It takes them weeks to actually reboot a machine with a kernel fix. So they’re using eBPF as a way to address zero-days in the kernel, and treat it as a better, more secure way of writing kernel code. So this point is extremely important - an eBPF program becomes part of the kernel. So it should not be untrusted; don’t load untrusted eBPF code into your Linux kernel.
So how can someone check whether the eBPF module that they got, or the extension that they got is secure, and it’s okay to run it?
This is when assigning comes in, right? As a user, you could of course disassemble the bytecode and figure out what it was doing. But just like you are running a program, like a non-kernel program as well, with capabilities, you need to trust the source that provided that application. And once you trust that source, you need to make sure that you’re actually running the code that the source provided you. This is where the assigning is coming in. And I think this is where open source plays a major role. So by using and relying on open source, the development process, and the code itself is open and public and can be reviewed by everybody, so you’re not running just proprietary binary in the end.
I know that you’ve done a lot of work on security layers, container security specifically, and I’m sure that if you’re thinking the fundamentals, the right fundamentals are there, that is a huge thumbs-up to how the entire process works. Because having people like you review the process and understand the components - again, it gives us confidence that someone that knows what they’re doing really knows what they’re doing, are saying “Yep, this is good.” And they’re having peer reviews, having the open source community look at it, having different companies look at it… Give everyone else the confidence that “Yup, this passed the test.”
Yeah, I think that’s fundamental to open source processes, isn’t it? It’s many pairs of eyes, many people’s thought processes contributing to not just developing code, but also sort of reviewing it and thinking about it, and thinking about the threat model, and thinking about how things could be abused. Yeah, certainly one of the reasons why I got excited about eBPF was that well, this really deep visibility has to be useful for security tooling. And I’ve worked in security tooling for quite a while, and you can see that any given approach has its pros and its cons. The fact that we have this ability with eBPF to see everything in a given machine - just to remind everyone that if you’re running containerized code, all the containers that run on a given virtual machine are sharing one single kernel. And the ability to instrument the kernel rather than instrumenting individual containers - it certainly strikes me as a much more… I very much believe in defense in depth but if you’re only going to observe from one place, then the kernel would be the place I would choose, because of that breadth of visibility.
Yeah. So I remember – that’s the one talk which I did have time to watch… The one that you gave, Liz, at KubeCon, the recent one. It’s on the Cilium service mesh.
And I really like how you talk about this concept of a service mesh, which is really important to security. You have mutual TLS, you have basically creating like network links between these disparate hosts… There’s a lot of things going on there. We used to do them in the application. Then it moved to the service mesh, to the sidecar, and now it’s moving to the kernel, where arguably it should have been from the first place, because hey, all networking habits in the kernel.
Well, in recent years, for the most of us, that’s where we remember that networking happens. And I’m starting to see, eBPF - as Thomas said, it’s pretty much everywhere, even places we don’t know. And some things that we may have been doing wrong, with like service meshes - like, not wrong, but suboptimal - there is a better way. And I can start seeing this convergence of all things coming together. But how do you think of service meshes, and where do you see Cilium in this context? Because there’s something really powerful there, where all this tooling is starting to come together in a nice way. And sure, you can write your own; everyone is free to do that. But maybe have a look how it’s done in Cilium service mesh, which - it’s the first time when I’ve seen it. Service meshes - I always had this “I’m not sure whether I need it”, but it started making sense. So can you tell us a bit more about that?
Yeah. I think for me as well, for a long time. You think about service mesh, but Kubernetes has service mesh as a native resource type. It’s a native concept in Kubernetes. So the idea of having a whole extra control plane to connect service meshes always struck me as just like “Is this really what we need?” And I didn’t have a solution for that, but it always struck me as a layer of complexity that seemed kind of maybe a bit overblown. And I think there’s a very interesting analogy… If we think of Kubernetes as a distributed operating system, and we’re running workloads in Kubernetes across a cluster of machines, or even multiple clusters of machines, service mesh starts looking like it’s the networking layer for that distributed operating system. But that doesn’t mean that it has to be separated out from the native, what we traditionally think of as networking. And that’s where I think Cilium service mesh is super-interesting and it becomes much more efficient. Because we stop saying that the networking layer that connects machines, or that connects individual pods is in any way different from the networking layer that connects services. They’re all using the same physical or virtual Ethernet connections basically underneath them.
And Cilium service mesh compresses our view of like pod-to-pod network, and service-to-service network, into one concept. And that’s possible because eBPF has visibility over all of the workloads on any given machine. So if we’re connecting one pod to another pod in a sidecar model, the pods don’t have visibility of each other directly. But with eBPF connecting them, they can have visibility directly. We can still have the service abstraction, but without actually having to implement it as a whole separate layer of networking. Being able to make those much more direct connections between the endpoints, whether they’re on the same node or on a different node - we don’t have to kind of route through two proxies to get there.
There was a slide which really resonated with me, and that shows the path that packets take, all the way from the application, down through all the layers… They have like a sidecar, there’s like a user space process, then it eventually hits the kernel, then goes back up again, through another proxy, as you mentioned… And it’s very, very messy. While, really, once it enters the kernel, it could just stay there, including for encryption, which - I find that fascinating. Like, TLS, all that, it can happen in the kernel, where it should, rather than going back to user space. And that makes it so much more efficient latency-wise, observability-wise, and that’s very important. You don’t have all the systems that you have to instrument; there’s just one thing, and you just look at that one thing, and it’s very obvious what is happening. Layer three, layer four, even layer seven, which I find fascinating. And HTTP/2 - that is a very hard protocol. HTTP/1 is so simple in comparison. People that have to do anything for HTTP/2, they realize there’s so much there. So I’m imagining – I’ve never done this, but I’m imagining observability for HTTP/2 is more complicated, maybe, than HTTP/1. What do you think?
It is. And actually, it’s a perfect example, because when we started looking at what of this service mesh functionality could we do natively in eBPF, we immediately came to the conclusion “Well, all we have to solve is HTTP/2 parsing, and we can be pretty confident that we can, from a protocol parsing complexity, we can solve most of it.” However, I think the best way to look at this is to understand there is kind of two very different perspectives. As I mentioned, we’re coming from the kernel development background, so our world is the kernel, and we have been looking at applications that we don’t really care what the applications are doing, we are providing services to applications - connectivity, TCP security, VPN, IPsec, and so on. A lot of what is kind of very similar to what service meshes started to do, but they have been coming down from the application level, as you mentioned. Initially, this was embedded into the application code. And to me, that’s very similar to the era where we were running applications on physical services, on physical servers. And then, that migrated or changed into sidecars, where we started to move that out of the application, so we don’t have to write that functionality for every application framework. That’s very similar to the VM age, where we said, “Okay, now we have virtual machines; they’re completely separate from each other, and we’re essentially running a separate copy of the operating system of Linux in every virtual machine.” That’s exactly the same what a sidecar is doing - you’re running a separate proxy in every application pod… And that makes a ton of sense if you’re coming from the application developer era, because what’s down here in the kernel level is very mysterious. Like, what’s going on there? I don’t quite fully understand.
So we have this both – we’re coming from the bottom of this very kernel-focused view, and everything in the kernel is simple for us. And for application teams - yes, deploying in a sidecar or proxy, that’s easy. I don’t want to deal with this kernel level. And now we have these two kind of layers converging together, and we’re seeing the evolution of what makes sense conceptually, which is all these service mesh values - resiliency, visibility, security, connectivity… That should be transparent to the application. And traditionally, the operating system has been why this has happened. And we actually did this shift before, from running multiple copies, kind of the virtual machines, to the shared operating system, which are containers.
So essentially, what we are doing with Cilium service mesh is to essentially provide both options. If you want to run the sidecar copies, you can - we have that as part of Cilium as well via our Istio integration - but then work towards moving as much of that as possible to the operating system, where it becomes as transparent, as invisible as TCP is today, where applications can just run on an operating system and they get the service mesh values. Because most of them are not completely new from what they provide, they just provide this at a different level, like for networking people that’s like instead of doing it at layer four, where we care about TCP and UDP, you do this at HTTP, or gRPC, or a different application protocol level.
I really like this TCP analogy, because you could theoretically run your own TCP stack in user space if you wanted to today. That is still possible, but nobody does it. And I think in some amount of time we will feel the same about sidecar proxies… Like, well, you could run one like that, but why would you choose to do so when there’s a much more efficient
And maybe this is also a good point to maybe expand a little bit on the gluing power of eBPF. Because one of the main questions I get is often “Well, can’t you do X with eBPF? But what about this? What are the limitations of eBPF?” And actually, it doesn’t matter that much, because the true power of eBPF is to glue existing things together. The example that Liz mentioned is amazing. There was a time when user space TCP/IP stacks were becoming more popular during the virtualization age, because we have frameworks like DPDK, and things like this, that offered better performance by going into user space, by moving out of the kernel, because the kernel became so hard to change. And eBPF has now introduced a tooling framework that allows us to glue individual layers in the kernel and in user space together, and find more efficient paths to connect dots of existing functionality such as the well-proven TCP/IP stack, which has been evolving for the last 30+ years, and is probably the best TCP/IP implementation that is out there… With for example an Envoy proxy you’re running in user space.
So eBPF - it’s less about solving every single problem in eBPF. That’s not the goal at all of the eBPF ecosystem in general. It’s about we want to use the pieces that we have, that we want, that are well-proven and working, cut out the pieces that maybe are no longer efficient or no longer needed, and find the best possible, shortest path for users to gain value out of that. And if we look at this from this perspective, it actually becomes often less important whether every single problem can be solved in eBPF itself.
So I would like to talk about the demo next that shows how all these things come together. I’m thinking Lizzie’s demo from KubeCon, but if you know a better one or a newer one, we can talk about that as well. Which way are we going?
[laughs] Was that my Tetragon demo, or my–
Star Wars. The Star Wars. The TIE Fighters, the exhaust ports… Star Trek fans - it’s okay if you drop off; not a problem. Or not drop off. Maybe fast-forward a few minutes.
Oh, we could cross the streams, we could totally do some Star Trek ones as well if people want that.
Can we? So yeah, let’s do – exactly, I love that, because this analogy works in any universe, whether it’s a Star Trek one, whether it’s a Star Wars one, it doesn’t really matter. Dune, any Dune fans… The Expanse, whatever. The point is, it’s a really good demo that showcases how all these things come together, how they show layer three, layer four, the observability element… It was really well made security-wise… It was amazing. Like, in a few minutes, users/listeners can actually see it. And I’ll drop a link in the show notes. But can you describe to us, Liz, the demo, roughly how it works, and what are the components, and how do they work together to showcase what is possible when all the different ones, or like all the components are put together in something that users can use?
I’m trying to remember exactly what I demoed, because we have quite a few Star Wars-themed things that fit together, and we pull different bits in and out for different demos… But essentially, imagine you are the security officer on the Death Star, and you have an API for the Death Star… And you maybe want to allow Empire ships to be able to access that API, but probably not anybody who is not part of the Empire ecosystem. So as the security officer, or maybe just the platform team for the Empire, you make sure that every Empire ship is labeled with Empire, on their Kubernetes Yaml. It’s just label to say “I am an Empire vessel.”
How do we verify that they’re an Empire vessel? Do they just say “Hey, I’m Empire”? “Okay, you’re Empire…” [laughs]
There’s a whole other level of certificates that we could add into this. You have to go to Darth Vader as a certificate authority to get him to give you a certificate.
Okay, that makes sense. [laughter]
But for now, let’s just stick with the labeling. So you might, for example, use an ingress to terminate that TLS connection, and as somebody enters the galaxy far, far away, you could terminate that TLS connection and verify that they were who they said they were, and truly they had a certificate issued by Darth Vader. But I’m stretching my analogy…
Speed of light restrictions do not apply. Everything happens instantaneously, right away, or near-instantaneously. Okay.
Yeah. And Cilium - because Cilium has, for every endpoint, be that the Death Star itself, or the Empire ships, or rebel ships, it has an endpoint identifier. It knows about – actually, it’s running eBPF programs to check policy; so we have network policy that’s looking either at layer three/four, or using those Kubernetes identities… Possibly even looking at the past. So perhaps you’re Empire-based ships - they’re labeled Empire, but even still, even if they’re the Empire and you’re going to allow them to land on the Death Star, you do not want them putting something into the exhaust port API. You want to make sure that if that’s the API call they want to make at layer seven, that they’re not permitted to do so. And we can do that inspection for every single packet that’s coming into the galaxy. I’m still going with this metaphor… [laughter]
This is the Cilium CNI that does it? Is that the component responsible for it?
Yeah, it is the Cilium CNI, yes. Yes, exactly. And we’ve had these – I remember when I first joined Isovalent, actually, Tomas telling me about how we already have an 80% service mesh, because we already have load balancing, we already have observability at layer three, four, and layer seven; we already have network policy at layer three, four, and layer seven. We have encryption. There’s not much more to go from the CNI that has all these capabilities to enforcing that with service mesh primitives like what we started with Kubernetes ingress. And we were already using the Envoy proxy for some of that layer seven capability, and we still do. We have Envoy proxy built into the Cilium agent that runs on every node. So at any point, if we have to terminate a layer seven connection, an Envoy proxy is doing that, and extremely capable of doing that. But a lot of – you know, we don’t have to terminate every single packet through the proxy. A lot of the time we can deal with everything entirely within eBPF, and that’s where a lot of the performance gains come from.
Okay. So if we were to take a cleanser break. Have you seen the Webb images (Webb with double b) from the galaxies, with the Webb telescope?
Oh, yes…! Yeah, the incredible zoom in to really detailed galaxies. Yeah.
With all the detail. What do you think of those?
Amazing. I mean…
Yeah, deep observability. Yeah, that’s eBPF-level observability. [laughs]
Thomas, what do you think? Have you managed to see them?
Absolutely. I’m a huge space fan. And even though I understand quite a bit of physics, I think the ability to look back in time with a telescope - it’s still very hard to really make sense of this, that you can essentially use a telescope and look back in time, like millions and millions of years, and see what appears to be an incredibly clear picture, just to be surprised again 5-10 years later, when the next generation of telescopes comes around, and you can see it even in more detail. So it’s incredibly fascinating, because it’s a role that we don’t see in our daily lives at all. It’s so different. It’s super-fascinating. I’m a huge space fan, and I’ve been following, of course, the launch, and the reveal of the pictures very, very closely.
And you know, Thomas was probably responsible for the fact that the observability component in Cilium is called Hubble… And I think we talked about - when we announced Tetragon, I think internally we even talked about whether we could name it as Webb, because of even more visibility… But at the time when we were first thinking about it, we thought “Nobody knows what the James Webb Telescope is.” And now I’m thinking, “Oh, maybe we should have gone that way.” I don’t know. Everybody knows, now that they’ve seen the pictures.
Hand on heart, when I’ve seen those images, the first thing which I thought about was “Is Isovalent going to use this name for one of their products, because of Hubble?”
Seriously, my mind went instantly – I don’t know why… There was like some eBPF tweets, and then there was this, and maybe my mind just like made the connections, and the thought that popped in my head… “Where is Webb? Why does Isovalent not have a Webb?” So Thomas, what can we do about that? [laughs]
Probably what we have today is probably definitely not the last project we have created around eBPF… And every time we see that telescope, it can actually – in the mirror, you can find the Cilium logo. And I think that’s – we’re so connected.
Oh, yes. See, I didn’t even realize it until you mentioned it. You’re right. The Cilium logo is in the Webb – like, all the mirrors, they have the same shape. Why is that? Is there a connection there? Was that intentional, or did it just happen?
Well, you could argue who was there first. Did they get influence from us, or the other way around?
The bees. The bees were there first. They created the structure of nature. [laughs] Okay… So why the hexagon, by the way? I’m wondering why the hexagon. Is there some story to it?
The story is that hexagons - you can fit them together very nicely, and tightly. So I think they’re actually a really good representation of containers and microservices and cloud-native. And also, if you look at bees, you have these hard-working bees, creating hexagon hives, hexagon-shaped hives… So from a theme perspective, that made a ton of sense why you see the bee as the logo for eBPF. Cilium is using a hexagon hive as its own logo… So as a theme, it made a ton of sense overall, and it’s why we started out this way.
And it’s very nice that it is a shape that gets used in space quite a lot. I remember – so a million years ago I was an intern at a company that made satellites, and I remember things like the insolations all done as a honeycomb shape. So it’s a very strong structure, strong and light structure.
Okay. So coming back to Hubble - I know that when I asked you in (I think) episode 26, when we last talked, Liz, you mentioned about Hubble, about the visibility… And I didn’t realize just how much it exposes. Webb would be amazing in the future, for sure. But Hubble, the one that we have today in this ecosystem - what does it enable users to see, and why is it important?
So you can get visibility into every single network packet. So you can use the Hubble – either the command line or the UI to see all the packets that you want to see, or filter them if you don’t want to see all of them… You can build up a service map, all of this being generated by eBPF code. So it’s extremely performant; those eBPF programs are very lightweight. And also, generating metrics in Prometheus format, for example, so OpenTelemetry. But those metrics - and we have some standard dashboards, and you can build your own dashboards as well - amazing level of detail that you can get through Prometheus and then Grafana. If you want to see latency, or you want to see where packets are being dropped, you want to see how many packets are being dropped… We had a really interesting internal demo yesterday about using those metrics to see whether your network policies are working correctly or not, because you can see whether or not packets are being dropped. And perhaps if you were trying to build policies that allowed packets to flow, you don’t want to see those drops. So you can see whether your policy is chained correctly using those Grafana outputs. It’s really nice. We’ll have to figure out how we get that demo into the public domain, because it was a really, really good demo that we saw there.
Yeah. I mean, people really respond to visual elements like that. When they see, they finally understand. And there are so many things to see. The kernel is amazing, what it does. Networking – and just like a small segment; there’s the CPU, there’s the memory, there’s so many things there. But even like just networking - all the layers, there’s so much detail there. So I’m wondering, is there a way that problems could be surfaced without having to build dashboards, without having to try and understand all the potential things that could go wrong?
I think this is what we might call Webb next.
If you look at the evolution, initially we had like very raw exposure of visibility. Like, “Oh, I want to see every network packet. And I would use TCP dump. Or give me a very raw metric of something like the number of packets received, or the number of HTTP requests being done, and so on. And then companies like Google and Twitter, they wanted better metrics; that was actually discussed around the time when eBPF got introduced, those were like the [unintelligible 00:44:11.06] metrics, and there was a large, large discussion in the kernel community, should the really advanced users be allowed to merge additional counters and metrics into the Linux kernel for something that is not really applicable for the vast majority of actual Linux users, outside of the hyperscalers? And the solution back then was eBPF to make this programmable. And what Hubble is built on is this ability to have intelligence in how to collect the metrics. Instead of instead of just exposing the raw information, actually for example create a histogram, or collect stack traces, or in the kernel correlate CPU consumption with a particular event, for example. I’m observing TCP retransmission events, so a packet had to be retransmitted - is that because of the CPU load, is it because of a network policy drop? And so on.
The next wave will be to actually build even more intelligence into the kernel with eBPF, where we actually identify problems. For example, we have stock exchanges using eBPF via Cilium, where they observe so called micro-bursts. So they actually want to understand is my application subjected to a small burst in data, or gaps, or so called TCP zero-window events? So very short, like microseconds, where the application is not receiving data. And these things are incredibly hard to observe via metrics, because you need a human to correlate and look at the graphic, and spot the problem. Computers are actually better at identifying a variety of these problems. That will be the next step. But this can typically not be done based on just a metric; you need to be very, very close to the source, because observing this is incredibly costly. So this is where eBPF comes in; it’s enabling exactly this.
So this is probably going to be the next level of observability that will be created with eBPF. We’ve done quite a bit of that already in our open source Tetragon project, and I think this space will evolve massively, where instead of having raw metrics, we actually have very intelligent sensors that give you a much higher-level signal of what is going on, what is going wrong, what could be the problem.
You mentioned Tetragon, and we mentioned it a few times… What is Tetragon?
Tetragon is I think all of our eBPF experience funneled into a runtime security and runtime observability project. So essentially, what we have done with Cilium on the network side, we’re dealing with Tetragon on the runtime security side. So it’s an open source project that uses eBPF to give you a visibility that is primarily security-focused. So we can see, for example, which application is accessing a storage device, or which process or application is accessing a certain file, or when our capabilities escalate, for example; when is the process gaining CAP_SYS_ADMIN capabilities? Or when does it become root? What system calls is it making? What child processes is it invoking? And so on. And we can also enforce rules based on that visibility as well, and actually restrict what is allowed and what is not allowed to improve, for example, the isolation of a container runtime, or to monitor and enforce namespacing, so the container isolation boundaries, violations of that, and give that visibility, that enforcement at a very, very low cost.
Again, thanks to eBPF, all of this runs in the kernel, very close to where the actual action is going on, compared to prior solutions, which primarily used a very small kernel level probe or sensor that just exposes visibility to user space, and then does the intelligence in user space. So it’s moving more logic into the kernel, where it is more efficient, less costly, and when it is less costly, it means that our users are able to actually enforce better, more finer-grained visibility or enforcement rules, because it costs them less. That, in a nutshell, is Tetragon.
Hearing you talk about this, Thomas, I remember that there is a book that two off I think the Isovalent folks co-authored, I think… Did you remember the title?
Yeah, The Security Observability With eBPF, I think it’s called… That Natalia and Jed wrote.
How can our listeners go and get the book? Do they just go to O’Reilly and buy it? What does that look like?
So that one’s actually what O’Reilly called a report, which essentially means it’s not sold through kind of bookstores. You can’t get it from Amazon. You can get it through O’Reilly’s own learning platform. So if you’ve got an O’Reilly subscription, that’s one way to get it. You can also download it from the Isovalent website.
Okay. Link in the show notes.
Where you can also download my “What is eBPF Report”, which is–
I was gonna mention that next, there’s also “What is eBPF?” So if hearing us talk made you more curious and you want to dig into more, “What is eBPF?” by Liz, that is a great one. Signed copies… I don’t know what’s going to happen next. There was such a huge queue at KubeCon, people just waiting – hundreds, I think; hundreds of people just wanting to get a signed copy. Are going to do that again, Liz? Signed copies?
Oh, yeah. Yeah. If people want –
When is the next time?
It’s certainly KubeCon Detroit. We might find another opportunity before then. I think maybe also Container Days in Hamburg is a potential opportunity. Yeah.
Is there like an eBPF-related conference? I think there is one. There’s like this eBPF day, but that is part of KubeCon. But there’s also another one, I think… Is it virtual?
The eBPF Summit, which is virtual, yes. So it’s going to continue to be virtual this year. It arose during the pandemic years when there wasn’t a choice, but it was so popular, so well-attended from people all around the world that doing it virtually - it enables a lot of people to participate. So the CFP, as we’re recording, is open. We’re seeing some really interesting submissions coming in. It’s going to be September the 28th and 29th, so block those dates in your diary. It’ll be two days… Short days. So we try and time it for evening in Europe, morning on the West Coast, so that as many people as possible can join us in their waking hours, with apologies to folks in Asia. And it’ll be four or five hours of jam-packed eBPF content. It was so much fun last year… Yeah, I really hope we can pull off – if we can pull off as equally fun this year, it’ll be excellent.
Are digitally signed copies an option?
Oh, of the book?
Like, is there a way, Liz, to get digitally-signed copies for “What is eBPF?” and then “Security observability with eBPF”, the two books?
We haven’t come up with a solution for that yet… I’m quite old-school about this. I feel like having something that you’ve written on with pen - it’s got a bit more of a physical, tangible feel to it than something digital… But yeah, maybe we need to come up with something, yeah…
Always the preferred option, for sure. But I’m thinking double-signed. You can get like a digital copy signed, maybe, and then you get like an actual one signed, and signed for real. So you can get both; you can enjoy one while you’re like on a train or on a plane, and you want to get the book out, and the other one when you’re at home, in the lounge, or out in the sun, and you can read an actual physical book, and go to an actual physical conference.
This is somehow reminding me of the excellent Cert Manager fakes on their booth at KubeCon. We’re giving you physical certificates, so you could – in the same way, they will generate you a certificate online, they would generate you a physical… It’s like a little bit of card with a QR code on it - it was pretty nice - to verify that you have been at their booth.
Yup, that is a good one. That is a good one. So yeah, so physical books… I still need to get mine. I still need to get mine. Can I get one through Isovalent? Can I go and – or is it just like digital, and I can download it from there, the books.
Just digital from there, yeah.
Just digital, okay. And physical ones - I think at KubeCon, you mentioned.
For sure. At KubeCon, yeah.
Okay. Okay. We talked a lot about eBPF, the community… Maybe not so much the community as much as the projects, because there’s a lot of things… And we only covered a small portion. eBPF is a huge, huge space. So who else beyond Cilium and beyond Isovalent is in this eBPF community?
eBPF kind of sparked out of the Linux kernel community. So in the beginning, around 2014, the majority of the bigger kernel contributors were Google, Facebook, Netflix, Red Hat and so on. And then as eBPF started to evolve, we saw an entire ecosystem being built around it, from SDKs, and libraries that actually allow you to write eBPF code in higher-level languages, and then a set of end user projects. Cilium was one of them, Falco, BCC, bpftrace, Tetragon, Hubble,l and so on. And all of that now makes the eBPF ecosystem overall, from kernel level – we even saw Microsoft port eBPF over to the Windows kernel in the last year… And when that started to happen, it started to make sense to think about who should be involved in a broader sense, outside of the Linux kernel community, which was kind of the governing structure for eBPF as the technology itself. So we have last year created eBPF Foundation. It is part of the Linux Foundation. Founding members were Google, Facebook, Netflix, Microsoft and Isovalent. Since then we have gained a lot of additional members, including Red Hat, as well as a variety of different security vendors, and so on… And this is now essentially kind of forming the governance body, that comes together, and standardizes eBPF, and discusses security models, organizes some of the conferences, and so on.
So if you’re interested in actually engaging in eBPF, outside of just a purely code level contribution - which you can of course do completely independently, and many do - you can obviously engage through the Linux Foundation, via the eBPF Foundation as well.
So today I think it’s a well established technology, with many really big industry players relying on it, not only from a maybe product perspective as Isovalent does, but also maybe from a just using this as a core technology for infrastructure, and all of this shared care, and I think attention is now centralized and managed through the eBPF Foundation.
So if as a listener I’m an open source enthusiast, I participate in Kubernetes, or I’m interested in Kubernetes… I’ve heard that eBPF is also huge outside of Kubernetes, so by the way, if you’re thinking this is just Kubernetes-specific - no, no, no; that is like maybe the easy mode that many go for. But there’s also hard mode. There’s definitely hard mode. So how can people get involved, get started with eBPF, the eBPF ecosystem, so that first they understand just how big it is, and what is possible?
I think the best way to get started is actually to attend an eBPF summit, or to watch recordings from a prior eBPF Summit. It shows the width of the ecosystem, from hearing talks about from the eBPF maintainers on the security model on the verifier, to users talking about their story, why did they choose this particular eBPF program or project, what problems did they solve, to new upcoming eBPF projects being kind of first talked about, to research being done in eBPF, and so on. It shows the full width, and it also shows you a lot of points where you can get involved on whatever level that is, whether this is I want to start getting involved in the project on the documentation, or code level, or I just want to be part of it and try it out and learn with others together - the eBPF Summit is a great way of getting involved there.
We also have an eBPF Slack, with thousands and thousands of eBPF folks that want to collaborate together, on all sorts of different levels, like from code level, deep down to “I’m a Cilium user”, “I’m a Falco user”, “I’m a Pixie user”, you will find that on eBPF.io. That’s the community eBPF site; it has a Slack link, as well as a link to the recordings of prior conferences, whether it’s eBPF summit, which is more a higher level one, all the way down to the BPF developer conferences, where the lower-level details are being discussed.
I just have it open right now… I’m looking at ebpf.io, and it’s a really nice website. I don’t know who built it, I’m just gonna have to go scroll to the bottom… No, it doesn’t say that. Who’s behind it? Because this is like “What is eBPF?” Project Landscape, right there at the beginning, a nice diagram… Just the right amount of text that makes good progress… This is really good. Who’s involved with this? Do you know?
We have sparked the idea and we are maintaining it together with the eBPF ecosystem; it’s all the contributors to eBPF, together with all the different folks that have been involved in the eBPF since the early days, from Daniel Borkmann that Liz mentioned, one of the co-maintainers, Alexei, Brendan Gregg, of course, more and more employees from our side… So it’s a collaborative effort across all of the eBPF communities eyes.
There’s a similar one, eBPF.foundation, and I’m thinking about Isaac Asimov (I have to) whenever I see Foundation… So like keeping it in the science fiction theme… And there is like an even wider view, slightly different, but still similar… eBPF Summit 2021, videos are available, Watch now… Daniel Borkmann - that’s like a video right there… Okay, this is really good. I mean, just skimming it, I already know what it is and what the options are and where I want to go next. So this is great. Apart from the website - you mentioned Slack… Are there community weeks, or community hours that are being run? I know there’s also like e – eCHO… Is it eCHO, Liz? Tell us about that.
Yes. So eCHO very loosely stands for the eBPF and Cilium office hours…
That’s a good one. That’s a good one.
Yeah. [laughs] It’s a weekly livestream that I host, and Duffie Cooley, who many people will know from TGIK. We were very inspired by that livestream. So the idea with eCHO is that we’ll explore anything related to eBPF or Cilium. And we had some amazing guests showing off what they’ve been doing in eBPF projects… We’ve had some really interesting demos of tools, demos of things you can do with Cilium, walkthroughs of different tutorials… Duffy did a really interesting one the other week about the life of a packet in Cilium, which was really great. So yeah, we cover tons of different topics. We’d love people to come and join us when it’s live, and ask questions, because the kind of community aspect of that keeps it really, really fun.
So I really like this community aspect, I really like that there’s a lot of activity around it, there’s like whole summits, huge names are part of the foundation, amazing contributions from everywhere… I don’t know what – is eBPF a utility? It’s not really a utility. How would you call it? What is it? I’m trying to find a word that describes it, because it’s everywhere. And I don’t wanna call it air, because it’s not air, but it’s like air.
I quite often call it a technology platform. I don’t know how accurate that is.
It’s a – maybe the highest level one is just it’s a programming language for the operating system.
Yeah. So from an open source perspective, I see a very healthy ecosystem. From a business perspective, there’s the big names, obviously; there’s also like the smaller companies that you can go to, and one of them is Isovalent. And Isovalent - I mean, you’ve been involved with eBPF since before Isovalent. So if I have a business, and if I’m depending on eBPF, what would prompt me or direct me towards Isovalent? What is the value behind engaging with Isovalent? And by the way, if as a listener you’re still like “No, no, no, this is not an ad. I’m genuinely curious”, and just skip a few minutes, it’s okay.
Before we even started the company, we created Cilium, and we saw just the huge potential of the technology itself, there was this huge urge, “We want to create something amazing with eBPF.” And maybe different to how eBPF was used so far. Actually create something that is usable for the mass, which is why we created Cilium. Like, okay, we want to bring eBPF to the cloud-native world and bring all of its powers to end users. Because eBPF itself, it’s really on the programming language level; it’s an assembly, bytecode language; you need to be almost - not quite, but almost a kernel level [unintelligible 01:05:44.13] to consume and use eBPF directly… Which is doable for companies like Facebook, and Google, and Netflix, and LinkedIn. They’re really big names out there with their own kernel teams. That’s not typically the case for your standard enterprise, or your SMB. It’s a low-level, deep low-level technology.
So the reason we created Isovalent when we founded Isovalent is to bring and help enterprises get to the value of eBPF via Cilium. So we have products based on Cilium that solve a bit more than what Cilium OSS can do, focusing really on the enterprise, maybe compliance-specific use cases. So you can of course completely run Cilium OSS on your own, and as you dive into the more very enterprise-specific use cases, with the very concrete compliance requirements, that’s typically when you start looking into our enterprise distribution.
Of course, there’s also a support angle, so if you are happy with Cilium, you’ll start relying on Cilium, you’re building your applications on top of Cilium and eBPF… You want to be able to call a company for support if something goes wrong.
That is the big one, that we keep coming back. Like, when you have a problem, who are you going to call? The Ghostbusters? You will need Ghostbusters to go down in the kernel and really understand this eBPF thing. And sure, the forums are there, the community is there, but can you afford to do that? I mean, most of you may be can, but I know that some of you will need this. So it’s there for when you need it, and when you need it, you will know it. So no one will need to sell you on it, but it’s there.
Yes. I believe it’s quite simple… If you build great products, customers will love the products and they will be glad to pay for them. And eBPF allows us to build amazing products. So that’s really all that we focus on, building great, amazing products. We’ve always believed that the rest is coming, and we have had more than enough success based on what we’ve built so far. So we’re not worried at all on the business model side in terms of what eBPF and our open source ecosystem allows us to do.
I know that some of the listeners, some of our listeners may be thinking, “Oh, I wish I could work with Liz. I wish I could work with Duffy.” How could they do that? And this is not a leading question. We didn’t talk about this; like, in the spur of the moment… Because genuinely, if you like eBPF, and if you’ve been maybe a contributor, or have been close to the ecosystem - how can you get closer to Isovalent, Cilium and just work on it full-time? What does that look like?
We have many, many, many openings right now, from Go software engineers, eBPF engineers… So you don’t need to know eBPF right now; even if you’re interested in eBPF and maybe you have some Golang knowledge, you have some Kubernetes knowledge, some security knowledge, feel free to check out isovalent.com. We have a careers page, with many offerings in engineering, marketing, some solution architecture community roles… We’re growing pretty quickly right now, so if you’re interested, have a look. There might be an opening that is interesting for you.
Yeah, that’s a good one. Okay. Thank you for that, Thomas. As we prepare to wrap up, I’m going to ask a different questions than I normally ask. I normally ask about the key takeaway, but I think we had so many, starting with eBPF is everywhere, and you don’t even know it… I’m sure eBPF is somewhere in the path of us recording this episode… There must be somewhere, some eBPF running, and people don’t even know it. It’s ubiquitous at this point. What do you have coming up? It’s summer, the holiday is coming up, the summits… I mean, I know that Liz enjoyed some time off. Thank you, Liz, for sharing so many great pictures on Twitter. I remember one, like, very nice blue water… What else do you have happening this summer, and as we go into autumn?
Well personally, the next two weeks I’m going to be doing jury service, which is going to be a bit of an eye-opener, I think. So that’s definitely very different from my day job; it could be an interesting insight into the criminal justice system… And yeah, then when I come back, I’ll be back for a bit, then I’ve got a couple of weeks of proper vacation, and then we come back into the autumn, queued up for eBPF Summit and the autumn conference season. Things like Open Source Summit, obviously KubeCon… It’s always busy in that autumn period.
And before you know it it’s Christmas. So maybe I ask you about your Christmas present closer to Christmas, because it’s weird; it’s summer right now, but I know time just flies. What about you, Thomas? What is coming up for you this summer and autumn?
I’m definitely already looking forward to kind of the holidays around Christmas, because I’ve seen the Christmas present that we’ll be giving out to our own employees, and it’s going to be amazing. Short-term, I’m looking forward to spending time in the mountains. I love nature. It’s the reason why even though I’ve always worked for American companies in my entire career, I’ve never left Switzerland. It’s the mountains. I cannot do skiing – or technically I could, but I’m not going to do skiing, but like hiking, trail running, spending time with the family in the mountains. I’m looking forward to that. I think the Alps, Switzerland is just amazing for this.
Wow. Alright, you’ve just touched a very soft spot… But we’ll leave that for another conversation. And that’s exactly the way you’re supposed to plan. You pick your Christmas presents in summer. And you plan your summer holidays in winter. That’s exactly how it’s supposed to work. If you’re organized and you know what you want… I also subscribe to that idea. So I also know what’s happening for Christmas, including the holidays and everything. So yeah, some of us just are wired that way.
Anything else that you’d like us to cover? There’s so many follow-up questions, I just have to contain myself, and I was containing myself as much as I could, but anything else that we didn’t mention, that you want to mention?
I think we’ve pretty much covered everything. If people do have more questions, eBPF Summit is not going to be that far away, and that will be an amazing forum to have some questions. Also, just joining the Slack channel, where Thomas is there, I’m there, but more importantly, our whole community of thousands of people who are interested in EBPs are there, and really helpful. There’s a really good spirit on that Slack channel.
So the one thing, the last thing which I want to mention is this hexagon-shaped neon on Liz’s wall. There’s a screenshot in the show notes… I’m just fascinated by it. It looks amazing, and I want one. Can you imagine how great it would look on that blank wall? On those foam tiles? Okay, I think – okay, Liz, I will talk… Oh, actually, no; can you tell us as we record, so that listeners can know where to get one?
Yeah, so that was actually my birthday present from my husband, but I know where it came from. It’s called Twinkly LED. And they have a variety of different kind of light formats that you can program. It will appeal to our audience here, I think, because you can lay out – I’ve done a hexagon… You can lay it out in any format, and then you scan the lights with your phone using an app, and then you can kind of program the color scheme to match the layout. It’s extremely cool. I love it.
We need a referral link. I’m going to tweet that referral link, and use it.
I will find the link and I will post it, yes.
And we need one for Thomas, too. Maybe. If Thomas wants some LED lights on his walls. But that’s great… Thank you, Liz. Thank you, Thomas. I had a lot of fun. Looking forward to eBPF Summit and seeing you in autumn.
Thanks so much for having us.
Our transcripts are open source on GitHub. Improvements are welcome. 💚