This is the post-KubeCon CloudNativeCon EU 2022 week. Gerhard is talking to Matt Moore, founder & CTO of Chainguard about all things Knative and Sigstore.
The most important topic is swag, because none has better stickers than Chainguard.
The other topic is the equivalent of Letâs Encrypt for securing software.
Matt Moore: [20:08] And in particular, thereâs sort of two classes of apps. Thereâs apps that ignore signals and they stick around for whatâs called the termination grace period, which is between the SIGTERM and the SIGKILL, which luckily defaults to like 30 seconds. So itâs not sitting around forever.
And then the other class of people are the people who â well, thereâs a third, which they do it properly, but thatâs like super-niche. The second big category of people are like, âOkay, well, Iâm going to do signal handling. When I get SIGTERM, Iâm just going to quitâ, right? And thatâs actually not what you want to do, right? You want to handle SIGTERM by starting to fail readiness probes, but all your normal requests will be handled properly, because it takes time from when you start to fail readiness probes until your pod is marked not ready. Thatâs the failure threshold on the readiness probe. And then once your podâs marked not ready, that has to roll out to all of the network programming, right? So your podâs endpoint has to be removed from the endpoints on the API server. So the service controller has to see that your podâs not ready, remove it from endpoints⌠But youâre not done there, right? Those endpoints then have to be propagated, in vanilla Kubernetes, to all of the nodes, which have to reprogram their IP tables, or if youâre in mesh mode, every single pod sidecar now needs to know that like, okay, that endpoint is no longer available, right?
So in some cases and some scales of clusters, I donât think that 30 seconds is even necessarily long enough. But the reason I bring it up is we did a whole bunch of magic in Knative, since we know itâs an HTTP based service, to make it so that it is really hard to get that wrong. Because itâs really hard to get it right in vanilla Kubernetes, but itâs actually really, really hard to get that wrong in Knative.
One of the things we do is we have a pre-stop hook where we do something somewhat magical where the pre-stop hook is on one container, but the place to send it is on the other container. So we have a proxy that sits in front of the application container, and when Kubernetes is going to go stop the pod, instead of actually sending any signal to the user container, it sends it to our sidecar first, and our sidecar starts to fail probes, and do it properly, so that you donât have to.
So if youâre in the first camp of folks who doesnât really handle the signaling at all and just continues to serve traffic normally, you will still drain properly, because our - what we call the queue-proxy - will actually handle that for you. And if youâre in the second camp, where you just do what I call a YOLO exit, youâre like, âIâve got the SIGTERM. Iâm outâ, youâre still good as well, because since we have that pre-stop hook, we get the signal first, we make sure traffic has drained, and then by the time youâre actually getting that signal, trafficâs been routed away from that instance of your application. And so itâs really, really hard, actually, within the context of Knative, to handle that wrong. And I think thatâs a really important thing to get right if youâre using any sort of auto-scaled application, because when you scale up, thereâs a window where the new podâs coming up, and if it reports ready before itâs really ready, youâre in trouble; youâre going to serve 500s. And when youâre scaling down, if traffic continues to go to those pods after theyâve started to shut down, youâre going to get 500s, right? So the goal is zero 500s, and we have all kinds of tests in Knative where weâre like, âNo, there should be zero 500s.â
[24:05] The other thing that we do that is really hard is - and the networking layers make this incredibly hard to do, and we work around all kinds of stuff in basically every Ingress provider - is ready means ready, right? Everyone at the networking level is like, âYeah, itâs eventually consistent. Itâll get there at some point.â But itâs like, no, if we roll out a new revision, we want to know, when we tell the user like, âYeah, yeah, youâve got your new codeâ, that weâre not lying, right? And so Knative does all of this fun stuff where we actually inject hashes of the network programming into the network configuration in ways that our elements of the data path will respond with the header thatâs being injected by the network programming, and then the components we have can actually probe different things to understand what version of the network programming has been rolled out. And then once itâs been rolled out everywhere - we canât do this in mesh mode, because we canât probe mesh sidecars, but we do this for probing the pool of envoys if youâre running outside of mesh mode. So for instance, traffic serving off cluster, we can probe and make sure that once we fully roll things out and we say itâs rolled out, you should never get the old version. It is at the new version, because weâve confirmed all the networking programming is there.