Rob Barnes (a.k.a. Devops Rob) and Rosemary Wang (author of Infrastructure as Code - Patterns & Practices) are joining us today to talk about infrastructure secrets.
What do Rosemary and Rob think about committing encrypted secrets into a repository? How do they suggest that we improve on storing secrets in LastPass? And if we were to choose HashiCorp Vault, what do we need to know?
Thank you Thomas Eckert for the intro. Thank you Nabeel Sulieman (ep. 46) & Kelsey Hightower (ep. 44) for your gentle nudges towards improving our infra secrets management.
Featuring
Sponsors
FireHydrant – The reliability platform for every developer. Incidents impact everyone, not just SREs. FireHydrant gives teams the tools to maintain service catalogs, respond to incidents, communicate through status pages, and learn with retrospectives. Small teams up to 10 people can get started for free with all FireHydrant features included. No credit card required to sign up. Learn more at firehydrant.io
MongoDB – An integrated suite of cloud database and services — They have a FREE forever tier, so you can prove to yourself and to your team that they have everything you need. Check it out today at mongodb.com/changelog
Chronosphere – Chronosphere is the observability platform for cloud-native teams operating at scale. When it comes to observability, teams need a reliable, scalable, and efficient solution so they can know about issues well before their customers do. Teams choose Chronosphere to help them move faster than the competition. Learn more and get a demo at chronosphere.io.
Sentry – Working code means happy customers. That’s exactly why teams choose Sentry. From error tracking to performance monitoring, Sentry helps teams see what actually matters, resolve problems quicker, and learn continuously about their applications - from the frontend to the backend. Use the code SHIPIT
and get the team plan free for three months.
Notes & Links
- Bitnami Sealed Secrets
age
CLI- Mozilla SOPS
- Experimental LastPass provider for Kubernetes Secrets Store
- HashiCorp Vault
- đź“– Infrastructure as Code, Patterns and Practices - Rosemary Wang, July 2022
- 🎬 Cloud Identity with HashiCorp Vault - Rob Barnes, DevOps Exchange London, March 2022
- 🎬 Developing a Secrets Engine for HashiCorp Vault - Rosemary Wang, August 2021
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Hi, Rosemary. Welcome to Ship It!
Hello!
Hey, hey! How’s it going?
It’s going well. It’s going really well, actually. Today was a very weird day weather-wise. It’s almost summer, but not quiet… And I know that you’re in the U.K. as well, Rob. Rosemary, is it summer where you are? Is summer happening?
It’s sunny, it’s summer, I have the air conditioning on… I had to turn it off, but yeah, it’s warm enough.
That’s amazing. The only thing that shouldn’t happen today is snow, and I hope it stays that way; it rains, it was like windy… All sorts of things. Cold… Mostly warm – anyways, it was a mix of things. So things are good. Whenever there’s some, you have to be on your sun lounger, and Rob knows a thing or two about that, because in the UK we don’t get a lot of it; so when the sun is out, sun’s out buns out, that’s the saying. I’m not saying that; that’s the saying.
Barbecues…
Barbecues, that as well.
Summer is a myth in the U.K. It’s one of those things… It’s almost like a - maybe a unicorn is a good phrase; it kind of doesn’t exist. But make-believe…
Exactly. It’s behind the cloud somewhere. So by the way, just correction I said buns, not bums. Buns, okay…?
[laughs]
Moving on… For the barbecue. Buns for the barbecue, that’s what I meant. So I would like to start by thanking Thomas Eckert, one of our listeners, for the intro, because he’s the one that made this happen. So thank you, Thomas. Shout-out to you. And he’s been hearing us talk about secrets, and how we use secrets in the context of the Changelog infrastructure, and I’ve been meaning to do something about this for actually a few years. So how do we store, how to distribute credentials for the changelog.com setup. And it kept coming up in Episode 44, the one with Kelsey, episode 50 with Adam and Jerod… So this is the episode where we get to dig into it.
[04:24] Okay, so we may have been doing secrets wrong… We don’t know, but we’ll find out today. So I’ll get straight into it to tell you what we do today, and you tell me what would be a good next step. So today, all our secrets are stored in LastPass. Credentials, Twitter, GitHub, client API - all those things that we use in our setup to integrate with all the services, they’re stored in LastPass. We run currently our setup on the fly.io, which is - think of it like a PaaS. And there’s a command that we run locally, on my machine (ironic) to synchronize the secrets between LastPass, and fly.io. And when we were running on Kubernetes, I was doing the same thing; running a command was getting all the secrets from LastPass, getting them into Kubernetes, in that case. So how do we improve on this? I’m not sure who wants to start, but I know that we can improve on this, and I’m wondering how.
I don’t mind kicking off. So I think what you have at the moment is centralized secrets management, right? So that was always gonna be the first step. What you also have in that centralized secrets management is long-lived credentials. Am I right? Essentially, you have API keys, and so on and so forth.
Yes.
They just existed, sol you go and manually rotate it at the target platform and then update LastPass with that, right?
Exactly.
Okay, cool. So at the moment, you’re at the point where you have reduced your attack surface, because you now don’t have the credentials in your actual workloads. It’s coming from some kind of secrets management platform. But where you can improve that is dependent on what the secrets are. So if they are credentials for like a cloud platform, or a database or something like that, we don’t actually need to have long-lived credentials for those. In fact, what we can do is we can have short-lived, on-demand credentials. So actually, the credentials don’t exist until the application needs it. When it needs it, it says, “Hey, can I have some credentials?” and it will go ahead and work with those credentials; the workload forms its function, and after a TTL has expired, those credentials are revoked from the target platform. And all of this happens automatically in the background, right?
So when we talk about people going on their secrets management journey, people may hear of things like ephemeral secrets, and think “Oh, that’s what we need to get to.” Actually, it’s just another step. The first thing you want to do is exactly what you’ve done. Take the step of centralizing your secrets management, take the step of ensuring that all of your long-lived credentials are in one place, and then, bit by bit, you can start to say, “Okay, we don’t want this one long-lived anymore. How do we start to do that?”
Now, in terms of how you can implement ephemeral secrets - well, that would depend on your secrets management platform, whether it supports that or not. That’s, generally speaking, where I would say the next step is if you want to increase your security posture, I would say. Is there anything you’d add to that, Rosemary?
Yeah, I was going to say, it’s pretty good… [laughter] Given you’ve already started with centralized secrets management, I will say, there are a lot of different patterns you could entertain. That’s actually pretty good for one person taking something down and pushing things out.
One question I wanted to clarify was what happens if one of those API tokens expires right now? What’s your workflow look like? How would you update that in Kubernetes, or one of your target platforms?
So I don’t know about a secret expiring. But what happened… The secret was – I’m not sure whether a compromise, but we had to rotate it. So it was a forced secret rotation. So the way we did it is we updated – actually, we created a new credential. We updated the one in LastPass, we ran the command manually to update it on the target platform, and then we had to restart the workload, because it wouldn’t automatically pick up that rotation.
[08:23] And actually, if you look in our repo, there is a make, how to rotate secrets, and it will give you a seven or eight steps by step of what you do to actually rotate the secret. Now, that only works on Kubernetes. I haven’t updated it for fly.io; it’s fairly similar, because the concepts are the same… But that’s something which I would like to improve. I would like to not be – like, a human shouldn’t really be part of that. I think, if a secret gets updated, there should be a system that notices, “Oh, there’s an update to the secrets” and it rolls everything through. Is that a step too far? What do you think, Rosemary?
No, it is not a step too far. I mean, I think we all want that, right? Because it’s difficult to maintain secrets manually in this manner. I can definitely empathize with it, because there have been times where we’ve had to rotate a secret or a secret’s been compromised. And we had to manually push this and restart every pipeline that the secret was associated with. It was very hard.
I think the problem right now and some of the challenges you’re encountering have to do partly with manual referencing manual replacement… Revocation is done manually as well… And so all of this adds friction to the process, right? Really, you just want to develop – you could care less about the secrets. In fact, it’s better you probably don’t know about the secrets, right, Rob? In this case, actually it’s Jerod that cares and knows about the secrets, and he says like “How do I do this?” So I had to add that make target for him to basically capture it, what he needs to do. And you’re right, in this case he wouldn’t need to care about that. He would just need to know, “Okay, I maybe trigger something somewhere, and it automatically flows through.”
But rather than basically encapsulating these steps in a pipeline, I’m wondering if there’s something better that we can do; if there’s a system that can do this for us, which has been built and optimized for rotating secrets. That’s what I’m wondering. It’s not a leading question, because I don’t know if it exists. I haven’t I haven’t checked Vault. I really haven’t. I know it’s great, but I’m wondering if it can do that, so you tell me if it can, or it can’t.
Yeah, well I think you hit the nail on the head. So just in case anyone doesn’t know, Rosemary and I work for a company called HashiCorp. You may have heard of us. I’m one of the–
Just Vault. I’ve only heard of Vault. [laughs]
He’s only heard of Vault. He probably hasn’t heard of TerraForm. Imagine that… [laughter]
Oh, that’s a great story about – okay, let’s leave it for later. Sorry… [laughter]
So I think that one of the things that – so I’ve been working with Vault many years, and how I got into Vault was every time – so I used to be a consultant, I used to help my customers increase their security posture when they moved to the cloud, and just extract more value from the cloud. And a lot of the challenges that my customers used to come up against from a security perspective - I’d always kind of look out there and see what other solutions that kind of tick all the boxes to deliver the business value they need? And Vault - it just kept coming up. So I thought, “Hey, I’d better kind of really dig into this and just understand why it keeps ticking the boxes, and understand the inner workings of it.”
So when we talk about taking away some of these manual steps - and yeah, what you’ve done is you’ve taken the approach that a true DevOps professional should take. So instead of trying to just build automation, you kind of automate the little pieces, and then you automate the automation, right? And that’s kind of the approach you’ve taken. And I think that’s good for a lot of things. But where it comes to security, I think we need to start to put systems in place to protect ourselves from ourselves, right? And this is where things like Vault can help. Because when we talk about the automation, rather than us having to write scripts, the automation is built into the functionality of the secrets management platform. In this case, we are talking about Vault.
[12:19] So we briefly discussed the concept of ephemeral secrets… That’s one form of automation. So it will create the secret and it will revoke it after a TTL has expired, which is going in and making the API call to the target platform, and doing it again to revoke it, and so on and so forth.
You also have things like – okay, so you have your secrets now in Vault, in this case. How do your secrets get from Vault to your application? And there are different patterns there. The common pattern, the favorite pattern, I’d say, in my opinion, is one where your application doesn’t even need to know about the secrets management platform, right? It doesn’t need to know about that. It just needs to know that I need to look somewhere to get the information I need. Maybe it’s a mounted volume, or something like that, whatever.
Environment variable… I’m pretty sure everyone uses environment variables these days, right? I remember writing the functionality in the app itself to first check if there is a mounted secrets path, like the volume. And if not, just like a fallback to an environment variable.
Exactly.
I think there’s a topic right there… Whether one is better than the other. And I think we already know the question to this. So why is it so hard to use mounted volumes versus environment variables? Everyone goes to environment variables. Mounted volumes - a pain in the ass for a lot of people. I don’t know why.
I mean, even when I’m writing integrations, and some of these integrations need secrets, rather than just putting the kind of argument for you to enter a secret into there, I’d always read from environment variables. I think a lot of people don’t understand the attack surface around environment variables. I myself only really recently started to gain a better understanding of it. But essentially, the process that has that environment variable - any kind of child processes that it kicks off has access to the same environment variable. So this can kind of leave you open to kind of rogue processes - or sub-processes in this case - having access to that information and doing whatever with it, that it is designed to do.
So when we talk about this thing, about mounted volumes versus environment variables, this is kind of, in my opinion, where the debate exists, is because people know about some of these risks, and they say, “Well, what’s a better way of doing it?” and then someone will come along and say mounted volumes. Obviously, we know that that’s got its own associated risks as well. And I think the biggest lesson I’ve learned throughout my career is when it comes to security, nothing is ever absolute. It’s always about striking a balance that the individual organization is comfortable with, and making the decision that supports the strategy and security posture of that organization.
So should you use environment variables? Should you use mounted volumes? I really don’t know the answer to that. But how does your organization feel about both and why? But it doesn’t really matter. From Vault’s perspective, if your application wants to read from environment variables - well, it can render it to environment variables. Your application doesn’t need to know about Vault; if it wants to read from a mounted volume, you can write it somewhere. And that’s one of the beautiful things, because you have this thing where - okay, great, you’ve got secrets management, you start to onboard your long-lived secrets, and then you start to turn them into ephemeral secrets, right? But it’s not really valuable to your organization unless your applications can start to use it, right? So there’s this whole thing - how do you onboard your applications to this process, this new way of working, right?
[15:59] This is sort of the in between day one and day two challenges of adoption. And essentially, I would always recommend to organizations always choose the path of least resistance. So if you don’t have to refactor your application code, that to me sounds like a pretty sweet deal. As a developer, I don’t want to have to sit there and refactor my application to do all these things there. It’s only in the extreme scenarios where, okay, we have to make our application Vault-aware in this in this instance here.
So I think when we talk about “Are you doing things the right way?” and why would you use something like Vault is because a lot of the automating the small pieces and automating the automation is done for you. You just have to tell it what pieces of automation you want to utilize.
How do you reckon I did on that one, Rosemary? I’m quite impressed with that answer actually I quite like that.
Yeah, it was a good answer. I also – just to add a little more color, I also think it’s worth pointing out historically the way we’ve developed and the way we thought about the development lifecycle for software involved environment variables, right? If you’re doing testing, very rarely does a framework tell you - unless you’re doing like something like .NET, Spring, where there’s a config.properties, or some kind of file-based configuration approach… Very rarely do we actually test in that manner; most of the time we’re injecting environment variables, and then from there, that was how we’ve always done it.
I don’t think people started understanding a file-based approach or a volume-based approach, partly because Kubernetes and containers came along, and you had to find some way to inject the information… And of course, environment variables was an easy way to do it, partly because that’s how we test it. And Rob is absolutely right, there is a path of least resistance to this. If you can avoid an application refactor, you’re going to do it. Unfortunately, when you move to the more, I would say dynamic secret, or the ephemeral secrets route, you still have to do a little refactor. But if you can add, from a software perspective, a layer of abstraction between whatever your secrets manager is, Vault, and your application, then your application can do whatever it needs to do, function the way it’s always functioned, whether it be environment variables, or a file-based approach. But some frameworks do support a file-based approach, which is nice.
Yeah.
I just wanted to kind of take it back to basics, just to kind of paint the context here. So when we talk about these credentials… So in your secrets management platform, you’re storing API keys, you’re storing database credentials, whatever it is, right? What did these credentials represent? And that’s the thing that we always need to get our head around - they represent an application identity. So if you have a specific service that needs to read and write data from a certain table or something in a database - well, the credentials that are used to authenticate to that database there represent that application’s identity. This is the payments application, for example.
So when you start to share that credential with other services that maybe need to read and write – I’m no database designer or anything like that, so maybe this is a bad pattern in terms of database design… But if you have another application that needs to read and write data, maybe to a different table within that database, it’s using the same application identity as the other one, right? So you’re sharing identity. That, in my opinion, is an anti-pattern, right? I think you’re always going to want to have the applications have an identity of their own.
A lot of the major cloud providers as well are putting in features that allow you to assign identities to workloads, because they understand this principle here. And essentially, how you implement your identity and access management strategy is always going to be using sort of the principle of least privilege. I just gave this example, the payments application needs to read and write data from a specific table in this database here. So that’s the exact permissions it should have. Nothing more, nothing less. If something else needs to do something with a different set of tables, then whatever permissions are required to perform that function, that’s what it should have. Nothing more, nothing less.
So we come back to the core constructed application, the identity of that - how do we test the identity of that, and then we assign the identity once we can make that attestation.
I really like the way you think about this, both of you, but there’s something which it triggered – I think it was Rob that mentioned this… The identity of applications, that was it. That construct, that phrasing made me realize that whenever I create a credential, I do two things. I capture what that credential is for - and it’s that identity. Is it me? Is it the application? Is it someone else on our team? Who is it that that credential is for? And the second thing is the date when that credential was created. The date is there, so it tells me how long has this credential been in use. And sometimes you can’t expire the credentials, sometimes you’re forced expire them; like, let’s say, six months is the longest that you can keep certain credentials active for… And I think that’s a good approach, but also very annoying, for very many reasons. And all these struggles come from the fact that the way we interact with credentials feels very static, feels very manual, like there is a person of trust that will do this… There’s always like this human element. And even I was thinking about it the same way - there’s a person that will creating the credential for you, and you’ll get the credential, and you will use it. And this manual process creates so many frictions and so many complications…
I think ticketing systems were built for this. “Can I please get a password for a database?” So how do you how do you ask that question? Well, you need a ticketing system. And then… Maybe. I don’t know. Crazy idea. But the point is that there’s this identity, and I really like that idea. And the data in my case was just like the ephemeral nature of secrets; and they should be ephemeral, you’re right. They should be rotated.
But there’s something else that’s Rosemary said around environment variables and those volumes which get mounted, which made me realize that because the applications that we write today, they’re expected to read the secret when they boot, whether it is from the environment, whether it’s from the file, and then that’s it; they don’t read it again. So how could we make our application tell it, notify it, “Hey, the secret changed. Please re-read it and then continue running as you were”? Is there a way, can you imagine a way that we can tell applications to do that? In an environment variable you can’t change it, so that’s not going to work. A volume - I’m not so sure about it. I don’t know; maybe, maybe not. But is there another way that applications can integrate with something like Vault, so that when the secret changes, they get notified? They don’t have to check it on every single request. Is it still the same? Has it changed? What do you think about that? Rosemary, what are your thoughts?
So there are frameworks that do have this built in, which is great.
Okay…
Yes. So if you have something like Spring, or you if you have something like .NET, ASP.NET Core, for example, you can implement sort of a hot reload configuration within your code. And what that will do is that if there’s a configuration file that it’s supposed to be reading from, if it detects a difference, it will self-reload the application. So some frameworks support that; that’s something that you have to turn on within the application.
Now, if we don’t want to make changes to the application, there are ways that you can do this outside of it. For Vault’s situation, Vault offers a sidecar container, which is called Vault Agent. And what Vault Agent is doing is handling the process of reading information from Vault, specifically the credentials and the secrets, and then writing them to a file, or writing them to a target entity of your choice. It could be anything for that matter. But file is the most common basis for this.
It writes it to a file, and you can append – in the case of Kubernetes specifically, you can append an annotation to your application deployment manifest, for example, that allows Vault Agent to issue a reload command. It could be Sighub - it could be something - to your application and force the pod to reload, or force the application to reload.
Now, there is some downtime associated with this, because you are reloading your application. At that point in time, unless your application has that functionality built-in, it won’t be serving requests. So there are two ways to go about it. You can think about it as, one, as the pull model from the application perspective. And the other is the push model from the Vault Agent perspective. But irrespective of which secrets manager you use, whether it be Vault – if you continue using LastPass, you have to build that automation piece to read the information, check for the diff from, let’s say, a secret changing in the secrets manager, and then issuing a reload signal to the application.
[26:27] Interesting. So are those signals configurable? Because reloading an application, while I know that that works really well, and it stood the test of time all these years, I know that some applications, they either take a bit of time to reload… And I’m thinking 15, 20 seconds, even 30 seconds, depending on what’s happening… They have to shut down, they have to drain… So certain applications going down - there’s a lot of things that need to happen in the background. And if we could send the application a signal that tells it, “Hey, just reload your secrets” or something like that. Or maybe we can just modify if – maybe that config file could be modified, and the application is watching the config file where the secrets are stored, then maybe it can just reload the file, and then it triggers the repopulation of the secrets, for example, without the application needing to go down. Is that possible?
That is, that is. But it has to be built into the application itself. The application has to be able to handle that. That’s why some frameworks offer that hot reload, where it detects it from that config file. Other times, some applications that don’t necessarily do that will find that the only option that they have is to issue an external signal and do a reload. But in the case of Kubernetes specifically, or if you are in a containerized situation where you have multiple instances, it is actually – and let’s say, theoretically, you cannot refactor that application to read the diff from a config file from us, a preset config file, and you must do an external reload. In that situation you should probably try to think about rolling updates, where it’s not all just once the secret is rotated, then they’re all of a sudden reloading. [laughs]
Oh, yes.
You’ll want to make sure to stagger it a little bit more.
Yeah, for sure. Yeah, that’s the other thing. If you have a single instance, then things get a bit tricky. Maybe you have like a blue/green thing going on where the new one gets brought up, and then when it’s healthy, it gets promoted to live. So that makes sense. If you have a bunch of them, obviously you don’t want to rotate everything. And if you have like 100 instances, I think it’s a bad time to rotate all of them maybe, potentially, depending on what they do. And imagine if you have like a couple of secrets updating at the same time, or things are very in flux, then you have like – basically, things are coming and going all the time, and that puts a lot of pressure on the CPU, on disks, on whatever the case may be. So you have like these storms of applications restarting… But yeah, you’re right, that’s like a problem for – it’s a good problem to have, let’s put it that way. It’s a good problem to have. [29:09]
So I know that this keeps coming up a lot, and I’m wondering your thoughts about committing encrypted secrets into Git repositories. A lot of people do that, and say “Oh, this is amazing! Bitnami Sealed Secrets! Yeah, this is great!” Or SOPs from Mozilla, or age… And I’ve used all those tools, and I think they’re okay, but I think there’s something fundamentally wrong with that approach… And I’m wondering if it’s just me. Or maybe I’m wrong, I don’t know. What do you think, Rob?
You know, the first time I came across that pattern was – I don’t even know if it’s still up, but do you remember this thing called Keybase?
Yes, I do. I still have it maybe, I don’t know.
I’ve probably still got it on my phone as well. But they had this thing with encrypted Git repositories. And the idea was you were supposed to be able to just have secrets in there because the repositories were encrypted; if there’s anyone out there from Keybase that wants to correct me on that, I am happy to be schooled. But that was my understanding of it. So that was the first time I kind of came across that concept.
But kind of thinking about it, the idea of putting secrets in a public domain, whether it’s encrypted, whether it’s ciphertext or whether it’s plain text, is something that I myself am not and probably never will be comfortable with. I think is a huge attack vector that you’re opening up.
[30:26] I always felt like your applications shouldn’t need to have these things in source control, it should always come from a system that is designed to securely store this data. Now, Git, for example, all the different flavors of Git - they are wonderful tools. They do their job excellently. It couldn’t be any better. But they are not designed to be secret management tools, and we shouldn’t pretend that they are, we shouldn’t treat them as they are, and we shouldn’t have the same expectations, and the insurance policies of a secrets management platform, from things like it.
I know Rosemary has done a lot of work and research in this area recently, so I definitely want to kick it to Rosemary and get some additional thoughts there… But that’s kind of my opinion; if you’re asking me, that is not DevOps Robert-proofed. Definitely not.
Yeah, I’m of the same vein. [laughs] It’s uncomfortable, and you know it is a pattern, you recognize it is done… But I think every time I’ve done it at least, there’s a almost like a lack of scale from a management perspective. You have so many controls you have to have in place. You have to make sure that every developer has the encryption tool, whatever they’re using; they have to figure out a way to get the right encryption key… If someone accidentally does it in plain text, you have to make sure that there’s a revocation process… And it doesn’t scale well, at least in my opinion. I mean, I know some folks who do it on a larger scale, and they use Sealed Secrets etc. But the amount of control that you have to have from a development perspective - it’s just really difficult. So it’s very uncomfortable for me as well, not just from a security standpoint, which - Rob, actually, you’ve pointed out is not great at all. But even from a management perspective, or from a development friction perspective, I just have never seen it work on such a large scale. And it’s distinctly uncomfortable to use.
Yeah… So I remember the first time when I did that, and I was thinking, “Well, what if someone basically unencrypts the secret somehow?” Because you don’t know… Like, what is there to give away when that secret gets used? There is no audit trail, there is nothing like that. It just happens, and you never know about it. And you’re almost like opening yourself up in a way that you won’t even know when you were compromised. And I think that’s the scary thought. You’ve put something out there, and when it happens, you’ll have no idea that it happened.
And that’s yet another reason why we say that things like Git are great at what they do, but they’re not secret management platforms. Because when the worst happens, you look for these breadcrumbs, you look for the trail. You always need to try and figure out how it happened in order for you to learn from it and to close the gaps, if you like. You’re not going to get that from a centralized source control manager. You will get that from a sequence management platform. Most of them have audit and logging built-in as standard. You have to you have to choose your weapon when it comes to these things, and honestly, Git is not a lifesaver for this one.
Okay, so do you want me to tell you – I shouldn’t even ask this. I’ll just go ahead, because of course you do… [laughs] I mean, now we want to know. You’ve like staged it, and now we want to know.
Exactly, you have to know. So it took me few weeks to understand how this crypto miner got in our infrastructure. I will not say where… [laughter] So what happened - those secrets that were committed to the repo, they were encrypted. We didn’t know. And this crypto miner appeared on this VM, with root privileges because of the key; there was the private key, an SSH key. If you have that, you can SSH into the VM, as root, set up whatever you want, and off you go.
[34:20] It was a very small DigitalOcean VM, so there wasn’t like any reason to run a crypto miner, but you know, people do it… I mean, not even a GPUs… Come on. Anyways, people do that, like a throttled CPU, 100% for weeks, and we couldn’t explain what the hell is going on. And then after a few weeks, we realized how the leak happened. And it was encrypted secrets. We had committed them, the key was leaked, we didn’t know, the private key was discovered, someone SSH-ed, they set up their crypto miner, and there you go. Good luck figuring out how that happened.
I think there’s a there’s a couple of things there. Obviously, the first thing we talked about is the whole encrypted secrets in Git. I think we’ve kind of covered that. The second thing is your cryptographic system. Essentially, this is one of the things… Implementing cryptography in your applications - it’s hard. It’s really, really hard; very, very, very hard. It’s error prone, there are loads of steps along the way where you can easily make mistakes… And I’m willing to bet that the average developer out there doesn’t want to touch cryptography, right? They’re not interested in that; they’re in fact scared of it. But from a business perspective, there are reasons to have to implement cryptography in your applications, there are laws that we have to comply with, which some of the easiest ways to comply with that is to encrypt PII, for example.
So you have this thing where, okay, there’s data, you need to protect that data in transit, and in rest. So we are probably talking about encryption. You have the process of encrypting that. That can come from within your application. And then you have the key part of cryptography - so how do you store and manage the keys? Again, another difficult problem to solve.
The ideal solution for a developer is to offset all the complexity and the responsibility of that to something else… Which is one of the things I love about Vault, right? That’s one of the things that you can delegate that responsibility, too. As an application developer, all I need to do is tell it the name of a key in Vault - a payment key, or something like that - and I’m presented with a simple API. I make API calls, so I can encrypt data, I can decrypt data according to my permissions. So f this application is only supposed to encrypt data, then that’s all that’s all it will be able to do.
Now, in terms of rotating a key, that’s the other nice thing - you can build in automation where let’s just say your organization says “Every 30 days cryptographic keys needs to be rotated.” That’s fine. You can rotate it in Vault. And the data that is encrypted with the old version of the key - you can easily rewrap it, you can even specify the version of the key, so you’re not going to have any specific downtime in terms of that. You can just point it to the older version until you rewrap the data, you can build automation for that to rewrap it as well, and then it will always have kind of the new ciphertext, which is encrypted according to the new version of the key.
From a developer’s perspective - they don’t even care about all of that stuff. They don’t need to. From an ops perspective, I’d say it’s minimal effort to kind of build in automation first, because a lot of the hard work is already done within Vault. You just kind of need to point it to where your data is stored and just do the rewrapping operation.
But that’s the key thing… So when we’re talking about how this leak happened. Cool. It was encrypted and stored in source control management. I understand that. But then the second part of problem was it probably wasn’t a robust enough cryptographic implementation, which is quite common if you don’t offset it to a proven system that takes care of that for you.
[38:09] So I don’t want to sound like I’m a salesperson for Vault, because that’s exactly what I’m not. I’m an advocate for developers. So if you’re a developer and you are listening to this, I understand your frustrations, I understand the pain that you’re going through, and I’m telling you, you don’t need to go through that pain; just offset it to a system. That’s what you have to tell your technical decision makers, is choose a system that gives you encryption as a service.
Okay, so let’s imagine that we are a small startup - well, we actually are a small startup… But I think many of the listeners are fairly small startups, they have fairly small and simple systems… And even Kubernetes can be a bit too much for some, because of all the overhead when it comes to managing it. And even if it’s a managed service, there’s still things that you have to do, upgrades that you have to run, things that you have to figure out, and it’s the complexity, the surface is really, really big.
So when it comes to Vault - let’s say I want to use Vault. How can I start really, really simply to get going with Vault? What is the first thing that I do?
There is a Managed Vault offering. So what you’ll do is you can go to HashiCorp Cloud Platform, Vault, and you can sign up for a trial, try out Vault there, and it basically gives you a Vault server, a Vault cluster, and you can give it a try. You’ll be able to test the Vault interactions with it, so creating authentication methods which allow your application or you to authenticate to Vault, and set up secrets engines which rotate secrets for you for certain target APIs.
So that’s the simplest way to get started. If you prefer to run it yourself, you can always pull down the binary, there’s the open source version; you can pull down the binary and run it in dev mode, which - up to you if you feel
comfortable or not with dev mode. If you’re doing this from a proof of concept perspective, if you’re looking at this from just like trying it out, you can run Vault in dev mode locally. If you are using Kubernetes - again, some people probably don’t. But if you are, there’s also a Helm chart. So that will allow you to deploy a small Vault cluster onto your Kubernetes cluster. So there are a couple of different options, depending on what you’re familiar with, as well as your target platform of choice.
Okay. So let’s imagine that I’m running it myself. I know that that’s an option that many would go for. Open source and all that; like, let’s start simple. Let’s see how well it works and that I understand this… Until you get to the point, “You know what? Actually, I don’t want to run this” and you go to the managed service. I know that many go by this path. So if I was to run it myself, how easy is it to do upgrades, to do maybe backups of secrets, because what happens if everything gets deleted…? How can you get your secrets back? What does that journey look like?
So in terms of backups, we have a functionality built in there to take snapshots, for example, depending on – so the way you’ve got to think about Vault is dependent on your storage backend, you can think of Vault’s entire architecture as kind of decoupled. You have the place where your secrets are stored, so that’s where all the secrets are stored, encrypted there. And then you have Vault itself, which is the thing that you interact with and the thing that does all the encryption, decryption and secrets management for you, and it interfaces with the storage backend.
Now, typically, we used to recommend Consul as a storage backend. We now have integrated storage backend in Vault, so now it’s not as decoupled as we used to be. You never used to have kind of – you used to have Consul, obviously, as a storage backend, and you’d have to manage that additional thing. It’s another system. And if Vault is just the business value you’re trying to extract, then no one really wants to manage this additional system just to get their secrets management, right? Which is where the whole argument of integrated storage comes in.
So in terms of that, it is a clustered approach, it is highly available. If you do lose a node, there are other nodes to take its place, there are leadership elections, so on and so forth; there is replication of the secrets between the nodes as well. And then it will just come down to implementing a good, fault-tolerant design for your Vault cluster. You’d have them in different kind of fault domains, and so on and so forth.
[42:22] So that’s kind of the first part. In terms of things like upgrades and general operational overheads, I’m not gonna lie to people out there, there’s a lot of things to do. So if you’re running it on a VM, for example, then you have to think about patching the underlying operating system (that is your responsibility), and protecting the underlying operating system as well. In general, I think if I remember correctly from the Vault hardening guide, you wouldn’t even have SSH to that. So you need to think about what your consumption patterns are there. Are you going to have a thing where you kind of throw away a VM and deploy a new one with Vault, and it joins the cluster? And so on and so forth.
And then you have upgrades of Vault itself. Now, it depends on how you approach this. If you are someone that stays on top of upgrades, so when a new Vault comes out, you upgrade in some kind of development environment, you test out the application, you make sure it all works, and then you kind of roll it up to the rest of your environments, then upgrades can be, dare I say, straightforward. But honestly speaking, as someone who worked with many organizations who were hosting Vault, I don’t think I came across a single organization that were on top of upgrades.
So what will happen is they are on a specific version, then something will happen one day where they run into an issue, and they think “Oh, it’s a bug.” And maybe it is a bug, right? How do you fix that? “Oh yeah, we’ve fixed that already. It’s in this version here.” Okay, you need to upgrade to that version. How do you get through all of these minor, intermediate and major versions to kind of do that upgrade there?
Sometimes the upgrade path is not so clear to understand. Do you have to go to the next version, and then the next version, and then the next version? Or can you do a one-hop jump to your target version? And sometimes the reason why you upgrade is not even because there’s an issue, it’s because there’s new functionality and you want to utilize it to make your lives a bit easier.
So if I’m being honest, from my point of view, I don’t want to manage all of that stuff there. I don’t care enough for that type of operations. In fact, if someone will do it for me, I’d much prefer that, because all the things like your SLAs for downtime and so on and so forth become your responsibility and accountability… Whereas you have professionals who can look after these things here for you, and it becomes their responsibility and accountability. And that sounds super-sweet to me. Let me just focus on building the thing that makes the business profits, it builds a small market share, rather than thinking about things like Vault. That’s not really what I want to do. I’m just an application developer. It turns out I need to encrypt some stuff, it turns out I need to access a few secrets to get to different parts of our platform… Yeah, I just need to know where to get that information, or my application needs to know where to get that information, and let’s go. Let’s go and build greatness.
That’s why when Rosemary talks about the best way to get started is kind of HCP. It’s actually interesting, because I remember years ago that was one of my startup ideas was HCP. I was like, “Oh, imagine if I could just run people’s Vaults for them. That’d be super-cool.” And you know, I never really got down to the drawing board or figuring out what that would look like, and so on and so forth. And I’ll tell you what, the people here are HashiCorp have done such a tremendous job. A far, far better job than I ever could have imagined. But there’s a reason why I came up with that idea, is because honestly, managing platforms - there’s a lot of things to do. So if there’s one less thing that you can manage, especially when it underpins your entire security, I wouldn’t see why you wouldn’t want to offset that.
I’m on board with that. I can definitely see the value in offloading that concern. It’s a huge concern. And you only realize it six months down the line, a year down the line; then you have to worry about the migrations, so how do we migrate… So now you have two problems. And if you delay it, then you have three problems… And we know where this is going.
[46:12] So I like that idea… And I’m wondering, Rosemary - let’s imagine that we have this code, we’re building greatness, as Rob said, and you want to combine all these things. We want to take HCP Vault, we want to take a platform, a service from there - we want to combine these services, software as a service, and we want to consume them in a way that is encoded somewhere. It could be documentation, but I would like to think there’s something more. So how can we encode all the components that we use? And some of these are not even infrastructure, really; it’s like services. And the combination of all those services is basically our setup. Can you think of a way that we could do this? Are we there yet, with this setup?
We are… I think it depends on the tool, though. Sort of a practice that you could approach is just managing it all as code, in which you express the configurations you need… Because the definition of infrastructure is actually much larger than it used to be very traditionally, right? We thought about it as a data center plugging in network switches, and the like. But in reality, infrastructure can include managed services now, as you pointed out. It can include fly.io, it includes any number of third party systems that you don’t necessarily directly manage, like HCP, but someone else might manage for you and you’re still using.
And at the end of the day, even if you’re using DigitalOcean, HCP, all of these managed services, you need to express that configuration somewhere. And most of these platforms have APIs now for you to configure whatever you need to use out of that platform. And fortunately, they’ve integrated with some kind of infrastructure as code tool somewhere. So we can pick TerraForm, just because we’re talking a lot about HashiCorp today, but a lot of this configuration can be managed as code. If you think about even Vault - what are the things that you need to set up the secrets for an application, too - you can manage that as code. Be declarative, be specific, be prescriptive, and then your application – you’re not breaking that contract that the software itself might express to depend on some of these infrastructure components and configurations. That way, you have it end-to-end expressed in one place, encoded in a single manner, and you can manage it without having to worry about breaking one thing because some schema change, and now you have to fix a downstream dependency, or something.
So I know Rosemary that you wrote a book exactly a year ago; well, a year plus one month, May 2021, “Infrastructure as Code, Patterns and Practices.” It’s a book in Manning, it’s finished, it’s done, it’s out there. I’m wondering if in this book you cover how to combine those services… So how to get a Vault via HCP, how to get a VM from DigitalOcean, how to get your code from GitHub or GitLab and wire everything together… Do you have such an example in this book?
I do. It does not use Vault specifically though, unfortunately… But actually, Vault gets a nice nod as a secrets manager.
V2? It’s a request for V2?
Yeah, V2. Exactly. Be more specific of V2. But it does point out how to manage secrets in infrastructure, as well as for managed services… Because secrets play a huge role, not just about configuring services, but accessing those services. Because in infrastructure as code, or as-code-anything, you need an API token to configure anything. Similarly, you could put a password if you want to configure, let’s say, an Amazon database.
So all of these things are linked up in the book, end to end. You’ll see them from basically writing clean infrastructure as code, protecting your secrets, all the way to delivery pipelines, understanding how to push the changes, reverting the changes, best ways to modularize as well as combine, implement some kind of dependency injection to decouple dependencies in infrastructure as code, plus managed services as well. So it’s all in there as patterns.
Okay, I don’t know where exactly the book is in my reading queue, but it is there. I haven’t checked recently. So I think this conversation just brought it a bit higher up to the top, to the beginning of the queue, the head of the queue. Okay. It’s a FIFO queue, just to be clear, but in this case, we’re just like changing the order a little bit. But the first one in the queue gets read first.
So when it comes to automating the runs - like, you make a change in your configuration, you have this captured in code, then it gets applied out there… What are your thoughts about having something like a CI system? I’m not sure whether it’s a CI or CD system, because you’re not really deploying anything, but you’re changing something which is active. Would you trust a CI/CD system to roll out changes without human approval? Would you trust the system to do that?
I’m gonna let Rob go first. [laughs] For those who are listening but you can’t see the video necessarily, Rob has a look on his face…
I think what we have to start looking at is instead of validating system components, we start validating consumption patterns. Generally speaking, when we think about a CI/CD system, it is integrated with some kind of source control. So when a specific event happens, then it triggers off pipelines, or runs, or whatever. So when we talk about would you trust a CI/CD system or platform to make these changes for you, it’s not really that system that you’re asking the question of; it’s the changes in the source control. Do you trust your process for updating your source code, the pull request, the approval process? That’s the real question here.
[54:14] In terms of the level of trust that you have for the system that’s actually executing the runs, it goes more into what Rosemary was talking about earlier on, with the secrets that are involved in infrastructure as code, and how do you actually authenticate into the target platforms to make these changes, and so on and so forth. I think that’s a slightly different conversation. But in terms of actually the overall consumption pattern, you kind of have to agree as an organization as to what is your process for developers, for engineers to make updates to these things, and how does it get from their laptop into production? What are the gates that they need to pass, the quality gates, and what are the approval steps. And that’s really the question - do you trust what you have? And if the answer’s no, then you kind of need to look as an organization as to what that looks like, versus how do you stop bottlenecks. You don’t want to go through too much approval gates to the point where you can never really get something into production, but at the same time, you want to have a good balance of something that you know and you trust. So you have to kind of validate your consumption pattern, I think, rather than components within a system.
Yeah. I’m thinking more about managing your infrastructure, managing services. So you have the CI/CD system which is not pushing code; it’s making changes to your setup, to your infrastructure, and that can be vast… So a mistake could mean that your Vault service gets deleted, and you lose all the secrets if you do a mistake. Or if you wrote the code wrong, the configuration wrong, and then you’ve missed something, and it will just tear it down. And I know that TerraForm has this, where it will tell you what’s going to happen; is it going to be recreated, is going to be just updated? What will happen.
But if you say - yeah, sure, just go through the CI/CD; there’s like too many changes, whatever… Everything’s automated… And then there’s the potential of taking something down, and we don’t mean to do that… So how automatable - if that’s a word - can you make these things? Because there’s always humans involved, like TerraForm apply, and let me read… Like, is it safe to say yes? Or shall I say no? I don’t know.
And usually, you make those changes manually, or at least that’s what I’m used to, when it comes to infrastructure… Because you can just take a whole cluster down; you get a lot of power. So how can you build the confidence in your CI/CD system to be running these changes, which can have potential huge, huge impact? Forget all your backups, they’re gone. You deleted a bucket, so forget about it… Or whatever. That’s what I’m thinking about. Like, what’s the worst thing that can happen. And it can be pretty bad…
It can be really bad. But with CI/CD systems it doesn’t mean you forego testing, right? It doesn’t necessarily mean you forego development environments either, right? Just as software development includes a pre-prod environment, or a testing environment, or a QA environment, hopefully if cost is not that much of a factor, hopefully you have a development environment, so that you can stage these changes, and to a certain degree, run automated tests, and have these automated tests in place, to recognize when some of these changes may be really impactful.
So let’s think about the worst-case scenario, which is you take down a database, right? You forgot in TerraForm, or you forgot in your infrastructure as code configuration that you don’t want to delete it; the delete prevention - you didn’t set it to true, for example. There are multiple ways from a testing standpoint that you can gate this, which is what Rob was mentioning before. You can get add a gate there and say, “Hey, if your delete prevention is false, then you should change it to true.” So that’s more of a static analysis or a unit testing perspective.
[57:55] And then there’s the end-to-end view, where you like, “Oh no, I ran it in development, and I deleted the database.” And you take it as practice almost, right? “Oh, I deleted the database. This is a worst-case scenario. How do I recover it?” And what that helps you understand - you know, before it goes to production… First, it stops it from going to production. But second, it helps you understand what kind of roll forward plan that you should have. Right? So what should you be doing with your infrastructure as code to reproduce that database properly. And I think that’s a mistake that folks make with CI/CD sometimes, where they’re like, “Oh, we’ll just keep pushing the changes.” And you don’t want every change to automatically go to production. It doesn’t mean you eliminate development environments, or testing environments, and it doesn’t mean that you should not do testing. In fact, it means you should really emphasize testing of infrastructure. And you can’t account for everything, but it’s a good way to start accounting for some of the most important, most critical infrastructure that you have.
Now, what I will say is in more recent years there’s been this whole GitOps thing… And I have to raise this, because you know, now we are actually doing more automated deployment and automated reconciliation of infrastructure components and services than we did before. Now you’re shifting almost like human review to the really early part of the process, and you make the change - in the case of Kubernetes, you make the change to a YAML manifest, and you just let Kubernetes figure out what it needs to do with it. There’s no intervention in between, there’s no manual approval for you to say like “Kubernetes, don’t do this.” Once you make that change, you commit it, it’s gone straight to Kubernetes. So I think we’re moving to an era where we want more of that automation, we want more continuous deployment, but in reality, I think it’s fair to say that you should probably stick with continuous delivery for a while. You don’t want to move to the extreme case, unless you’re, as Rob pointed out, very confident, and you trust your testing, your environments, as well as the discipline of rolling forward all of your changes.
So if you had to run an application in production, Rob, would you push straight to production? And if not, what would that look like?
Yeah, [unintelligible 01:00:09.25] is straight to production, YOLO, right? Nah, not really… So the ideal pattern, at least in my view anyway, is you have your code, you build your software artifacts, and that runs in the development environment. You go through your different series of testing, and assuming the happy path, then we would like to promote this artifact to the next environment. And maybe it’s at staging, where you can run a different set of tests or whatever it is… And so on, and so forth, until you get to your end goal. That’s kind of the ideal pattern.
And I couldn’t help but think to myself, when Rosemarie was describing the whole workflow of going through testing on a dev environment before it gets to production - there are so many similarities between how we should approach infrastructure as code versus how we should approach application code and artifacts, right? It’s the same kind of thing - we still want to test, we still want to have some sort of gates to make sure that that is happening the right way.
And ultimately, the thing that we need to understand is, when you’re making changes directly to production, people think it’s about infrastructure. But it’s actually more than that. It’s also about security. When you talk about the pillars of – I was actually talking about this on my podcast earlier on this week, episode nine.
Nine. Okay, great. We’ll put a link in the show notes, we’ll make sure.
Absolutely. Essentially, there are three pillars of security, right? So you have confidentiality - I think it kind of speaks for itself; you want to protect the value of that data there. Integrity - so you want to make sure that the data that you have is actually correct, that it’s the right data, it’s not been manipulated in any way… And the key one here, in terms of what we’re talking about is availability, right? If any of these pillars are compromised, then your security is compromised.
[01:02:10.15] So when you start to think about your infrastructure as code and pushing straight to production, you are risking the availability of your platform. Because if it’s going to take out something which has implications for your application being able to effectively run and serve customer requests, you’ve just compromised your security.
If you look at the nature of some of the security attacks that are coming, everyone’s thinking about data breaches, and so on and so forth. But sometimes the attack actually comes in a form of trying to take down applications or take down the underlying infrastructure. And that threat doesn’t just come from outside, it also comes from within, either from a malicious actor, maybe a disgruntled employee, or by humans, because we are humans and we make mistakes.
I’ll go back to a phrase I said earlier on in this recording - we have to put in systems to protect ourselves from ourselves, right? Think about the workflow, think about how you’re testing, think about the gates and how do you check that you haven’t made a mistake. Because most of the time, these things happen for a misconfiguration, not for any malice or anything like that. So we need to try and protect ourselves from those swings.
Okay. Here it comes… My approach has been to push straight to production all these years. Whenever I see there’s other hoops to jump through, I see that forms of inventory and I really don’t like that. Like, there’s things always like in another stage than what you want them in, and the stage is production, live users using it, users giving you feedback directly/indirectly on what you’re working on. If you don’t have such a system, you should try to get to that point, which requires a lot of confidence building, which requires a lot of understanding of how the components fit together… But you shouldn’t not have it, because the reason why you want that is to have a nice flow from your laptop into production… Think minutes. It shouldn’t be an hour, it shouldn’t be two, it shouldn’t be days; heaven forbid it’s weeks. I mean, you’re in a not very good place.
So if you can get code into production in minutes, that’s a very good place to be in. If there’s issues, any sort of issues of getting code, and even changes like your infrastructure, if it takes more than minutes - why is that? If there’s issues, why do you have issues? Try to address the underlying problems, because what you want is being able to see changes, your users being able to see changes as soon as possible, and it shouldn’t take weeks or days. Rosemary, what do you think about that? Crazy idea?
It’s not a crazy idea. I think there are folks, or whole organizations, with more than one developer, more than five developers, hundreds of developers, doing this. But again, they have a lot of these systems, a lot of fail-safes in place, they have a lot of structures within teams to understand the risk of, let’s say, accidentally doing something they weren’t supposed to do. And I think that’s also dependent on industry, and it also depends on who your user is, right? I mean, I think if your users are pretty tolerant, pretty tolerant of these changes in general, then it’s probably not the biggest deal. But if you’re watching some video and you really don’t want that video to have any latency, you don’t want it to stop, because you’re watching a live concert or something, you’re not going to have much tolerance for when someone pushes that change out and it goes down.
So I think that, again, we put a lot of these gates in place, or we put a lot of this friction in place, partly because there are financial, as well as legal, as well as I think general functional ramifications for bringing down a system, and we’re slowly moving to a mindset where we’re trying to be more tolerant of downtime, as engineers to understand how to debug a system, recover it quickly… And to a certain degree, find out ways to make it more resilient, so that it’s not affecting an end user, and we can still be comfortable pushing to production. But I think it takes a lot of work.
[01:06:13.21] Actually it works okay for a small team, to be honest, pushing straight to production. It works fine. I mean, I have nothing against it. But when you start getting to hundreds of people trying to coordinate changes, really large systems that are supporting things 24 hours a day, seven days a week, really high expectations from a user perspective, that’s where you might need to add a few either pieces of automation, or even more sort of natural control in place just to make sure people are disciplined in the process of pushing the changes out.
Yeah.
I don’t have anything against it.
You used a word there, you said “confidence”, right? And that’s what it all centers around, how much confidence do you have. In Rosemary’s example she does say that a lot of the people that are pushing straight to production have all these fail-safes. Those fail-safes - that’s what they do, they provide them confidence, right? That should the worst happen, then these things will be enacted, and so on and so forth. So that’s kind of the way to think about it - you have confidence that when you push straight to production, the worst isn’t going to happen. Because if it does, these are the things that are going to take care of it… Which is fine.
Yeah, that’s a big one for sure. And security - when it gets compromised, you know about it. Let’s start there. [laughs]
Absolutely.
If someone gets your credentials, at least you should know about that, and then you should be able to rotate them, whether it’s manually or in a different way, but still. Okay. So as we prepare to wrap this up, what are the key takeaways that you’d like our listeners to have? Rob, do you want to go first?
Yeah, sure. I think we’ve spoken a lot about tools and how these tools can solve the problems. What I’d like people to do is rather than thinking about the tools, kind of think more about the workflows, right? Don’t try and build workflows around the tools, try and choose the right tools to fit around your workflows, right? And try to take that 30,000-foot view.
So we talked about things like credentials, and how they represent identities for applications, and so on, and so forth. And it amazes me how many people I speak to that have never even thought of it like that, right? And it’s like, the moment you think about it from that high level, it almost changes your perspective on how you’re gonna approach your implementation of anything concerned with identity and access management. So I always like to start high level, and then go deep; go broad and go deep.
Essentially, what I’m trying to say is the key takeaway here for me is don’t think about tools, think about workflows, and try to validate consumption patterns, rather than specific kind of use cases, right? If you do those things, it’s a good balance between getting gates in the right places, and also not providing friction to your developers. So if you’ve got a way of – in TerraForm, for example, if you have approved modules, rather than someone having to approve a pull request to put in this piece of infrastructure, actually, developers can just fill out this form in the form of a TerraForm module, and it’s already approved, so you already know that it’s conforming… So that that pattern itself has been approved, rather than the individual pull requests.
So it’s those points there that you can start to think maybe we can start to get a bit more confidence in letting that go straight to whatever the target environment is, and so on and so forth. So just try and think about validating the patterns, rather than the individual instances and occurrences. I think if you do those things there, then you get a good balance between control and lessening the friction.
Rosemary?
[01:09:47.21] If I were to choose one takeaway, it would be think about improving your security and availability through immutability. What we didn’t really talk about as a principle was immutability… But the idea that you can replace the secret with a new secret - it doesn’t sound so hard, but why is it so hard in our systems to revoke and give a bunch of new secrets to applications? It’s because we never thought about security from an immutable perspective, right? Similarly, with infrastructure we talked about how if you express it as code, you can reproduce it, right? You can think about how your testing and controlling changes. But if something has really gone wrong, why not just take the whole environment down and reproduce the whole environment?
And so I think we’re moving to an era where we’re more comfortable with the principle of immutability, and that we’re taking away something and creating something completely new to replace it. We’re not upgrading it in-place, we’re not doing bespoke configurations and little manual, sort of like going in with a screwdriver trying to make it work anymore. And as a result, there are a lot of workflows, there’s a lot of automation and tools that can help you achieve this. But as you move toward this mindset, you can accommodate for any number of tools, platforms, and ultimately help with sort of the development workflow in general… Because again, why not just replace the whole thing with something you know that works, rather than trying to fix something that you know is completely broken and will take you time to fix it?
Huge plus one. Huge times two, huge cosign on everything you’ve just said, Rosemary. Huge, huge, huge.
Thank you.
Nice. Nice, nice, nice. Wow, okay… So my takeaway is that I can tell that you two have been thinking about these problems for a really long time. I can tell that you’re spending a lot of time talking about these things. I can hear experience when I hear it. I recognize experience when I hear it, and I really enjoyed having this beginning of a conversation. That’s what it felt. I think there’s like so much to it. There’s no way we can cover it like in one hour. I think we tried, we did our best… And I’m looking forward to the follow-up. Thank you very much, Rob, thank you very much, Rosemary, and I’m looking forward to next time. This was great. Thank you.
My pleasure. Thank you.
Thank you.
Our transcripts are open source on GitHub. Improvements are welcome. đź’š