This is our 9th Kaizen with Adam & Jerod. We start today’s conversation with the most important thing: embracing change. For Gerhard, this means putting Ship It on hold after this episode. It also means making more time to experiment, maybe try a few of those small bets that we recently talked about with Daniel. Kaizen will continue, we are thinking on the Changelog. Stick around to hear the rest.
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.
Changelog++ – You love our content and you want to take it to the next level by showing your support. We’ll take you closer to the metal with extended episodes, make the ads disappear, and increment your audio quality with higher bitrate mp3s. Let’s do this!
Notes & Links
All episode notes are in 🐙 GitHub discussion changelog.com#440. Feel free to add your thoughts / questions!
|2||01:10||Breaking the big news|
|4||13:22||Staying plugged in|
|5||16:36||How did we get here?|
|7||20:13||Dagger with Changelog|
|9||33:00||Who else does this?|
|10||39:47||Rotating all our secrets|
|12||56:55||What if this works?|
|14||1:10:01||Where is Kaizen going?|
Click here to listen along while you enjoy the transcript. 🎧
Change is constant, and the one thing, the one lesson which really helped me was to not fight it, but embrace it. Some may think, “Oh, this sounds very agile-ish, and I thought we are post agile”, but this is one constant, right? Change will always happen. And if anyone has been paying attention to the world, things have changed so many times in the last couple of years. So that’s the one thing that will always be constant - change. So with that in mind, me embracing change and change being constant, I’ll be taking a break from Ship It after this episode.
That’s a gut punch…
Is a little bit… [laughter] But that’s why I want to make it sound as positive as it can be, because it is. So if you remember when we started, I was experimenting so much, and trying so many things, crazy ideas, like “Let’s use Kubernetes for Changelog.” Remember that one?
I do recall. I do.
And then Jerod came and said “No, let’s use Fly”, and we tried that as well. So we were experimenting quite a lot before Ship It, or I was experimenting quite a lot before Ship It. And then, Ship It was taking more and more of my time, to the point that I was rushing from one thing to another thing, to the next episode, the next episode… And I had less time to experiment. So I would like to do more of that.
More experimenting, less shipping of Ship It.
Less shipping off Ship It episodes, yes. That’s right. But definitely shipping. So things will still continue changing on the Changelog side; the improvements will not stop. And if anything, a couple of other areas are already picking up, like Dagger, for example, for me, which means I need more of my headspace, and more of my A game for that thing.
Embracing the change. So the big Why, if we say why in general, it’s because you were stretched too thin in order to do the experimentations that you love, and you need some headspace. Dagger taking off, taking over, and Ship It being very much your passion project, a side project for you… had some financial stability, but was never going to be - or at least in its current form, not going to be a full-time thing… And something had to give, because you were burning on both ends, and we don’t want you to burn out. And so there you have it.
That’s right. I was checking myself, basically… And it’s really important to know when to stop and what to stop. And to know how to rearrange things. And everything is temporary. I think that’s something that is worth emphasizing. Nothing will last forever, not even us.
But hopefully, we’ve had some great time together. More amazing things will come, because this is not the end of it. It’s just a pause, and we don’t know how it will continue, in what shape or form… I don’t think that’s the approach - nothing wrong with the approach. But we can improve on it some more. Some video would be nice… There’s so many videos that we shot in the last two years since we had Ship It, but we published very few of those. Like working with various people, experimenting… But we never had time.
I remember episode 33, Merry Shipmas; recorded with the Upbound folks, recorded with the Dagger folks at the time, because I wasn’t part of Dagger back then… And the third thing was Parca. We were profiling our app, and everything was running in Kubernetes at the time, to understand where the CPU time is spent. And Parca improved so much since, but we haven’t installed it in the new world, which for us is fly.io. So that’s maybe one thing worth bringing back. I don’t know. We’ll see. But I know that we have many more ideas of things to improve. So small bets; more small bets. More trying things out and see what sticks, and embracing change.
So this is episode 90. So you made it to 90 episodes before this hiatus, this pause, so congrats on 90 episodes. Most podcasts do not make it that far even. Unfortunately not 100, which would have been a coup de gras; it would have been perfect.
However, if it would have been 100, it would have felt more like the end. And this is not the end, right? So 90. Like, who stops at 90? Obviously, something else is going to come after 90. It’s not a natural place to stop. 100 would be like “That’s it. The book is done.”
Right. We would call it a grand finale, and you would sail off into the sunset. Well, for me, I am a little – of course, embrace the change. I’m a little bit sad. I know we have a lot of listeners who truly love this show. It’s a unique show in our catalog, in Changelog’s catalog. You talk about things that we don’t talk about elsewhere, in ways that we can’t talk about… And so, of course, we will miss it. For me, selfishly perhaps, my favorite episodes are divisible by 10. I like the Kaizens, maybe because I get to listen to myself… No, that’s just a joke. I just enjoy catching up with you, and…
[06:28] Not a joke. [laughs]
No, I do like it. I’m starting to like it.
You have a nice voice, Jerod. That’s what it is. Let’s be honest.
It’s not how I say it, it’s how I say it. No, I’m really joking.
It’s how you hear it.
Yeah. [laughter] It’s not my voice that’s great, it’s the things I’m saying. That’s the best. Just kidding. But I love our Kaizens. If the interviews never came back, I could get over it. If the Kaizens never continued, I don’t think I could get over it. So we don’t know exactly what’s coming next, but I think Kaizen needs to continue to be a thing that exists in our world. And we don’t know what form that’s going to take; maybe it’ll be on the Changelog, maybe it’ll be on some show that doesn’t exist yet… Maybe it’ll just be a show called Kaizen. I don’t know. But we don’t want to lose you entirely, Gerhard. We want you to continue to experiment, and push forward our operations here, our platform, pushing us into new things so we can learn along the way, and sharing that, at least the navel gazing part of Ship It. What do you think?
I love it.
If you remember, one of the ideas for the show titles before Ship It was Kaizen.
That’s how – it’s so embedded within me… I mean, I never see myself stop doing that. And the fact that we can talk about it - I think it’s great. The cadence makes sense. It fits with everything.
Right. And in fact, your idea to us, your pitch for this show was basically just the Kaizen stuff. And I said, “Nobody wants to listen to us every week talk about our platform every week. We need to mix in some interviews.” And so that became Ship It. It was the interview shows, and then I thought you picked a pretty good cadence, of every ten, every two and a half months… Almost quarterly, but using the episode numbers brilliantly to map out a Kaizen episode that made sense. I think if we would have come out and done a weekly Kaizen with us three, I don’t think it’d be the show that it has been. And so I think that was a good collaboration by us, to realize that, but also, you were definitely on to something in terms of just an enjoyable format that people do like to follow and say “These crazy guys just air their dirty infrastructure laundry, right here on the air, for us to learn from.” And I think that’s cool.
Yeah, I think so, too. And I really liked the new GitHub discussions… I mean, we had the one for Kaizen eight, now we have for 40, which is discussion for Kaizen nine, which is this episode… And it captures all the things. I think that works really, really well. You have the written format, you have it in GitHub, you have pull requests, issues, all things connected… I think it’s something worth celebrating. And while we don’t ship once every two and a half months, because that would be crazy, we do talk about the highlights. And I think that is a nice forcing function to always keep moving forward. Always keep improving. It keeps reminding us of what we’ve accomplished.
Adam, do you wanna chime in here? You’ve been nodding along, but you haven’t said anything.
I think he’s too sad.
I am a little too sad, honestly. I was having trouble coming up with words, because you know, ending is always challenging. I guess pausing is a little easier. But it’s bittersweet for me, because there’s a lot to like about it, obviously, and there’s a lot that came from our deeper relationship, and everything… But I’m also about quitting when it makes sense. The Dip from Seth Godin was, by far, one of my favorite books in terms of like self-development. And that book isn’t really about quitting necessarily (I guess it might be), it’s about knowing the right time to quit, I suppose; or pause even something. And that’s a challenge, because too often we’ll push ourselves beyond our limits, and things break. Sometimes those things that break are really important to us, and that’s called regret. And so none of us want to live with regret. I don’t want you to live with regret. I want to do great things together, but not at the expense of the things that are important to you and to us. And I think from a listenership, I would love the listeners to come to this and say, “That’s really awesome, to know when to pause.”
[10:38] I mean, for a while there I had to pause Founders Talk, and other things that were way back in the day, to make sure that we can focus on the Changelog podcast. A couple years back Mireille and I paused Brain Science because it was just too fast of a clip for us; we were both really busy… We’re still in the midst of bringing that show back, but we have great ambition and great plans… But you have to look at what you’re capable of, and what you want to achieve, and kind of pair the two up, and say, “Is this sustainable?” And if it’s not, be wise and put your no down. Because too often do we say yes when we should just say no.
On the note of more video stuff though, and this experimentation, and this Kaizen, and some of it… It sounds like what we really wanted from this was the experimentation and the freedom, and then the cadence of the actual podcast… Which, I agree, a weekly podcast is incredibly hard to do. If you’re listening to this right now, anybody who’s shipping a show weekly, for years, they’re not quite superheroes, but they’re darn close, because it takes a lot to show up every single week, and do something that is worthwhile. And if you have a growing audience, like we’ve had… And this show has been part of that. That’s a big, big challenge.
However, even like on today’s topic, like DHH, and cloud, that conversation out there, like this backlash against the cloud… Like, I would have loved if – that show was great, by the way. I loved that episode. But like in terms of experimentation and videos on YouTube, I would love to see – because you don’t have to have like a rhythm; you can just do it when you want… A deep-dive or a peek behind the veil of their non-cloud cloud; their own infra. Like, what does that mean, to stand up your own infrastructure? …and just have a 20-minute DHH screen-share with you, and you guys just hammer it out for like 20 minutes. That’d be cool for me, every couple months. Like, nothing that’s weekly; just something that’s like “Show me behind the screen. Give me a peek at your infra. What are your choices, why’d you make them? How does it work?” etc. That’d be cool to me. And with no necessary cadence; just like whenever it makes sense. And that kind of fits into your desire to explore. Because you’re an explorer, Gerhard, you know? You like to push the boundaries of you on the edge… But I think this show may have limited you from doing that, potentially.
Adam, you just said behind the screen. Was that a slip of the tongue, or are you workshopping a new title scheme? [laughter]
You know, always, Jerod. Always.
I like where this is going… [laughter] Behind the keyboard.
Have you done that on purpose, or…?
Not away from the keyboard; behind keyboard, behind the screen, behind the camera.
There you go. So that’s the big news. That’s probably a surprise to most, if not all, in terms of Ship It subscribers. A lot of these people are like - they listen to Ship It every week, and they just heard this, and they’re like “Well, that sucks for me.” Touchpoints - like, we’re talking about potential experimentation; how can they stay plugged in with you, what you’re doing, and maybe with the future of the show… Obviously, don’t unsubscribe from your feed reader, unless you’re a super clean freak, because there might be new things getting published into the feed. Just go ahead and let it go inactive, and if we ever publish here again, you’ll just automatically get them. So I’ll say that much myself, subscribe to the Changelog; it probably would be a good idea. But I’ll just throw that in there as a shameless self promotion. But for you, Gerhard, how can people who want to stay connected with you personally, beyond Ship It, where should they go?
[14:14] Yeah. So I’m still on Twitter. It’s still a thing. I’m on Changelog.social, even though I haven’t tweeted anything yet, if that’s a thing to do it…
I haven’t tooted, there we go. Sorry.
You toot there.
See? I’m not up to date on all these things, so I think that’s an area worth improving.
No one wants to be up to date with that word.
Yeah. I’m still very much on the Changelog Slack, on the Changelog GitHub… That’s where I intend to spend more time, since this whole Kaizen thing behind the scenes for Changelog is not going to stop. We’ll still be improving things, there’s pull requests, there’s issues, there’s all sorts of things happening there… Maybe even discussions. I mean, we had this second GitHub discussion, where everyone is welcome to participate, where we’re talking specifically about what we are going to improve about Changelog. So I’m not sure how Chris Eggert knew how to jump in and help out, and do that improvement, or Jarvis Yang, and there’s a couple of others. Or Noah… How Noah Betson knew how to do this, and a couple of others. But this is still going on. We are still on GitHub; we’re still doing things. We’re still on Slack, on the Changelog Slack. So we’re still there, it’s just like the show, the cadence, the weekly cadence - we are pausing that until we figure out, or I figure out what comes next… Which would be still like with listeners, with people, as like – I really like Adam’s idea. It’s closer to what I had in mind a couple of years back. And I’m craving for experimenting more, and only putting an episode out there maybe in a different format, when it’s ready. It doesn’t mean once a year, but it means less than once a week. So between once a week and once a year, that’s somewhere the sweet spot, which I have yet to discover.
There you go. So not continuous delivery, but some sort of delivery…
Not of episodes, because there are so many other things, right? I mean, it has to be meaningful. I remember, for example, the Merry Shipmas, episode 33. That took a lot of early mornings, late nights and weekends. I have no idea how I could make time at that point for it. It was crazy. I no longer have that time now, which means that I no longer can do those things, which means that it’s all in the episodes and the few hours here and there, which is just not making me happy. Anyways… We are improving that.
It might make sense to say how we got here, which I think if you listened to this show since the beginning, you know kind of how we got here… But how we got here originally was like you, Gerhard, was our SRE for hire, essentially. You helped us stand up our infrastructure way back in 2016, when –
…when Jerod was exploring delivering and deploying an Elixir application to production. I’m paraphrasing the story, of course, but how we got here was by shipping, and we would talk about that once a year on the Changelog podcast. We liked doing that so much… We’re essentially just regressing back to the original blueprint, essentially, right?
Not once a year, though. More than once a year.
Well, maybe less than once a year, but back to the blueprint of you’re still working with us on our infrastructure; that’s not changing. We’re gonna still keep improving that; that’s not changing. We’ll keep developing partnerships. One of the ones we’ve just formed recently was Typesense. Behind the scenes Jerod and Jason Bosco are like hammering out some cool stuff with Typesense for our search, and that’s so cool. But these things are gonna keep continuing, we’re gonna pause the podcast, essentially. The extra is changing, and we’re regressing back to the normality, essentially. The opportunity to put your explorer hat back on, put a smile back on your face, and leverage your time so wisely.
[17:49] Exactly. That’s exactly right. And in a way, we are kind of going back to the beginning from the shipping side of things, because we have a huge improvement that went out in the last two and a half months… And there’s even more amazing stuff coming out in the next two and a half months, so on like the next Kaizen, in the time period. And it means that I will have more time to do a better job of that; focus more, do more… And obviously, that means for me CI/CD as code. So we are going back to the initial idea of like “Hey, how do we get Changelog out there? How do we use –” for example, back in the days it was Docker, for deploying on Docker Swarm, running on Linode, set up with TerraForm. Or was it Ansible? I think it was Ansible.
It was Ansible and Concourse CI.
There we go. Concourse CI. Exactly. So in a way, we are back there, right? It’s the continuation of Concourse CI, it’s the continuation of that… There is a PaaS now, which is Fly… But again, it’s going to be a lot more. Integration with services… And I know that Jerod is missing certain things… And stuff is coming, but for that, we need more time.
So describe to us this big update, this big improvement that you did over the last two and a half months. I think we touched on it in Kaizen 8, but it wasn’t finished… Now, this was a Dagger version 0.3, I believe… First of all, explain what the improvement is, and then you can get into what you had to do to pull this off, and where it’s going from there.
So Merry Shipmas - I keep coming back to that, episode 33 - we introduced Dagger in the context of Changelog. What that meant is that we were migrating from Circle CI to GitHub Actions. Rather than trading one YAML for another YAML, I thought “Wouldn’t it be nice if we had CI running locally first, and remotely next?” And remotely would be via a very thin interface. That interface with Dagger. You can run it locally, you run it in whatever CI you have, invoking the same command, and the same things will happen, because your CI now runs in containers. And I don’t mean CI like the actual operations. That was November 2021.
Beginning of 2022 I joined Dagger. We did a lot of improvements, and end of last year, which was just a few months ago, we released SDKs, which means that you can write your CI/CD system, your pipelines, in code. Whether it’s Python, whether it’s Go, whether it’s Node.js, it’s no more YAML, it’s no more weird things, weird configuration languages, that some perceive weird… It’s the code that you know and love. So what that means is that now you can write proper code, that declares your pipeline, like all the things…
[21:56] And I say “declares” because it’s lots of function calls. Sort of like lazy chaining, which eventually gets translated into a dag, hence Dagger, the name. And then, everything gets materialized behind the scenes. Some things are cached, naturally, other things aren’t.
So that means that right now we are in the phase where, from Dagger 0.1, which is using CUE, we now have Go in our codebase. And I want to know how do you feel about that, Jerod? How do you feel about having your Elixir spoiled (hopefully not) by some Go code?
No, I feel good about it. I feel like a renaissance man. We have all these different things; we taste of the best Elixirs, and we also can just pull in some Go when we want to… I mean, that’s diversity, that’s inclusion… I’m happy about it.
That’s amazing. So no more YAML…
Also happy about that…
No more CUE… No more makefiles.
I was going to learn CUE. I don’t have to learn CUE now.
Exactly. You have to learn Go…
No more makefiles. Zero makefiles.
Now you got me.
Yeah. The top one went, and the others will disappear as well from the subdirectories when we finish the migration. So there’s no more top makefile.
Okay, so where do I go? I look for a .go file, it’s in there somewhere, to look at what’s going on.
So everything Dagger-related is in mage files.
Okay. And mage is Go’s version of make, or rake, or like a task runner thing?
It’s just like to invoke things, just to have like different entry points… So for example, right now we have three entry points. The first entry point is the Dagger version 0.1 legacy, where we can run the old pipeline. 0.1 is 0.3. That was one PR. So we had PR 446, where we run the Dagger 0.1 pipeline, the CUE one, and 0.3 using the Go SDK. So the entry point is Dagger version 0.1 :shipit. And that wraps the old pipeline.
There’s also a new - again, this is like image, so it exposes… I mean, you can think of those like subcommands. It all bundles up in a binary, and it has like different subcommands. And if you don’t provide any command, it’ll show you “Hey, you can run these things.” That’s in essence what it is.
So we have image is a namespace runtime. So we can now build the runtime image using Dagger version 0.3. Not only build it, but also publish it to GHCR. And that is pull request 450. So now we are building and publishing the Changelog runtime image to GitHub Actions. Sorry, using GitHub Actions, or within GitHub Actions, using a very thin Dagger layer. And all it does is basically just go run. Go run, the main Go file, and the command is image runtime, and off it goes to GHCR. So if you go to GHCR.io/thechangelog/changelog-runtime, you will see our image in all its beauty. What does that mean? It has a very nice description; we’re making use of certain labels that the open container spec has. So there’s like a specific label to show the description in GHCR.
So GHCR - that’s GitHub’s deal, right? That’s their registry.
GitHub’s Container Registry. That’s it.
Okay. I haven’t used this before, so I’m a newb here. I’m used to Docker Hub. So this is like GitHub’s version.
Oh, I’m looking at this Changelog runtime, and it has an emoji next to it…
How beautiful is that? [laughter]
Gerhard got some emoji in there… So you’re already talking my language…
Elixir version 1.14.2, so you see the description… I mean, you can see the version that we use in the actual tag… And that’s what we’re using in production right now. That went out this weekend.
So we’re using that runtime image.
Okay. And this was built via Dagger, inside GitHub Actions?
That’s right. Yup.
And you can also run it locally, if you want.
When you run it locally, are you running it inside Dagger? What’s the terminology here?
[26:04] Okay, so you’re running it – so it runs Go on the outside, it provisions a Dagger engine inside Docker… Because if you have Docker, it needs to provision like the brains, if you wish, of where things will run… So by default, if you have Docker, it knows how to provision itself. When the Dagger engine spins up, all the operations run inside Dagger engine. The really cool thing is, if anything has been cached, it won’t run it again. So imagine our image, when you pull down our image… So when we build this runtime image, obviously we have to pull down the base one, which is based on the hexpm image, and that’s from Docker Hub, then it needs to install like a bunch of dependencies… And by the way, all that stuff - I mean, if you look at… I have to show you the code. This is too cool, Jerod. Check this out. So if you go to the pull request 450, and if you look at image files, image, image.go, look at line 50 to 61.
‘build. Elixir(). WithAptPackages(). WithGit(). WithImagemagick().’ So this is like a chain of function calls that you’ve named nicely…
That’s it. And you can mix and match them in whichever way you want. So when, for example, we convert the rest of our pipeline to Dagger 0.3, we’ll do build, we’ll take Elixir, with packages, and whatever else we want. And when we want to publish the image, we can chain, again, the function calls however we want. For example, we do not want with Node.js when we publish our image, but we do want with Node.js when we build or compile our assets. So this way, we can chain all the functions, get all the bits from the various containers, various layers, assemble it, and make sure that all dependencies will be the same. Because with Node.js knows exactly which Node.js version we do; and it doesn’t matter where you call it from. And because all the operations are cached, they won’t rerun. Some of these can take a really long time, by the way… Anyway, so I’m super-excited about this. So this is – and by the way, Noah, if you’re listening to this, I’m very curious to know how much easier it is to bump our dependencies with the new approach.
I was just going to ask that, because I’m looking at line 16, it says elixir version equals, and then it’s a string, 1.14.2.
Can I just change that string?
And that’s it?!
That’s it. Change the string, commit and push, and the CI will take care of the rest.
Whooo-weee!! Now we’re talking.
Oh yeah, baby.
I’ve asked you for this for years. Like, can I go to one place in the code and just change the version, and it’ll be done?
That’s it. And there’s like more and more stuff that we can add on top of that. For example, we can change the local files. You know, we still have, in contribute.md, if you go that – by the way, that was updated as well to tell you how you change things. So that was updated to reference the new files. Those steps, we can start removing them, because we can automate more and more of that stuff. So we can, for example, go and update the Elixir version in the readme, in the contribute.md, wherever we have it. It’s all code, at the end of the day. And it’s not scripting.
Meaning it’s only in the readme? Like, you could have it in the readme only?
Meaning that it will only be in the image go. That’s it. When you bump it into image go, and the pipeline runs, it will update all the other places.
Oh, it’ll update the readme for you.
I was gonna say, it’d be crazy if you actually just had that version in the readme, and it read it in the image go… Which you probably could do, because it’s Go code.
It could do that. Yeah, it could do that.
That doesn’t sound smart, but it just would be interesting.
[29:45] Yeah, no. You want it to be in code. You want it in code. And not to mention that when it’s in code, by the way, we can have – again, we still need to figure this part out, I suppose… But we could have things that automatically bump it. When a new version comes out, it bumps it in code, the pipeline bumps it everywhere… And because the pipeline runs, it checks if the new version works.
And then opens up PR and then we can just merge?
That’s it, Jerod. That’s it. That’s it.
See, it’s stuff like this that gets me really excited. [laughs]
You’re getting me.
Okay. So that’s cool. How does that play into the other thing which happened recently, thanks to Chris – and by the way, by the time this episode goes out, we will have shipped an episode of the Changelog with Brigit Murtaugh from the Dev Containers spec, from the VS Code team, talking all about this, in which Chris gets multiple shout outs. So he’s probably getting sick of hearing us talking about him at this point. He opened up a pull request allowing us to run our codebase on Codespaces by adding a devcontainer.json. So thanks to him for that. He’s using a Docker Compose file and a little bit of JSON, and you can just like say, “Open in Codespaces”, and it’s super cool. How do these changes affect his work, if at all, or what’s the integration there? Because now we have like a dev environment, we have this image that you’re changing the way it works…
Yeah. It all builds on top of it. This is brilliant.
This is brilliant… [laughs]
It is. And it’s not me, it’s the combination of people that came together, right? I wasn’t expecting Chris to come along.
That was great, it was amazing. So based on that - that was pull request 437 in our code base - I did a follow-up, 449, which basically changes the reference in the Dev Container with our runtime image, that is now pulled from GHCR. And because we’re running GitHub Codespaces, that will be very fast. Much faster than if you’d pulled it from any other registry. So that was another reason to go to GHCR.
So that works currently?
That’s how it works currently. If you go and open the file - come on, let’s check it out.
Because I just did it last week in preparation for that conversation with Brigit, and one thing I noticed is pulling from Docker Hub, just the entire – the first running Codespaces experience. I mean, it’s probably five to seven minutes, you know…
That has improved. The pull request that I mentioned, 449 - it no longer builds it; it references the already built runtime image. If you check out in the Dev Containers directory, if you look at the Docker Compose file, line five, now it has the image reference. So the runtime image is no longer built; the runtime image reference is pulled. So it shouldn’t take six, seven minutes anymore. It should be instant.
I’ll try that again.
There you go. Let me know how it works. But if not, we’ll work on it some more. And all this stuff, all these things, we can start templating. Once we get it in the pipeline, there will be a single place where we declare those versions. As soon as the image builds successfully, and because we go through the process in the pipeline, we can start modifying all these other places, then build the production image, try and deploy it, and if it works, we’re done. Merge the PR… We’re good.
Who else is doing it like this? How state of the art is this?
I don’t know. I would say it’s pretty cutting edge… Because we are redefining the CI/CD with Dagger. We really are. I mean, the CI/CD as code - forget like any weird languages… And some of the stuff that we have coming - I can’t talk about all the things… But I’m like six months ahead, and I’m so excited to be there.
For example, last Friday - it was just a few days ago - we shipped services support. It’s an experimental feature. If you’re listening to this, you’re not supposed to use it, so please don’t, because it may be broken in a number of ways we don’t know… But Changelog will be the first one to use the services support in Dagger. What that means is that we will be spinning up a PostgreSQL container that we need for our tests inside dagger, inside the Dagger engine, because it now has a runtime.
And what are the ramifications of that?
[33:57] Well, you spin up containers in code. Just as you write your code, you can say, “Spin me up a PostgreSQL container”, and when it’s spun up, connect it to this other container where the test will run. You can have the waiting – I mean, we used to do nc, netcat, for heaven’s sake, to wait for the PostgreSQL container to be available. There’s like services support, there’s like ugly YAML… All sorts of weird things.
Let’s not knock on netcat, Gerhard. Come on. Sweet tool.
No, it’s amazing. I love it. It is old school. It’s amazing. But what’s not amazing is that you have to – you’re forced to combine scripting and YAML.
To wait. Yeah, you’re waiting for a service to be ready for you.
In a weird way. Exactly. Rather than doing it in code. Why wouldn’t you do all these things in code? Because now we can start orchestrating containers. But orchestrating for the purpose of CI/CD. Let’s be clear about that.
So we’re going to be like a poster child for Dagger, aren’t we? I mean, these people have to love us. We’re using all the bleeding – I mean, by these people, I mean you people.
I love you. I’m Dagger.
I know you are. [laughter] That’s cool, man. I love that we’re a testbed for cool new things. And we’re definitely right there on the edge… I wonder how much bleeding we’re gonna do. Well, we are defining it. Well, we’ll find out… And by the way, you have the right person to fix it, who does the work. [laughs] Isn’t that the whole point?
Yes. Alright, cool. Exciting times. I’ve always wanted to have one string in my codebase, in which I could update the version of Elixir.
And then docs, too. That’s so cool. Updating docs is a cool thing. Still docs suck; especially a readme. Like, when you go to the readme, it’s like – I’ve gone there recently with other things I’m working on… It’s referencing the old release< for example, in the readme. It says in the installation instructions, which you go to immediately, but it’s referencing an old release. But if you go to releases, there’s like two new ones, for example. But the documentation is out date.
It could always be outdated.
So is every – so because we do basically master branch base deploying, is every push to master a release, effectively?
Yeah. That hasn’t changed in years. Since I’ve been around, that hasn’t changed.
Right. What about on PRs and branches? How does that work?
We don’t deploy. So we now run tests, by the way… We didn’t use to run tests in pull requests. Oh, dang it, I don’t know how I overlooked that thing…
We just close them all, yeah. [laughs]
Yeah, yeah, yeah. So that was actually one of the first things, pull request 436. So since pull request 436, which by the way, happened in the same Kaizen, since Kaizen 8… We are now running tests for every pull request. And we do that by basically leveraging the built-in Docker engine in GitHub Actions… Which is a bit slow, and it doesn’t have any caching… But it means that we are running all the pipelines, including building a runtime image, but not publishing it, because there aren’t credentials to do that, with every pull request. So while we don’t deploy on every pull request, we could…
Which would give us deployment previews, effectively.
We absolutely could. That’s it. That’s it, yup. And the nice thing would be - I think I’m very keen to try and do that in Dagger. The reason why I’m keen to do that is because of the services support. I’m pretty sure when they were designed no one thought about this, but we can have longer-running environments. So basically, we have a CI that is like one action which won’t stop until you’re okay with it. So how do we figure out routing? I don’t know. I’m really keen to explore that.
We could run a very lightweight version of the Changelog in the context of the CI/CD, in the context of the pull request. Because it doesn’t have to serve a lot of traffic, it doesn’t need to be anything big… The CI/CD is already there. You have a VM where you’re running the actual code for your tests. So why wouldn’t you run a longer-running process that exposes Changelog?
You’re blowing my mind, Gerhard. I’m not even –
[38:00] That’s a crazy idea, right? No one has thought about that before. [laughs]
See, I told you - six months from now. It’s the future.
Okay. Well, that’s exciting.
So when a pull request opens, basically, the GitHub runner that runs all the various checks, one of them, we basically keep it running for longer; or we don’t even use GitHub runners at that point. So one of the things which we run - we spin up a Changelog, a preview one - we still need to figure out the data part - that will be accessible publicly. We get a random URL that you can hit, and then you can connect to that instance. And that instance runs within one of the CI workers. When the pull request is merged - I mean, one of the checks… Again, I still need to figure out how to do this, but one of the checks, basically, will not finish until the pull request is merged. And that check in GitHub Actions - that’s the one where you can access the Changelog, the preview version.
So literally, you’re running a preview in CI/CD.
I’m going to need a new diagram…
Infrastructure.md is the place to go to our repo to see how everything wires together, and that’s the one that I intend to update as we will have this new stuff. So infrastructure.md is fairly accurate right now. I think the only thing missing is GHCR, and the reason why it’s missing is because I’m migrating the rest of the stuff to GHCR. And once that will complete, it will be weird to see both Docker Hub and GHCR. So we’re in a transition period. Once the dust settles, the diagram will be up to date. But again, that’s the only thing which is missing. Everything else is accurate. Fly, Honeycomb, Sentry… Everything.
Very cool. Very cool.
So what about you, Jerod? I know that you’ve had some improvements in mind. Some of them I think you’ve already done since Kaizen 8…
Which ones do you want to talk about? There’s many, I can tell you that.
So a lot of my time, Gerhard, as you know, has been spent on rotating all of our secrets, first of all.
Oh, my goodness me. There were so many. [laughter]
So LastPass, thanks for nothing… Well, thanks for a few good years; and then we’ve lost confidence. So we are 1Password users as a team now, which we talked about for a few Kaizens, and finally made that migration. And then we decided, because of the LastPass leak, and the fact that we’re all on 1Password now, it’s a great time to just go through and do a key rotation, right? Just rotate all of the things… Which was just a lot of things. Like, man, we’ve got a lot of secrets in there, lots of integrations… And mostly harmless. There’s a few fallouts, as there tends to be, with just that many changes; things that went wrong because of that. The biggest one was our stats system went down for a few days, because AWS credentials existed in one place correctly, but the other place incorrectly, I think… And then secondly, Changelog Nightly actually stopped sending, because I didn’t update the Campaign Monitor API key on Nightly, which is an old Digital Ocean box from way back; it still just runs dutifully, every night, on a Digital Ocean box…
So I updated our Campaign Monitor API key inside of our app, and in Campaign Monitor, but I didn’t rotate it over on the other server. And so it failed to send. It was still generating the emails, just not sending them, which is key; it’s a key part of it. So there was like a few nights where Nightly didn’t go out until I realized it, and I was like “Oh, that one makes total sense.” You and I also teamed up on a few things…
…which is always fun.
Issue 442 for anyone that wants to see all the things we have to go through. We had 79 tasks to complete. And some of the work quick, but just like untangling all that… We cleaned up a lot of stuff, and again, it was like almost like a spring clean; even though it was January, it was definitely like a spring clean for secrets.
[42:13] Yeah. You don’t realize just how many service integrations you have until you go to rotate all your secrets. And then it’s like “Holy cow. Slack. Campaign Monitor. GitHub. Fastly AWS. GitHub.”
Yeah. GitHub twice, by the way. You said GitHub twice, because GitHub is used twice you have NPI token [unintelligible 00:42:30.06]
Same thing with Slack. There’s like two different Slack APIs that we use. One’s for the invites, which is like this old legacy thing that was never an official API, how you actually generate an invite. And then everything else is like for logbot, which is our Slack bot that does a few things. Yeah, there’s just so many of them. And then it’s just like – it’s just an arduous process. So this is why my personal private key is years old at this point, embarrassingly.
We have to rotate it again. You won’t be able to SSH into things. Good thing is you don’t need to SSH anymore. Isn’t that a relief?
That is nice. We’re getting better on that front.
Flyctl ssh console…
I do enjoy that, yes. So that was one big piece of work… The other thing - Adam, you mentioned it; it’s in flight right now - we’re swapping out Algolia for Typesense, which is a very cool C++ based search index, search engine, open source, that we had on the Changelog… Jason Bosco, we had him on the Changelog last year. I really liked the guy, got really interested in the product. Algolia has been kind of – we were on the Algolia, and we still are on the Algolia open source plan, which sets us a limit… And so when we’ve hit that limit, and we’ve been putting new things into the Algolia index ever since, but it won’t search them until we upgrade our plan… So we’re happy to be replacing Algolia with Typesense. Of course, that’s an open source thing, but we’re working on a partnership with Jason and his team, so that we’ll be using Typesense Cloud. All that’s very close to at least being swap-out-ready, and then we’re going to build from there and start to use some of the things that make Typesense interesting. So I’ve been coding that…
And then the third thing is trying to rejigger the way that our feeds are generated and cached and stored in order to get to this clustered world of multiple nodes running the apps, without having to change the way we use Erlang’s built-in caching system, because I’ve just had some issues with that… And I just started thinking, “Why are we caching stuff if we have a very fast application, that can just run close to the user? Let’s just figure out a way not to cache stuff as much.” But we have these very expensive pages, specifically the feeds: Master feed, Changelog feed… I mean, the XML that gets generated is like 2.3 megabytes. It’s not going to be fast on any system, unless it’s literally pre-computed.
So I started thinking about different ways of pre-computing and storing files on S3, and fronting that… And there’s just lots of concerns with publishing immediately; we like to publish fast. And we even had a problem - thanks to a listener who pointed it out - with our Overcast ping, because Overcast as a specific app allows you to ping it immediately on publish, and they’ll just push notify, and people will get their things immediately… Which some people really like that. I’m always surprised - there’s some listeners who listen like right when it drops, and there’s others who listen like six months later. And that’s all well and good, but for the ones who want it now - it’s cool, we add the Overcast Bing. Well, there’s an issue there, because Overcast pings, but we’re caching our feeds for a few minutes, maybe just a minute. And so Overcast says there’s a new episode, and so you click on it, and you go there, and there isn’t a new episode. And then you refresh, it’s not there, then you refresh, it’s not there, then you refresh it and it is there, and it was like 60 seconds… Because we’re caching.
[46:14] So I just turned that thing off and thought, “Well, people can just wait for Overcast to crawl us again, for now, but I would love to solve that problem…” And so then I started thinking, you know, we already have a place where we store data, that’s a single instance, but is a service, so to speak, and it’s called Postgres. And instead of adding like a memcached, or Redis, or figuring out these caching issues inside of the Erlang system, which was not trivial in my research, I was like “What if we just precompute and throw stuff into Postgres?” And I did a test run of that, the feeds; just the feeds. And just turn off all other caching, because I don’t think we actually need any other caching. It’s just like, I already had caching setup, so I cached a few popular pages… But what if I just did it on the feeds? And every time you publish, you just blow it away, rerun it, and put it in Postgres. And you just serve it as static content out of Postgres.
I did some initial testing on that locally, and it’s like consistently 50-millisecond responses with like Apache Bench, it was not a problem. It’s never super-fast, like what you get with Erlang, where it’s like microseconds… Which I always like to see those stats. But that’s not what we need, right? Consistently 50 milliseconds is great.
Without any caching layer. I mean, you’re basically just pulling it out of Postgres and serving it. Very few code changes… It just felt “Okay, this is kind of a silly idea, using Postgres as a cache effectively, but what if it just works, and it’s simple, and we don’t have to add any infrastructure?”
So I want to test that sort of in production, I kind of want to roll it out and run it, and then easily roll it back if it’s not going to actually work in production… But I don’t really have the metrics, I don’t have the observability. I have Fastly observability through Honeycomb, but I’m lacking the app responses [unintelligible 00:48:10.20] observability, which is really what we want. We don’t want Fastly to be waiting on the app all of a sudden, and the app to be just bogged down on other requests. And so that’s where I came back to you and said, “This is what I would like to see… Can we get Phoenix talking to Honeycomb in some sort of native fashion?” And then I found this OpenTelemetry thing, and I stopped right there. So I will let you respond after that long monologue.
No, no, I mean, that’s exactly it. I mean, we knew we wanted to do that. It’s like another experiment which I wanted to continue with… And I’m so keen to get back to it, to see how that integration could work. That was on my list for as long as I can remember, and I’m so excited to be finally doing it. We’re finally in a good place to do that integration, and I’m fairly confident that we’ll be able to talk about it at the next Kaizen.
Ha-ha! He said it.
[laughs] On the next Kaizen…
There you go. In the next Kaizen.
Okay, so we have it on record; there will be another Kaizen.
Not just a hope and a dream.
We just need to figure out where.
So if I understand this correctly, Jerod, you’ve done this work, but you haven’t done it in production. So you need a way to test it in production, essentially, to see how it responds.
I spiked it out on a branch, and then it was just like “Okay, this is certainly feasible” And then I did some rudimentary benchmarking of that branch, just to make sure it’s not crazy dumb… And then I’m like “Okay, this is feasible, and I know how to bring this into official code.” I can definitely transition what I coded, or even just rewrite it in a way that’s maintainable if we decide to do it. But I’d really like to know if it’s gonna be really dumb, or just kind of dumb. I feel like it’s just dumb enough that it just might work… And be so simple, and solve a problem in a way that’s just awesomely dumb. But I don’t want it to be so dumb that it’s not gonna work… [laughs]
[50:10] That’s the real spirit of Ship It. We literally have to get it out to see if it works. Like, what happens.
And then I was like “Well, what I lack is metrics.” So I can observe it for a few hours, get some confidence, leave it in, or be like “Holy cow. It worked great in dev, but it’s not going to work with a real load.”
I have a question for Adam… So Adam, I think this may be the moment to tell us again about the benefits of feature flags.
I almost mentioned it there. I was like “I don’t want to have egg on my face by mentioning feature flags…” Because I know Jerod has sort of been resistant to some degree against it… But there may be a simpler way to do this, but I think that that’s essentially what you want to do. You want to test this in production, on a limited set of users. So it could be scoped to admins only, for example.
No, because I want to load-test it. I want the full load, is my issue.
But it could be like maybe 50% of the requests, and you can compare them. So 50% of the requests, 50/50…
…going to the old one, 50 to the new implementation, and see how do they compare over the course of maybe a few days…
Yeah, we can do that.
So Adam, how do we get feature flags? What do you think?
Where do you stand on that?
Well, if we’re doing 50/50, can’t we just do like an if statement, with like random divided by two? [laughter]
Sure. “If it’s an even second, do this. And if it’s an uneven second, do the other thing.” [laughs]
If it’s an imperial unit, or if it’s the metric system… Is this the metric system, or which system are we going to use here?
Luckily, seconds only exist in one… [laughs]
I know Adam’s been keen on feature flags, and I feel like this is his big moment to introduce some sort of subsystem.
I think so too.
I mean, I don’t feel like I have a system to pitch here… [laughter]
No, I remember the conversation, Jerod. That’s why I keep going back to it. Because we didn’t have a good answer for Adam, and we were both against it. So maybe now it’s coming back, and maybe now it’s a yes, because it was a definite no back then.
We were premature. When I tried to pitch –
The insider story here, listeners, is there was – my initial pitch for us using feature flags fell on deaf ears, essentially, because we were premature. We just didn’t have the need for it. We were trying to find a use for it, and if you follow Kaizen, and Ship It, and what we’ve done, then you know our application is pretty simple. We don’t have a lot of developers developing on it, so there’s not a real need for an immense feature flags feature and/or service to use. LaunchDarkly was our friends for a while there… I’d still say they’re friendly, but they’re not friends. We’re not working with them directly anymore.
We do have a new sponsor coming on board, DevCycle, which is in the feature fly business, which - you know, if you wanted to use it for this one instance, I’m sure we could do something. So I mean, there is an opportunity there, but… That would be my pitch. I feel like if it’s just this one off though, then the if statement probably works.
Well, I’ll let you know when I get this far. What we need first, I think, is the observability. Because either way, if we do it 50/50, we want to see both results.
And so right now I can’t see any results, besides sit there and stare at the log files, and look at the request responses… Which was a side effect, actually, of one of our recent changes - our log files just stopped logging. I got it fixed, but that was funny. So I’m like “Wait a second, there aren’t any logs.”
How can the Changelog not log?
That’s just like against the laws of nature, essentially.
Well, I’m not gonna git blame that one on the air, because I don’t want to embarrass Gerhard, but… I fixed it.
That’s okay, I can’t get embarrassed. [laughter] I can’t, because I’m going to learn something new out of this.
There you go.
So tell me the commit where this was introduced, so that I can understand my mistake. Seriously.
[54:00] So the code that fixes it is in commit f19c9cf, where I basically changed the application file to basically turn the logger back on. So I think you were overly aggressive when you were – you were removing a few things… We removed PromEx, because we’re not really using Grafana anymore… And you just deleted too much code. And the code that you deleted would, if we’re not in IEx, turn on the default logger. But you deleted it, so there wasn’t a default logger, and so it wouldn’t log anything in prod at all…
…and you didn’t notice.
Yeah, that’s right.
And I didn’t notice, and so I just thought, “Well, I’ll just go see what’s going on in production”, and there was no logs there. So I actually just put that code back in, that you had deleted, is all.
Right. So hang on, let me try and understand this code… That’s what’s happening right now. I’m trying to understand some Elixer code live, as we are recording this… I’m looking at the application.ex, line 32, ‘unless Code.ensure_loaded?(IEx) && IEx.started?() do’ Which of those two lines disables logging? The 33 or the 35 one? Oban telemetry attach default logger?
No, that’s not the line. Look at endpoint.ex line 60. Plug.telemetry. That’s the line where you basically remove the telemetry plug.
Okay, okay, okay. I see. So the telemetry plug logs.
I see. Okay.
The logger uses the telemetry plug to do its thing.
Right, right. If it would have been plug log. I don’t think I would have made that mistake.
But yeah, cool. Okay. That’s good to know.
So yeah, it was an easy mistake to make. And I know how it is when you’re removing stuff. You’re like “Oh, this we don’t need. This we don’t need.” And I think it was just that one line…
…just turned that off, and we didn’t notice because we weren’t really looking at production. Now, had we been sending it over to Honeycomb and observing it, we probably would have seen the drop-off immediately, because Telemetry would have been turned off there.
Yeah, that’s right.
So I think the Honeycomb integration will use this OpenTelemetry plug as well, when we do it. So that was the line that did it; it wasn’t the other one. There was a few other things that you also removed, I put them back in, but that was like Oban stuff. Not a big deal. It was just over-aggressive deletion, which is totally normal when we’re like “Let’s –”
Probably. I deleted too much.
Yeah. When you’re in like “Let’s delete stuff” mode… I know how it is, because it feels so good.
Okay, okay. Okay, okay.
So there you go.
Cool. That’s good to know. So who reviewed my PR?
Do you see where this is going? [laughter] Cool, great.
Well, it wasn’t me… Clearly…
I merged it, but I didn’t review it.
I think I waited for a while and said, “You know what - I’m just gonna push this through”, because that’s how we roll.
There you go.
No, that’s fine. That’s fine.
No, even if I reviewed it, I must have not reviewed it very well, so… You know…
That’s okay. Yeah, it was an honest mistake.
On both our parts.
On both our parts.
I want to chase that rabbit down… I’ve got a question for you. So once we put this experiment into production, Jerod, what’s going to happen? Can you come back to the beginning, where if we get this potentially smart Postgres feature out there… Let’s say it’s successful. What happens? What happens as a result of that being successful?
So what happens is every single request that goes to one of our feeds will be served live from Postgres, from what I call like a feeds cache inside our Postgres instance. So it’s effectively – it’s as if it was reading off disk, but we don’t have a disk, because we’re in Fly land… But it’s just on disk inside of Postgres. And so it goes out of Postgres, goes out live, so every request is immediate… And then every time that we change something that’s going to change the feeds, we blow that one away, and we rewrite it, and so we recompute the feed. It’s basically a cache inside of Postgres, because that’s already our single source of data. Whereas if we did it anywhere else, we’d have to have a shared data source etc.
[58:03] I think what’s more important is that this enables us to run more than one instance of Changelog.
Right now, because of how caching is done, we can only have one instance of Changelog. And we have been on this journey for quite some time now. Right? If you remember, we had a persistent disk. So we did have a local disk. But when we had that, it meant that we could only have a single instance, because all our media assets were stored on that one disk. So we pushed the media assets to S3, and now we could have more than one. But then the next thing was like “Oh, dang it. The caching.” So once we solve the caching, we can run more than one instance, we can spread them across the world, we can serve dynamic requests from where users are, rather than everything going through the CDN, and the CDN really only caches the static stuff. And even then, it has to timeout. That’s why we have also like the time, because the CDN and also caches for about 60 seconds.
Right. Yeah, the other thing lets us do is serve different feeds to different requesters. And so here’s why this might be interesting… So Spotify specifically supports, allegedly - I haven’t seen it working very much… They support chapters, if you put them as text in your show notes, using the YouTube style timestamps thing. So I just put it in for everybody at this point. But it’s silly to put it into the show notes for listeners who have regular podcast apps that support chapters the way that you should, not because they’re Spotify.
Well, we could just serve from using this system. We could have two different versions of the feed, both put into Postgres, use the request header to identify Spotify, because it has a standard request, and serve a slightly different feed to Spotify than we serve to everybody else, and give them those timestamps. So you get the chapters over there, but you don’t clutter up your feeds for everybody else. And you can’t do that very well with caching, because it’s like “Well, we’ve got a cached version”, right? And the requests never hit our server; they’re just Fastly. And maybe you can put that logic inside of Fastly, but now you have to point it to different places, and manage that whole deal…
And so this also enables that, where you can basically have N caches per request, and serve the right one dynamically, but still have it precomputed. So it’s kind of the best of both worlds. By the way, to our listener, I realized this is kind of a dumb way of doing it. If it’s super-dumb, and you have reasons why, please, tell me, because I’m about to roll it out… [laughs]
“I’m about to roll it out…!”
I don’t think it is.
Why is it dumb? Why do you keep saying this? Why do you think it’s dumb? What’s the logic behind it being dumb?
Storing precomputed text inside of Postgres - it’s somewhat large. I read some – like, how big is too big, and it’s like 2.3 megabytes in a Postgres record. It seems like it’s fine, actually, but once you start getting up to like 100 megabytes, now you’re in trouble. We’re not going to make it there with any of our documents. But maybe even at 2.3 megabytes, at scale it’s just going to read too slow. I don’t know, it seems like a very low-tech, kind of silly way of doing it… And so maybe it’s just lack of confidence, is why I think it sounds dumb.
I think this is a step in the right direction, because Fly brings the app closer to the users.
And Fly really makes it less necessary to run a CDN, or maybe completely unnecessary, depending on the case. If we want to depend less on the CDN, which I think is a good idea, and if we distributed our apps around the world, that means that we can rely less on the CDN - which by the way, had like all sorts of issues which we are yet to solve - and serve directly from our app… So basically, we are reverting back, putting changelog.com behind the CDN. And we had to do that, because we had a single instance, we had all sorts of issues related to that… But now, if we have multiple instances, one per continent - again, depending on where our users are - we no longer need to depend on the CDN as much as we did before.
[01:02:12.28] And by the way, Fly itself, it has a proxy, it has a global proxy, which means that depending on where you are, those edge instances, they will connect to the app instance which is closest to the edge. So then we are pulling more of that stuff in our app, which makes us be able to code more things, as Jerod mentioned, pull more of that smarts in code, rather than in CDN configuration or other things… Which are very difficult to understand, very difficult to troubleshoot… I mean, we’ve had so many hair-pulling moments. That’s why we have so little hair [unintelligible 01:02:46.00] sections, going like “Why the hell? How does this varnish even work, because it doesn’t make any sense?”
Right. And we built our own little version control inside of Fastly, between Gerhard and I, by adding a comment and putting whose name it is at Last Edited, which we would love to just have our actual programming tooling.
It seems smart…
If it takes us to where we wanna go, I agree with you 100% that having our app be its own CDN, so to speak, closer to all the users, which is what Fastly is giving us, at the app level, then it can be dynamic in ways that is possible with Fastly, but it’s just cumbersome to this day.
Yeah. And I guess one more layer here is we haven’t truly embodied the vision of Fly, which is our app close to our users, because of this cache issue. This is full circle; the whole reason for this cache experiment was to be able to bring to fruition that actual dream with no ops, or very, very little ops… But we haven’t been able to do that because of this cache layer.
Well, our app does run close to our users in the greater Houston area… [laughter]
Yeah… It’s actually in Virginia.
Oh, is it?
Well. It shows what I know.
It’s the IAD data center. Yeah.
Yeah. Well, all that to say, getting to this direction is is challenging. I think the logic in this Postgres sounds fine. I mean, if we were, like you had said, above a larger threshold… A couple megs, not that big of a deal. And if the app is close to the user, and there’s one – I’m assuming there’s probably like one or two primary Postgres writes, and then the rest are reads, right? That’s how it would set up, naturally, with Postgres on Fly…
Yeah, the writes would actually happen on publish. The writes happen on edit, not on first request, which is what happens now with a typical caching. First request, we calculate it once. Now we’re not going to calculate it again for 60 seconds. Then we’ll calculate it once. This is actually on write, is when we’re doing the compute, which we wanted to move to.
The other option is to put this on a static file server like S3, and then manage and blow away different files. But then I started thinking, like, we actually like our URLs, how they are, and so then our app would be reading from S3 and responding as a proxy… And it’s like “Well, it was already proxy to Postgres.” I don’t know. But yeah, we would cache on write versus on read, which makes us have immediate changes. There’s no 60-second delay, or five minutes, or whatever you send it to.
And I’m in that camp. I mean, I listen to our show immediately, as soon as we ship The Changelog at least… I mean, as just a crazy person, whenever you ship something, you want to make sure it’s in production. And the only way to do it is like to test it. And the app I use is Overcast primarily. I don’t think I have notifications on, because I just hate notifications just generally. If I don’t have to have notifications on for an application, they’re off, for sure. But when I do go there, I usually test it on the master feed directly, because… I listen to Master, like you should be. Hey, listener, if you’re not listening on Master, you’re wrong. Or Plus Plus; then you’d be even better…
[01:06:06.04] …because it’s better… But I’m a Master feed subscriber in that regard, and pull to refresh, and it does take a bit for the new episodes to get there, for me at least. So I’m not like I ship it and 30 seconds or a minute later it’s in Overcast. It takes longer than I’ve counted, let’s just say. I haven’t actually sat there and counted. It’s like “Oh, it’s not there. I’ll come back later”, and come back and it’s there.
The one thing about this which gets me really excited is that we will double down on PostgreSQL. So we talked about this for a while… Crunchy Data is what I’m thinking. But it’s not the only way.
In what regard are you thinking Crunchy Data?
I’m thinking a PostgreSQL as a service, that scales really, really well, so then the app is all Fly. PostgreSQL is managed via Crunchy Data. We have a global presence, nicely replicated, all that nice stuff. And then we consume PostgreSQL as a service at a global scale. Our app runs at a global scale, on Fly, and the database the same, but with someone else. Because the PostgreSQL in Fly - it’s not a managed one. It’s easy, convenient, we have a lot of advantages, and it’s been holding up really well since we set it up. No issues. But we can – I mean, if the app is distributed, and if the app gets this level of attention, I think so should our database, because now these are the two important pieces. We scale the app, we should scale the database. I mean, if for example we have all these app instances that connect to the same PostgreSQL instance back in the US, that’s not going to be any good. Right? Reading all those megabytes across continents… That’s going to be slow.
Isn’t that the point though for like the read servers that are distributed?
So we could add multiple PostgreSQL read replicas in Fly; we could do that. Maybe tune them… Maybe. I don’t know. Maybe try and understand better what they do… But maybe, rather than doing that, we can grow up our approach to databases, and go with someone that does this as a service. I know Planet Scale comes up as well… There’s like a couple that we can use PostgreSQL as a service.
But that’s MySQL, Planet Scale.
There’s one which I know is PostgreSQL. Maybe it’s not Planet Scale… What was it…?
I think it’s Supabase. I think it’s Supabase. I think that’s what I’m thinking. Yeah. See? Not enough time to experiment. [laughs]
There is a conversation, let’s just say there’s a conversation. So we may be meeting in the middle, let’s just say. Don’t wanna give too much away.
But dreams… We are dreaming together.
Exactly. And we need to experiment a lot. So that’s the whole point, right? We need to try a couple of things out, see what makes sense… I know Jerod loves his PostgreSQL, the vanilla one, the open source one…
You know, as unaltered as they come.
We’re actually coming out with a T-shirt, Gerhard. It says “Postgres-compatible is not Postgres.” [laughter]
Really?! Okay, I wasn’t aware of that… Okay.
No, not really.
We want to.
Is that the Jerod tagline?
No, that’s actually a Craig Kerstiens tagline.
I do like “Just Postgres” as a T-shirt.
“Just Postgres.” Yeah.
We will be doubling down on that. That’s what matters. And we’ll be improving that part as well. All this is leading us into that direction, and that’s really exciting.
That’s why I wrote this right here… I was writing it right there.
There you go. On a napkin? It’s a thing!
Okay! Now we have a plan.
That’s how all dreams start, on a napkin.
Mm-hm. I’ve been doodling while we’re having this call.
Put some B’s and some dollars as well, while you’re at it.
Yeah, put some dollars on there.
Step one, Postgres. Step two, question mark. Step three, profit.
[01:09:52.19]Or Postgres, change the s into $. That’d be good.
That’s right, I’ll do that.
That’s our business plan. We’re gonna turn Postgres into dollars.
Well, let’s say somebody’s listened this far, and they’re thinking, “Man, this really sucks, okay?”
“I’m here at the end of this amazing episode–” Well, I’m gonna tell you what sucks. I’m gonna tell you. They’re gonna be like “I liked this show. Come on, guys… What’s going on here?” Can we dream a little bit to where this might go, the next version of Kaizen? Can we give them some prescription? Versus just wait and see? Jerod, you mentioned subscribing to the Changelog, which I think is a great next step after this…
Well, I think it makes sense to do our next Kaizen on the Changelog if we don’t have anywhere else to do it…
That’s right. Yeah.
Which is probably likely, right? I mean, we could cross-post it to the Ship It feed, I guess…
Or episode 91 will be Kaizen in two and a half months. [laughter]
Yeah. And so will 92.
That’s also possible. And so will 92, yeah. Or we go straight to 100, and then people are like “What the hell? Where’s all the rest?”
So it’ll be 90, 100… It will be just going 10 to 10. We were just talking about Fahrenheit and Celsius… [laughter]
That’s more of a Celsius thing… 100 is hot. I would say we would publish our next Kaizen on the Changelog feed. Ain’t that safe? That’s probably the safest bet today.
I think so. It’s what makes most sense to me, too.
And stay tuned for more. We’ll have more to say on that episode.
Well, I have one thing which I really have to say, and I have to mention this, because I’ve been trying to get to someone from 1Password since January 15th, when I sent my email, and I haven’t heard back… So if someone knows someone within 1Password that can help with their services account… This is so that we can use secrets from 1Password without needing to run the Connect server. I mean, we will set up a Connect server if we need to, but hopefully, we’ll be able to access the secrets using this new beta feature, which as far as I’m aware, it’s called Services Accounts, that allows us to use the secrets programmatically in CI systems. Right now, we can’t do that without the Connect server. And ideally, I would like to use the Go SDK - and you see where I’m going with this… To use it directly in code, so that our CI will never see the secrets. It’s just a code that connects to the 1Password instance, and it pulls it just in time as the code runs. So if anyone knows someone, I would very much like to talk to them to get this feature, try this beta feature, see how it works. Alternatively, how do you feel about a migration from 1Password? [laughs]
Rotating secrets is my favorite thing to do… Yes, I mean - we want something that works, and works well, so…
We can set up a Connect server. I mean, it’s so easy to set anything up on Fly these days, so maybe we’ll just do that… Which will act as a gateway to 1Password.
[01:13:04.23] Well, we can make something happen with 1Password, there is some opportunity there. So…
Great. That’s the one thing which was on my list.
Let me go to work, you know?
I’m a big fan of 1Password.
I like it too, very much.
And I root for them, in all ways. I’ve been using them for more than a decade. I mean, like just basically forever. They’re embedded in my operations. And now with SSH integrations, and stuff like that - I just love biometrically… And thank you for removing all of our SSH needs, Changelog.com infrastructure-wise, but I still have LAN infrastructure that I have to log into, and biometrically logging in via SSH is just – it’s the way to go.
Yeah, for sure. Yeah. And I was reading this blog post on the 1Password blog about passwordless systems. I’m just going to double check the title… So the blog post is “Pass keys in 1Password - the future of passwordless.” And it was published on November 17th, 2022. So not that long ago. And it was mentioned a couple more times.
So I think that’s a really cool idea… So I really like where 1Password is, and where they’re going… If we can only figure this thing out, it will be even more amazing for us. So no more secrets in GitHub. Yes, baby! That’s what I want.
Should we call it a pod?
I think we should call it a pod. Someone needs to sing something, I feel like… It’s my birthday tomorrow, so…
Happy Trails to you…
See? Told ya.
That’s all you’re getting… Until we meet again.
He tried to sing Semisonic on the –
…on the & friends episode we did. Yeah, you started singing Closing Time. I edited you right out of that, man. I didn’t want you embarrassed… You did not do a good job. [laughs]
All I said was “You don’t go home, but you can’t stay here.”
Well, that’s what happened in the one that shipped.
Behind the scenes, it was worse. I’m just messing with you, Jerod. I’m just being silly.
I don’t even believe you.
With all this time that I’m going to have from not shipping a Ship It episode every week - do you know what I’m going to do instead? I’m going to go Dan-Tan! [laughter] That’s what’s happening…
Oh, my gosh. Dan-Tan… Comes again!!
Every week, I’ll go Dan-Tan. [laughs]
So that’s what’s up.
Oh, my gosh…
I love it.
I’ve got my kids saying Dan-Tan now.
There we go.
Never telling that story again.
Everyone is on it.
Everyone’s saying it.
So that’s my plan.
Sounds good, Gerhard.
It has been good. Thank you.
Always a pleasure. There will be a next one, two and a half months away. Right? Roughly. So I don’t exactly when, but two and a half months away. It will be warm and nice where you are, I’m sure.
I’m looking forward to that… Kaizen!
Our transcripts are open source on GitHub. Improvements are welcome. 💚