The code behind Changelog's infrastructure

In 2016 we open-sourced the code behind the new changelog.com, and we spoke about the related infrastructure work in Episode #254, but never open-sourced this code. The primary reason was that making it open source friendly was low in the backlog and never made the cut. While good commit messages were used, and user stories with the right amount of what/why were written, the high level context remained diluted and fragmented.

Today, we are open-sourcing the code behind Changelog’s infrastructure as is, and are giving you the opportunity to influence the way we continue investing our time. We need your help to decide on one of the following directions:

Make the current approach in thechangelog/infrastructure easy to understand and re-use.
Start from the beginning, with today’s best tools, services and practices. All code will be public, developed iteratively, open and available to everyone willing to follow along. It’s a fresh start with your involvement at every step.

How can you give us feedback?

Post your comment to this issue on our Ping repo. Or, if you want to chat in real-time, you can find us in the #dev channel in our community Slack. (Joining is easy and free.)

Why are we doing this now?

After 2 years, I am making time to update Changelog’s infrastructure and want to use this opportunity the best way possible, meaning putting the community needs first. I don’t know what you will find most valuable, so I’m doing the obvious thing and asking you.

What makes Changelog’s infrastructure so special?

It’s easy to assume that managing a production service is mostly about pushing code. While this is the essence of it, it’s just the tip of the tip of the iceberg. Whether it’s Heroku, Cloud Foundry or Kubernetes, there is an immense amount of considerations that is required long before, and long after this recurring event. With the amount of hype surrounding devops/cloudops/serverless, it’s too easy to fall into time traps. From my perspective, these are the things that matter:

Being able to push new code with ease
Have new code in production within minutes
Prevent bad code from taking the service down
New code goes into production automatically
Mistakes are small, easy to understand and easy to fix
I can easily understand the changes being made, preferably locally and instantly
I can test incompatible changes with ease
If everything gets lost, including code, data, infrastructure, it can be recreated within the hour
All components can be upgraded with minimal effort - bonus points for no downtime
All dependencies are explicit and locked down, meaning that we always know what versions are running where
I am aware of all errors that happen in production
I get instant notifications if the production service gets degraded
It’s impossible for the system to be completely down; worst case, stale static content gets served, no writes will be possible
I have a good idea of the experience that my users have
I am aware when my service is running low on resources
Resources change based on demand, without my intervention
I am always aware of my current burn rate and have a clear end-of-billing cycle estimate
The right amount of resources are provisioned at all times, waste is avoided
The system degrades predictably, redundancies are layered
Everything is simple, without compromising availability

This list is not long on purpose, if anything it’s incomplete but sufficient to make my point.

My proposal is to leave all hype in the news section, and show you the pragmatic solution that works for changelog.com. Who knows, it might work for you as well.