In late 2016 we relaunched changelog.com as a new Phoenix application and improved the deployment process. Last year we open-sourced the infrastructure code, and now we are doing one better: we will go over the new changelog.com setup for 2019.
The new features are so good that we cannot keep to ourselves. They leverage the best that our partners offer and allow us to focus on content. Thank you all for making it so easy, especially Marques Johansson for sharing Linode’s newest developer utilities 💚
changelog.com is a simple 3-tier web app
A NodeBalancer connects the application running on Linode to the global content distribution network run by Fastly. The same NodeBalancer terminates SSL. Everything downstream runs as a container in CoreOS Container Linux. We love the OS auto-updates and the immutable filesystem, as well as everything-is-a-container approach.
The NodeBalancer forwards all requests to a proxy container. The proxy bundles legacy assets and redirect rules that never change, and it also serves media files from a Block Storage volume mounted read-only. This block device is the single source of truth for all changelog.com media.
Any request that the proxy cannot serve gets forwarded to the app container. The app bundles and serves static assets. Fastly caches them and then serves directly, meaning no more trips to the app container. Media is saved to the same Block Storage volume used by the proxy. Once cached by the CDN, media gets served from the edge location closest to our listeners. All other state is stored in a PostgreSQL database kept on a separate Block Storage volume.
changelog.com is continuously deployed & monitored
Code commits go to GitHub
All master branch commits trigger a CircleCI workflow that ends in a publish stage. If all previous stages pass, a Docker image gets published to Docker Hub. This image contains everything required to run an instance of changelog.com.
CircleCI builds & publishes Docker images
CircleCI stops at Docker Hub. It has no access to production and it has no knowledge of any credentials used in production. We draw great confidence from this separation. The CI is in no way coupled to where changelog.com runs. Any image that gets published to our Docker Hub repository is a valid production candidate.
Docker service manages app lifecycle
The component that updates changelog.com is
docker service update running in a loop. Docker manages the entire app update lifecycle. This includes promotion to live when the new instance becomes healthy. The running version is not stopped until the new version proves itself healthy. Since rollback involves instant routing updates in the Docker networking layer, a bad deploy is at most a few failed web requests.
Rollbar tracks app deploys and errors
When a new changelog.com app instance starts, it notifies Rollbar of a new deploy. If the new app instance fails to deploy, the healthy one remains running. The new version will enter a loop of start, notify Rollbar, then stop. Because all deploys show up in our #dev Slack channel, we have the visibility and incentive to fix bad deploys. With end-to-end automation, it’s as easy as committing and pushing to master.
Netdata for real-time system metrics
Netdata enables real-time system metrics with a per second resolution. The level of visibility that we get from this open-source product is second to none. If you are curious to see what changelog.com system metrics look like in real-time, check out netdata.changelog.com
Papetrail manages all logs
Logs get aggregated and forwarded to Papertrail. It’s a service that we have used for many years and still find invaluable, especially when things go wrong. Loki from Grafana Labs is intriguing, we plan to explore it in the next release.
Monitored by Pingdom
Pingdom reports all changelog.com downtime via e-mail and in our #sre Slack channel. Some users appreciate our brief comments on service outage notifications. If you are curious, uptime & response times are available at status.changelog.com
changelog.com setup is declarative & idempotent
There are no infrastructure operators at changelog.com: developers are operators.
The app repository vendors all infrastructure configuration and tooling. A
Makefile captures everything required to run and manage changelog.com. Anyone on the team can run
make iaas and converge the entire infrastructure. All automation is declarative and idempotent, there is no risk of unexpected side-effects.
You can learn about the changelog.com make targets - some are very good! - by running
make in the cloned repository. The output lists all public targets with aliases and short descriptions. To find private make targets and other goodies, browse the
The new Linode Terraform provider that Marques contributed makes it easy to manage our entire Linode setup. For DNS, we use the DNSimple Terraform provider. A Docker Stack captures all services that make up changelog.com. A local stack variant enables everyone on the team to run an exact replica of production on their machine. LastPass acts as the single source of truth for credentials. In production, credentials are exposed as Docker secrets only, no environment variables.
The single most important take-away is that changelog.com setup is declarative and idempotent. Control loops are constantly converging on the desired state. CircleCI ensures that all commits to master result in a production-ready Docker image. The App Updater ensures that the latest Docker image is running in production. The Docker daemon ensures that all services are present and healthy. If everything is in the desired state, nothing needs to happen. If a layer deviates from what is expected, control loops will reconcile any differences.
The new changelog.com setup for 2019 enables anyone to contribute in a few commands. Clone the repository and run
make contrib to start a development version of changelog.com. Pull request #246 has all the context.
The future changelog.com
As amazing as the new changelog.com is, we know that it can be better. In no specific order, some of the things that we have on our radar for 2020:
When we started using the new Linode tooling, provisioning a Kubernetes cluster was not as easy as it is today. Jumping from the setup that we ran since 2016 to Kubernetes was too big of a step. We had to find a middle-ground that captured most benefits without being too disruptive to the way we did things. Focusing on what mattered to the business was the strongest reason to defer the Kubernetes dream for one more release. With all learnings gained from this round of improvements, it was definitely the right decision.
Metrics via Prometheus & Grafana
Netdata is great for real-time system metrics, but Prometheus is best for long-term & business metrics. Add alerting and Grafana visualisation and we have the perfect way of tracking all metrics that matter to us. The plan is to aggregate all metrics in Prometheus and display them via purpose-built Grafana dashboards.
Integrate logs with metrics via Grafana & Loki
With the addition of Loki to Prometheus & Grafana, there has never been a better time to run metrics and logging using this setup. We have yet to try Loki, but if Grafana is any indication of what to expect, we are in for a treat.
Improve image builds
The changelog.com Docker image is based on the CircleCI image that builds dependencies and runs tests. We would like to have a production-specific image that uses
mix release. We also want to sign Docker images so that we have higher confidence in what production runs.
Automate service updates
We do not automatically build and/or update service images. For example, when there is a new PostgreSQL image released, the stack should auto-update with no service disruption. Moving to a Kubernetes cluster should make this easy. We would trust the platform’s primitives to take care of always running the latest stable version, without us having to care how it actually happens.
Roll-out HTTP/2 & IPv6
HTTPS is becoming the norm, and HTTP/2 is not far behind. Thanks to Fastly, all cdn.changelog.com requests are already served as HTTP/2, but the Linode NodeBalancer is HTTP/1.1 only. Since the Phoenix application & nginx already support HTTP/2, we are not that far from full HTTP/2 support. This would make the website quicker for all HTTP/2 clients, which means the majority of our visitors.
Even though IPv6 brings less value to our users than HTTP/2, it would be nice to do our part in modernising the internet. Our last attempt to enable IPv6 was surprisingly easy and low-effort, but it resulted in a TCP4 sockets leak in Docker. We want to give this area of the stack more time to mature before we try it again.
SSL via Let’s Encrypt
changelog.com currently uses a Comodo Essential SSL Wildcard certificate which is due to expire in June 2020. We believe that SSL is an essential building block of a secure internet that should be free, a belief shared by everyone behind Let’s Encrypt. It only makes sense to join this community of like-minded people and support them the best we can.
Improve CDN integration
Our CDN integration is partial. changelog.com is susceptible to service degradation even though the bulk of the requests are static. As rare as incidents are, serving potentially stale content is preferable to 500s.
We also know that with the always-changing open-source landscape, there will be many new tools and services appearing that will make us better at what we do. If you know any that are worth sharing, please do so in the comments below, or via our GitHub repository.
I want the same for my product/company - can you help?
Most of my time is spent working on RabbitMQ, CloudFoundry & now Kubernetes within Pivotal. I like what Changelog stands for, so I help them be more successful. I take some number of days every year to improve the setup. I have shared more about this in my Not working together talk, as well as Deploying changelog.com episode. As straightforward today’s changelog.com setup seems, it has been a couple of years in the making. If you enjoyed this post and think that your product would benefit from my expertise, reach out to email@example.com. Until next time, Gerhard.