The new changelog.com setup for 2019

In late 2016 we relaunched changelog.com as a new Phoenix application and improved the deployment process. Last year we open-sourced the infrastructure code, and now we are doing one better: we will go over the new changelog.com setup for 2019.

The new features are so good that we cannot keep to ourselves. They leverage the best that our partners offer and allow us to focus on content. Thank you all for making it so easy, especially Marques Johansson for sharing Linode’s newest developer utilities 💚

changelog.com is a simple 3-tier web app

A NodeBalancer connects the application running on Linode to the global content distribution network run by Fastly. The same NodeBalancer terminates SSL. Everything downstream runs as a container in CoreOS Container Linux. We love the OS auto-updates and the immutable filesystem, as well as everything-is-a-container approach.

The NodeBalancer forwards all requests to a proxy container. The proxy bundles legacy assets and redirect rules that never change, and it also serves media files from a Block Storage volume mounted read-only. This block device is the single source of truth for all changelog.com media.

Any request that the proxy cannot serve gets forwarded to the app container. The app bundles and serves static assets. Fastly caches them and then serves directly, meaning no more trips to the app container. Media is saved to the same Block Storage volume used by the proxy. Once cached by the CDN, media gets served from the edge location closest to our listeners. All other state is stored in a PostgreSQL database kept on a separate Block Storage volume.

changelog.com is continuously deployed & monitored

Code commits go to GitHub

All commits go to our public GitHub repository. Every commit - including pull requests - trigger a CircleCI pipeline. GitHub checks show if the proposed changes pass all pipeline stages.

All master branch commits trigger a CircleCI workflow that ends in a publish stage. If all previous stages pass, a Docker image gets published to Docker Hub. This image contains everything required to run an instance of changelog.com.

CircleCI builds & publishes Docker images

CircleCI stops at Docker Hub. It has no access to production and it has no knowledge of any credentials used in production. We draw great confidence from this separation. The CI is in no way coupled to where changelog.com runs. Any image that gets published to our Docker Hub repository is a valid production candidate.

For every commit to master, this CircleCI pipeline will resolve dependencies, run tests, generate static assets & publish to Docker Hub. If any step fails, the changelog.com Docker image won't be updated.

Docker service manages app lifecycle

The component that updates changelog.com is docker service update running in a loop. Docker manages the entire app update lifecycle. This includes promotion to live when the new instance becomes healthy. The running version is not stopped until the new version proves itself healthy. Since rollback involves instant routing updates in the Docker networking layer, a bad deploy is at most a few failed web requests.

Rollbar tracks app deploys and errors

When a new changelog.com app instance starts, it notifies Rollbar of a new deploy. If the new app instance fails to deploy, the healthy one remains running. The new version will enter a loop of start, notify Rollbar, then stop. Because all deploys show up in our #dev Slack channel, we have the visibility and incentive to fix bad deploys. With end-to-end automation, it’s as easy as committing and pushing to master.

Application errors and deploys are tracked by Rollbar.
Seeing the real-time impact that deployed code commits have on specific errors is most useful.

Netdata for real-time system metrics

Netdata enables real-time system metrics with a per second resolution. The level of visibility that we get from this open-source product is second to none. If you are curious to see what changelog.com system metrics look like in real-time, check out netdata.changelog.com

Netdata makes real-time system metrics too easy

Papetrail manages all logs

Logs get aggregated and forwarded to Papertrail. It’s a service that we have used for many years and still find invaluable, especially when things go wrong. Loki from Grafana Labs is intriguing, we plan to explore it in the next release.

Monitored by Pingdom

Pingdom reports all changelog.com downtime via e-mail and in our #sre Slack channel. Some users appreciate our brief comments on service outage notifications. If you are curious, uptime & response times are available at status.changelog.com

changelog.com setup is declarative & idempotent

There are no infrastructure operators at changelog.com: developers are operators.

The app repository vendors all infrastructure configuration and tooling. A Makefile captures everything required to run and manage changelog.com. Anyone on the team can run make iaas and converge the entire infrastructure. All automation is declarative and idempotent, there is no risk of unexpected side-effects.

You can learn about the changelog.com make targets - some are very good! - by running make in the cloned repository. The output lists all public targets with aliases and short descriptions. To find private make targets and other goodies, browse the Makefile

The new Linode Terraform provider that Marques contributed makes it easy to manage our entire Linode setup. For DNS, we use the DNSimple Terraform provider. A Docker Stack captures all services that make up changelog.com. A local stack variant enables everyone on the team to run an exact replica of production on their machine. LastPass acts as the single source of truth for credentials. In production, credentials are exposed as Docker secrets only, no environment variables.

The single most important take-away is that changelog.com setup is declarative and idempotent. Control loops are constantly converging on the desired state. CircleCI ensures that all commits to master result in a production-ready Docker image. The App Updater ensures that the latest Docker image is running in production. The Docker daemon ensures that all services are present and healthy. If everything is in the desired state, nothing needs to happen. If a layer deviates from what is expected, control loops will reconcile any differences.

Our team does not manage infrastructure or CI, our partners do.
They are excellent at it!

The new changelog.com setup for 2019 enables anyone to contribute in a few commands. Clone the repository and run make contrib to start a development version of changelog.com. Pull request #246 has all the context.

The future changelog.com

As amazing as the new changelog.com is, we know that it can be better. In no specific order, some of the things that we have on our radar for 2020:

Kubernetes managed

When we started using the new Linode tooling, provisioning a Kubernetes cluster was not as easy as it is today. Jumping from the setup that we ran since 2016 to Kubernetes was too big of a step. We had to find a middle-ground that captured most benefits without being too disruptive to the way we did things. Focusing on what mattered to the business was the strongest reason to defer the Kubernetes dream for one more release. With all learnings gained from this round of improvements, it was definitely the right decision.

Metrics via Prometheus & Grafana

Netdata is great for real-time system metrics, but Prometheus is best for long-term & business metrics. Add alerting and Grafana visualisation and we have the perfect way of tracking all metrics that matter to us. The plan is to aggregate all metrics in Prometheus and display them via purpose-built Grafana dashboards.

Integrate logs with metrics via Grafana & Loki

With the addition of Loki to Prometheus & Grafana, there has never been a better time to run metrics and logging using this setup. We have yet to try Loki, but if Grafana is any indication of what to expect, we are in for a treat.

Improve image builds

The changelog.com Docker image is based on the CircleCI image that builds dependencies and runs tests. We would like to have a production-specific image that uses mix release. We also want to sign Docker images so that we have higher confidence in what production runs.

Automate service updates

We do not automatically build and/or update service images. For example, when there is a new PostgreSQL image released, the stack should auto-update with no service disruption. Moving to a Kubernetes cluster should make this easy. We would trust the platform’s primitives to take care of always running the latest stable version, without us having to care how it actually happens.

Roll-out HTTP/2 & IPv6

HTTPS is becoming the norm, and HTTP/2 is not far behind. Thanks to Fastly, all cdn.changelog.com requests are already served as HTTP/2, but the Linode NodeBalancer is HTTP/1.1 only. Since the Phoenix application & nginx already support HTTP/2, we are not that far from full HTTP/2 support. This would make the website quicker for all HTTP/2 clients, which means the majority of our visitors.

Even though IPv6 brings less value to our users than HTTP/2, it would be nice to do our part in modernising the internet. Our last attempt to enable IPv6 was surprisingly easy and low-effort, but it resulted in a TCP4 sockets leak in Docker. We want to give this area of the stack more time to mature before we try it again.

SSL via Let’s Encrypt

changelog.com currently uses a Comodo Essential SSL Wildcard certificate which is due to expire in June 2020. We believe that SSL is an essential building block of a secure internet that should be free, a belief shared by everyone behind Let’s Encrypt. It only makes sense to join this community of like-minded people and support them the best we can.

Improve CDN integration

Our CDN integration is partial. changelog.com is susceptible to service degradation even though the bulk of the requests are static. As rare as incidents are, serving potentially stale content is preferable to 500s.

Something else

We also know that with the always-changing open-source landscape, there will be many new tools and services appearing that will make us better at what we do. If you know any that are worth sharing, please do so in the comments below, or via our GitHub repository.

I want the same for my product/company - can you help?

Most of my time is spent working on RabbitMQ, CloudFoundry & now Kubernetes within Pivotal. I like what Changelog stands for, so I help them be more successful. I take some number of days every year to improve the setup. I have shared more about this in my Not working together talk, as well as Deploying changelog.com episode. As straightforward today’s changelog.com setup seems, it has been a couple of years in the making. If you enjoyed this post and think that your product would benefit from my expertise, reach out to hi@gerhard.io. Until next time, Gerhard.

Discussion

2019-03-05T01:43:03Z ago

Thanks for the kudos, Gerhard! I’m glad Terraform and the related tools worked well in this deployment process.

We’ve been working on the tools that will make the next deployment process even better, especially those related to Kubernetes.

2019-03-05T09:21:47Z ago

Very excited about your Kubernetes tooling, it has the potential of simplifying our setup even further.

2019-03-05T14:43:34Z ago

Thanks for all of your work on this massive effort, @gerhard! We 💚 having you part of the Changelog family.

To all the readers out there: definitely reach out to him if you need some help in this area. Our rating is two enthusiastic 👍👍s

2019-03-10T12:35:10Z ago

why do you use Elixir, Erlang, and Phoenix because As I know it is for high-speed telecommunication websites not for normal usage right?

2019-03-11T13:48:33Z ago

Erlang was originally designed and created for use on telephony applications, but it has since broadened its scope and is a general-purpose programming language and platform.

Elixir came much later and is designed for building “scalable and maintainable applications”.

Phoenix is a web framework for Elixir that helps you be productive while not compromising on speed and maintainability.

These are all good choices for “normal” websites like changelog.com. 👌

2019-03-10T14:44:36Z ago

Thanks for sharing all this. It’s great to see everything so meticulously laid out and explained. By the way - what tool you use a tool to build these elegant flow charts?

2019-03-10T15:51:51Z ago

I’ll second the flowchart question @gerhard, these are purty 📈

Also, how are you managing container communication without Kubernetes? Docker compose/swarm?

2019-03-10T16:47:51Z ago

https://draw.io for flow charts. The source xml files are in our public GitHub repo. You can open them in draw.io via e.g. File > Open from > URL... & raw infrastructure.xml.

changelog.com is captured in this Docker Stack config file. We load it into Docker Swarm via docker stack deploy: Makefile:267

2019-03-10T16:49:42Z ago

Thank you! https://draw.io for flow charts. Tried a few approaches before settling on draw.io.