Ops Icon

Ops

DevOps, infrastructure, etc.
13 Stories
All Topics

Pēteris Caune healthchecks.io

Healthchecks – a watchdog for your cron jobs

I've wanted this for years, but apparently never enough to build it myself: A passive monitoring tool written in Python & Django. Set up your cron jobs, backup scripts, weekly email sending scripts, nightly data import jobs etc. to ping this service when they complete. When they don't send a ping on time, you receive an alert. The service offers a generous 20 free checks before you start paying. And since it's an open source Django app, you can set it up to run on your own infrastructure too.

read more...

Josh Kalderimis Travis CI Blog

travis-ci.com now supports open source projects

Travis CI announced the merging of their worlds to combine their .org (open source) and .com (paid) efforts under one roof. Smart move! Over time we found two platforms lead to confusion for people using travis-ci.org extensively, or together with travis-ci.com ... when we decided to move our GitHub integration to GitHub Apps at the beginning of this year, we realized it was a great opportunity to dive into merging travis-ci.org and travis-ci.com into a single platform.

read more...

Evelyn Van Kelle O'Reilly Media

Strong feedback loops make strong software teams

I'm a huge fan of well designed feedback loops. In software creation, feedback loops prove to be one of the most important, often overlooked, artifacts of the development lifecycle. Evelyn Van Kelle writes on the O'Reilly Ideas blog: There is a false dichotomy between full automation and human intervention. Successful quality control combines tool-based measurement with manual review and discussion. At the end of the day, the most effective feedback loops are a mixture of daily best practices, automation, tools, and human intervention.

read more...

Hongli Lai joyfulbikeshedding.com

Netdata for simple server monitoring

Hongli Lai, co-founder of Phusion and Passenger engineer, shares his quest for an easy-to-use monitoring solution for Phusion's servers. Unlike the other solutions I've checked out, Netdata provides real-time, per-second monitoring. You can see the CPU/memory slider update in real time. Netdata also provides alerting and installs a ton of alerts by default. By default Netdata stores collected stats on the same server. This is very convenient if you are just getting started. It can also be configured to send stats to a central server. Also, go back in time to 2015 when we talked with Hongli on The Changelog #136.

read more...

Practices Icon pagerduty.com

Why write postmortems?

Postmortems are a healthy exercise to do after an incident to learn the specifics of why it happened and what needs to be done to prevent it from happening again. A good report captures the risks of current services, and helps Product and Engineering to more proactively prioritize work on services. Someone from outside your team should be able to read your postmortem report and answer these five questions...

read more...

Ops Icon blog.agilebits.com

Terraforming 1Password

Last weekend the folks at 1Password put out a tweet saying they were going down for a few hours to replace AWS CloudFormation with HashiCorp Terraform. The tweet got a lot of attention, particularly from people running or managing their services. It is like creating a brand new universe, from scratch. This post will go into technical details and I apologize in advance if I explain things too quickly. I tried to make up for this by including some pretty pictures but most of them ended up being code snippets.

read more...
0:00 / 0:00