That Sinking Feeling (The #HugOps Song)
Doing ops properly is hard. Most of us are failing & learning every day. Some of us manage to have fun too.
Doing ops properly is hard. Most of us are failing & learning every day. Some of us manage to have fun too.
This sounds too good to be true, because it kind of is. There is no escaping the cloud (because of email trust) or the requirement of sysadmin’ing this setup (sending/receiving email is critical). If you slack on the details or upkeep, it’s your email.
I have been on an ongoing quest to free myself from cloud services for years now. During this time, I have hosted my personal email (
@bloomqu.ist
) on aGoogle AppsG SuiteGoogle Workspace account, which, while convenient, also means that my personal emails are at the whims of one of the world’s most privacy-hostile companies.
Don’t get me wrong – what Zach shared is quite possible, but it’s still too time consuming and difficult to host your own email. It’s untenable long-term. There’s a billion dollar business there waiting for someone to seriously compete with Google on email, and not be evil. Fastmail comes to mind. I could be wrong, but I would characterize them as being an alternative, not seriously competing with Google.
A perspective on incidents that makes a lot of sense actually, and captures the “Why?” perfectly. My highlights: Incidents involve more people than we think. Tooling just makes it really hard for them to help. We have more incidents than we realise. We just don’t hear about them. Your whole team, on the same team. Practice makes perfect.
The New Stack has a solid summary of what’s new in Grafana 8. Shiny! ✨
At Channable we use Nix to build and deploy our services and to manage our development environments. This was not always the case: in the past we used a combination of ecosystem-specific tools and custom scripts to glue them together. Consolidating everything with Nix has helped us standardize development and deployment workflows, eliminate “works on my machine”-problems, and avoid unnecessary rebuilds. In this post we want to share what problems we encountered before adopting Nix, how Nix solves those, and how we gradually introduced Nix into our workflows.
If Nix is intriguing to you, you’re going to love an upcoming episode of The Changelog. 😉
Hamza Tahir on HackerNoon:
By now, chances are you’ve read the famous paper about hidden technical debt by Sculley et al. from 2015. As a field, we have accepted that the actual share of Machine Learning is only a fraction of the work going into successful ML projects. The resulting complexity, especially in the transition to “live” environments, lead to large amounts of failed ML projects never reaching production.
Productionizing ML workflows has been a trending topic on Practical AI lately…
Chip Huyen:
While looking for these MLOps tools, I discovered some interesting points about the MLOps landscape:
- Increasing focus on deployment
- The Bay Area is still the epicenter of machine learning, but not the only hub
- MLOps infrastructures in the US and China are diverging
- More interests in machine learning production from academia
If MLOps is new to you, Practical AI did a deep dive on the topic that will help you sort it out. Or if you’d prefer a shallow dive… just watch this.
In this post I share the latest 2020 and beyond details for changelog.com’s infrastructure.
Why Kubernetes? How is Kubernetes simpler than what we had before? What was our journey to running production on Kubernetes? What worked well? What could have been better? What comes next for changelog.com? Read this post and listen to episode #419 to learn all the details.
Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can be used with any of the open source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry. It supports key/value lookup only and is designed to work in concert with logs and metrics (exemplars) for discovery.
Add this to the incredibly impressive open source portfolio at Grafana Labs.
This segment will be included in a podcast near you soon enough, but we thought it’d be fun to share the video as a standalone since we watched the whole thing play out via K9s.
kubectl is the new SSH. If you are using it to update production workloads, you are doing it wrong. See examples on how to automate application updates.
We’re using this in our new Kubernetes-based infrastructure (more details on that coming to a podcast near you). Keel runs as a single container, scanning Kubernetes and Helm releases for outdated images. Super cool stuff, and even has a web interface (which we’re not using yet, but should).
Chris Toomey shares a good idea (especially for read-heavy apps) around how you can do scheduled maintenance without taking your entire app offline (ie – Heroku’s maintenance page).
His solution is Rails-specific, but the general concept applies to any web app with similar use-case.
Everyone’s (or at least my) favorite system monitoring tool is still alive and kickin’ with a big 3.0 release. In addition to a new display option to show CPU frequency in CPU meters, optional vim key mapping mode, and many other goodies, the big news is this:
New maintainers - after a prolonged period of inactivity from Hisham, the creator and original maintainer, a team of community maintainers have volunteered to take over a fork at htop.dev and github.com/htop-dev to keep the project going.
Open source FTW!
More good news: Hisham has agreed to join us on Maintainer Spotlight!
How do you respond when someone asks:
Is Kubernetes right for us?
Where do you start? Let’s talk about IT modernisation, beginning with the problem that needs to be solved, and exploring any constraints that are obvious.
Leszek Zalewski:
The impact of COVID-19 is multifaceted. Our infrastructure team observed an exhaustion of our server resource pool for auto scaling due to a drastic traffic increase! Learn how we achieved 2× faster application run with only 1/3 of the servers by tuning auto scaling rules and switching to Puma threads.
Monitoror is a single file app written in Go. It can run on Linux, macOS, or Windows. You can view a live demo here.
This tool is surrounded by mountains of marketing speak, but it does seem like it offers a quick way to spin up different dev environments, which is cool. It has built-in recipes for WordPress, Drupal, LAMP, MEAN, and more. Here’s how you get started on Drupal 7, for example:
lando init \
--source remote \
--remote-url https://ftp.drupal.org/files/projects/drupal-7.59.tar.gz \
--remote-options="--strip-components 1" \
--recipe drupal7 --webroot . \
--name hello-drupal7
You can use these out of the box or start with a base language and mix in the things you need from there. Kinda like Docker Compose? Yeah, kinda like Docker Compose:
You can think of Lando as both an abstraction layer and superset of Docker Compose as well as a Docker Compose utility.
Arijit Mukherji on The New Stack:
We all have our favorite urban legends. From cow tipping to chupacabras, these myths persist despite a lack of definitive proof (and often evidence to the contrary). Technology isn’t immune to this phenomenon. It has its own set of urban legends and myths that emerge alongside new technologies and continue well into mass adoption. As organizations consider the shift from monitoring to Observability, I hear three common misperceptions. It’s time to debunk the myths.
Includes interview questions, notes, and useful links to other resources to continue your learning.
If you are a system administrator, or just a regular Linux user, there is a very high chance that you worked with Syslog, at least one time. On your Linux system, pretty much everything related to system logging is linked to the Syslog protocol. Designed in the early 80’s by Eric Allman (from Berkeley University), the syslog protocol is a specification that defines a standard for message logging on any system.
This is pitched as “everything that you need to know about Syslog.” From what I can tell, it might just live up to that pitch. It’s high quality and thorough.
Almost any slog can be turned into a do-nothing script. A do-nothing script is a script that encodes the instructions of a slog, encapsulating each step in a function. For the example procedure above, we could write the following do-nothing script:
Containerization technologies are one of the trendiest topics in the cloud economy and the IT ecosystem. The container ecosystem can be confusing at times, this post may help you understand some confusing concepts about Docker and containers. We are also going to see how the containerization ecosystem evolved and the state of containerization in 2019.
Put on your swimming suit, because this is a deep dive. 🏊♀️🏊
An interview with James Shubin (DevOps/config-management hacker and physiologist from Canada who works on a next-generation config-management project he started called mgmt) about his open source work and how it contributes to the cloud native landscape.
Two new terms have recently emerged around software delivery: Software Defined Delivery and Progressive Delivery. Why? How do they relate to Continuous Delivery?
Several forces today make delivery increasingly complex. Notably, proliferation of repositories, with hundreds of small projects replacing a handful of monoliths; desire for greater automation to realize the full potential of CD across multiple environments; the rise of feature flagging; and increased evidence (such as the Equifax debacle) of the need to bake security into the delivery process.
There’s another gorilla to consider for container orchestration.
Kubernetes is the 800-pound gorilla of container orchestration. It powers some of the biggest deployments worldwide, but it comes with a price tag.
Especially for smaller teams, it can be time-consuming to maintain and has a steep learning curve. For what our team of four wanted to achieve at trivago, it added too much overhead. So we looked into alternatives — and fell in love with Nomad.
From the Nomad website:
HashiCorp Nomad is a single binary that schedules applications and services on Linux, Windows, and Mac. It is an open source scheduler that uses a declarative job file for scheduling virtualized, containerized, and standalone applications.
Anyone from the community with experience using Nomad? Let us know in the discussion below.