Ops podcast episodes

Managing Meta's millions of machines

Anita Zhang is here to tell us how Meta manages millions of bare metal Linux hosts and containers. We also discuss the Twine white paper and how AI is changing their requirements.

Ship It! #101

Let's go back to AOL chat rooms

In this episode Justin and Autumn are joined by Mandi Walls to take you back to a time before the cloud. Before Kubernetes. When a/s/l was common and servers were made of metal. Back to the days of AOL to discuss how chat rooms worked.

Ship It! #100

Bluesky apps

Paul Frazee joins the show to tell us all about how Bluesky builds, tests, and deploys mobile and web applications from the same code base.

Ship It! #99

From Kubernetes to Nix

Why would you want to switch your developer environments from containers to nix? Ádám from LastPass has a few reasons.

Ship It! #98

Deploying projects vs products

Verónica López, Kubernetes SIG Release tech lead & distributed systems engineer, joins Justin & Autumn to share her experiences deploying services at scale.

Ship It! #97

SoCal Linux Expo

Justin & Autumn take you with them to the 2024 SoCal Linux Expo where they asked six fellow attendees about their favorite open source projects and their least favorite commands.

Ship It! #96

Productivity engineering at Netflix

What’s the difference between productivity engineering and platform engineering? How can you continue to re-platform with a moving target? On this episode, we’re joined by Andy Glover, who spent ten years productivity engineering at Netflix, to discuss.

Ship It! #95

Containers on a diet

Kyle Quest joins the show to tell Autumn & Justin all about the evolution of DockerSlim & minimal container images. Why are small container images important? What are different strategies to make containers smaller? Let’s find out!

Ship It! #94

Scoring your project’s security

Autumn and Justin are joined by Chris Swan to discuss tech industry trends like AI and sustainability, gamifying the software development process and motivating devs to write more secure code, OpenSSF Scorecards and how they offer a way to measure and improve the security and compliance of GitHub repos, the scoring system, and the security posture of a repository.

Ship It! #93

Hybrid infrastructure load balancing

Wanny Morellato & Deepak Mohandas from Kong join Justin & Autumn to discuss building, testing & running a load balancer that can run anywhere.

Ship It! #92

Shipping in SPAAAACCEEE

What do you do when your infrastructure runs 1000 miles away and you only have access every 90 minutes? Find out from Andrew Guenther from Orbital Sidekick.

Ship It! #91

Building containers without Docker

We’re back! Jason Hall joins the show to tell Justin & Autumn all about how Chainguard builds hundreds of containers without a single Dockerfile.

Changelog & Friends #27

The state of homelab tech (2024)

Techno Tim is back with Adam to discuss the state of homelab in 2024 and the trends happening within homelab tech. They discuss homelab environments providing a safe place for experimentation and learning, network improvement as a gateway to homelab, trends in network connection speeds, to Unifi or not, storage trends, ZFS configurations, TrueNAS, cameras, home automation, connectivity, routers, pfSense, and more.

Umm, should we make these conversations between Adam and Tim more frequent?

Practical AI #243

Self-hosting & scaling models

We’re excited to have Tuhin join us on the show once again to talk about self-hosting open access models. Tuhin’s company Baseten specializes in model deployment and monitoring at any scale, and it was a privilege to talk with him about the trends he is seeing in both tooling and usage of open access models. We were able to touch on the common use cases for integrating self-hosted models and how the boom in generative AI has influenced that ecosystem.

Practical AI #240

Generative models: exploration to deployment

What is the model lifecycle like for experimenting with and then deploying generative AI models? Although there are some similarities, this lifecycle differs somewhat from previous data science practices in that models are typically not trained from scratch (or even fine-tuned). Chris and Daniel give a high level overview in this effort and discuss model optimization and serving.

Changelog & Friends #10

Kaizen! S3 R2 B2 D2

Gerhard joins us for the 11th Kaizen and this one might contain the most improvements ever. We’re on Fly Apps V2, we’ve moved from S3 to R2 & we have a status page now, just to name a few.

Changelog & Friends #6

Even the best rides come to an end

On Monday, Kelsey Hightower announced his retirement from Google. On Tuesday, he sat down with us to discuss why, how & what’s next.

Along the way, Kelsey teaches us how not to suck at work, analyzes his magical demos, fights off the haters (again) & opines on System Initiative, Dagger & 37Signals moving off the cloud.

Changelog Interviews #545

Rebuilding DevOps from the ground up

This week we’re joined by Adam Jacob and we’re talking about his mission at System Initiative to rebuild DevOps. They are out of stealth mode and ready to show off their transformative new power tool that reimagines what’s possible from DevOps. It’s an intelligent automation platform that allows DevOps teams to build detailed interactive simulations of their infrastructure and use them to rapidly update their production environments.

Ship It! #90

Kaizen! Embracing change 🌟

This is our 9th Kaizen with Adam & Jerod. We start today’s conversation with the most important thing: embracing change. For Gerhard, this means putting Ship It on hold after this episode. It also means making more time to experiment, maybe try a few of those small bets that we recently talked about with Daniel. Kaizen will continue, we are thinking on the Changelog. Stick around to hear the rest.

Ship It! #89

Rust efficiencies at AWS scale

Tim McNamara is known as New Zealand’s Rust guy. He is the author of Rust in Action, and also a Senior Software Engineer at AWS, where he helps other builders with all things Rust.

The main reason why Gerhard is intrigued by Rust is the incredible resource frugality. Fewer CPUs means less energy used, which is good for the planet, and good for the monthly bill. This becomes most noticeable at Amazon’s scale, when S3, Lambda, CloudFront and other services start adding Rust components.

Practical AI #211

Serverless GPUs

We’ve been hearing about “serverless” CPUs for some time, but it’s taken a while to get to serverless GPUs. In this episode, Erik from Banana explains why its taken so long, and he helps us understand how these new workflows are unlocking state-of-the-art AI for application developers. Forget about servers, but don’t forget to listen to this one!

Practical AI #210

MLOps is alive and well

Worlds are colliding! This week we join forces with the hosts of the MLOps.Community podcast to discuss all things machine learning operations. We talk about how the recent explosion of foundation models and generative models is influencing the world of MLOps, and we discuss related tooling, workflows, perceptions, etc.

Ship It! #88

Treat ideas like cattle, not pets

In our ops & infra world, we learn to optimise for redundancy, for mean time to recovery and for graceful degradation. We instinctively recognise single points of failure, and try to mitigate the risks associated with them.

For some years now, Daniel Vassallo has been doing the same, but in the context of life & work. Daniel talks about the role of randomness, about learning from small wins & about optimising for a lifestyle that matches your true preferences,. Apparently, ideas too should be treated like cattle, not pets.

Ship It! #87

Why we switched to serverless containers

Last September, at the 🇨🇭 Swiss Cloud Native Day, Florian Forster, co-founder & CEO of ZITADEL, talked about why they switched to serverless containers. ZITADEL has a really interesting workload that is both CPU intensive and latency sensitive. On top of this, their users are global, and traffic is bursty. Florian talks about how they evaluated AWS, GCP & Azure before they settled on the platform that met their requirements.

Ship It! #86

Human scale deployments

Lars is big on Elixir. Think apps that scale really well, tend to be monolithic, and have one of the most mature deployment models: self-contained releases & built-in hot code reloading. In episode 7, Gerhard talked to Lars about “Why Kubernetes”. There is a follow-up YouTube stream that showed how to automate deploys for an Elixir app using K3s & ArgoCD.

More than a year later, how does Lars think about running applications in production? What does simple & straightforward mean to him? Gerhard’s favourite: what is “human scale deployments”?