Managing Meta's millions of machines
Anita Zhang is here to tell us how Meta manages millions of bare metal Linux hosts and containers. We also discuss the Twine white paper and how AI is changing their requirements.
Anita Zhang is here to tell us how Meta manages millions of bare metal Linux hosts and containers. We also discuss the Twine white paper and how AI is changing their requirements.
In this episode Justin and Autumn are joined by Mandi Walls to take you back to a time before the cloud. Before Kubernetes. When a/s/l was common and servers were made of metal. Back to the days of AOL to discuss how chat rooms worked.
Paul Frazee joins the show to tell us all about how Bluesky builds, tests, and deploys mobile and web applications from the same code base.
Why would you want to switch your developer environments from containers to nix? Ádám from LastPass has a few reasons.
Verónica López, Kubernetes SIG Release tech lead & distributed systems engineer, joins Justin & Autumn to share her experiences deploying services at scale.
Justin & Autumn take you with them to the 2024 SoCal Linux Expo where they asked six fellow attendees about their favorite open source projects and their least favorite commands.
What’s the difference between productivity engineering and platform engineering? How can you continue to re-platform with a moving target? On this episode, we’re joined by Andy Glover, who spent ten years productivity engineering at Netflix, to discuss.
Kyle Quest joins the show to tell Autumn & Justin all about the evolution of DockerSlim & minimal container images. Why are small container images important? What are different strategies to make containers smaller? Let’s find out!
Autumn and Justin are joined by Chris Swan to discuss tech industry trends like AI and sustainability, gamifying the software development process and motivating devs to write more secure code, OpenSSF Scorecards and how they offer a way to measure and improve the security and compliance of GitHub repos, the scoring system, and the security posture of a repository.
Wanny Morellato & Deepak Mohandas from Kong join Justin & Autumn to discuss building, testing & running a load balancer that can run anywhere.
What do you do when your infrastructure runs 1000 miles away and you only have access every 90 minutes? Find out from Andrew Guenther from Orbital Sidekick.
We’re back! Jason Hall joins the show to tell Justin & Autumn all about how Chainguard builds hundreds of containers without a single Dockerfile.
Techno Tim is back with Adam to discuss the state of homelab in 2024 and the trends happening within homelab tech. They discuss homelab environments providing a safe place for experimentation and learning, network improvement as a gateway to homelab, trends in network connection speeds, to Unifi or not, storage trends, ZFS configurations, TrueNAS, cameras, home automation, connectivity, routers, pfSense, and more.
Umm, should we make these conversations between Adam and Tim more frequent?
We’re excited to have Tuhin join us on the show once again to talk about self-hosting open access models. Tuhin’s company Baseten specializes in model deployment and monitoring at any scale, and it was a privilege to talk with him about the trends he is seeing in both tooling and usage of open access models. We were able to touch on the common use cases for integrating self-hosted models and how the boom in generative AI has influenced that ecosystem.
What is the model lifecycle like for experimenting with and then deploying generative AI models? Although there are some similarities, this lifecycle differs somewhat from previous data science practices in that models are typically not trained from scratch (or even fine-tuned). Chris and Daniel give a high level overview in this effort and discuss model optimization and serving.
Gerhard joins us for the 11th Kaizen and this one might contain the most improvements ever. We’re on Fly Apps V2, we’ve moved from S3 to R2 & we have a status page now, just to name a few.
On Monday, Kelsey Hightower announced his retirement from Google. On Tuesday, he sat down with us to discuss why, how & what’s next.
Along the way, Kelsey teaches us how not to suck at work, analyzes his magical demos, fights off the haters (again) & opines on System Initiative, Dagger & 37Signals moving off the cloud.
This week we’re joined by Adam Jacob and we’re talking about his mission at System Initiative to rebuild DevOps. They are out of stealth mode and ready to show off their transformative new power tool that reimagines what’s possible from DevOps. It’s an intelligent automation platform that allows DevOps teams to build detailed interactive simulations of their infrastructure and use them to rapidly update their production environments.
This is our 9th Kaizen with Adam & Jerod. We start today’s conversation with the most important thing: embracing change. For Gerhard, this means putting Ship It on hold after this episode. It also means making more time to experiment, maybe try a few of those small bets that we recently talked about with Daniel. Kaizen will continue, we are thinking on the Changelog. Stick around to hear the rest.
Tim McNamara is known as New Zealand’s Rust guy. He is the author of Rust in Action, and also a Senior Software Engineer at AWS, where he helps other builders with all things Rust.
The main reason why Gerhard is intrigued by Rust is the incredible resource frugality. Fewer CPUs means less energy used, which is good for the planet, and good for the monthly bill. This becomes most noticeable at Amazon’s scale, when S3, Lambda, CloudFront and other services start adding Rust components.
We’ve been hearing about “serverless” CPUs for some time, but it’s taken a while to get to serverless GPUs. In this episode, Erik from Banana explains why its taken so long, and he helps us understand how these new workflows are unlocking state-of-the-art AI for application developers. Forget about servers, but don’t forget to listen to this one!
Worlds are colliding! This week we join forces with the hosts of the MLOps.Community podcast to discuss all things machine learning operations. We talk about how the recent explosion of foundation models and generative models is influencing the world of MLOps, and we discuss related tooling, workflows, perceptions, etc.
In our ops & infra world, we learn to optimise for redundancy, for mean time to recovery and for graceful degradation. We instinctively recognise single points of failure, and try to mitigate the risks associated with them.
For some years now, Daniel Vassallo has been doing the same, but in the context of life & work. Daniel talks about the role of randomness, about learning from small wins & about optimising for a lifestyle that matches your true preferences,. Apparently, ideas too should be treated like cattle, not pets.
Last September, at the 🇨🇭 Swiss Cloud Native Day, Florian Forster, co-founder & CEO of ZITADEL, talked about why they switched to serverless containers. ZITADEL has a really interesting workload that is both CPU intensive and latency sensitive. On top of this, their users are global, and traffic is bursty. Florian talks about how they evaluated AWS, GCP & Azure before they settled on the platform that met their requirements.
Lars is big on Elixir. Think apps that scale really well, tend to be monolithic, and have one of the most mature deployment models: self-contained releases & built-in hot code reloading. In episode 7, Gerhard talked to Lars about “Why Kubernetes”. There is a follow-up YouTube stream that showed how to automate deploys for an Elixir app using K3s & ArgoCD.
More than a year later, how does Lars think about running applications in production? What does simple & straightforward mean to him? Gerhard’s favourite: what is “human scale deployments”?