SRE Icon

SRE

SRE is what you get when you treat operations as if it’s a software problem.
4 Stories
All Topics

Ops matthewtejo.substack.com

Why Twitter didn’t go down (from a real Twitter SRE)

Matthew Tejo:

Twitter supposedly lost around 80% of its work force. What ever the real number is, there are whole teams with out engineers on it now. Yet, the website goes on and the tweets keep coming. This left a lot wondering what exactly was going on with all those engineers and made it seem like it was all just bloat. I’d like to explain my little corner of Twitter (though it wasn’t so little) and some of the work that went on that kept this thing running.

This is a detailed post about Twitter’s caching system that Matthew and others built while working there, and it’s brilliantly summed up by commenter Johnny Manu40:

When everything works fine, they wonder why they hired you. When everything stops working, they wonder why they hired you. I.T. in a nutshell.

Ship It! Ship It! #69

The cloud native ecosystem

Maybe it’s the Californian sun. Or perhaps it’s the time spent at Disney Studios, the home of the best stories. One thing is for sure: Taylor Dolezal is one of the happiest cloud native people that Gerhard knows.

As a former Lead SRE for Disney Studios, Taylor has significant hands-on experience running cloud native technologies in a large company. After a few years as a HashiCorp Developer Advocate, Taylor is now Head of End User Ecosystem at CNCF. In his current role, he is helping enable cloud native success for end-users like Boeing, Mercedes Benz & many others.

Founders Talk Founders Talk #92

Enabling a world where all software is reliable

This week Adam is joined by Robert Ross founder and CEO of FireHydrant — the glue layer between your tech stack and your teams to mitigate and resolve incidents at scale.

Robert shares his journey to become a software engineer, his time at DigitalOcean, this idea of incident management as a platform and how he shifted his focus from creating courses on incident management to recognizing the value of the software he was creating for the course — what is now known as FireHydrant. We also talk through his first experience in raising capital, what happens when the bar is raised on the reliability of the world’s software, and why their mantra is “Hire great people, who build, sell and market a great product, and you’ll have a great company.”

Ship It! Ship It! #21

Learning from incidents

Things go wrong all the time. We all make mistakes. And that is okay. What is not okay, is to think that it won’t happen, or that there will be someone else around when it does. In that moment, it doesn’t matter who wrote that module, package or microservice. But there is a better way to think about this, and there is an approach that makes people actually look forward to incidents.

It all starts with thinking of incidents as opportunities to learn, and then share those learnings with everyone, so that you can all improve. In this episode, Gerhard is joined by Stephen Whitworth and Chris Evans, incident.io co-founders, and former Staff Engineers at Monzo.

They get it, we get it, and now you can get it too.

Player art
  0:00 / 0:00