Ship It! – Episode #128

Infosec & OpenTelemetry

with Austin Parker

All Episodes

Maybe Jira for your kids’ chores is a good idea… Probably not.

Featuring

Sponsors

SentryCode breaks, fix it faster. Don’t just observe. Take action. Sentry is the only app monitoring platform built for developers that gets to the root cause for every issue. 100,000+ growing teams use sentry to find problems fast. Use the code CHANGELOG when you sign up to get $100 OFF the team plan.

Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.

Timescale – Real-time analytics on Postgres, seriously fast. Over 3 million Timescale databases power loT, sensors, Al, dev tools, crypto, and finance apps — all on Postgres. Postgres, for everything.

Notes & Links

📝 Edit Notes

Chapters

1 00:00 This is Ship It! 00:27
2 00:52 Sponsor: Sentry 03:17
3 04:17 Pre-show 09:56
4 14:13 Welcome Austin Parker 00:23
5 14:37 Software kids 02:04
6 16:41 Deserted Island DevOPS 01:59
7 18:40 Conference schedules 02:35
8 21:22 Sponsor: Fly.io 03:32
9 24:57 Password security 02:28
10 27:26 Real world practicality 03:27
11 30:53 Nuances at scale 04:42
12 35:35 Tools make borders 04:20
13 39:55 Introducing OpenTelemetry 05:27
14 45:36 Sponsor: Timescale 02:17
15 48:05 What OpenTelemetry is doing 02:40
16 50:45 Grabbing data securely 02:02
17 52:47 Challenges building OpenTelemetry 01:42
18 54:29 System and app telemetry overlap 06:52
19 1:01:22 aparker.io and running your own website 05:54
20 1:07:16 Socials 00:36
21 1:07:51 Outro 00:52

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

I’m your host, Justin Garrison, and with me as always is Autumn Nash.

See, we ended up going two hours yesterday, and now we’re like, we started before the intro.

I think we’re just going to roll most of that intro right now… It’s a whole conversation about kids learning how to project-manage and automate.

My daughter is absolutely smitten with our smart – you know, turn the lights on and off with her voice… And she will run up to – when Mandy, my wife is on FaceTime with someone, she’ll run up and yell “Turn off the lights!” to try to turn their lights off on the other side of the phone… Because she has an aunt that will do it for her. When she comes up, she’ll –

In pretend?

Well, no, she’ll actually – Alexa lights, so when Dylan runs up, and she knows, she’ll say “Alexa, turn off the lights”, and the lights will turn off, and Dylan’s just “Yeah…”

Over the phone. That’s great.

Yeah. Because she hasn’t figured out – like, she keeps trying, but she’s just not good enough at the words yet for Siri to pick up on it…

Like, she’s trying, but Siri just does not –

Life gets real when they can understand your children, because all of a sudden you have six things in your Amazon cart that you don’t know where they came from.

Yeah, I’m not really looking forward to that moment of the diction getting good.

Don’t let them find out that there’s a fart song and a bunch of other stuff, because… Oh, my God.

Yeah, I’m probably going to have to rethink my opinion about smart homes as soon as – in another year or so.

Yeah, my kids would for an hour spend – just changing the color of all the lights. They’re “Wait, make them blue. Oh, now make them orange.”

Somebody put the longest song ever, or what is it?

The song that never ends.

Yes. On Alexa. Why, though? Why? I was “Do you just hate us parents? What is wrong with you?”

I just think about how much money – like, I have YouTube premium because we will play YouTube Kids, and it’s limited to certain channels. And I’m just thinking, how much money has Miss Rachel made off of me?

Oh, Miss Rachel is balling. Forget that. How much money has Ryan made off of all of us? That kid’s got twenty one million dollars and he’s seven.

Yeah. The economics of children’s YouTubes… I don’t to think about it, because…

I just try to ban my kids from most YouTube, because I’m tired of watching it, and it makes me mad that they really believe half of it… And I’m “That’s fake.”

Yeah… My daughter’s three, so she’s still pretty young, and it’s really just “Okay, please, watch the tablet.”

While you try to survive.

Yeah. We’re in survival mode, so it’s kind of “Okay, you get Miss Rachel, you get PBS, and you get super-simple songs. Go over here while we’re trying to do something.”

That used to be us and Blippi, but thank God Blippi now is on Amazon Prime.

Oh, yeah. We never got Blippi, thankfully.

Don’t do it. It’s so creepy.

My youngest wants Danny Go right now.

Oh, yeah. I’m hoping that we can stick to the just normal, or the not weird corner of children’s YouTube.

It’s so weird.

Yeah. It’ll probably last until she gets older and then her wonderful little friends at school will tell her about all these fun things she doesn’t get.

My youngest watches – they watch a Danny Go YouTube video in class. And so he comes home and he’s “I have to watch Danny Go.”

Great. Thank you, educational system.

[00:08:04.03] I’ve banned them to Epic, but they even watch YouTubes, and PE, and different Doug and all these different questions things, which [unintelligible 00:08:11.14] school that they’re educational… But I’m “If you can’t find it on epic or PBS, you’re out of luck.” They learned how to Google, though, and ask Alexa how to do stuff, and they’ve found out so many ways around system child locks that at this point they should hire my four year old to work for the damn NSA. He figured out PlayStation’s entire child lock thing… And he can’t play Plants vs. Zombies, and he can’t play regular games, but he’s figured out how to redownload Fortnite six times. Who made it?

The amazing thing about children is the level of motivation for highly specific tasks…

Yes. I can’t get them to clean their rooms, but let me tell you, they’ve found a way how to get on Google and Amazon through my child protection locks. And they’re “Mom, this is the Halloween outfit costume I want. And then here’s this thing.” And I’m “How did you –” Like, at first you’re “No, not that.” And then you’re “Wait, but how did you get there?” Like, it hits you…

Yeah, it reminds me of that bit that went around a few years ago where it’s some girl who was “Oh, I got banned. My parents took my phone, so I’m tweeting from my fridge.”

And I have the Twitter fridge. That’s some s**t my kids would figure out. I just sat there for a minute and I thought about how many screens are in my kitchen and the living room, because it’s one big open, like, room… And it’s like, at one point my kids were watching YouTube on the Alexa show. Then they’re watching it on the fridge, and then on the TV. And I was “You can see each screen from each wall. What are we doing?”

Oh, yeah. The worst is when they watch different things on [unintelligible 00:09:52.18]

Oh, my God. My kid does that. He’ll be watching TV…

Yup, she’ll have PBS on the TV, and then she’ll go over and get the tablet and watching Miss Rachel on the tablet, and then do one of these… Just watch the TV and watch the tablet at the same time.

Yeah. My oldest is playing Minecraft, watching a YouTube video, and on his tablet. And I’m “What are you doing?”

Amazing level of multitasking.

Okay, my kid will build a game in Scratch, while he’s watching TV, making his brother play one game that he made as his beta testers that he’s yelling at on his laptop… And I’m just “You can’t yell at your testers, bro. What are you doing?”

Your brother’s going to unionize.

I swear. I hope they do. I hope they’re “We want one piece of candy per game testing.”

Collective voice now…

Yeah, collective bargaining. It’s never too early to teach labor relations.

I’m telling you. I was like, first of all, those are your beta testers. And then he was like, they were asking for different things and to change it, and he was that’s not how you play it. And I was “Then you didn’t write it clear enough.”

Welcome to users.

He was so mad. And he was yelling at him…

“We’re gonna teach you JIRA now…”

Yes. And I was “That’s not how this works.”

You should just use this as the cold open. This is entire conversation, just drop it in at the beginning.

People are “What show is this?” Welcome to Ship It, the podcast all about your kids.

Teaching your kids about project management. It’s never too early.

Okay, but hear me out… We should do one where we just have toddlers, sitting here talking about tech and Minecraft… It would be so funny. It’ll be the faces of Ship It, but Mini Ship It, and we’ll put the headphones on…

You get one of them to have a Scratch app, and all of them talking about how they’re using it wrong.

That kid gets mad downloads on Scratch. He gets mad downloads.

A podcast of eight year olds talking about Minecraft mods would probably be utterly fascinating.

[00:12:08.20] It would also get our kids off our backs. At one point, when we all worked at the same company, we were “Okay, if we get one Minecraft server, we can get through the summer.” Why didn’t you make the server, Justin?

Because Logan doesn’t care anymore. He just went and bought one on a friend’s server.

Logan’s a grown man, okay? What did you do?

He has his first dance tonight.

He is going to a dance. I know.

He sent me a picture of Logan the other day, and I was “What grown man is with you at the hockey game?”

I’m a few years out from all that, thankfully.

Oh, it happens quickly. I have a ten-year-old, a seven-year old and a five-year-old, and he was “Mom, don’t help me get out of the car. My friends are watching.” I was “I wipe your butt. What do you mean?”

[unintelligible 00:13:00.29] just the one and she’s three, so…

Just don’t get outnumbered. Just FYI.

Trust me, trust me. After the first mental breakdown I had in the bathroom, when she was six months old, I was “We’re done.”

You’re going to have more of those.

Oh, I’m sure I’m gonna have mental breakdowns, but it’s just “You know what? It’s going to be two on one for the rest of this.”

But hear me out, though… I don’t even got to be good at video games, because the three of them will beat up on teenagers, and they carry my ass in Fortnite. It’s great.

There are advantages.

I don’t even carry groceries anymore. I instacarted it, I paid for it…

Carry me to the [unintelligible 00:13:42.02]

My kid was “Why don’t they just automate taking out the trash?” And I was “We did. I had you.”

Congratulations. You figured it out.

You are the automation. Also, it would be my kid that says “We should automate taking out the trash.” It’s like, I broke you.

Break: [00:14:11.03]

Autumn, this is our 30th show, which is awesome.

We’ve been doing this for just a little while now, and that’s been awesome… Today on the show we have Austin Parker, who - Austin, you work at Honeycomb. I don’t actually know your title.

Director of open source.

Director of open source. And we’re going to talk a lot about OpenTelemetry today, which is one of the things that I was interested to hear more about from you, because that is a topic that I know very little about, and I always want to learn more. But again, we went through this whole background of kids, and building their own stuff, and automation… And I think at the end of the day we ended up with “Kids, you’re on the screen too much, so you need to start a podcast.” And that’s how we’re going to solve some of this problem.

Well, we also have come to the conclusion that we’ve created our own baby version of the nerds that we are.

I think that that’s your prerogative as a parent, right? Your prerogative as a parent is “Okay, I have all these menial household tasks, and so I’m going to invest in a child, so that within 10 years they can take the trash out and get mad at me about it.”

I asked one of them to clean the room… You would have thought that I was murdering them. And I asked my son to wear a jacket to school. It’s 45 degrees outside. He was so mad that he couldn’t wear shorts… And I was like “I’m sorry that I’m trying to keep you from dying of frostbite.” He was like “I’m going to get heatstroke when it goes up to 50 today”, and I was like…

Kids and users have a lot in common.

It’s like arguing with tiny terrorists.

Yeah. You can fire customers, but you can’t fire your own children, which is…

I know…! Bad decisions were made. This was not thought out.

[00:15:55.15] But it reminds me of all those times you see people on Twitter, or whatever… It’s like “Oh, we both work in software, and so we have a Asana or a Jira for our household.” I’m just like “Who hurt you?”

Okay, don’t get me wrong, at one point I did make my kid write a one-pager on why he wanted a bearded dragon… And then I was like “Oh, we’ve gone too far.”

No, that’s too much.

I think you’ve gone officially beyond the pale when you start doing OKRs for your children… Which is honestly such a great – I mean, it’s terrible that I said it… I’ve birthed this into the world, and at some point we’ll probably end up doing –

Someone’s going to do it now. It’s your fault.

It is. Which is really the story of my life, is –

You say something and regret it later when it happens?

That’s how the Deserted Island DevOps happened. I made a joke on Twitter, and then I’m the Animal Crossing DevOps conference guy.

Is that still going?

Wait, Animal Crossing DevOps? Go back!

Deserted Island DevOps. It was the world’s first virtual DevOps conference held in Animal Crossing. We ran three years during the pandemic.

I’m going to buy a Switch right now.

You can watch it on YouTube. Deserted Island DevOps on YouTube. You can watch all the talks. It was great.

Were they animals?

I mean, we used the multiplayer in… Oh, God, whatever the newest one is. And people would bring their villagers onto the island, and then I set up a little room that looked a conference room, a little podium… And they would give their talks, and I livestreamed the Switch, and muxed in the audio from them talking over Zoom, and then overlay their slides… It’s very cute. Go look it up on YouTube.

This is the most amazing thing I’ve ever heard of.

We’ll have it in the show notes.

I was just going to say, these are going to be the most fire show notes ever.

Yeah, DesertedIsland.Club, or on YouTube, Deserted Island DevOps. Everything should still be up there. But yeah, we ran the last one a couple of years ago, and it was very fun. It was an inversion where because people were traveling again, we found an island in Michigan called Mackinac Island, that had a hotel on it, and an event space. And so all the speakers came in, and the speakers were all in the same place, and then we still livestreamed the Switch, but people were all in the same room together… It was a nice way to kind of put a bow on that little experiment.

You should bring Animal Crossing DevOps to DevOps Days LA at Scale, and it would be so much fun. I’ll just sit there and play Animal Crossing.

It might come back at some point. I keep meaning to go out to Scale, but the DevOps Days LA - it’s just never at a great time for me.

March in Pasadena, though. I mean, come on.

It’s the perfect time.

It’s usually either so close to KubeCon, or it’s scheduled against KubeCon EU, that the travel doesn’t work out.

Yeah. You’re traveling twice for two different conferences, in different directions… It gets difficult.

Yeah. Especially Europe, and then back, West Coast… I live on the East Coast.

The whole time I was like “Take me to Paris with you”, because everybody was going to KubeCon Paris. I was like “Put me in your suitcase. I just want a croissant.”

The croissants were really good.

Dude, everybody was eating croissants on Twitter the next week, and I was like “I hate you all.”

The croissants were really good. Don’t worry, we’re in London in 2025, so… Much love to all my friends in the UK, but British cuisine is an oxymoron.

I wouldn’t be as excited about the food.

Lovely people, though.

I do want to go to London. I want to go see all the sights.

I just went a few weeks ago and I drove a Mini Cooper, a classic Mini Cooper for the first time. Rented one. Fantastic.

Did you drive on the opposite side of the road, too?

I did. The steering wheel was on the right side, we were driving on the right side… It had a stick, it was manual, so I had to shift with my left hand for the first time… And I did okay. I think I got my first speeding ticket. I saw a flash of light as I was going off to return. I’m still waiting for it to come.

[00:20:07.08] It’s like, “Wait, how are we not sure?”

Yeah. They have a lot of automated speeding ticket cameras throughout.

They love their automated cameras over there.

Yes. And so I was going to return the car, and I went a little faster on what I thought was a highway… But it turns out the speed limit was not highway speeds. So I saw a flash of light, and I’m still waiting to see if the ticket comes in the mail.

You went to London and this is the story that you came back with? You’re like “I went and I got my first…”

If they get it across the pond to you, then that’s impressive.

I mean, it’s licensed under the rental company’s name, so I told them, I was like “Hey, by the way, I may have got a ticket. I saw a flash of light. If you get it, email it me.” I was like “If they have a picture, I want it.”

Yeah, I want it. I want to frame it. I want to put on my wall.

It’s gotta be framed.

Justin’s the only person who will be like “Hey, I could get out of the speeding ticket, but I want the picture, so here’s my contact information. Please come after me, because I just want the picture of it.”

My first speeding ticket. It’s fine. It’s great.

I drove on the wrong side of the road, and all I got was this speeding ticket…

This is also the man that has pictures of his broken arm on the Internet.

Well, I had to put it somewhere…

What else is the Internet for…?

Break: [00:21:16.24]

Austin, I think we started this conversation on BlueSky as you were ranting about security…

And it was a while ago, and I don’t even remember what the context was, but I remember it was enough to say you wanna come on the show and talk about it.

Yeah. I think my actual rant was something around the biggest obstacle to actual security is actually just security policy and controls. It’s super-interesting to me, because if you think about – to extend the metaphor a little bit, right? Or by way of example: password policies. Everyone knows “Oh, you have the password. You can’t use the same password twice. It can’t have your date of birth in it, it can’t have your name in it, it can’t have all this stuff in it.” And for years, we just all kind of accepted “Okay, my password is gonna be this one random string I memorize, and then every month I increment something at the end, or I add a number to it, or whatever.” And it doesn’t actually increase security, because I’m using the same password everywhere. And so slowly, over time, the standards change, and now NIST says “Oh, you don’t actually have to do – like, password rotation every month for users is not recommended.” But I think you see this in so many areas of security, especially. You give someone, it’s like “Oh, we wanna make sure that people have super-locked down corporate devices, so we put all these agents, and we have all of these timeouts” etc. And what do people do? It’s like, they want to get stuff done. They see the security controls as an obstacle to getting stuff done, so they just completely bypass the security controls. They use personal devices, they use shadow IT etc.

I’m not saying that those people should do that. I don’t work in security. And you should never take my advice about it.

But I think it’s actually super-instructive to the rest of how we think about software systems and systems in general, is there’s so many best practices and things that we do and say where it’s like “Well, you just do it this way. You just do this thing, and that’s gonna solve your problem.” But the more friction that those things add to the experience of whatever you’re trying to actually accomplish, then the more people will cut around them in order to make it easier for themselves to do what they actually want to do or need to do.

And it’s not just security. It happens in observability. It happens in CI/CD. It happens in DevOps. It happens in anything you wanna point to. The more controls you put in a way, if you’re not thinking about “How do these actually fit into how work gets done?”, then you’re gonna run into people bypassing the controls.

That’s so true. Going from a solutions architect to actually being a developer in production, it is wild. When you’re trying to give people best practices, and then seeing how things work in production, and how… You want to do best practices, but at the same time, there’s so many things that don’t work the way that best practices and architectures are usually diagrammed. I almost wish people had more thought into that, because they make it really hard when they kind of imagine these controls, or these – I don’t know how you’d explain it, but these different advice.

What the ideal world would look if we could implement this thing.

[00:28:03.13] Yes. It’s almost doing it in a world that doesn’t actually exist, and I’m like “Can we get something that’s a little bit more reality-based, so that way we can actually follow those rules?”

I mean, a lot of it is just intro to physics stuff. When you think – go back to high school or college, and you think about physics education, where it’s like, okay, assume a frictionless plane, and then do this math, or this formula. But it’s like, okay, but in the real world I can’t assume that. And the reason that we say “Assume we’re frictionless”, when we’re teaching people, when we’re educating, I think a lot of times we do we want to take that complexity, or we want to take these extrinsic forces and sort of say “Okay, set that aside for a moment.” And the hope is that by taking that part away, by taking all that complexity away, we can get at what’s really important, like calculating velocity, or these other really core foundational concepts.

The problem is that in software especially, I feel we kind of – we miss that second part, which is why do we assume a frictionless plane? We assume a frictionless plane because if you don’t get that higher-level understanding – the friction, the plane is not actually important to the concept you’re trying to get through, like how to calculate the velocity, or decay, or whatever, based on mass. And if you don’t understand that concept, then you’re never going to get to the point where adding in the friction and adding in hills and all that other stuff is going to actually make a difference.

But then we go into the software world and we start out by assuming everything must be a frictionless plane. It must be a perfectly spherical cow. Security, observability, logging, deployment, whatever you want to point at, we package all of our lessons up in this way that is completely ignorant of the context surrounding how work gets done at organizations, how different one system is from another, even systems built in the same tech stack, and what are the things that are not just driving value, but the decisions that lead to what gets prioritized, what drives value, not just for a customer, but what drives value in the engineering organization. What is the thing that gets you promoted, or gets you a good review, or hits your OKR. All of that - I think that’s the plane. That’s all the extrinsic forces bouncing up against our best practices. And unless we kind of think about that as engineers and as people working in production or people talking about how you should work in production, then we’re just going to keep running into this problem of telling you all the 101 stuff over and over again, and then when it doesn’t work, being like “Well, you didn’t do enough best practices.”

“You’re doing it wrong.”

“You should have best-practiced harder.”

I think that’s the most frustrating part about being an engineer. There’s so many things that will teach you the concepts of writing code, or the concepts of doing general things, but there’s so many more nuances and contexts that you need to actually write production code at scale… And we do a horrible job at teaching people that, and just giving people the documentation. And sometimes you run through documentation and you’re like “Did you even do this to see if it worked before you released this on the ,internet and told people this was going to be how people use your product?” Some of the most successful databases are successful because they have good documentation and it’s easy, not necessarily because it’s the best tool for it. You know?

Especially docs. They make so many assumptions about – they all have to focus on an individual, because there’s one person reading the docs… And I had this problem over and over again at Amazon, where all of our docs were like “You should be able to go do this thing and deploy it to Kubernetes.” And I’m like “Wait a minute… This works for me at Amazon, because I’m an admin on this AWS account.” Go to any other company of someone trying to do this and they’re gonna have to go email four people to get access to do that thing that you just told them “Oh, this will be easy.” Like “This is one step.” I’m like “No, no, no.” Let’s break this up into “I have your admin account credentials and you have this amount of access to a cluster. Now go follow your docs and tell me where it’s going to fall apart”, because you have to switch back and forth to an organization complexity of like now there’s four teams.

[00:32:12.09] One manages the database, one manages network, one manages the cluster, one’s AWS account. Now how are you doing this? None of that practice works anymore, and that’s exactly also what you’re talking about. Now I have to make a friend over on that other team and say “Hey, can you do this for me? I’m not gonna file a ticket. I’m not gonna go through all the steps.

I just need to get this done.”

I think there’s a really interesting corollary here with stuff like Open Policy Agent in the –and again, I’m not a security person, but OPA, in my understanding, is a way to write manifests about security policies for code. And I think there are analogs to this from various clouds. And it’s actually - it’s a cool idea. It’s basically like, okay, here’s a manifest file that describes all the different sort of security stuff for this service… And what ports should it talk on, and what routes should it have, and what is the ACLs etc. But when you get down to it, a lot of it looks like a bunch of security people kind of went and looked at all this code that the developers were writing and said “Hey, I don’t really understand all this code. I want to take all the things that I care about, I want to put them in this little security box over here, and then I want to write tooling that interacts with the security box. And then I’m gonna make all the developers figure out how to translate what they’re doing into things that fit in my security box.”

And in a lot of ways, I think that the legacy of sort of the cloud transformation has been that, in a lot of ways, where we build everything on top of abstractions, but we try to make the lines between those abstractions a little less gray, a little less porous than maybe they would have been in the past, so that we can say “Okay, build and deployment - those are CI/CD problems. Those are DevOps problems, and they go in the DevOps box, and the DevOps people use this stuff. And security stuff - that all goes into the security box. And the dev stuff goes into the dev box. And in doing so, we’ve kind of lost the entire lesson of agile and DevOps to begin with, which was it’s about working code, over process. It’s about responding to needs. The entire idea of DevOps is –

Reducing friction.

Reduce friction. Like, you should own your code as a developer, and you should own it on its entire journey into production, and even after production. I wrote a blog about this a while ago… I think I called it “The commodification of DevOps.” And SRE is going through this similar sort of thing, where you’re seeing now – you see AI-powered SREs, that you just turn all these AI agents open on your cloud account and they’ll do all this stuff for you. Every single movement in software engineering and software design and architecture and a lot of this stuff is really this super-interesting cycle of people, human beings get this great idea of “Hey, what if we didn’t have all of these policies and procedures, and we didn’t put everything into these boxes and we just said “We’re going to make good software and we’re gonna ship it to people and we’re gonna own that entire process?” And then them doing that and getting good results, and then someone coming in later and saying “But what if I could make money off of this process that you have created?” And coming up with things like the Deloitte agile train, coming up with AI-based SRE, coming up with observability in a box.

Someone will always make money off your complexity.

Right. Anytime you introduce abstractions, anytime you create value, someone will try to multiply that, and will capture that value for themselves.

Charity I think has always said this, where tools make borders. It’s like, if this team’s using this tool and that team’s using that tool, you now have two different teams.

Which is so weird, because as a solutions architect, we would talk to companies, but there were a lot of big companies that you couldn’t – the two teams could not talk to each other. They couldn’t know that they were both using the same database, or different databases… And I’m just like, so you will pay enterprise support money to get advice when you could just talk to each other within your own teams…

[00:36:06.05] Oh yeah, no, absolutely. Consultants make so much money just because they can transcend the borders of the teams.

Yes. It baffles me. I think abstraction and complexity are all part of the game, and sometimes they can make things better, but it’s wild how we are creating our own abstraction and loss of context. Even some ways that we use automation is loss of context, you know what I mean? There’s so many ways that we’re losing context with abstraction, automation, and just stupid rules that we’ve put in place. And it’s wild, because humans made this worse, for other humans, intentionally. I don’t understand.

So just by way of background, I used to work for a company called LightStep, that a couple years ago got bought by ServiceNow, and it was actually super-interesting… I stayed there for a couple of years, and it was super-interesting getting to talk to ServiceNow customers. Because if you’re familiar, ServiceNow is a big enterprisy place, and it sells to big enterprisy places. Fortune 100, massive multinational orgs. And talking to them about observability was super-fascinating, because they would be talking about like “Oh yeah, we have incidents –” It’s like, their incidents would run for weeks. It wasn’t like “Oh, this is an incident, and we fix it in an hour or a day.” It’s like “No, this has been going on for like two months.” And I’m like “Whaaat?”

“We’re working towards our first nine. We’re going to get there.”

And then you go and it’s like “Well, talk to me. How does it happen? How do you solve these?” And the most illuminating thing was talking to someone and they were talking about like “Yeah, so –” This was a telecom company, major telecom. And I’m like “Well, when these things happen, who’s getting alerted, who owns the problem? Who’s getting alerted?” And they walked me through, they had this very, very comprehensive incident response process, where it’s like “We have incident commanders, and these are all the different steps.”

And it’s like “Okay, I understand you have this great process. Cool. But it’s still taking you months to resolve things.” How are people getting information? How are people getting data? And it’s like “Well, yeah, that’s part of the problem, because the people that can actually change the code don’t have access to the logs.” I’m like “What?” And then coming to find out the process involves teams in different parts of the country, different parts of the globe, where literally because of various security and access controls, would go into – one team would push a change, and then the next day another team would go pull logs, redact them on their laptop, and then send snippets of those logs in a team’s message back to the first team. And that’s how they’re sharing data. Because the access controls won’t let them do it any other way. And it’s like, “I’m not entirely sure your problem is one of technology here… Just going to be honest.”

But process is so hard to change. This is a heavily regulated company. Changing these processes would require probably tens or hundreds of months in terms of work, across hundreds of employees across the globe, just to make these little changes. And it’s like “Oh, gosh, this is a much bigger problem.” This is not something that a tool fixes.

Yeah. Well, but people are trying to buy DevOps, right?

[00:39:53.24] They are. They would love to buy DevOps.

How do you then lead them into “OpenTelemetry is going to solve a problem for you”?

So the thing that I actually got out of that entire experience was that what fixes this in a lot of ways are standards. Technology itself – you can’t buy a tool and expect the tool is going to solve these kind of very thorny problems, because tools come with their own language. Tools make their own silos. But if you can get everyone speaking the same language, and you can get everyone kind of speaking the same dialect, and you can get everyone kind of aligned around what’s important, what data do we need to figure out those important things, how is that data structured, how is that data meaningful, and can we have a standard for what gives the data meaning, then that actually goes a long way towards solving the problem, because everyone is kind of – instead of being in their 40 or 50 different silos, now we’re all talking about the same thing. And just being able to talk about the same thing and having that shared layer of understanding about what data is available, what format it’s in, what it actually means, unlocks a lot of potential solutions for us to get past these kind of human problems.

Because if everyone just gets to go and do it on their own, if everyone’s creating their own logs, everyone’s creating their own metrics, their own traces, their own whatever, then you have these understanding gaps that you have to bridge. Because my logs are going to be whatever and your logs are going to be whatever. And if we kind of give everyone the ability to go and do it on their own, then my team’s going to make a metric that kind of looks good for us, and then your team’s going to make a metric that looks good for you, and everyone’s going to have their vanity metrics that they talk about… OpenTelemetry steps into this as a way to unify these various things. It says, “Hey, here’s a standard way to structure observability data. Here’s a standard API for the instrumentation of code. Here’s a standard set of semantic values that tell you what is the data you’re emitting mean, and here’s this whole ecosystem of tools, open source, commercial, whatever, that it plugs into to help you analyze and better understand your systems.” And that, I think, is the promise of OpenTelemetry, is that in the future everything’s using OpenTelemetry, and you don’t have to think about what logging API I’m using, or do I have a metric or do I not have a metric? It just exists because Otel is built in. And if Otel is built in, then you’re getting this stuff for free. And that’s the promise. And that is why I work on it, and I’m really excited about it.

Now - I mean, there’s been standards before, right? I mean, intentional or accidental standards of things Apache log format, right? The Apache log format, and Nginx log format… Those are assumed standards for a lot of people, because everything else just conformed to this not great happenstance way of accessing and grepping for logs. And it’s like, I had regex memorized for years on how to pull out certain things from every single one of those, because everything admitted a similar type format. But at the end of the day, that ends up as some dashboard that someone’s like “Oh, now I’m validating myself against the dashboard.” But then once another team is like “Oh, we want to use the same data, but I care about different things”, they fork the dashboard. And then we had a different conversation still. We’re like “Oh, at the end of the day we’re not actually talking about Apache logs. We’re talking about the metric that you care about versus the metric I care about, which was more important.”

Yeah. I do think there’s a couple of important differences in how Otel kind of approaches this. One is we have this idea of telemetry being kind of sparse, right? And being highly contextual. So what I mean by that is we don’t want to necessarily emit a bunch of telemetry about the same thing that means that it’s completely identical to each other. We want telemetry to layer on top of itself, and when we turn it from these three pillars - people might have heard of the idea of traces, metrics, logs, the three pillars of observability. The Otel concept is more like “What if all this telemetry was just kind of this one interlinked braid, where it was all self-referential, where my metrics refer to my traces, which refer to my logs, which refer back to my metrics, and I can have this kind of shared context value at a technical level that helps me associate all these different things?”

[00:44:06.21] And then also, going a step further, of saying “Okay, there’s that core base level of telemetry that all my stuff should emit. But then I do need ways to process it. I do need ways to transform it for different use cases. Maybe some of it needs to go into different – a security team wants something different than an app dev wants, which is something different than a front end dev wants, which is something different than a DevOps team wants, whatever. That data needs to go to different places. That data might need to be transformed in certain ways as it goes to those different places.” But as long as we’re all working off the same base level, and if we’re able to separate those concerns, or we’ll say “The telemetry is down here, and then these kind of telemetry pipelines are one step above it”, then it becomes much easier to kind of satisfy everyone’s needs while not fundamentally changing the nature of what we’re doing.

An important part of OTEL is the idea that we support lossless transformation. So when you create a metric in OpenTelemetry, you can actually translate that metric into other forms. So you can change from a delta, you can change how it’s aggregated, or how it’s displayed, using views, using other kind of techniques… And that is not a lossy transformation. You can actually work that backwards, depending on how you have your stuff set up.

Break: [00:45:25.10]

When I think of Elasticsearch, and it was like “Oh, I can make metrics from all of these logs that I’m gathering, and I can then roll those up into a metric”, and I’m going the opposite direction where I’m taking this really verbose, low-level thing into like “I want a number now”, Otel’s not doing that, right?

Right. Otel isn’t doing that part for you. What it’s doing is it’s giving you structured data that has all of the information you need to do that yourself. So either you as someone creating an observability platform… A good example of this is at Honeycomb we support OpenTelemetry, and we accept all this OpenTelemetry data. Now, on the backend, Honeycomb is kind of designed around this concept of arbitrarily wide events, which is not – like, Otel gives you a span. It gives you a metric, it gives you a log message. So what we can do, though, is we can use those hints that are in the data. We can look at the structure as it comes in and we can do interesting tricks to turn metrics into things that are better able to be queried in Honeycomb, by looking at that metadata that OpenTelemetry is giving us.

And other people could do this too, right? Someone could go and say – just hypothetically, let’s say a Grafana or whoever had a metric database, and wanted to only allow you to run valid queries on certain types of metrics, not letting you do a rate for a gauge, or whatever, or not letting you do certain transformations that wouldn’t make sense based on the type of metric it is… Then they could actually enforce that by looking at the hints from Otel about “What is this structured data about this metric? Okay these are the valid query types for that.” Now, not a lot of people do this. In fact, I’m not sure anyone actually does this right now… And that’s totally natural. It’s going to take a while before the analysis side kind of catches up, I think, to what Otel is providing… Because Otel is a very low-level sort of foundational part of this, and I understand why vendors and other people aren’t necessarily trying to build on top of this fast-moving open source project. Totally valid.

So when I think about what is the future looking like for Otel, it actually doesn’t look a lot today. It looks more thinking like what is the next sort of generation of observability tooling look like, which is a super-interesting question that I don’t think we’ve seen a lot of people start to explore yet.

I think observability and metrics are so underrated, and trying to troubleshoot and really have good insight into what you’re building… When you’re trying to gather all these different metrics and data and metadata, did you run into a lot of issues with trying to keep it secure, but also to get all that data and put it in one place?

It depends.. I think there’s such a tension between gathering the data you need to actually understand your system in production and understand what’s happening, and give you enough fidelity and resolution on that data to be able to kind of pinpoint problems, and also not collecting too much, right? Because if you get too much data, then - the way I like to talk about it is there’s really only kind of three states a system can be in. One is everything is absolutely fine, which never happens, but everything’s absolutely perfect and we don’t care. Or, conversely, everything is broken, and we also don’t care, because everything is broken.

Now, most of the time, over a majority of time, no system is in a “Everything is perfect” or “Everything is broken” state. You’re always in the middle, especially in a distributed system or a decentralized system. Some services will respond more slowly than others. Errors will just happen. These are things that we have to program around. We have to code defensively, and have backoffs, and retries, and pressure relief valves etc. So what is the right amount of data? It kind of depends on a lot of factors that have less to do with the data and more to do with “What do you care about and prioritize as an engineering team? What are the things that you are optimizing towards?”

[00:52:18.26] If you’ve read Charity and Liz’s book on observability engineering, it talks a lot about measuring things from the perspective of the customer. Who is your user? It’s their experience you should care about. And I really that framing, because why else are you building software, if not for people?

I think people really forget that. We get into into building the cool thing, or building something different, or what we want, and not thinking about who your actual audience and customer is.

What was your biggest challenge when building OpenTelemetry, as far as a technical challenge? Or even if it’s not technical.

Well, I’m not going to take credit for building OpenTelemetry. It’s definitely been a huge effort by hundreds and thousands of contributors, honestly. We’re the second biggest project in the CNCF behind Kubernetes. Last time I checked, we’re one of the top 25 open source projects on GitHub, just in terms of size… Which is incredibly rewarding and validating just to see how much interest there’s been.

I think the hardest part of any very large project is just kind of keeping everyone motivated and going in the same direction, especially when you’ve got something that’s – if you think about the scope of OpenTelemetry, we have APIs and SDKs in 13 different languages, we have this kind of tooling ecosystem built in Go, we have an overwhelming amount of documentation that needs to be written and updated, we have a really vibrant community, but communities need to be nurtured, and you need to give people ways to grow in that. I think definitely the hardest part has been governance and trying to keep all that energy positive, and going in the right direction, making sure that when people come to work on Otel, Otel is their first team, and not whoever they work for, especially when you’re dealing with – in the observability space, there’s a ton of companies…

Cross-company requirements, and demands, and “We’re doing this for the projects.” Yeah.

Right. Making sure that everyone kind of thinks about OpenTelemetry first, and not their own employer.

As a systems person, I always saw Otel as the – it’s the APM replacement. It’s a thing that the devs do in the apps, and it’s not for my systems. I know we want logs, but I’m usually I see that being talked about as “I want application logs.” I don’t care about system logs.

I know there are Otel agents or things that emit data from a system level. Is that still the intention? Are people implementing systemd unit files with Otel in them built in? Because there’s no such thing as a span on a systemd service, right? So I have different things that I can even do. Where is that overlap going to be? Because when the dev says “Hey, my app is getting this. In Otel I have this metric, it’s bad”, I still need to correlate that a level below into the OS.

So absolutely, OpenTelemetry is intended for both application and system-level telemetry. We have a tool called the Collector, which is kind of a Swiss army knife that will suck data up from almost anything, including Journald, and it will get host metrics, host logs… iIf you have files and you want to point it at files, it’ll slurp those up…

[unintelligible 00:55:38.19] Just get it and send it all over, right?

[00:55:43.01] Right. It’s very customizable. One thing that we can do is that we can try really hard to associate – in a collector we can actually associate those particular application-level metrics and logs and traces with system-level data as well. So either directly, if we have a context value in both places, like a trace ID or something, then you can associate that way. But we also have ways to kind of associate this concept called a resource. So in Otel you have these special attributes that are called resource attributes, and they identify unique sources. So what you can do is you have – imagine I have a Kubernetes cluster, or just a fleet of EC2 VMs, and I’ve got a hundred instances of my application running. By looking at those resources and kind of aggregating across resource, I can see “Okay, yes, there was this problem in the app, and I see which node it came from, which pod it came from”, and then I can also see “Okay, what were the system metrics and logs and so on and so forth from that resource?” And then if I have tooling for it that can kind of give me a unified view of this, then I can really easily pivot between those two things.

Again, this is something that the tools need to kind of support. You have to have an experience for this, either in a tool you’ve built yourself or something you’ve kind of stitched together. We do provide the raw data, but it’s up to consumers to kind of figure out how to take that raw data and to get it into a usable form, or do analysis with it. I shouldn’t say usable form.

JSON is very usable, right? I’ve got jq, we’re good.

Yeah. The data is usable, it’s just, it’s heavy data. It’s dense. I think that’s one of the biggest complaints we get about Otel, and I don’t think it’s wrong, is that it’s super low-level. And it’s something that we hear a lot, and we’re trying to – it’s something we think a lot about as a project from a governance perspective, and also at an individual language level, is “How do we make it easier to use?”

One thing that we started recently is this new special interest group on developer experience, which is really aimed at “Okay, what are the things that we need to do as a project to make OpenTelemetry easier to use as a developer, or as a systems administrator?” Where are we missing the mark? Because the goal is that Otel should be native to your system. It should be native to your framework. If you’re using ExpressJS, or you’re using C#, or whatever, it should just kind of be there.

And so we have to write at a low level to integrate at a low level. But a lot of people, when they think about observability, they don’t think at that low level. They think more at the “Oh, I just dropped this agent in, and it does all this stuff for me.” And we have stuff that is kind of drop-in agent and it does all this stuff for you, but it’s really more of a – that’s like a bootstrap. That’s like we had to do this for people to actually use it, because nobody’s going to go through all the trouble of writing all this instrumentation themselves. They need the Easy button.

The Hello World for observability.

And honestly, I do think there’s a pretty good argument that if you don’t have anything, then something is better than nothing. And so having an Easy button that’s just like “Yeah, this shows you all of your database calls, and your HTTP calls…” That is really useful, especially if you didn’t have that before.

I think it’s important to have different levels of easy, and then more customizable, so people can kind of use their use case in their understanding of your product.

Even going back to remembering the first time that I ran dmessage on a Linux host, and I’m just like “Oh, it’s–” I wasn’t doing anything on the server, and it’s doing a lot of stuff. I don’t know what it’s doing, but it’s doing it.

Yeah, there’s a lot going on. That’s everything in software. It’s the good and bad part about… I reflect sometimes - like, when I was a child, when I was 10, when I was younger than that, I learned BASIC, and I wrote little database in BASIC to keep track of my baseball cards. And it was fun. And I got older and I wanted to get – I had a Macintosh and I wanted to do GUI programming.

[01:00:05.00] And so I got Prolog books, I got C, C++ books, and I tried to learn this, the GUI stuff and the event loop, and I was just… Oh boy, that did not go well. But it was because it was so – it’s like, things were hard. Even with the libraries and the system-level stuff, [unintelligible 01:00:21.04] gave you, you had to do quite a bit of work yourself to write a GUI. And now it is a billion times easier.

And they work a lot better.

HTML, CSS, all this…. If I wanted to write a graphical program, I don’t need to learn super-low level [unintelligible 01:00:40.03] whatever. I can just create a div and say “Oh, here’s this Tailwind class”, and boom. I have a box. That makes it easier to use, and that’s great… But all of that creates abstractions that we might not necessarily need to completely understand what’s going on under the hood… But what is going on under the hood does matter, and it can impact performance and it can impact our people having a good experience. And so there’s a really fine line, I think, with instrumentation and telemetry of how far down do you really need to go to get the data you need to do the analysis to figure out “Are people having a good time”?

Going full circle there with your GUI Mac apps… Your website is awesome. Aparker.io is a classic Mac-looking WordPress thing. It’s on top of – it’s a theme you have that looks like a classic Mac. And we had Dave Eddy on a few weeks ago, you could curl his websites… And I love promoting anyone running their own websites, because I think that that’s still the backbone of the internet, is people that have a website and they update it occasionally, and they put some content there. Whatever it is that you want to do, you just do it.

Your website is so cool.

Actually, I’m in the middle of my – my spare time project has been rewriting that. I started doing this before the entire WordPress thing started happening… But just to go even more full circle, Justin, we started this conversation – I am here because of a conversation on BlueSky, the world’s… Hold on - currently maybe America’s number two social network in the app store. I need to check. Yeah. Interesting times in text-based social media. But one thing that’s super-cool to me about BlueSky is that it’s built in this thing called AT Protocol. And AT Protocol - there’s two things I really love. One is that it uses domains as handles.

Identity.

Yeah. Your domain is the cornerstone of your identity. And it also is very big on “You should have control of your content online.” And so I’ve been rewriting that blog to run off of an AT Proto PDS, personal data server. And it’s actually been – it’s super-interesting, because it’s, one, always fun to try out new things… But two, it gives you this really interesting vision of the future where it’s like, I could have a single server that this is my identity. This server has all these different things, and it speaks all these different protocols. If I talk to it via HTTP, I get a website. If I talk to it via some other protocol, I get microblog. And you can verify all this stuff because it’s all cryptographically signed. And so if someone sends you a link to something, you could prove that “Yes, this is a blog that I wrote.” And if I edit it, you could verify that it’s been edited.

I don’t know, everything’s been very static, I think, in terms of the internet for quite a while… And I’m starting to believe that people are kind of seeing that the very managed sort of corporate internet that we have all been using quite frequently over the past 20 years isn’t really sustainable. That you can’t actually run a business off of just selling ads. I mean, you can, but…

[01:04:11.07] I mean, but also, just owning some things on the internet makes it more fun.

Yeah. If you own stuff, if you have control of stuff, it’s good. That’s the internet. You should have control of stuff.

And with that comes responsibility.

You have to be responsible for what that looks on a global network of things.

Yeah. I think we have lost that spirit of – you think back to the ‘90s or 2000s and you go to someone’s Geocity site and you would see just amazing things. It’s why I apologize to anyone that’s really big into the idea of decentralized finance, but it’s why when you heard about people talking about the Metaverse and buying virtual real estate - I’m like, this is so antithetical to the internet. The point of the internet is that you don’t run out of space.

Yeah, don’t make it scarce. Except for IP addresses. That was just a bad –

Yeah, well… Yeah. We’ve got V6 now. We’re good. People trying to impose scarcity, I think, on the idea of space on the internet is just very – it doesn’t sit well with me, and it’s why I’m so excited about stuff like AT Protocol, and this idea of a web and an internet that you invest in. This is your space online. This is your part of this global network of human beings, and you aren’t just renting time from some large company, or a billionaire, right?

I’ll be glad to be able to invest in one place and not have to worry about losing everything every time it’s hijacked by some rich guy.

Exactly.

I mean, the most important thing of that is the place should be the thing you own. You’re not investing in someone else’s place. You have to buy a domain, and you invest in your space, and just as Austin said, if you own a website, you are constantly in a state of rewriting it, or moving it, or doing something else to it… Because that’s why it’s fun.

Sometimes I think I’ll never get it done though, if it’s all mine.

That’s fine. It doesn’t matter. It’s fun.

Yeah. And I also want to say, obviously, that’s not for everybody. I think it’s a great thing. I do think that there should totally be on-ramps for people that aren’t interested in that. But conversely, Not everyone needs a project car. You can go buy a Corolla and get around.

Some people are just fine with Uber all the time.

Yeah. Some people are fine with Uber or Po Transit. We have to support everyone in their journey, but I think that in a lot of ways that spirit of “This is my little corner of the Internet” is what needs to come back. I don’t see it necessarily as a – I see it as a spiritual thing more than anything else. Take it or leave it, but it really is – you have to have some level of investment. The Internet is a global network of human beings, and like any network, it is a community. And you get out of a community what you put into it, right? Don’t be a bad neighbor and throw your branches up against someone else’s fence. I don’t know.

I feel that. Where can people find you?

You can find me around the world at various events… You can find me mostly online on BlueSky, @aparker.io. You can find my blog at aparker.io.

What do you know? The same thing.

Yeah, the same thing. You can find me on GitHub at Austin L. Parker, and I’m on various Slacks and other places.

We’ll have some links in the show notes. Thank you so much, Austin, for coming on the show.

Thanks for having me. This was super-fun.

Thank you. It was nice talking to you.

Great talking to you both.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00