Ship It! – Episode #130

Hosting Hachyderm

with Preston Doster

All Episodes

Preston Doster joins the show to tell us what it takes to run a Mastodon server with 55,000 accounts and 11,000 monthly active users.

Featuring

Sponsors

SentryCode breaks, fix it faster. Use the code CHANGELOG when you sign up to get $100 off the team plan. Don’t just observe. Take action. Sentry is the only app monitoring platform built for developers that gets to the root cause for every issue. 100,000+ growing teams use sentry to find problems fast.

System InitiativeThe future of DevOps automation (is here!) — System Initiative is an intuitive, powerful, collaborative replacement for Infrastructure as Code (IaC). The free tier is awesome (no credit card required) and you can get started in 3 clicks.

TimescalePurpose-built performance for AI Build RAG, search, and AI agents on the cloud and with PostgreSQL and purpose-built extensions for AI: pgvector, pgvectorscale, and pgai.

Notes & Links

📝 Edit Notes

Interview

One correction about the cost to run Hachyderm. Preston said it’s about $600/mo and after the interview emailed me to let me know it’s closer to $1000/mo

Chapters

1 00:00 This is Ship It! 00:37
2 00:52 Sponsor: Sentry 02:15
3 03:12 Scaling decentralized things 00:59
4 04:10 What are Webrings? 00:21
5 04:32 How Justin met his wife 03:24
6 07:56 What does Toolio do? 02:14
7 10:10 Start of Hachyderm 02:37
8 12:47 Traditional scaling 02:27
9 15:14 Extra stress 04:02
10 19:16 When did you need a CDN? 01:00
11 20:16 Where does funding come frome? 00:39
12 20:55 Nivenly's other projects 00:52
13 21:47 Nivenly spending & Hachyderm cost 02:23
14 24:20 Sponsor: System Initiative 03:31
15 27:59 The Hachyderm team 01:22
16 29:21 Redis licensing 00:30
17 29:50 Manual deployment 01:57
18 31:47 Who gets paged? 02:21
19 34:08 How many active users? 00:25
20 34:33 Next fallover 02:59
21 37:32 Most painful part of infrastructure 02:12
22 39:44 Data residency 03:39
23 43:22 Why Germany? 01:17
24 44:39 Why are DMCAs an issue? 00:59
25 45:38 The intent 00:27
26 46:12 Sponsor: Timescale 02:17
27 48:33 Blocking other servers? 02:21
28 50:54 Mastodon server blacklist? 01:47
29 52:41 Performance costs of Threads 02:48
30 55:29 Threads metadata 03:42
31 59:10 Individual blocklist & migrations 05:32
32 1:04:42 Figuring out archiving 01:05
33 1:05:47 Long term risks 00:31
34 1:06:18 Always looking for volunteers 01:52
35 1:08:10 Thanks for joining us! 01:10
36 1:09:20 Outro 01:12

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

Hello, and welcome to Ship It, the podcast all about what happens after you git push. I’m your host, Justin Garrison, and with me, as always, is Autumn Nash. How’s it going, Autumn?

Hey, everyone.

Today on the show we have a conversation I’ve been looking forward to for quite a while, because I’ve been interested in ActivityPub and Mastodon and specifically how that infrastructure runs, and how to scale that, and just how the servers talk to each other… And I like decentralized things in the non-scammy financial ways. It was kind of something that I –

I love that you had to be specific about that.

I love the internet. And the internet is the most decentralized system you can get, and it scales really well, for a lot of reasons… And then I see this Mastodon thing that’s like “Hey, we can kind of do that internet thing, but like portable, on social networks, and let people connect.” I’ve been blogging for a very long time. I used to have a list of WebRing sites that I also would send people to, if people are old enough to remember WebRings…

Wait, what’s WebRing?

WebRings weren’t like a way to communicate between sites, but it was a way to like list out like sites you’re also reading. It was like a recommendation engine for people that had websites. They’re like “I have a blog. If you like my blog, you’ll probably like these other people’s blogs”, and they would link to other people. So it was like a discovery engine before things had discovery engines. It was all like personally curated, and then no one ever updated them.

You’re like my favorite old best friend.

It’s weird, because I never grew up with a computer…

I know, I’ve had a computer since I was like – I fell in love with computers in elementary school, and you’re like “I got one in college…” How, sir? Did you just not come out of your dorm room at some point? Did you just lock yourself in there and you found every – how did you have time to date Beth? It’s a miracle that you got married.

We met because I fixed her computer.

That is the cutest nerd thing ever.

I needed a job in college, and I got a job because there was a lab across the hall – the dorm room had a lab, which is what I used all day, every day, because I had no computer myself.

Can we talk about how cute and nerdy that is? You fixed her computer, and then you were married for 20 years.

I didn’t fix her computer, though. I worked on her computer.

Did you break it?

No, I didn’t sabotage it. It was a whole thing where –

He was like “She has to come back if it doesn’t turn on ever.”

I worked at the IT departments. It was a student, IT-run thing, and so we would repair computers. We also were one of the first colleges that had Wi-Fi throughout the whole school. And so every year when people would come, they would come with desktops. No one had laptops in 2001. And so they would come with desktops, and be like “I need to get on the network.” “We’re we’re going to install a Wi-Fi card for you.”

You went to college when there were no laptops?

Yeah, we had a laptop rental program, so that people could rent laptops if they wanted to. And those came with optional Wi-Fi. It was like – yeah, it was the thing. But she came with a desktop that was an old MacTel.

Like, she carried a desktop into your –

Yeah, everyone moved a desktop. Everyone came with desktops. And she needed a Wi-Fi card to connect… And it was in our shop or whatever, and they were like “Oh, we need to go deliver this to the customer. We have to go bring it back to the dorm room.” And they’re like “Okay–” I was working the time. “Hey, where is it?” And someone else knew her. Someone’s like “Oh, it’s Beth.” I was like “Who’s this? I don’t know where this person is.” “Oh, yeah, she has a shaved head.” And I’m like “I’ve seen her around. I want to meet her. Can I please bring that computer back?” And I literally was like “Let me take over this ticket.” I delivered the computer, and I did none of the work.

Okay, but this is the cutest thing I’ve ever heard of. He was like “Bro, this is my ticket. Don’t even look at the ticket.”

I’m literally – I’m pointing over on my left and I can see the computer. I still have the computer. The computer, the reason we met. I still have the computer. It is a terrible knockoff Mac, when Apple used to license their operating system.

This is like the cutest thing ever. This is the most warm and fuzzy… It’s like, okay, I believe in love again.

That is how we met. Because I – I didn’t fix her computer, I delivered her computer after it was fixed.

That is the most adorable story ever. It’s so cute. I might have like a small tear right there. It’s so warm and fuzzy. This is the cutest nerd story.

This was way before Bumble and dating apps.

Yeah. I mean, I have to fix people’s computer to find my soulmate? This is bullshit.

If you can fix someone’s printer, that might be like a new –

I want someone who can fix my printer.

Where’s your 3D printer coming?

Never…

Sorry. Anyway. That’s not what we’re her to talk –

I did a chargeback, okay? I’m so bitter… Do you see the filament on the ground? Do you see it? Do you see it? It’s just haunting me.

Boxes of filament.

Tons of filament, but you’re sad, and you have no 3D printer.

You’re over there with like a blow dryer, trying to melt it into some sort of like figure… Anyway, sorry. That’s not what we’re here to talk. Preston Doster is on the call. You are an infrastructure architect at Twilio, which is not at all what we want to talk about.

[00:08:02.29] No, not today.

I have questions, though. What does Twilio do?

What does Twilio do? So I would call us a telecommunications company. We kind of grew up by building APIs that abstract the complexity of doing things like making phone calls, or sending SMS… We’ve grown into a lot of different channels now, and into almost like the contact center space.

Like, who uses Twilio and for like what?

I’m a paying customer, just FYI.

Yeah, yeah. So we started out really marketing to engineers. So people who wanted to build apps to send, say, SMS. And what we’ve grown into is this CPaaS, or customer service platform, really, where it’s all about – so like when you order from Uber Eats, or from DoorDash, or whoever, and you get that text that says “Hey, your dasher is waiting for your order.” If it sends it to you by SMS, there’s a potential that’s going through Twilio. Because they would use our API to abstract all that complexity. 2FA… Yeah, exactly.

The thing that I’ve been wanting - and I’ve been paying for a Twilio number for years…

Awesome, thank you.

He’s like “Thank you.”

…because I generated a number, and it’s like a buck a year. It’s like “Yeah, you keep the phone number for a year.” And it was like an L.A, a 323 number, and it was a good… I don’t remember what the number was. I’d have to look it up now. But I’m just like, every year I’m like “Yeah, I’ll pay the dollar. I’ll keep it.” It’s like a domain name, basically.

Yeah, exactly.

I love how engineers just collect domains, and phone numbers…

They’ve got all the good ones, right?

But the thing I’ve been wanting forever from Twilio is iMessage or RCS support. And I know you can’t talk about roadmaps or anything like that, but I ended up going to another one, SendBlue. I’m pretty sure they’re built on top of Twilio, and they implemented the iMessage stuff on top of like the base number handling. But it does iMessage properly.

And so I was always like “Oh, I want RCS.” Like, I just want like straight – now that Apple’s iOS supports RCS too, I’m like “How do I get a good number RCS interaction that’s a little more portable than iMessage?”

Yeah, totally. I can send you some docs on RCS stuff.

Is it working?

The last time I looked at it was last year.

Yeah, I’ll send you some stuff.

Sweet. Man, not another project… I’ve gotta finish that project I shelved a year ago. Cool. But we wanted to talk about Hachyderm. You are part of the infrastructure team at Hachyderm.io, which is one of the largest Mastodon instances that I know of. I know there are some big ones out there. I mean, obviously, Mastodon Social I think is the largest, still, by probably orders of magnitude…

I’m on your server.

You’re on Hachyderm ?

And so I wanted to talk about like – you’ve had that for a little while now. I know Kris Nova and friends started it, and it’s been going for a long time… And one of the things I loved about how you’re running it is how open you are with the data. And you’re like “This is how this works, and this is how the metrics are, and this is how we scale it, and this is –” All that stuff is really cool to see from the outside. I’m like “Oh, cool. I don’t have to run it to see your dashboards.”

Yeah, absolutely. I was super-fortunate to cross paths with Kris while she was with us, while I was at Twilio… What was it - probably about like two years ago now, when there was the large Twitter exodus…

One of the first waves.

I know. It seems to be like another one every six months.

Yeah, yeah. We’re on like wave seven now, or something like that.

But yeah, Kris had started Hachyderm literally running it out of her basement. There was an old Dell PowerEdge named Alice that got us through like the first days of Hachyderm. And then we realized with that really big wave from Twitter I think we went from something like 500 people to - I think we had a peak of like thirty six thousand over the course of like a handful of weeks. So you know, ridiculous growth. Which kind of prompted us to say “Hey, we’ve actually got to get out of the basement and build some proper infrastructure.” So like I said, I crossed paths with Kris, and she was like “Hey, do you want to come help out?” And the rest is kind of history.

[00:11:53.00] So it’s been fun writing the growth waves, especially using purely open source software, purely decentralized software like Mastodon is… And yeah, just really learning it, and how it scales, and building our infra and tuning it to serve the folks that are on it today.

She was so cool, watching her do like the different – she would do kind of like… I don’t know if it would be an AMA, but she would talk about the different infrastructure and how she was building it, and all that. That was the coolest part about Mastodon; just the different ways that she would be so open and talk about the different servers that she was using, and… She would talk about how she’s going to like do maintenance, or… You know, all that cool stuff. And it was really cool that she would –

Just live stream it. She was just like “I’ve got some stuff to do. Let me jump in front of a camera and we’re going to talk through it.”

That was the best part of Mastodon. That’s probably the only few things that I would look forward to on there.

Hm. Yeah.

So how has that – you’re using the official Mastodon instance, right? Because there are forks, there are other options. I used one for a little while that Cloudflare wrote, Wildebeest, which was all integrated into their workers, and stuff like that… And I thought it was really cool, because it was like a “serverless” Mastodon, but it was like extremely difficult to upgrade and to change things. But you’re using the official Mastodon software, which is a Ruby on Rails app, with like a Postgres database, right?

Yup. You got it. That’s kind of the core. Yup.

Yeah. And if anyone ever run Ruby on Rails, you’re going to deal with Sidekiq, you’re going to deal with PgBouncer, things like that, or like the traditional ways of scaling. Can you describe that path of like “Hey, we went from 500 users to over 30,000”?

Yeah, totally. What’s awesome is you actually hit the main kind of actors in the ecosystem. So really looking at it from bottom’s up… We went from Alice, which was one server that was hosting everything. The plan was then to break things out, and like you said, Postgres kind of at the bottom, as the database. What we ended up doing there is we purchased and we actually run metal servers in Hetzner, in Germany, in their Frankfurt data center. So Postgres sits on that. So it’s a metal server. It’s quite large. Plenty of spare capacity for growth, those sorts of things. And there’s supposed to be. And this is one of our big projects that we’ve got to do, is build up our secondary replica, so that we’ve got some good failover, good redundancy there. But that’s kind of at the bottom. And as you mentioned, as we go up the stack, PgBouncer, absolutely, for connection pooling, and making sure that we’re not overloading the database in that sense.

There are a pair of Redises that do both ephemeral caching and a bit more like permanent type caching, and that help speed up interactions through the web server. As we go up from there, you mentioned Sidekiq. Sidekiq is really the heart of how all the decentralized stuff works. If you can imagine, at any point in time in the Fediverse, so many events are happening, and you’ve got to both advertise events out, as things are happening on your server, you’ve got to be able to pull events in, that are happening in the world, and process those in a reasonable amount of time.

So lots of Sidekiq. So a good amount of our time and energy, and some of the server capacity… It was just making sure those jobs are doing what they need to go to. And we’ve got a lot of different cues involved, a lot of parallelism involved there as well…

And I think that, at least from my understanding, that’s like the core differences to me between something like Mastodon and something like Blue Sky, where all of the heavylifting of how the world is interacting is on individual Mastodon instances, where every single one of them… I like a post, I send an update, I do something - that needs to go tell all of my anyone I follow on different servers, and then I need to hear from all of those other people on all the different servers. So like on a single server, it seems like it’s pretty constrained. Like, if everyone only followed people on Hachyderm, you’re probably fine. Like, you’re not doing a lot of network traffic… I can do internal processes, and do that stuff. But once you start having dozens of other servers, or lots of people following other people on other instances, that seems to cause a lot of extra stress on your instance.

[00:16:03.19] Absolutely. Yeah. Yeah. Yeah. Federations work is fun, because - yeah, as you can imagine, as that social graph grows, you’ve got to interact with increasingly more and more and points around the world. We’ve got some folks who have some really large followings, and what’s fun is sometimes that causes spikes. You can go look at Grafana, you can go look at our [unintelligible 00:16:26.05] and see where certain events are happening just because there’s so much of that kind of chat and back and forth happening.

I remember a long time ago Twitter had – they said they had a whole dedicated server rack for Justin Bieber.

I wouldn’t be surprised.

Because of the architecture, whenever someone with a big following was sending out an update, it caused so much traffic they had to scale it up, and for a long time they were on prem, they had bare metal stuff, and it was just like “We have a rack for Justin Bieber.” And it’s just like “Wow!” That was amazing, just to think of that, of like “Okay, how often is he interacting?” it doesn’t matter. It’s just, because we want that to go out to the millions of people that want to see the thing he’s saying, and then they come back and interact with it, and all that stuff… And all of those jobs just have to process in some order. They have to coordinate, and they have to resync, and everything. Wow.

Yeah, exactly.

That’s one of the best stories, or I guess use cases in –

Data-driven applications?

Yes. Designing data-intensive applications. They talked about when they had to split the architecture for really popular Twitter users, versus like regular Twitter users… And it’s amazing how much work and how different each user was.

Tim Banks on the show just a few months ago or whatever was talking about that, with like the hot girl problem.

Well, he was talking about the Tinder version, which is wild, because we think of – it’s even just funnier, because people use travel mode so much. You know what I mean? And like just moving from place to place… Who would have thought that would be a whole different reason that you had to change architecture? It’s crazy the way that we have changed the way that we use the internet and interact with different applications, and applications for different things, and how that causes different architectures.

Am I the only one who – I’ll go download an app or I’ll go to websites and something slow or something acts weird, and I try to figure out what the architecture behind it is… Because I feel like a lot of times – I’m like “Oh…” I was like “I bet this is like a Sidekiq problem.” Like, they’re trying to scale up something – like, there’s a queue somewhere.

Trying to diagnose random apps…

I totally do it. I’m just like “I bet this is like a queuing problem somewhere, that someone is like “Oh, you didn’t scale up this portion of it”, or retries, or something like that.

Yeah. Yeah. Let’s see… So other stuff in the architecture… So Mastodon Web is a Puma application. So in our ecosystem, we actually – we only have one instance of that. And again, that’s another little mini project that we’ve got, is to build some redundancy in there, and a scale point there. And then the last edge for us is what we call our CDN edge. That’s something we’re hosting in Linode, or Linode, I guess, depending on how you like to say that. And we’ve got points of presence across the world, just for caching for speed…

Linode got bought by Akamai, right?

They did. Yes. Yeah. I’m still stuck on the old name.

Oh, I’m pretty sure the branding still stays. They have like Linode by Akamai, or something like that.

Yeah. So we have those points of presence, so like West/East U.S., Europe, Japan, and occasionally light those up.

At what point did you feel like you had to have a CDN?

So actually, when we did the move from Alice, from the basement, into kind of our current architecture, we implemented it then. And actually, a big part of it was a lot of the traffic involved Mastodon is media files. So images, movies, audio, whatever. And the stress that it put on just like the core server, even with caching, to provide that better experience… We were like “Okay, we’ve got to do CDN. We have to have something that’s closer to our folks.” And it’s distributing that cash to take the load off the core server. So that was already in the architecture there.

[00:19:52.19] If I remember correctly, when it was on Alice, the limiting – there was NFS for that storage, right?

Oh, yes. Yeah.

It was NFS on the backend that was storing that. Where is that stored now? Because that’s a lot of just data. Because it’s on a CDN, but you’ve got - what, hundreds of gigs maybe that are coming in?

Oh, terabytes. Easily. So that’s all in DigitalOcean now. So we’re using their Spaces product.

That’s right. Okay.

They’re an S3-compatible API.

Not to distract from like where this is going, but how do you guys fund this?

That is a super-great question. So we were super-fortunate in that throughout the lifetime of Hachyderm we are part of what we call the Nivenly Foundation, which is a kind of open source nonprofit foundation that Kris founded, again, about two years ago. I’m a member of the board there. And that’s kind of like the umbrella organization. And we have folks who are kind enough to contribute to us, who believe in the mission. And part of those contributions go to Hachyderm as well. So we have people who are directly contributing to Hachyderm, we have people who are contributing to Nivenly, and that’s where that comes from.

What does Nivenly do outside of Hachyderm?

Yeah, totally. So Nivenly - we’re all about… And again, this is a lot of Kris’s vision, was building a foundation that can help open source projects really identify what we call pattern of patterns. So how do we start new open source projects? How do we fund them? How do we help them self-govern, so that the contributors, the maintainers of those open source projects can retain control and direction and vision for their individual projects? So really about teaching and finding different unique ways that we can support them. So again, open source contributors can go do the open source thing, right? And the dream would really be be able to support them in such a way that maybe they don’t have to have their normal day job.

It’s really interesting.

Yeah, totally. And we’ve got a couple of a couple of projects under that right now, Hachyderm being one of them.

I know Nivenly’s been around for a little while, and Kris and I were talking about it like back in the day. Are you open about the funding for that? Because I know there’s a donation side of it, and some of it is like actual “I pay dollars for it”, and there’s some of like sponsorship; Digital Ocean is sponsoring the object storage for Hachyderm. Do you break out the where the spend goes? Is there a line that says “Hachyderm costs this much money”?

Yeah. So funny enough, I’m the treasurer for Nivenly.

So yes. So we do track all that on the backside. We have not - or I will actually even say I have not done a good job of publishing that regularly. But our intent is to publish that on a quarterly basis, and be fully transparent about what are what are the types of in-kind donations are we getting, what are the types of membership donations that we’re getting, and then where do those costs break down.

So how much is Hachyderm cost?

So Hachyderm right now probably runs us just over six hundred dollars a month. So the vast majority of that spend goes towards media storage.

Well, the CDN portion, right? Because the DigitalOcean side is sponsored, right?

It actually isn’t.

Oh, okay.

So I think right now we’re getting like a little bit of a discount. So yeah, DigitalOcean, if you’re listening, sponsor – we’d be happy to talk. But yeah, that all comes out of Hachyderm’s funding. Yeah.

So did you guys buy the hardware up front with like donations?

Yeah. So we’re actually leasing that through Hetzner. So they have actually – Hetzner has incredibly reasonable pricing, both for metal and their cloud products.

Who’d have thought?

Yeah. We lease a few servers from them, and - yeah, that’s how we’re operating today. So we actually don’t own any of the hardware at this point.

I didn’t even know that they had a cloud. Like, I knew that they did leasing for like bare metal, but that’s really interesting.

Yeah, totally. And that’s actually one of the interesting quirks or maybe features of Hachyderm… As we talked to the stack there, all these pieces are in different clouds. We explicitly chose not to go with one of the larger clouds, and put all our eggs in one basket… But the idea being that, because again, the Mastodon footprint is relatively straightforward and simple. Simple is probably the wrong word, but –

Simple if you’re familiar with Rails, right?

Exactly. So there’s effectively one application that we’re hosting. We said “Hey, we’re going to go multi-cloud, so in the event that we have an issue with one of them, we can easily flip that piece that’s being hosted there elsewhere.

Break: [00:24:12.26]

How big is the team that manages Hachyderm?

It’s a rotating cast of folks. We’ve got some core folks, I would say probably like four or five core folks who are around, and then volunteers who kind of come and go. And what’s nice about the infrastructure that we’ve set up so far is that it’s – I mean, it’s relatively stable. Probably the main stressors that would cause it to become metastable would be maybe like another Twitter exodus. Another big surge of growth. But even with that, we’ve kind of planned for the future. We have got spare capacity in kind of like the key scale points right now.

So yeah, largely the software just kind of runs . The big activities end up being stuff like major upgrades, so version bumps from Mastodon upstream… So we just did 4.3.0 last week, which is probably our biggest upgrade that we’ve done in a while… And then we did 4.3.1, which was basically none of it, because it was super-simple. But generally, the hard upgrades are upgrades that involve database migrations.

Those are always the hard ones.

Exactly. Exactly. It’s just because you’re touching the real data. There’s always a little bit of fear there, but we’ve got some good mitigations in case –

Is it still Rake?

Yeah. So it’s just core Rails stuff.

Rake migrate DB. Okay. Cool.

There you go.

Has the Redis licensing affected the way that Hachyderm runs at all?

It hasn’t… But we have had conversations about moving to something that is truly open source. And kind of in the same vein Terraform going to the BSL license… We talked about OpenTofu… Actually, at this point we’ve been – I forget which version of Terraform it is. But we’ve pinned the very last version of Terraform that was the open source license. And then again, at some point another mini project could be “Let’s go OpenTofu.”

You have multiple clouds, which Terraform is just great at – it’s just like “Hey, we’ll make this look like one thing, basically.” But then you only have a few servers in Hetzner. How are you actually deploying that? Is it like Terraform create my server or manage my server, and then like Ansible provision the application? Is it like a Kubernetes cluster on those servers? What are you doing?

Yeah, so less glorious than that. So we actually – again, another key infrastructure decision that we made pretty early on was no Kubernetes, because we wanted to avoid really the complexity that that introduces. And again, don’t get me wrong; I use Kubernetes all day at day job. But for this, we made a key decision. We’re like “Hey, it adds, layers of indirection”, especially when we think about the network and the abstractions that you have to deal with your Kube API.

And we said, “Hey, we’re going to treat this just like a simple application.” And right now - I will be super-honest - deployments are pretty manual. So it’s SSH in, go do the thing.

We’ve done some light Ansible work around kind of the post provisioning after a host boots… But yeah, for the most part it’s actually manual, with the core info around the hosts themselves being Terraform.

Yeah. You’re like “I get an IP address and I’m going to go do the thing as an admin that I know how to do.” I think a lot of people overemphasize “Everything needs to be automated.” It’s like, no. I’ve run a lot of infrastructure in a lot of places, and having manual steps is actually really important for some things…

Yeah, totally. And there are certain runbooks for activities that happen a lot. So again, mastodon upgrade. There’s a standard script that we run every single time we do a Mastodon upgrade. And it’s relatively – like, it’s not that it’s unattended, it’s just very automated, and a human watches it, and when it’s done, it’s done.

And yeah, maybe an upgrade takes – again, without a database migration, maybe an upgrade takes 30 minutes. Even when we did the fourth [unintelligible 00:31:35.17] upgrade, I think that we did that over the course of like maybe two hours. And again, largely uneventful, pretty straightforward. It just involved a database migration that took a little bit longer.

Do you guys have like a team of people that get paged for Mastodon?

We actually don’t. So the way that we’ve got it set up right now is we’ve got Uptime Robot that will kind of do our synthetic checks to our public-facing properties. That will alert into Discord. We’ve got folks kind of around the world, and we’ve got an infra channel. If somebody sees something, they will go do something. So nothing as fancy as actually getting paged.

[00:32:13.23] I’m on the infrastructure team for Southern California Linux Expo, and we have a Slack channel that has a Pager – not PagerDuty. Data Dog that’s like “Hey, guess what? The site’s down.” Someone’s like “Oh, I’m on. I’ll check it.”

Yeah, absolutely. And then, to kind of continue the observability stack again, another mini project coming up here in a second… So we’ve got Grafana. There’s a public side of Grafana. So if you actually want to look at our public dashboards, you can just go to grafana.hachyderm.io. You can see the public dashboard that kind of has like the key vitals. Not quite at the level of like SLO, more like an SLI. We’re looking at very specific health attributes of the different services that we provide.

That requires a login.

I might have to send you the public dashboard specifically. Yeah, it’s probably –

I was going to put it in the notes, but yeah.

Yeah. I’ll send it to you afterward. But yeah, you can look at kind of the key metrics. And again, we talked about Sidekiq earlier… That’s probably one of the main things that we watch, is how hot are the Sidekiq queues. Are they getting super-deep? Are they getting backed up for some reason?

What if it gets stuck and you have to kick it?

You’ve gotta do something. So Grafana kind of is that main visualization layer, backed by Prometheus for metrics… We’re manually registering servers there, for scrapes, and things like that. Not too long ago we set up Loki and [unintelligible 00:33:27.26] so they’re available on Grafana as well. And - what else from [unintelligible 00:33:33.16] Oh, the mini project that’s coming up is in Mastodon 4.3.0 there’s finally support for OpenTelemetry. So next little mini project is get [unintelligible 00:33:43.29] and start presenting that… Which is actually going to be pretty important, because as of 4.3.0 they also deprecated the StatsD, I guess, commission.

Oh… We just had Austin Parker on the show, who’s on the OTel board, which was a really fun conversation. So yeah.

Awesome. Super-cool. So that’s going to be a fun little mini project. That might be something I’d try to do this weekend, or something. So yeah, totally.

You said you’ve been going through these waves of people leaving everything, and you went from that 500 to over 30,000… How many active users are on it today?

So I think the last time I checked monthly active users was somewhere around 11,000 to 12,000. So that’s still a lot of folks. So I think total accounts was something like 55,000. That’s on the public dashboard as well. I think monthly active was like 11k or so. Yeah.

I mean, it’s been running okay, you’ve scaled up the things… The things that – like, the Redis queues and the PgBouncers… Are there other areas that you think like “This is the next thing that’s going to fall over”? If you get to if you get to 100,000 users or 30,000 active monthly, what’s the next thing that’s going to break?

Yeah, totally. I think insofar as just like scaling… It’s probably Sidekiq, where there’s probably good opportunity for us to do a bit better – maybe like reactive auto-scaling even, with Sidekiq, where it’s like “Hey, we have a predefined threshold where we go spin up another maybe machine that goes and runs X number of Sidekiq queues or replicas to go handle load whenever we’re a certain percentage behind, or a certain percentage deep in the queues.” But I mean, that generally tends to be the thing that gets stressed the most. That’s where we see things get hot. The website - I mentioned earlier there’s a project to put some redundancy there. That’s more from like a redundancy and a failure mitigation perspective, and less from like a scale or a stress point. It actually tends to do pretty okay. And then – I mean, I’m always afraid of the database, just because the data is the data… So I think there’s some good opportunities there.

Again, less from like a scaling perspective, because Postgres does a freaking awesome job of just like running so well. So it’d be more, again, from like a read replica perspective, maybe offload a little bit from the primary, but more from like a “Hey, if the big database goes down, do we have a place to to flip quickly without having to restore from backup?” That sort of thing.

How many backups do you have currently?

[00:36:07.05] So right now we do a weekly full backup. I think we’re keeping those for – I’ll have to go double-check how long we’re keeping the full backup for three weeks at a time, 21 days. And then we do a daily incremental diff. So we’re actually in a pretty good position from a backup perspective.

That is actually really good, especially for how small your team is and the money you’re spending per month.

Exactly. Well, and for us too, we can have that small team, and have some solace in the fact that “Hey, if something really, really bad happened”, like let’s say all the disks for whatever failed on that primary, we can get back and have a reasonable kind of like RPO, RTO.

I think it’s also because you have a very talented team. I don’t know if it could be run the same with the amount of people you have if you guys didn’t have so much experience.

Absolutely. Yeah, don’t get me wrong, the folks that we had building the initial architecture to get us here that actually executed on it, the folks that have been helping us over the years, even the folks who have just showed up for like a month or two, done a little bit and then gone and done the next thing - everybody’s been amazing. And we absolutely wouldn’t be here, we wouldn’t be able to run this without those folks.

Well, you’re one of the talented people, too. But…

I try to help, you know? But just think about other scale points, again… Like, I just always come back to Sidekiq, because that is, again, as the Fediverse graph grows, that’s the thing that gets really, really stressed.

Has Sidekiq and database been the most painful part of your infrastructure?

Yes. Yes, period.

I mean, Redis is pretty commonly known, and just kind of works, and then Puma is like “Yeah, Puma’s fine.”

“It’s there.” Yeah, exactly. Yeah, when I think back to incidents that we’ve had, or challenges that we’ve had, or like the scary moments that we’ve had, it’s generally been - yeah, Sidekiq queues are way backed up. What’s up? And then really figuring out like the tuning… Hazel Weakly did a lot of really great work around tuning of those queues, parallelism, number of queues, where they’re actually deployed, that sort of stuff. And then the database. It’s just like –

The Postgres.

Yeah. For Postgres. And then it was essentially just “Let’s give it a lot of power, a lot of CPU, a lot of memory, fast disks, and let’s over-provision, so that we know it’s ready for it.”

Which is interesting too, because the vertical scaling side of it when you’re using a metal server is just so much more cost-efficient than “Oh, let me scale up this EC2 instance”, which is like my cost is going to go up four times.

But that’s hard with relational databases, because they take so much CPU.

They’ll take what you give them, right?

Yeah, exactly.

So everybody’s like “We can just do everything on Postgres”, which - Postgres is a beast. But I think we’re also going to learn why we have non-relational databases or NoSQL databases, because everyone’s just like “Put everything in.” And I’m like “I think you forgot the scaling problems that people have on a relational database.” But Postgres is like the GOAT, and everything runs on it.

Yeah, exactly. And the other thing that we have to keep in mind is because they’re metal servers, if we want another one… If we say we’re going to go add another server, that is a process that takes a week.

That’s actually fast, though.

Yeah, don’t get me wrong. That’s fast. Hetzner has been very responsive, very reactive from that perspective. But it’s not “Hey, I go to EC2 and go do a run instances, and suddenly I’ve got another machine in a couple of minutes.”

Yeah, exactly.

So we’ve got to have some pre-planning and foresight there. And again, it was just like “Hey, let’s get a big box.”

Yeah. You mentioned Hazel, and we’re going to have her on the show in a couple of weeks… But she and I were talking about just like the progression of database infrastructure, and I thought one of the most fascinating things was when it was running off of Alice, one of the things that Kris kept getting calls from her ISP… Like, you’re doing a lot of traffic. You need to get you need to get a business plan, right? So that was part of the motivator of like “We actually – we can’t run this big social network from someone’s house.”

[00:40:08.18] Yeah, exactly. That was a big part of it. Transferring it to like a foundation – again, [unintelligible 00:40:12.03] basement was a big part of it as well… And then also starting to even think about things like data residency, and just thinking about the different legal jurisdictions that we operate in, where we have folks who have data… And it was like “Okay, hey, we’re going to move to the EU very specifically, as a very specific choice, and Germany as a very specific choice”, because of the protections offered there, and the strength of those laws.

I remember she posted a picture of Alice in her basement, and it was so cool… You almost forget how much servers do for social media, because just everything lives in your phone and in the cloud, and to see a physical server and everybody like being connected on it… It was so cool.

And the fact that that one physical server from someone’s house could handle 500-ish people on “Here’s my server. Like, this is fine.” Before the ISPs come knocking at the door, before scaling out, and you don’t need PgBouncer, and all that stuff… You’re like “I have enough disk capacity, with some manifest maybe, to do this.” It’s maybe not amazing, but it works.

And people underestimate, like “I want to go start a blog. I want to go do something –” Like, just go run it from your website, or from your house. You have a public IP address. You can DNS that, and you can even rotate, all that stuff. And you can just do it from your house and start, without needing to like go pay for all this other stuff.

Yeah, absolutely. And one thing the Mastodon project has done really well… I know I said we don’t use Kubernetes, or containerized Mastodon… They’ve done an excellent job of building this little micro ecosystem, where - yeah, if you run Kubernetes, you can very easily just say “I want a Mastodon”, and you go apply the deployment and you’ve got one.

Which is super-cool from like an approachability, a low barrier to entry perspective.

As long as you know Kubernetes.

As long as you know Kubernetes.

The bar is actually kind of high… But also the same with Rails, right? If you know how to run a Rails app, then you’re like “Oh, this is all familiar.” That was the whole point of Rails, was just to make it familiar to everyone. If you go from one Rails app to the other - guess what? This is going to scale the same way, and it’s going to be very familiar to debug it and to see what’s going on.

Okay, but why do they use end so many times?

Why do they use what so many times?

For Ruby, when they do like end, end, end, end, end, instead of the brackets… It drives me nuts. I just want to know who did this, and why are they so mean.

Do you want your curly braces?

Give me curly braces, yeah.

Because you close the braces. [unintelligible 00:42:40.05] I’ve spent so many hours with a missing end… Like, I have flashbacks, like…

Yes, I am not a Ruby person, but I’ve learned enough to be dangerous and to at least keep Mastodon running.

I mean, I would rather tune Sidekiq than try to dive into the JVM again. The two big apps I’ve done in my life are JVMs or Ruby, and I’m like “No, give me give me Ruby.”

I’ll take the brackets and the structure, because I hate unstructured –

You like sentence long functions?

Yes…

It’s cool. Do you want to [unintelligible 00:43:15.28]

But we know like specifically what we’re asking it to do. Okay, so what were some of the laws that made you pick the EU? And Germany, specifically. What were some of the things that motivated you for that?

Yeah, absolutely. So GDPR, obviously, is probably like the big prevailing one. And again, the type of data we have is really interesting, because in a lot of ways it’s – I’m not going to say it’s PII, in the sense it’s like personal identifiable information…

Sometimes it is, when it’s on social media…

[00:43:45.29] But it could be. People could put interesting PII out there. But it’s information that is very personal to the people who are posting it. Again, the intent was for us to be in a place where we could feel safe that maybe the prevailing government or whatever wouldn’t show up and be like “Yo, Hachyderm, you’re hosting this type of content. We’re going to shut you down.”

Mastodon instances have gone down for that. That’s not even like a theoretical “Oh, no, this maybe is happening.” It was fascinated talking to Hazel about it. One of the reasons that it’s under Nivenly as a foundation is to protect the people that are doing the work. Because it’s like “Oh, you’re the web admin on this thing? You’re coming to court.”

“We’re coming to you.” Yeah, exactly. Especially a world where it’s like DMCA takedown notices… Think of just all the different legal ways that people can go after.

Can you describe why that’s specifically an issue for something like Mastodon? …where it’s like, if I run a website, that’s not a problem.

Sure. Like DMCA –

I mean just like the content. Why is other people posting content on other servers a problem for you?

Sure. Yeah. So part of it is – you know, people can post whatever. And again, depending on the location of the server, depending on the authorities, the prevailing government, and the type of content that’s being posted, we could be put in the crosshairs where the government would be like “Hey, you’re hosting this content that in our jurisdiction is illegal, or offensive, or whatever. You need to take care of it.” Or “We may block your site” or “Hey, we may go after your company, or the officers of your company, because again, you’re potentially doing something illegal in our jurisdiction.” So finding kind of that sweet spot, and again, Germany was it when we did the move. It was super-important for us.

It seems like you put a lot of care and thought into it, which is cool.

Yeah, absolutely. I mean the intent was – it started in Kris’s basement. The intent is for this to be a thing that’s around for a really long time. And again, that’s where the technical architecture, the legal architecture, even though we’re not lawyers… We spent that energy putting all that thought in there because we wanted to be around, in five, ten, however many years.

Break: [00:46:07.02]

How do you go about – because Mastodon also caches from other servers. Like, I follow something from another server, and then I’m just going to get random content from that other server, even if I’m not following the person. And again, it’s not even just your users. Like, your users, you could kick out. Like, someone keeps posting something that’s illegal, you’re like “Actually, you can’t do that here. Go somewhere else”, and you can just ban them, basically. But other people posting things on other servers - you can’t ban the person over there. Is the only recourse to block all access to that other server?

Yeah, so that’s actually – so I’ll say it two different ways. So it’s something that Mastodon is doing well, but can continue to do better. Because they’ve actually started to build some good moderation tools, where like you said – like, locally, you can absolutely say “Hey, this member, they’re banned because they posted whatever; they’ve offended the server rules, or whatever.” But you can actually also do that for people who are not on your server. So if somebody from Mastodon.social is spamming, let’s say… Which actually happens very, very often. You can individually block that account, even though they’re on a remote server.

If you find that the server as a whole – let’s say there’s an unmoderated server, or a server with questionable content, or positions that don’t agree with your kind of code of conduct, you can actually de-federate from that server entirely. So that means anyone posting on that, any activity that’s happening on there is no longer relayed through your server.

And honestly, as much as we’re talking about the infra here, to me the real magic of Hachyderm is the moderation team.

Every social network is the moderation. That’s the thing that is important.

Exactly. Ultimately, the network and the experience for the people on your network is only as good as your moderation team, and that’s something where our folks, again, spent a lot of time and thought and energy just thinking about “Okay, what does it mean to moderate? What does the moderation experience look like? What should that look like for folks who are on Hachyderm?” …or even off Hachyderm. Because there’s always the potential that somebody joins Hachyderm and start spamming from us. How are we on top of that? How are we treating people with respect, but at the same time making sure that we’ve got good boundaries, so that we’re keeping a healthy community for our folks.

Because you mentioned that relay blocking thing - that immediately to me sounds like what I would do with a Pi-hole. Like, I’ve got this big host file, where I’m just like “No, you don’t reach any of that.” Is that like a subscription? Is there known Mastodon servers that you subscribe to, a black hole list of “We’re not allowing that here”?

Yeah, totally. So I think this is a big evolution point for Mastodon, and potentially the Fediverse as a whole, is actually figuring that out. So there are a number of services or organizations that do provide things like that, where you can say “Okay, I want to subscribe into a defederation list.” And again, this is actually a really big, active debate that’s going on in the community right now, is “Should they be allow lists? Should they be deny lists that you’re subscribing to? How do the subscriptions work, both on a technical or a social level?”

[00:51:49.19] So yes, you can… And again, depending on how your moderators like to run your servers, some people do, some people don’t. And right now it’s not something that’s built into the core – subscribing to a defederation list is something that’s built into Mastodon core right now.

Not ActivityPub.

Yeah, correct. Yeah, so Mastodon has like the Upstream Mastodon project.

Yeah. Because I know other Mastodon servers that don’t have those features, and they don’t even do relays. It’s like a super-lightweight – like, I can run this for myself. I don’t do – a lot of the things that are really heavy about a Mastodon server, I don’t do it. And it’s fine, and you don’t have to do those things, but the actual subscription, “Let’s make sure that this works across services” is mainly just for the upstream Mastodon server, official – I don’t know if it’s… I guess it’s official. It’s the one at Mastodon.social.

How does this relate to Threads. Threads is peering with ActivityPub, Tumblr is starting to do that… Are you seeing any uptick in traffic, or new places you have to scale or block or moderate because there’s these – Facebook has a lot more infrastructure than you do… And they can just throw more money and more servers at the problem to say like “Sidekiq? You just get a whole data center. I don’t care. It doesn’t matter to us.”

Exactly.

How are you – is that causing extra stress on your instances?

So far, no. Right now. That being said, particularly with Threads… And again, I’m actually not super-familiar with their internal infrastructure, how they’re set up… But they’re starting to engage in limited federation.

They’re very piecemeal about how – they’re like “We want this feature from this thing, and make you feel warm and fuzzy that we have ActivityPub.”

Exactly. Exactly. Yeah. Super early on, when Threads first announced “Hey, we’re going to federate into the Fediverse…” Again, big conversation, big controversy about “Should servers even allow that? Is it okay for Meta to start to push things into the Fediverse?” Big discussion there. From a technical perspective, just like you were saying, Justin, the concern is “Okay, well, what if they turn it on for literally everybody?”

You would just blow it out of the water.

It just destroys - yeah, exactly - the Fediverse. And some people thinking “Hey, that’s the plan. That’s how they’re going to destroy the Fediverse and take it over.” But yeah, to this point so far, again, because I think they’ve taken kind of this limited, measured approach to federation, we haven’t seen anything, from like a Sidekiq perspective, or just a general scale perspective, that’s broken anything at this point. Now, is that going to change the future? I don’t know. But so far, so good.

I heard a recent podcast with whoever’s running Threads over there, and they were asking “Are you ever going to turn on, by default, Fediverse interaction?” He’s like “Absolutely not. We’re never enabling that by default.” And immediately you’re like, okay, cool. We’re never going to get double-digit percentage of Threads users to like go find the obscure setting to say like “Yeah, federate all my stuff across all of ActivityPub, all the time.” It’s just not going to happen because it’s not going to be there by default, and they have no incentive to do that… Because they’re the ones selling the ads.

I don’t think most people even know that it exists.

Right. And that’s the point. It’s a feature to get the nerds on board.

Exactly.

Yeah. And you mentioned ads… That was always a big concern, again, with the federation model… It was like “Okay, is Meta going to start trying to push ads through federation?” And then I’m on Hachyderm, and I see an ad for something. So again, to this point I haven’t seen that. Obviously, as as the situation evolves, you might change your stance…

I never really thought about how that might work… Because by default, there’s only users that are doing the inbox/outbox sort of handshake between instances. But is there other metadata that gets sent, that they could try to push content to you to say like “This is–”?

Yeah. So depending on the scope that you’re looking at… So normally, when you log into the Mastodon you, you’re seeing your feed. It’s like “Okay, who are the people and hashtags that I’m looking to, or I’m listening to, or I’m watching?” You can actually change scopes to look at “What is happening in the Fediverse?” for all the things that your servers relay to. So in that case, they could push a post. They could have a user, quote-unquote user, that could push a post, that’s an ad. It happens to come from threads.net. And it shows up in kind of that global Fediverse view. So that could be one way.

[00:56:13.29] You just have to trust them at that point too, because they could say “This post has a million likes”, and you have no way to verify or do anything around “Oh, yeah, obviously, that’s the most popular thing in the world right now. We should show it to more people.”

Yeah. And with the way that Mastodon works right now, there is no algorithm, in the sense that things aren’t weighted… There are things like trending hashtags that might show, but in terms of like your feed and your posts, it’s linear. It’s like first in, first out. But yeah, if they say it has a billion likes, that’s what the ActivityPub message said, so it must. Right?

And I guess they could also just send you a bunch of posts that have a hashtag to get that in trending. Just like “Oh, even if you’re not showing this to people, here’s a bunch of stuff”, and you have no way to verify it. You don’t have any way, because it’s their infrastructure sending stuff to you, and you’re just like “Alright…”

Yeah, that’s the interesting thing about the federation model… Or really just the Fediverse in general. In the end it becomes a web of trust. And it’s like “Do you trust these other servers to send you things that you want your people to see?”

It’s a web of trust, and it’s affecting your site, which is the weird thing… Because my blog, my WebRing, to go back to – my WebRing, I trusted those people because I was reading their content for a while. But anything they posted didn’t actually affect me. It didn’t show up on my site. And I haven’t had comments on my websites for a long, long time because I was like “I don’t want your stuff showing up here. If I want to hear your opinion, I’m going to go out to social media, I’m going to find the opinion out there. But I don’t want it muddying the water of like –” Someone just read this – even like YouTube comments. I’m glad they hide most of them by default now, because most of them are like “This is trash”, or whatever. It’s just like, it’s going to taint someone else’s view of “Did I just read something good? Did I watch something I liked?” It doesn’t matter, because someone else that was popular said “This sucks.” And like, okay, well, maybe 9 out of 10 comments say this sucks, so I’m the wrong one. It’s like, it doesn’t actually matter. So I turn that stuff off from my sites, but that’s fascinating that you now have to trust some other website to send you something that you can’t verify, and it might affect how people see stuff that is not controlled by you, but is somehow aggregated and shown.

Yeah, absolutely. Again, it goes back to moderation, having good moderation practices, consistent moderation practices…

Moderation is key in being able to block people.

Yeah. Like, absolutely. And content absolutely does affect reputation… If we take too long to take action on something, or potentially even if we take what’s perceived as the wrong action… People might say “Oh, well, Hachyderm moderation - y’all aren’t doing it right.”

And I do find that interesting, that the Mastodon model of moderation is the server, basically. There is individual block lists, but eventually you have to land on a community server that is well run, that you trust the moderators are doing things in your best interests… And then if you’re only recourse, if that isn’t the case, you’ll find another community that you align with a little better. Mastodon doesn’t have anything – Blue Sky has block lists that are individually maintained.

Someone’s like “Hey, these are harassers. These are people that I don’t like.” If I don’t like a list of five people, you can subscribe to my block list and you can also not have those five people. But Mastodon doesn’t have something like that at the individual level, right?

So at the individual level, I don’t think so. I’d have to go double-check. I know there are scripts that will let people import them, and kind of like do individual API requests to do blocks like that.

Twitter had a similar thing. You could subscribe to like a block bot or whatever, and it would automatically – it had access to your account by going in and saying “Okay, yeah, I’m going to block all those things that this other person also blocked.”

Yeah. At this point – I know at the server level there’s not the notion of a block list, that you can just say “Hey, I want to go use Mastodon.social’s block list”, or whatever like that. That doesn’t exist. And again, there are multiple proposals for how to do that that are kind of flying around right now… And then yeah, at an individual level I don’t think that exists. Again, I’d have to double check.

[01:00:06.18] But to your point - yeah, if you find yourself in a community that’s not aligning to what you want, you can use the migration feature to move. But obviously, that’s a heavy toll.

Well, and migration isn’t “Bring your content.” It’s “Bring your metadata, bring your followers.”

Bring your identity and your followers, kind of…

And it’s not even like bring your identity. It’s like redirect your identity.

Yeah. It’s like “Oh, Hey, redirect to over here.”

Right. I equate it to kind of like if I move my house, my new house is somewhere else, and I can tell the post office “Forward my mail for 30 days.” And it’s like “Hey, keep sending me this mail if it comes here. It’s actually for me. Please just do that for me.” Do you see that happening a lot? Do you see people going between servers frequently?

It depends. I mean, that’s again the fun part about the Fediverse, is like the graph is always changing. Servers are coming and going. So I would say in general, yes. We do see that happening a lot.

And that was one of my fears, of like these servers get shut down because they’re either too expensive, legal reasons, people aren’t interested in running them… Whatever reason. And that’s like my post office shutting down when I told them to forward my mail. That post office doesn’t exist anymore. Nobody has that rule to forward your mail. Sorry. Like, you’re just a new person.

Yeah, exactly. It’s just like “Oh, Hey, X site is offline.” That domain name doesn’t respond anymore, so there’s nothing there, right?

Immediately I thought you meant like X, the site X.com.

Oh yes, sorry.

I was like “Is X down again?”

I meant that as generic domain.

Yeah. If X is down, you get more users on your site, so…

Yes. I keep forgetting that that’s the domain now.

I still call it Twitter.

I do, too.

I call it Twitter when I want to talk about features that I liked.

It’s awesome.

X is like the derogatory term… And it’s scary to lose all your like content on Mastodon.

There is an export feature that can give you your content. It gives you a zip, similar to what – I mean, because people forget that X has that, and Facebook has that. And this came out when – what was it called? Google Plus? Yeah, Google plus.

The waves, that – I forget.

Yeah. It was the social network… There was like this huge legal battle that people had to allow you to export your content. And it was like all these big social networks had this – it was called Takeout. That’s what it’s called. Google Takeout. They were like the driver of this, because they’re like “We want people to move to Google Plus, and so we’re going to make this a legal battle to make Facebook do it, we’re gonna make Twitter do it, we’re gonna make Instagram do it…” And so you can go to all those sites and there’s a legal reason, at least in the United States; I don’t know about the rest of the world. But you can go download your contents, and they give you a zip. And the Twitter export, if you export your data, you get like a little mini website, and it shows you your stuff, all in the thing. And the idea was that you could then go somewhere else and you would import that into another social network. It’s like “Hey, here’s all my stuff. Please import this for me like it was yours.”
And I know there was some projects that started up around that that would be like “Oh, we’re going to take your Twitter and put it in Tumblr. We’re going to do those sorts of things.” But it never really caught on really large, and then Google got bored, and they turned off Google Plus, and all the stuff that normally happens.

I might go export my Twitter feed…

I deleted all of my Twitter, and I kept an export. So I have an export.

Can you mass delete your Twitter without deleting –

Not anymore because of the API limits. You used to be able to. So I did it before the API limits came out. And so you can pay for a service to mass-delete your X posts… Because they’re like “We will pay to get more API quota for you.”

There’s also some free scripts you can run, that’ll be like “Hey, delete my last 2000.” And then the next week you have to run it again…

Over the next day, yeah.

Yeah. Now that I’ve moved so much over to Blue Sky, I think I might just mass-delete my Twitter, because just the shady things that that man is doing. Did you see the thing today about how he says he talks to Putin all the time? Just the amount of data that he has on all of us, and…

[01:04:02.29] Yeah, I deleted 16 years of content. I was on that site, I really enjoyed it, and I was just like “Nope, not doing it. I have to get out.”

I just don’t want to contribute to the data and to the things that he’s doing that are wrong, especially after just seeing like how much icky stuff shows up for my business account that follows nobody… It’s just very clear where we’re going.

I am still waiting to shut down my business account. I run it, my company’s X account. I’m like “As soon as we get more followers somewhere else, I’m just shutting this down. I don’t want it here.”

Yeah. As soon as my Blue Sky started to outpace Twitter, I was like “No. It’s icky. I don’t want it anymore.”

Done. Let’s go. Yeah. Going back to – you know, the largest cost for Hachyderm is generally storage, and it’s media storage. So one thing we’ll have to figure out probably sometime soon is what’s the cost of infinite, unbounded storage.

If we just keep everything forever, we wouldn’t be able to run Hachyderm. So at some point, what does archiving look like? And then what’s the user experience impact? It’s like, okay, if I wanted to go export 16 years of Hachyderm history, do I lose the first 14 years of images and things that I posted?

That’s true.

Yeah. That would make me sad.

Yeah, exactly. But then at the same time, we can’t keep that stuff forever, because then we’ll be paying $10,000 a year on S3 storage or whatever.

Yeah. That’s fascinating. Like, as you want it to live for a long, long time, and that storage isn’t going to get less…

Yeah. It’s like, what’s that balance.

I mean, there’s definitely a balance of how much do you keep in a CDN, but there’s still a balance of like that hot, warm, cold sort of storage tiers. Are there any other like long-term risks, you think?

Other long-term risks… I mean, from a customer experience perspective, or a consumer experience perspective, I think that’s probably the main one. The cost of storage just keep going up. Other big stuff could be political jurisdiction, legal jurisdiction type stuff changing… That’s always a thing. To me, those would be the big ones. But I think as far as just like core infrastructure, it’s like, we can run this thing for a while. Always looking for volunteers… Again, like I said, even if it’s like you’re going to rotate in for a couple of weeks, and do a little project here or there, I’m always happy to have folks come help out.

I frequently recommend people that are like “I want to get into technology. I want to do infrastructure. I don’t have any experience.” I’m like “Go find an open source project like Hachyderm. Help out.” I send people to the Fedora project, the Kubernetes project… I’ve sent people over to Hachyderm… I’m like “If you want to know how this stuff runs, there are people in Discord that will literally just sit with you, let you watch them do some work, and then hand over the keys and say “Hey, let’s do this together.” And then you’re going to get like actual professional resume experience that you’re like “I helped with this project scaling”, whatever reason.

So if you’re trying to get into this space, there are open source projects, there are foundations like Nivenly, that create and run infrastructure on the public internet that will be beneficial for you to understand, and you can get it for free if you just spend your time there.

It’s a great learning experience.

Absolutely. Absolutely. So yeah, if you want to do Linux, sysadmin work, Postgres work, Ruby on Rails scaling, distributed systems scaling… Yeah, come on.

Yeah. I see it so often on LinkedIn, Reddit, wherever, people are like “How do I get into infrastructure with no experience?” I’m like “Go find some experience. You can do it at your home lab, that’s fine… But open source projects are asking for help.” And you can go help out. You can go say “I’m going to show up here once a week, twice a week, whatever, whatever time you have”, and they’re like “Great. Here’s some docs. Let us know where there’s problems in the docs.” Even that’s helpful in so many cases.

Yeah. Even just documentation work.

If I move out of engineering, I think I’ll definitely join one of the Kubernetes release teams just to like not lose experience doing [unintelligible 01:07:58.20]

Yeah. Kubernetes - I was on the release team before, and it’s a lot. There’s so many sub-projects that roll up into this thing. Yeah, it’s a lot of fun. So Preston, thank you so much for coming on the show, thank you for talking to us all about Hachyderm… If people want to find you, what’s your Hachyderm handle? Where should they…?

So I am esk@Hachyderm.io. I don’t post a whole heck of a lot. And when I do, it’s usually like synthesizer stuff.

No one can see your background right now, but I was looking at all those synthesizers and wires back there. I’m like “That looks like fun.” Like, I don’t need another hobby…

Yeah, it’s a fun hobby. It’s kind of like my medium in between doing I guess like real world engineering, where I get to work with electricity and stuff like that, and kind of blends it with creativity, because you get to sometimes create music, usually create noise.

It’s basically engineering, too.

There you go. Exactly. No difference.

You’ve described all my Git commits. This is fine.

Yeah, exactly. Exactly. It at least gets me off a computer screen, how about that?

A different keyboard.

Different keyboard. Yeah.

I don’t have a mechanical keyboard. Mine has [01:09:04.01]

Exactly. Exactly.

Alright, thank you so much for coming on the show, and we’ll talk to you again soon.

Absolutely. Thank you so much.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Player art
  0:00 / 0:00