Adam is on location at ZEIT Day talking with Jessica Rose about burnout, Henry Zhu about his passions and pursuit of open source, and Simon Willison about data and his passion for interesting datasets in the world.
Rollbar – Our error monitoring partner. Rollbar provides real-time error monitoring, alerting, and analytics to help us resolve production errors in minutes. To start resolving errors in minutes, and deploying with confidence - head to rollbar.com/changelog
Linode – Our cloud server of choice. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code
changelog2018. Start your server - head to linode.com/changelog
GoCD – GoCD is an on-premise open source continuous delivery server created by ThoughtWorks that lets you automate and streamline your build-test-release cycle for reliable, continuous delivery of your product.
ZEIT – ZEIT is on a mission to make cloud computing as easy and accessible as mobile computing. Special thanks to the team at ZEIT for inviting us to work with them on ZEIT Day. We're honored to be involved.
- Henry Zhu: In Pursuit of Open Source at ZEIT Day 2018
- Henry Zhu on Patreon
- Babel on Open Collective
- The React Podcast #4: Babel and open source sustainability
- RFC #18: Maintaining a Popular Project and Sponsored Time
I personally connected pretty deeply with your (pod) with your talk, because -- I almost said "podcast", because it's on the brain...
Because I think everybody's experienced some level of burnout, whether they admit it or not, and I kind of empathize you because you said you're getting older, and I'm also realizing that I'm not immortal and I'm getting older, so I realize that things start to hurt, or it's harder to wake up and be excited, even though I run the thing, I'm in control of what I'm doing... Aside from not really being in control.
Yeah, it's one of those things where as we're getting a little bit older - and I think part of that isn't just getting older, but I think it's getting either this space within the industry, or sort of to backbone ourselves to say "Oh no, wait, I am tired. Oh no, wait, this IS hurting me." So the folks I see really impacted by burnout and swept off their feet are the young ones who feel like they're invincible. It's like, "Oh no, I'm going to be fine. I can work these 80-hour weeks..."
Weren't you there, though? I was there at one point in my life; I was the young one who was invincible...
I've been old for forever.
I can remember pulling all-nighters often... I remember having insomnia and not being able to sleep just because, and still getting up and doing the work. I can't say I was the best at it all the time, but I remember days where I felt young and invincible...
And I think that's a really important point. When you look at doing meaningful work, your ability to stop doing meaningful work ends after six or seven - eight, if you wanna go by standard workday hours - but that ends a lot earlier, and people keep working.
I saw a really interesting -- someone said something really valuable to me, and I think they got it off the internet, where they say "If you're going at 150%, you're really just taking out a loan from your future self."
Yeah. Can you talk about your very visual -- so it's probably difficult for a podcast, but this visual process of like memory load... Can you just remind me what the term was for it?
Oh, fantastic! So I'm really into pop psychology, and I'm really into pop psychology from the '60s, because I'm a very specific kind of nerd, and cognitive psychology was sort of a -- someone is gonna hear this and fuss at me for what I got wrong...
No, I'm delighted for it... I'm pretty laid back about being wrong. Cognitive psychology was a period -- it was a specific branch of the study of psychology, and folks started thinking about and talking about the brain's organic processes and psychological processes in computer terms.
So when you talk about working memory from cognitive psychology...
Working memory, that's right.
...it's exactly what you think it is - it's the amount of memory, it's the amount of space you have available for mental processes, whereas cognitive load is the amount of working memory you have booked up; it's how many mental processes you're doing and how much of that cognitive load they eat.
[00:04:16.26] I often like to ask people to visualize having too many browser tabs open - that's eating up too much of your browser's working memory.
Right, right. Last time I actually had a bout of this - I opened up Slack with the intention of going to a particular person's private conversation to pull back... Essentially what e-mail is, very similar. I'm going into Slack to get some information I know they shared with me; I do a couple scrolls (I knew I had to do that), but before I knew it, I found out later, something else grabbed my attention, and I'm in Slack and I'm like "What am I doing?" I didn't even do the thing I was supposed to be doing, so...
I was pretty tired last night. I flew in from Houston to San Francisco, I went to bed early, when I normally don't go to bed very early - maybe 10, 11, 12, or whatever, but I was--
Wait, 12 being going to bed early?
If my wife's listening to this, she knows I'm lying, because it's more like 1 or 2... I'm stretching it a little bit. I'm just trying to like not embarrass myself, basically...
Well, it's one of those things where it's like "Do as I say, not as I do", to be like "Oh yeah, take care of yourself kids, but I'll stay up all night and worry."
My wife's like "Do you have to work tonight?" I'm like "I don't have to, but I probably could do one or two things to make tomorrow easier..." And it probably makes it a little easier, but it still probably makes it just as hard...
I love how -- listeners can't possibly catch this, but as soon as you said that, you made clearly the face your wife would have made, that sort of gentle eye roll...
Yes, yes... And she realizes that I try my best to do balance. It's tough, it's not always easy, but I can feel like I teeter on burnout... And I kind of say it's okay in seasons. Have you ever heard of this term?
I have not heard this term.
I could accept being burnt out for a season, because I know I've got a lot going on... Say, the first quarter of the year, or the first half of the year... Or some kind of commitment where I say "It's okay for this length of time", but then once I get there and I'm starting to do what you said in your talk, I start saying no to things, cutting things out, in positive ways, that don't hurt you. But I'm making it about me; I wanna get back to you, and parts of your talk.
Something you said in your talk was like -- and I think what a good message might be to share is like, it's okay to be overloaded, but to be realizing it and then say... Don't just like, eject all the things, and then screw your life up, you know? Pull out things that -- you said it in such a way, basically, that it seemed like "Don't hurt yourself in the process of saying no to things", and removing things that are bad for you to do and taking up too much of your time in your life.
Even when you're critically burnt out, there are some things that you have to do to sort of save your future self. For me, when stuff got really bad... Internet grocery shopping is absolutely my savior. Buying a bunch of "healthy" frozen burritos, getting an accountant to make sure I was paying all the things that needed to be paid, and even outsourcing some stuff like having some folks help with cleaning and laundry, which is a massive privilege, and I feel so lucky to have been able to do... But just either getting other people to help, or me myself making sure that the stuff I'm dropping isn't gonna make my life much more difficult in two months.
But I think a lot of our dear listeners may not have sort of the funds or the flexibility to do it. The advice I would still give is when everything seems overwhelming, "I've gotta do this, and the other thing's due" - really triage what's going to doom you... Well, everything feels like doom when you're overloaded, but what's going to make your life more difficult if you don't solve it. You probably don't have to call your aunt back. You probably do have to pay your tax bill. Like, yeah, just triage where you can.
[00:07:57.29]There's a book by Brian Tracy, that may not be the best prescription for this, but I think it's called Eat That Frog, or Eat The Frog, or something like that. Have you heard of that?
Essentially, it's not a great to-do list type of a thing, but essentially, do the thing that you have to do that day, first. It doesn't mean literally eat a frog, it just means that if you've got this one big thing to do today and today's a success because you did that thing, do that thing. Don't wait two or three days to do that thing and call your aunt back, or whatever...
Instead of taking my approach, where it's like "Oh, I'm really dreading this, so I'll just put it over in the corner..." Do the dread thing first.
That's his basic advice. I'm not sure if it's the best advice, because it's difficult to do that, but I think what he means is just don't put the things off that matter most too far. Do them pretty quickly.
You shared this story too about -- Peter may not realize this, but right now it sounds like you may be in a case of burnout; not so much from our conversation, but from your talk.
No, I was really critically burnt out...
...seven, eight months ago, and I got really lucky. Being able to successfully work through burnout is really rare. But I'm working over at FutureLearn now. We're like an edtech platform, and it's the most reasonable place I think I've ever worked. I'm managing a team of really brilliant engineers. I go home right at five, and I don't answer e-mails, and I don't do Slack, and I get to fuss at my team too, like "Go home right at five, or at six", and not do... The only thing I've ever fussed folks out for is like "I saw you were emailing on a weekend!"
Yes, I like that! We've been doing that... I'm not the best at it personally, however I do like it when my team is that way, because it lets me know they care about balanced life. I don't ever wanna make them feel like they have to do something that's for us, outside of like normal hours. It's just like "I don't want you to feel that way."
But an important part of leading in that way is making sure you don't do stuff that they see...
I know, I'm kind of a bad example. I'm working on it. So that's part of burnout, too - you may not be able to get back to like perfect you in this burnout stage, to get back to some equal balance... I think it takes time. I'm a big fan of iteration. I realize that today's delivery might not be perfect, but it's good enough; it's enough to put out there and get a feedback on it going on to learn on how to better course-correct for the next iteration. To me, that's a pretty profound thing.
So what does burnout feel like? What would you describe burnout as? If someone's listening to this and they're thinking "It kind of sounds like I might be burnt out..." What do you think burnout is? What was it to you?
Fantastically, there's a ton of research around this that says burnout is effectively indistinguishable from clinical depression. Yay, it's just great!
That's pretty common though, right? Being depressed. Or clinical depression... Is it different?
Clinical depression is just diagnosable depression.
It's really challenging, because the symptoms of occupational burnout both mirror depression and other mental illness issues, and can trigger them.
So it's like a one-two whammy of misery. If you are listening and you're concerned, like "Wow, maybe I'm burnt out", the Mayo Clinic has a really fantastic checklist, which you can search for it online and go through it.
A checklist... I'm assuming if I took that checklist right now, they'd probably say "Yes."
If you were in my talk, if you said yes to any of those questions...
Yeah, I said yes to a couple things.
I was [unintelligible 00:11:26.15] I didn't wanna say yes to any of those things, but I couldn't help but doing so. There's times when I get up -- I love what I do, and it's a shame, because I do really enjoy what I do... But not every day. I don't always enjoy it the first thing in the morning. And I was really listening closely when you said you have this bug, or this thing you're dealing with, with a code problem, or whatever... "Oh, by the way, I've gotta take care of the laundry, I've gotta help with this..." - that's what happens to me sometimes. I heard myself in your talk today, basically.
[00:12:04.23] And everybody's brain works really differently, and neuro diversity is one of the most exciting things about the way people engage with the world around them, and mental processes and social processes are so cool... I think wedding self-care into the way we view our own mental processes and approach the world is absolutely critical.
What is that? What do you mean by that?
For me, when I was burnt out and coming through recovery, one of the hardest things was you aren't running on all cylinders cognitively, so I had very serious memory issues, which is quite common; I was consistently despairing and miserable, and I had a really hard time seeing success in my own work. So I'd do something, I'd do a measurably good job, and I'd be like "Well, I guess that's okay..."
And for me, if you've heard me wandering around the halls today, you've probably heard me sort of [unintelligible 00:12:58.11] "Everything is fine..."
To other people as well, when they're like "Oh, this thing...!", I'd like "Everything is fine..." So I had to kind of -- not quite home CBT, but work to build new pathways. If I was like "This is terrible, and everything is bad", I'd be like "You know, it's probably fine..."
It reminds me of something that happened on my trip here. I had to drop off my car in parking and take the -- I guess it's some sort of bus over to the departures... And I was in a rush; she was only supposed to take C, but she decided to pick up A and B as well, and then everybody started piling on, and then I had to scoot over and share half of my seat with somebody.
Long story short, I was just like -- she thought I was complaining, but I wasn't... But I liked her response to her thinking I was being an abrasive person. I may have been, but I definitely wasn't trying to be... But she says "It's a beautiful day." And I was like, "I'm not a bomb, but you just diffused a bomb." Because you can say that thing -- you say it to yourself, it's not like you say it to other people too, but it's that one thing you can say that's like "You know, everything's fine. It's a beautiful day." It's just something you can say to sort of like take away the stress and crazy chaos that might be coming. What do you think?
I was gonna say, I've got something that sounds very nice and placid, but actually means "Piss right off!" and I [unintelligible 00:14:21.02]
Sure, you can do that.
One of my favorite things is "Oh, well perhaps you know best...", which does mean like "Um, please."
She's mouthing something. [laughter] We try to remove the explicit tag for our podcast, for our young hacker fans out there, but she's...
I think if I'm ever in a situation with somebody being terribly unpleasant, I'd be like, "Oh, well I suppose you know best." Folks know what you mean, I hope...
The one thing I would say about burnout, because burnout so closely mirrors depression, is that if anybody listening to this things "Oh wow, maybe that is an issue for me", one thing I'd encourage people to do almost immediately (well, immediately) is seek some kind of healthcare. So go and talk to your doctor, talk to them about how you're feeling... Just because it could be burnout, it could be depression, it could be one of a dozen physical or mental health issues. Just going and making sure that you're not listening to a podcast and going "Oh, well Jess said just balance my--" No, no! Go see your doctor, please.
Right, right. This is not medical advice; this is only two people talking about -- sharing war stories of burnout. Jess, in your talk you mentioned that your doctor had actually written your prescription to quit your job. Can you talk about that?
It was very jokingly.
So that wasn't real.
No, she did.
It did happen, okay. Can you share the story?
[00:15:47.27] So I went in to talk to my GP, and I'd been working as a contracting consultant for a while... And I went in to chat with her because I was having a really hard time sleeping, and I was lethargic... I was just pretty miserable, and my GP (general doctor) is just absolutely glorious; she's very sarcastic and absolutely delightful. She was like, "Okay, okay... Okay... Well, I can give you this antidepressant, this antidepressant, or this antidepressant, but you should probably quit your job and we can skip all of those." And I was like "Are you telling me that's your medical advice?" She was like "Yeah, yeah, look!" and then grabbed a prescription pad and just wrote "Please quit your job."
Wow, I mean... Technically, she did write it on paper. That was her medical pad -- I don't know what you call those things; prescription pads. It's a legit thing, right?
It was just a bit of stationery, I don't think it was a proper prescription.
Oh, okay. Cool. I was taking it a little further, but...
No, you want it to be like a sticky pad, where they tear off at the top -- no, no, not quite so romantic.
Okay. So did you quit your job?
It was a contract, so I got to the end of my contract, and then yeah, I got into something that was less stressful... Which isn't fair to the contract. It was a completely reasonable job, and completely reasonable people, but had 60%-70% travel, and the early stage startups, you know...?
Yeah... Just unhealthy for you. Not so much unhealthy in general, like to somebody else who could do the role. Just not the best for you.
Well, I think developer relations folks talk about there being -- you could do it maybe two years... And at the time I'd been doing it four or five.
There's actually several people I know that have been in head of something, of dev evangelists, or relations - whatever name you wanna apply to that role; essentially, telling people about products to encourage you to use them through developer means... Whether it's how to use something, or a demonstration, or meetups, or whatever. There's a lot of people I've seen do that job that eventually do for sure burn out.
Oh yeah, we just drop off the map for a year and a half, two years, go to Thailand, backpack... Yeah, folks do tend to take a chunk of time off.
Well, it's fun to do, right? You may go into it thinking "It won't be me. I can conquer this job. I can travel the world and not get burnt out. I can put in 15-hour days, seven days a week", or whatever the time constraint is... What's a normal day? Is it like 10-hour days, 15-hour days?
It really depends. I'm such a ham. I was doing a lot of traveling and speaking. I think I was in maybe 50 countries in two years. I was doing maybe two or three conference talks a week. My husband switched to being a house husband just because I would come home and need somebody to like swap out laundry and run my life admin. I don't necessarily recommend -- I love DevRel, I absolutely recommend doing it; I don't necessarily recommend that level of travel. I absolutely recommend house husbands, though. They're just glorious.
Is that right?
Yeah. Everyone deserves one.
[laughs] Well, how do you get one?
I suppose you marry someone who is more caring than job-focused.
Okay. Interesting. Parting advice - that's always good. What's on the horizon for you? Where are you going? What's next for you? What's some good advice to give back to people? How can we tail this out?
Oh dear, that's so many question.
You've got options.
So for me, I'm doing a lot less traveling and speaking, and instead I've been focusing on stuff that's scalable and I can do from my flat... So I run a podcast myself, the Pursuit Podcast...
...which you've gotta scroll down a bit, because there's also a lot of church podcasts with similar names.
The Pursuit Podcast - yeah, I guess so. So if you're searching for it, you may find a bunch, but if you go to...
It's the blue one.
What's the website?
If you go to pursuit.tech, you'll find us there.
It's a good domain.
It is, pursuit.tech.
If you wanna fuss at me directly, I'm at jessica.tech. Always very on-brand.
Okay. What is the podcast about?
So we tend to focus on a different topic each month, and we just talk to folks in that space, and about their experiences and about what kind of advice they could give. So we talked about wearables this past month, and then next month we're gonna be talking -- oh dear, that's like tomorrow, isn't it? In May we'll be talking about different types of funding. We'll be talking to somebody about bootstrapping, we'll be talking to somebody about working with VC's...
Yeah, I'm pretty excited for it.
When you read the summary of it, what's the promise of it?
Oh, man... I wrote that like two years ago, so I think it's "Your guide to getting the things you wanted..." Oh dear, oh no... Um, it says some stuff.
It says some stuff. Okay, well listen to Jess' podcast and learn some stuff - Pursuit Podcast. Is it pursuit.tech?
Pursuit.tech, or @pursuitpod on Twitter.
Okay, easy enough. And parting advice...
Parting advice is absolutely fall in love to saying no. The way I got so burnt out was like "Oh yeah, I'd love to come to your conference. I'd love to help with this project. Of course I'll work over time." And if you learn to say no judiciously earlier, you can make your life a lot easier.
Saying no is the toughest two-letter word in the English language. It's so tough, but I can agree with you that saying no has its fruits, because... I've said no to a lot of stuff, and have seen the rewards of that, so it's definitely got a good thing for it.
And once you get used to it... Being a little bit burnt out - I have fallen in love with it.
Are you familiar with Derek Sivers, by any chance?
I am not.
Derek Sivers wrote several blog posts... He started CD Baby; he's got a really interesting life. He's a really interesting person, let's just say that... And he has this way of saying... It's either -- when he says, you know, a decision to do something for him is either "Heck yes!", or "No." He's either really excited about it, or he's definitely "No." Because if he's not really excited about it, he doesn't have time. Almost like in your talk, where you don't have enough hacks for something - it's like that for him; it's like "I don't have enough to do this all-in like I do things, so then it's a no."
That makes quite a bit of sense.
That's his way. I kind of like that one.
I might politely disagree a little bit, because some of the stuff I've found was rewarding is stuff I never thought I should do or that I'd enjoy.
I could kind of agree with that too, because there's been times I've said yes to things I'm like "Ain't gonna work out... It's not gonna be fun." It worked out, and it was fun.
It was great, yeah. Kind of the night out principle - when you're going for a night out with your friends, the nights where you're like "Yeah, this is gonna be great!" are often duds. And the times where you're like "Oh, I'm not sure I wanna go out..."
Best time of your life.
Best time of your life.
Yeah, I could agree with that.
Henry, you're in the middle of an experiment which I think to some might seem risky, but you've got it kind of fleshed out. You've been down this road, you've been doing open source for a while; you've even said you've accidentally become a maintainer at one point, I believe, either on the React Podcast, or Request For Commits -- somewhere I've heard it, but I'm not really sure where. Where is this leading you to? You're in open source full-time now, but solo. A company is not paying you to do it, you're doing it yourself.
Right. The reason why I left - part of it was that I don't think a company would pay me to do things I want to explore and experiment with. Mostly they're gonna pay you to work on features, or just the things that they need for their business. If you talk about "How do we build community or bring new contributors?" - they want that, but they're not gonna pay you specifically to do that. Or like "What does it mean to do mentorship?" when it's like an intern, but they wanna hire them to work on other things, not open source. All these companies don't wanna take ownership of open source (or Babel, for me) and it's like "Where can I actually do that?", and I feel the only way to do that is doing it on my own then.
It makes sense... A business has to have some sort of value from the exchange. At some point, you'll be your own business, so you may have to have similar considerations of how you spend your time - is it valuable? Does it help me? Or you may actually have to make a choice of being able to earn money or serve open source - what do you think about that?
Right, because the things I wanna do aren't things that people would normally pay for... So it's like, either I have to make money doing freelance, which I don't wanna do, or get enough money from doing maybe consulting, or support contracts, or workshops for Babel, such that I can do the other work.
This is why I made the Patreon, because then people can support me for whatever I'm doing. They wanna support the person, rather than the project. But I found out for most companies they only wanna support the project, because it's just better for them from a business point of view, or just talking about it to their boss; they don't wanna support a person.
But individuals would rather support a person, because they understand who you are, and you're literally supporting their livelihood. I think that's more like something they wanna contribute to, rather than a project, where it's like - it could be a lot of people, you don't know how you're spending it (What are they doing with this money?).
Learning to figure out how to spend the money for a project is really difficult. "What does that conversation look like?" and stuff.
So let's not assume that the listening audience is familiar with your story. Give me a quick version of not so much your getting started, but this transition - you worked at Behance, you decided to make the shift... Give us that, and we'll go further from there.
Yeah, so first I got my job at Behance because of open source. I got involved in their project, and then they e-mailed me "Hey, do you wanna move to New York and on this team?" So I was already wanting to do open source more, so I was like "If they found me through it, that means they care about it." So I was kind of doing it, but I was working on the product team, because at Behance they're mostly focused on that... So it didn't really make sense for them to have someone working on it, like just for open source.
But halfway -- not halfway through it... Like a year later, I'm working on Babel in my free time; we use it at work, so why aren't we -- like, we shouldn't wait for me to work on it if we need something... So like "Well, why don't you let me work on it at work then?"
I think I asked for full-time and they gave me half, which was already amazing... So I was like, "That's cool. Okay, I'm good" and I kind of just started working on that. But the problem was that that's always gonna be hard, like "What does it mean to do 50%? Is it like half of my sprints are Babel and the other half is [unintelligible 00:27:53.13]?"
[00:27:54.17] But how did that work out actually practically? Did it work out? Were you still kind of saying "My responsibility is just like -- oh, now half of it really is this, but it's not."
Yeah... I think it's kind of like that. It was an experiment for them too, because they'd never done that with other people.
I mean, kudos to them too for trying. Obviously, there's no hard feelings, but it's just like... That's risky for a business that's like brand new for them, even.
I think it's just that a lot of people there have previous experience and culture in open source, in other projects like jQuery and other projects, so they understand what that's like... They see where my passions are and they wanna support that, because I'm still working on other things, and then the things I'm doing only help us, too. So they were able to justify it.
It's just that in the end I would have my own guilt of not feeling good about doing open source even though it is my job, and... We have deadlines, so it's like, I would always tend to -- I wanted to do open source, but I'd be doing my other work. It's just like a weird situation, and... You know, everyone was really supportive, it's just like mentally I don't know if I could take doing it and then doing this other thing. So in the end I was like, I think I need to just figure out "Do I actually want to do this?"
I had lots of conversations with my boss about what is it that I actually want; do I just say I wanna do open source because it sounds like a good idea full-time? The reality is a lot of people don't wanna do it full-time. It should be just like something that's fine, and you should have to worry about it your whole day, every day, and dealing with community and all these other things which are not just purely coding. But then I realized that's the part that I like; I like working on those non-technical parts.
Anyway... On the board, when I said I wanted to leave, when I made the decision, they knew I was sure. They knew I wasn't just making it up, or something. My boss was like, "If anyone can do it, I think you can do it."
Were you concerned in those moments? I mean, I can just imagine me - I'm not you, obviously... I would be nervous. I would be like "Gosh, what are they gonna think? Is the world gonna think I'm an idiot for doing this? Is this the wrong move?" Take us through some of the morning of like, "I know today I'm gonna give my resignation..." How is it gonna go?
I don't think I ever thought "People think it's gonna be dumb or bad." I actually never thought that. It was just more like "What's gonna happen to me? Am I just seeking after something and taking the risky road, instead of just having a safe, comfortable job and working on it half-time? Maybe I should be okay with just working on it half-time, which is already better than a lot of people." But I guess in the end I felt so convicted to wanna do it that much that I decided that was the right thing to do, and when I went there I felt pretty confident in going in and doing it.
In the end, I didn't really have any issues, when I got to that point. But before, there was a lot of just not even wanting to think about it in the first place, because you're like "That's such a dream" or "That's such an impossible thing", that I never just thought about it seriously. So trying to do that, thinking about the taxes and insurance and all that... You know, just trying to make it seem more realistic; not in terms of like it's easy or hard, just that "Those are things I'm gonna need to think about if I'm gonna do this."
Maybe I'm just missing it, but what is a day like in your life? Not so much Henry's life, like "Get up, brush your teeth" kind of thing, but what is being in open source to you? You said you're excited about community, the things that they weren't willing to pay for, or the intangibles that are not so much code level, human level... What's that for you?
Yeah, that's something that -- I kind of talked about this in my talk; it's something I always wanted to do, but I never really did it. I would kind of half do it. It's like, you have a picture in your mind of what a maintainer should be, and usually it's about code. So even if I wanted to do something else, I knew like "Oh, I have like this box" where it's like "This is what a maintainer should be", so I just focused on code.
[00:32:11.00] But now it's like, okay, I don't really know what it looks like, but I want to figure out those things, so that means like -- right now I'm just focusing on getting funding and that kind of thing, but I do wanna spend my time like "Okay, should we make a Babel meetup, where instead of giving talks, it's just a meetup where we come... I'm there, and I'll teach you about how to contribute to open source or get involved in our project", instead of waiting for people to randomly show up in our GitHub, making a PR or an issue. Obviously, everyone is very intimidated, so if I'm there, then people will feel at least willing to show up and see what it's about.
Maybe we make a podcast, maybe we do livestreaming, or we start doing more like video chat instead of just talking on Slack or Twitter... Like, just things that make it so that I seem more accessible, and human, instead of just like some person... And also, so that people know that there's a few people working on this, not just a huge company, a huge team, or something like that; it's like a few volunteers, but yet you're using it... And then people feel like "Oh, I'm not good enough", but it's like, I got started in the same way, not knowing anything, accidentally, and I'm here, and I wanna help other people get to that, if they want to.
So in a lot of ways you're the on-ramps to Babel/future open source work...
Yeah, I think so.
And that on-ramp is defined by either face-to-face meetup workshops, intro to open source, first contribution to Babel or XYZ project, or "Here's how you use GitHub", or you name it.
It's hard, because then if you're doing that one on one, or that kind of thing, it doesn't really scale... But I think maybe we get caught up in this idea of just like wanting to tell everyone about open source, and "Everyone should do it, everyone should be a maintainer", but I don't think it means that -- people can, it's just that I don't know if they know what they're getting into... [laughter] So we wanna make sure that they -- it's not like a prepare, it's just like...
In a lot of cases, your warning signs...
Not in a bad way, but you have to be committed. It's not that the road is bad or hard, it's just a matter of like, you've gotta be committed to do that kind of job.
Yeah, and I think it's also fine if people wanna do those kinds of -- I guess we call them "drive-by pr's", or whatever.
Drive-by contributors, yeah.
Yeah, maybe you don't have time and you can't do it at work, or you have kids, you have family - all these other priorities, but you still might wanna do open source, and there should be a way for you to contribute, like maybe once a month or whatever, and that's fine... It's just that, if you go about it in a certain way, you might find that you're taking on too much and you don't realize... You're like "Wow, why did I suddenly make my own project and millions of people are using it?"
How do you think Babel will benefit from now you being essentially -- is anybody else full-time on Babel? Is there a full-time on Babel at all for the project?
No, the other main contributor, Logan - he isn't full-time, but he left his job a while ago and he's my other partner in doing all this... And everyone else is a volunteer. So we don't have any full-time people.
I mean, it's weird - I say full-time, but it's full-time for the project, but it's not full-time for the code. Right now, I'm basically not coding anything. I'm just trying to raise awareness, or get funding, which are like sales, I guess... And all these other things.
Yeah, business development; that's still part of it, it's just not what people normally think. And I'm mostly just doing like reviewing, and very meta level things, that I hope are like long-term things that we would never consider before. Before, when you're at work and you have limited time, you're always thinking about taking out fires, like these short-term things; but if you're full-time, then why aren't we doing these longer-term goals, like getting contributors, investing in people, so that maybe some of them will be committed, instead of just like asking a hundred people to make a good first PR.
[00:36:10.15] So you've decided to go with the Patreon route, and I'm not trying to get in the politics of the different routes you could go to seek funding, so I'm not asking you to share the advantages of this platform versus another, but what is it about -- not so much Patreon, but how are you incentivizing people and/or corporations or companies to support you? Who is the customer that you're seeking? If you're doing business development, you're seeking out "Who can fund you? Who can believe in this mission? Who can help sustain you to make it happen?" Who is that person and/or company?
I should probably have a better, specific group of people, but there's a lot of different people that it could be. First, I'll say that we have an Open Collective which is another crowdfunding site that's for the project itself, so I can take the money from there, which is good, because there's not enough from Patreon right now.
The reason why I picked Patreon is because it's a way for certain people to be able to invest in a person, rather than in a project. I think there's gonna be people that would just rather donate to a person, other people would rather donate to a company, maybe companies would rather donate to the Open Collective for Babel itself.
In terms of incentives, I don't really wanna go out of my way to make incentives, and I think that's true for anyone that does Patreon, because it becomes a job in itself, of like getting people to--
Yeah, it's like the Kickstarter where you say "I'll just give you a bunch of free stuff, just give me the thing. I don't want the swag, I just want the thing."
Yeah, and then you're like, am I gonna ship people stickers? That's like a lot of work that you have to do. So all of my incentives -- I'm kind of trying to make it funny, actually... I have one incentive that's like $11/month, and it's like the ping-pong tier, because at ping-pong you play to 11, so it's like "If you donate to this, I'll play you in ping-pong." It doesn't mean I'm not gonna play ping-pong with you, it's just like -- that's like a reason for you to pick that.
A bonus, yeah.
I did one for video games, like Mario Kart. I had a 50 CC tier... So I'm just trying to get creative... You don't really get anything per se; it's just like, if you wanna donate that much, you can. And I did one for like board games. Just the things that I'm interested in, and maybe they're interested in that, too. So those are just individual people, and I guess it's gonna be difficult to get that, because it's more of an awareness thing.
I feel like once people know, they might decide to do it if they know me... But for the higher tiers, I might have like $100, $500 or $1,000/month, and that goes to the -- basically, emulating what we did in Open Collective, where it's more like, we'll put your logo on our website... It's more like an advertising thing. That is kind of -- it's not risky, it's just that I'm gonna have to keep pitching, because once someone's like "Oh, we're not getting enough out of this logo for $1,000, we don't need you anymore, and then we're not gonna donate."
So what happened was that a few days ago I had like $3,000/month, and then suddenly two people dropped out, they both were giving $1,000 for this next month, so now I'm down to only like $1,000. That's because these two companies - they're probably not that big, they're not like Google; it's just like two people, or something.
The goodwill, or charity almost... In a way.
Yeah, and maybe they wanna support me for a month, or they have their own money issues... So it's like, I don't expect them to do that. But it does suck when you're living off of this number, where like one day it's all gone, and then you're like "Okay, now I've gotta think about that."
Wow... Yeah, the ups and downs of crowd-- I don't wanna say crowdfunding, but I couldn't think of any other way to say it... Like, donation-based living, I don't know. You have that number, you look at your own Patreon, you see what's coming in or what you think is gonna come in, or what's expected, and you probably plan life around that.
It should also be said that you live in New York City, right? It's not exactly cheap to live there. I'm sure you've got fairly -- even if you trimmed your expenses, you'd be like... What size is your apartment now? I don't wanna say a smaller apartment, because maybe you're in the smallest one already, I don't know.
[00:40:09.02] Yeah, it's not enough to cover the rent, and then I have to pay for insurance, too. So even the base of that is already very high... But the good thing is that our Open Collective has a good amount of money there now, so I can at least use that. Then the other plan is to reach out to companies, which I've done - and hopefully those go through soon - and it's like, okay, one idea is this idea of a support contract, which is something that Webpack is doing, where you pay a certain amount, say it's like $1,000/month, and we'll give you two hours of time to help you with whatever.
Then that means I can work a lot less hours and you can still get paid. If you can convince them to do that, that would be really good. And you don't have to do that for a ton of companies, but the problem is [unintelligible 00:40:58.19] there's like a huge potential, but right now there's none, so it's like... Where is that?
So after hearing that, I would say maybe you could treat the human side of this equation for you, that these companies are also humans, right? Companies aren't just companies, there's humans behind that; users of Babel, potentially even contributors of Babel, right? Potentially, who knows? And that maybe this is a growth hack that you're using to 1) keep in touch with your constituents, and also potentially get some (for a lack of better terms) sales.
Yeah, I think it's definitely a good way in... And it doesn't have to be two hours; it could be more, and they could just pay more... I have a lot of other ideas around "What if they wanna be contributors? Well, you can pay me to help you get into open source", although I'll probably still have the meetup for free... But then for companies, it's like, well, we can charge them.
"Come to a workshop - how to do open source."
Some companies would be interested... I mean, I think most of would just be like "Oh, can you give a talk at our company?"
Yeah. Well, you just put it into numbers for me, so that's why I maybe think about that, which is like, if you can say "Well, I need (for a lack of better terms) to have ten companies committed to doing two hours, because it equates to this number", because then you can start to determine it's gonna be there and start to live like it's gonna be there, and have some certainty that you don't have now. Well, how do you reverse that? Have some relationships.
What I've learned from our business, and you probably know this already, but the biggest part that makes us successful is that we care deeply about not just our listeners, but also our sponsors, which we really call them partners, but the industry accepted term is sponsor. We really call them partners - people who work with us, that sponsor our content, are our partners. We work with them, we form a great relationship, and it's only beneficial if we add value to them. Sponsoring our podcast isn't charity; they get value, and it's up to us to help them understand how they get value, and help them TO get value, not just read an ad.
So in your case, maybe it's the relationship side of the equation... Invest a little in those people, to get them to invest in you a little. That's just one way, though; that's just one of the things I'm thinking about.
That's interesting though, man... The uncertainty... Do you sleep well? I mean, given -- I'm not saying you're in a bad situation, because you're not, but you know, there's some uncertainty... Do you fret, do you get upset? How's life for you right now?
[00:44:01.11] It's interesting... I guess, oddly, I'm not too worried. In the end, I do have -- the backup is just like "Get another job", and it's fine. That's totally--
It's an experiment.
Yeah, and I'm willing to take it as far as I can, and I don't really wanna think of that as an option at all, really.
Do you have what they call runway? Are you like a mini Henry Zhu startup and you've got runway?
I have some savings, so I think I'm good, but unfortunately there's not that much rest in terms of like thinking about what's coming up, and yeah, there's definitely uncertainty... I don't know, I guess I'm very autistic.
This is your choice, though. You chose to do this, so clearly you can't be that upset about your circumstances, because you chose the circumstances.
Right. Someone to enforce me to do this... People would probably tell me not to, and I have to go out of my way to be like, "Okay, I think I truly believe this is a thing", and maybe writing the blog post or giving this talk helps me to form my thoughts better, to think "Okay, this is something I wanna pursue, and a goal that I want not just for me, but for other people, too."
Okay, other people, too. So are you in a position yet to advocate others to follow in your footsteps? Are you still in an experimental stage, only to the point where you're like "Let me try this out for a bit and then I'll let you know?" I know that Feross Aboukhadijeh is also doing it... There's several others. We had a small list of people we actually wanna do a panel show with - you were one of them - around this topic of going full-time crowdfunded open source maintainer perspective...
No, I actually don't think people should do it. For me, I spent a whole year thinking about whether this should be a thing or not... And I didn't wanna go rash, and just like "I quit my job! I don't wanna do this anymore!" and just do my thing. I wanna really know that this is what I want, before I do it. But I didn't figure it all out before making decisions, because I felt like then it would take forever.
You had to at some point take a leap.
Yeah, and that's what I did. And I don't think my goal with doing it is to convince everyone else to quit their job and do full-time open source. That's definitely there, but I just don't see that as that viable at the moment. I would rather help people to be more aware of open source in general, and how we can support people that are doing open source, whether they're full-time or not... And probably the most important thing is "How do we get companies to either sponsor projects they use, or allow their employees to work on open source at work?"
That's probably the best way, because they're usually unwilling to give money, but they're willing to let their employees work on the things that they want if they're going to ask for it. But the problem is that a lot of employees don't, and there's a lot of reasons, whether it's being intimidated, or not knowing what to contribute to, or not knowing how to say it... But I feel like there's a lot of people that would want to do that, especially if they're busy outside of work. Not everyone has the privilege or even wants to work on code. You're coding all day, and then you go home - are you gonna code again? It makes sense, you wouldn't wanna do it.
You may have answered some of this there, but... I'm not gonna ask you for the five-year plan, but I'm gonna ask you for some -- project for me a time range... This is how I kind of work - "What is success?" A year down the road, a year-and-a-half down the road? What does success look like to you? Like, you've done this, it's successful, you can advocate others to try it, given some circumstances, for example... Maybe have put in some time, have relationship with the community; not just jump ship, of course, but give me maybe what do you think success is to you a year, a year-and-a-half down the road? Or just whatever timeframe makes sense for you.
Well, one measure of success that I might think of is the idea that I could just go away, leave the project, and everything is good. So I could leave it now, and -- I kind of talked about this in the talk... You have this sense of like pride, where like "If I leave, everything's gonna go bad, and no one's gonna take it up", but that's not the case. I'm not that irreplaceable.
[00:48:10.26] But obviously, I still feel like I'm needed (or whatever that means), and it's like "When do we get to a point where we have enough contributors?" I don't wanna say the boss factor... My boss - he said this is the lottery factor; that's like a better way of putting it.
Okay, I haven't heard of this one.
Yeah, it's really cool. It's like "Oh, those people won the lottery, now they're not gonna work on this anymore, because they can do whatever they want."
Okay. So they didn't die, they just hit it big.
So how do we increase that, so that people can leave-- actually, it doesn't have to be I can leave the project entirely, it's just more like, we all feel comfortable, everyone feels comfortable to just take a break... Not just because mentally they were able to overcome their issues with that, but that everyone feels good that like "Oh, there's a lot of people working on this, and everything seems to be going well", that kind of state. I don't know what that really looks like...
So if I understand you correctly, it sounds like you're projecting that success is Babel being in a state where you can leave without any concerns of it imploding, for lack of better terms.
Yeah, that's one way to say it. And when I say that, it doesn't mean I don't wanna work on it anymore. I still really wanna work on it. I just think it's a good attitude to have when we're doing this kind of thing, and I think that is a good level of success, because it's saying that I don't have to be around, and that it still moves on. Then you won't have the anxiety about whatever the issues are, and maybe you can work on other open source...
I think for me I really like working on Babel, but I think another thing I like is just thinking about open source in general. It doesn't have to be tied to this project. I just happen to really appreciate what this project does, because it's unique.
That's interesting. I guess my perspective so far has been you full-time on open source means Babel involvement; it sounds like that's not exactly... Not that you have plans, but just that your mind is open to one day Babel not needing you, that you may have skills and abilities to be put elsewhere, whether that's another project, or just a new thing, or whatever.
I always get this guilt-free kind of perspective when I think about it like bands. Sometimes a band might tour with another band, or this band, or the lead singer might go to this band and do a cross-over. It's never like they're leaving their band - unless they actually do leave their band, but it gives you the freedom to sort of cross-pollinate. I think there's a lot of under-appreciated opportunities in cross-pollination.
Yeah, and I think at least I should be trying to look -- if I have the time, then I can look into what are the projects that are related to Babel. It doesn't even have to be a different language, or something crazy like that... It's just like "Oh, Webpack is used with Babel, so maybe I should learn more about how they do things and we can work together" or Vue and React... I think that's a good way to kind of like naturally look into other projects.
[00:52:05.11] Yeah. Anything I didn't ask you that you wanna share, that I just missed?
I don't know, I guess my talk today was just about how we can think differently about open source... Not just like getting things for free, but actually helping and serving people. And I think the values that are there - maybe we don't really emphasize much, and we kind of get influenced by how non-open source works... So whether it's like adding transactions, or thinking about things in a very robotic way, when open source has its own views on things. We kind of just take those things and we copy them, because that's just the way we do things, and that's probably why we keep seeing all this behavior.
We can change the medium in which we do open source such that it is building up people, it is bringing community, and those kinds of things. I think we should be thinking about that more. I don't really have an answer, and it's gonna be really hard, because this is all non-technical, versus the code itself... But yeah, I like to think, what are the habits or things that we can create such that we can reinforce the ideas that we want? In our minds are the ones we want, but in reality we don't act those out.
Sitting down, talking to Simon Willison here, getting in the groove, talking about data. You are so - more than everybody I've ever met - excited about data, bro. Earlier I didn't tell you this because I was just enjoying the moment, but you got really excited about some data.
Oh, yeah. So back in 2009-2010 I was working for The Guardian newspaper in London; basically, I was hanging out with journalists, helping solve data problems... So it was in the world of data journalism.
At The Guardian, yeah.
This is like the Mecca of like -- you're from the U.K., right?
Yeah. It's a very, very high-quality U.K. newspaper.
Right. So if you're working for a newspaper there, or anything journalistic, The Guardian is a good name to have on your resume.
Yeah, that's definitely true.
And when I joined The Guardian, they had this fascinating onboarding process where they basically set you up on coffee dates with people from all sorts of different departments around the newspaper. So you have coffee with the sub-editor, then you have coffee with somebody who's involved in the print presses, and so on and so forth. And those people will talk to you and they'll introduce you to other people.
After a few of these meetings, a bunch of people said "You know, you really need to talk to this guy Simon Rogers, one of the journalists", because Simon Rogers was the journalist at The Guardian responsible for the infographics, where you publish a graph in the newspaper, and somebody has to phone up a bunch of government agencies and gather the data. He'd been doing this for years and was really good at it. He could get data on anything, and all of this data -- I asked him where it was, and he said "Oh, I've got it in Excel spreadsheets on the computer under my desk." And I'm like "This is gold!"
[00:56:11.11] So we got together, we started [unintelligible 00:56:13.13] and we ended up thinking, okay, the easiest thing we can do is start a blog - we'll start The Guardian Data Blog and we'll publish these things as Google spreadsheets. We did this, and The Guardian still has a data blog today where they're publishing data behind the stories.
That was really exciting, except Google Spreadsheets always felt like a bit of a shortcut to me... I mean, it worked, and all credit to it, that was fine, but I wanted to do something better. So last year I was still thinking about this problem. I've moved on from journalism and had all sorts of other adventures since then, and I realized that the combination of SQLite as a database format and ZEIT Now as a immutable hosting platform was actually a really interesting opportunity for publishing data. Because if you can take any data at all, you can always wrap it up to a relational database (they're really good at that). If you wrap it up into a SQLite database file and then publish it, you can build APIs on the top, you can build an interface on the top, you can start sharing data that way.
So essentially, I've been building the software that I wish I'd had back in 2010 when I was working at The Guardian.
Really good for databases that don't change.
Meaning like, this is data that's never gonna change. It's in stone, it's done.
It's facts about the world. Some of the examples I showed in the talk today, one of my all-time favorite datasets is the list of trees in San Francisco. It's maintained by the Department of Public Works; they publish it through their open data portal. It's basically a .csv file with 119,000 trees, and each tree - it's got the species, and the location, and often when it was planted, and who looks after it... And it's just set there - this gorgeous data.
So I took that, I turned it into a SQLite database, and I've published that as an API, and then started building things on top of it. I've got a website sf-trees.com, which is a search engine for trees. So you type "cherry" and click a button, and it'll show you all of the cherry trees in San Francisco. It turns out that's the website I always wanted to build, and I just never knew until the data was in front of me.
A question about the data, I guess, is that if the data doesn't so much change from the past, but there's new data, how do you deal with new data?
So when you're deploying with static hosting like ZEIT Now, you just deploy a new copy. The tree data is actually updated pretty frequently, so every now and then I'll pull in a new copy of the .csv file, I'll turn it into a SQLite database...
Publish a new diff, basically.
I'll just overwrite the old thing with the new thing, and it works. I think it's about 80 MB when you deploy it, which is small enough that it doesn't really matter, and you can just keep on shipping new versions.
So for the uninitiated out there, who are not very familiar with ZEIT Now, you mentioned it was immutable, so that means that you can't write back to a database. So the databases are immutable.
Right. ZEIT Now is hosting where everything is fire and forget. You bundle up your code and other assets, it turns it into a Docker container, you fire it up to them, they deploy it and they give it a URL which will never change... So it will always live in that URL, and in ten years' time if somebody hits that URL, they'll spin it up and they'll start serving [unintelligible 00:59:13.13]
I think Guillermo said it was called immutable deploys - is that right?
Immutable deploys, exactly. And then you can also set an alias. So you can say "You know what - okay, that URL is gonna stay the same, but I'm gonna point sf-trees.com at whatever the current version is."
So you get atomic deploys, as well. You deploy a new version, you test it, and then you switch the alias over once you're certain that it's gonna work. And it's a really nice way of working. The downside is doesn't give you a regular database that you can post-write to. You can get those, but you have to get those from another vendor, like Compose.io or Heroku Postgres, or something like that. But if your data doesn't change, if it's, say, a list of 190,000 trees in San Francisco, you can package that up in a SQLite database and deploy it alongside the rest of your code, just as part of that regular deploy process. And that, it turns out, is a really cunning trick for doing all sorts of exciting things with semi-static data.
That was a slightly different project, yeah.
Yeah, slightly different project, but you're trying to essentially take datasets, deploy now, and be able to perform searches on those.
And in this case you wanted to auto-deploy a new site based on search parameters, essentially.
I've built a few different tools. They're all open source, they're available on GitHub... The first one is this script called csvs-to-sqlite, which is just a command line tool that takes .csv files and turns them into a SQLite database. That's all it does, it's like a one-shot pony. It can do a few extra things - you can tell it "extract this column out into a separate table", you can tell it to make things indexable using SQLite's full-text search, but essentially you run a command, and a .csv goes in one end and a .db file comes out the other.
The final piece of this is the Datasette Publish command, which as a command line tool, it will take that SQLite database, publish it on the internet with ZEIT, add the dataset application itself, and essentially in one go wrap the whole thing up and turn it into data that you can access as a URL, with a JSON API and an HTML interface.
Then on top of that, I built another tool called Datasette Publish, which is a web app that does all of this for you, so you don't have to install any software on your computer at all. You go to publish.datasette.com, you upload some .csv files, click a button and it will deploy to your own ZEIT account that combined databases, all of the code, and just get things up and running. So the idea is that if you're somebody who isn't comfortable installing Python command line tools, you can still use this suite to take your data and turn it into something that's browsable and explorable.
You mentioned in your talk that you're really passionate... I think either it was in the past, or you're still currently passionate about this -- but data journalism. You mentioned The Guardian as your past - this is something where journalists are not typically programmers, they're not typically familiar with these tools.
It turns out some of them are...
They have to be.
So data journalism is a very specific sort of subset of journalism where you've got journalists who are all about Python and R and Jupyter [unintelligible 01:02:46.20] They're very familiar with this stuff, and they have conferences where they all get together and share tips. These are my people; I'm a huge fan of what they're doing, and that intersection of skills I think is really interesting. But most newspapers can't afford to hire people who have that software engineering background. The really big papers - The New York Times, The Washington Post, they're all doing this stuff... But if you're a little local newspaper, the chances that you can hire somebody with software engineering and journalism skills is pretty slim.
What does it mean then, since you're building this kind of tools...? I mean, one, you're having fun, you're here at ZEIT day, you're showing off some cool tools that you've built, obviously showcasing the performance and abilities of ZEIT - or ZEIT Now, as a matter of fact... When you start to look at how this applies, some of the tooling you're building applies to data journalists, or just people who are curious about data... It seems like you're taking insights from data; what are you trying to build for them with what you're doing?
So the starting point, I think, is that .csv is actually a pretty awful format for sharing data.
But it's what everyone uses, because...
It's a standard.
It's ubiquitous, let's say that.
It is ubiquitous. Excel can produce it and consume it, and so forth... So that's fine, but it's not the most useful format, and actually, when you look at .csv files, where they really fall apart is when you are dealing with something a bit more relational. You get .csv files where a bunch of the columns are duplicated hundreds of times, because they didn't have the ability to bundle it with a second .csv file that's just got one set of the data in.
[01:04:17.03] SQLite databases, I think, are the perfect format for this kind of stuff. SQLite itself is crazy stable. A SQLite database from ten years ago can still be read today. It's ubiquitous, it's one of the best-tested pieces of software I've ever seen, and it's in everything; my mobile phone runs SQLite, my watch it turns out has a SQLite database in it that tracks my step counts, which I can't get access to because I have to jailbreak my phone in order to get the database out of my watch, which is very frustrating.
Come on, now...
But you know, it's everywhere, and so if I can help show people that if you're gonna share data, sharing it as a SQLite database is actually a much more powerful and efficient way of doing it than just as regular .csv files, that's fantastic, especially if I can provide a toolkit that will take that database and spit .csv back out of it, so you're not losing anything by using SQLite; you're gaining stuff and you still get that .csv export as well.
So you've got .csv to SQLite... What about the other way around?
Not yet. That's very high on my to-do list, that feature.
Okay. Prominent in open source, you're doing some cool stuff there... So the repo you mentioned, that is open source right?
Yes. The open source tools - csvs-to-sqlite is open source, Datasette is open source; they're both under the Apache 2 license.
Explain Datasette real quick, the spelling at least, so people don't go to...
It's spelled like the Commodore 64 tape drive, Datasette (like a cassette, but for data).
Datasette, cassette... So think about that when you go there. I will have some show notes, so don't worry about that... Simon, what is it that gets you so excited about this data? Maybe a better a question might be what is it about datasets like this that gets you excited?
The San Francisco trees is probably my favorite dataset. One of the other ones I really like is one from the USPS, it's a dataset of polar bear ear tags in Alaska. In 2009-2011 they stuck GPS trackers in a bunch of polar bears and had them wander around Alaska, and they got back latitudes and longitudes and battery levels and temperatures and all of this stuff, and they published it online to .csv files.
So I grabbed that .csv file with 40,000 known locations of polar bears, I converted it in Datasette, and then I used it as the demo for the first one of my Datasette visualization plugins... So I've been building up this plugin infrastructure so we can add additional features, and the first feature I built was one which looks for latitudes and longitudes and then sticks them on a giant map with clustered markers so you can map the whole lot at once.
And when you put 40,000 polar bear ear tags on the map - and we'll definitely have a link to this in the show notes - one thing that's interesting that shows up is that most of the polar bears in Alaska, about 200 of those trackings, come from Seattle... And that was an interesting mystery; I'm wondering what these polar bears were doing in Seattle, so I zoomed right in on the map where these earmarks were, correlated it to Google Maps, and realized that there's a company called Wildlife Computers in an office park right there who sell ear tags for scientists to track polar bears... So evidently, they'd been testing the ear tags at the home office, and that data ended up in the data that was published as part of this survey.
I don't know if the scientists who published the data would even notice, because they might not have that same visualization that shows them where the clusters are.
So they could be counting fake data.
Possibly. Who knows...? I need to figure out how to get in touch with them and see if they'd spotted this.
So here's something I thought about while you were sharing that story - if you reverse the scenario... Like, as a data person, you're concerned about analyzing polar bears, for example; if you had started out with your desire to have the data, they gave away, or just had available massive amounts of data that was so valuable to you - that was just not really valuable to them?
[01:08:06.14] Well, a wonderful thing about the U.S. government is that they release an awful lot of stuff. I think the default within government is often to release the data...
Right, it's open.
...and it's wonderful.
It's part of the Open Gov initiative.
It's that, but also -- so I have an interest in zeppelins and airships, and it turns out that the U.S. Navy had some really cool zeppelins in the 1930's, and the photographs of those are all freely available, because every photograph taken by somebody in the Navy is copyrighted in a way that it can just be released by the government. So if you want photographs of awesome airships in the 1930's, the U.S. Navy photo archive has all of this stuff for free.
I don't think about data quite like you do, but it sounds like there is just a plethora of opportunity. Where do you see -- open datasets, immutable data now... What have you. Some things you've done here... Where can you see interesting projects that are exciting to you, or things that could be done - not so much tomorrow or next year, but over half a decade, a decade? Where do you see some of these ideas you're shaping, going to utilize public data in useful ways to analyze society and give back answers to make a better future?
So I feel like we've spent a lot of time as a industry dreaming of the semantic web, trying to say "Okay, if we can get all of the data in one standardized format, then we could build wonderful things with it." We've been trying that for ten years, and it hasn't really worked that well so far, but I think that's because that's sort of a boil the ocean approach - trying to come up with the perfect standard for data, and then to get everyone to do that... What actually does work is publish the data in a format that people can use, and then watch people integrate with that and say "Okay, well I'm gonna do a custom query against this, something custom against this, and then combine those together and build something myself." So one of the things I wanted to do with Datasette is not so much establish a standard, but just make it as easy as possible to put data out there in a way that people can automatically query it, they can pull from a JSON API, and then watch what people do with it once it becomes available to them, even if they have to do a little bit of work to clean it up and reformat it for their purposes.
I'm just amazed that you can find such use with such a simple .csv, for example... So much data out there available. We've done shows in the past about open government, we've done shows in the past about open cities - like the city of Chicago talking about manhole covers, and stuff like that... Just interesting--
Did they have a .csv file full of manhole covers?
I'm sure they do, because they have to register that. Those things are expensive.
I hope I'm gonna get that...
Those things are like $500 each, or more... The manhole covers are super thick, they're completely metal, and if they get stolen, they've gotta replace it, so think about the costs. So they have to track them... I remember talking about this around -- it was years ago; I can't recall the exact show, but I will put it in the show notes if we can... But just like, all this public data out there, and someone like you that gets excited about it like that being able to make it useful to people who are not exactly programmers, you know? And then advocating for ways to say "Share data in .csv, share it in whatever format you can, just to make sure that you can share it with whomever, so that it can be reused, in these cases."
I mean, with all of this stuff, the real delight is when people build things that you weren't expecting. And when somebody takes data that you have made available and uses it to solve problems in an interesting way, that's always a delight. If you're building open source software, the best part is when somebody uses your software for something that you'd never imagined it would be used for. That's certainly something that gets me very excited.
Maybe some parting advice then for anybody listening to this segment here that is half as excited as you are about some of this stuff; what are some good starting places, what are some skills you've developed over the years that has made it easier for you to work with and munge data to do some of the things you've described here?
[01:12:01.18] So as a Python programmer, one of the things I love most about Python is it's got an interactive prompt, and this is true of all of the other programming languages that I like working with. Interactive prompts make it so much easier to manipulate data, to suck things down, to reformat them... I've recently been learning my way around pandas, which is the Python library for dealing with tabular data... And if you combine pandas with something like Jupyter Notebooks, it's an incredibly productive way of sucking data and previewing it, formatting, and then you can write it out to a SQLite database and publish it in that way.
I think the data science community has all of these tools as well which are constantly being developed, so keeping in with what's going on over there is super useful for those tools, for pulling things down, for cleaning up and for generating and manipulating things.
Then the other one is just good old SQL. It's been around since 1979, and it turns out it's still a fantastically powerful tool for doing data analysis.
Any help needed on your projects in open source that you can give a shout-out to?
Yeah, absolutely. So both csvs-to-sqlite and Datasette are active open source projects on GitHub which are very keen on accepting contributions. I've been trying to label issues with "Good first contribution", and so on. So I'm very keen on having people dig in there, but more importantly, I want people to use the software and give me feedback. I need to know what works, what doesn't work, what are the features that would help you solve problems that I haven't even imagined yet.
On your issues, do you share any -- not so much roadmap, but bigger ideas that you don't have time to tackle, that if someone did have time to tackle, that you would take a contribution that way, or do you only have the kind of issues you described?
I have a few issues that are some of those bigger ideas, but in Datasette the thing I'm most excited about right now is the plugin ecosystem. It's now possible to build plugins for Datasette that add additional functionality. That means that I can be completely hands-off, and if you want to invent a fantastic new visualization mechanism that does, like, time charts against columns automatically, or whatever, you can build that right now today without even talking to me, and you can start using it and shipping it and sharing with other people.
So I'd love to see people starting to dig in with the build plugins, but also give me feedback on what other hooks are needed to make plugins more productive.
Who is Datasette for? Is it for developers? Is it for developer-like people? Who's the user for Datasette?
So my three targets for Datasette are data journalists, anyone who's collecting interesting data and wants to be able to share it. I'm really interested in museums, because it turns out museums have huge amounts of metadata around their collections, which often is locked up somewhere, and I'd love to help get museums start publishing that.
The Metropolitan Museum of Art in New York has a spreadsheet with 400,000 items in it which I've turned into a dataset instance... And it's that kind of thing I think is really exciting.
And then the third one is civic institutions. There are all of these governments out there that are publishing data in various different formats. If I can help them publish it more effectively and in a way that's more useful to people, that I'd find really exciting.
We were talking earlier - not in this podcast, but earlier today - that interface you were showing off where you can create a database (I think we were even talking about the museum example), was that Datasette? Because you showed me a couple things... That was Datasette?
You were able to dig into a query and then even augment the SQLite or the SQL queries, I believe.
This is one of the most interesting things about doing things read-only; if you've got a read-only SQL database, I can just open it up to select statements, because it's not like anyone can cause any harm. They can't modify the file on disk.
Because it's on Now.
Well, because it's on Now, and also because when I open the database, I use SQLite's Immutable option, so SQLite will disable any writes that could possibly happen.
You were showing me (you got really excited about this, too) the example of Australia and dogs.
Yes. There are eight different counties in Australia who publish lists of dog registrations, when people go to register with a dog license, and they've got the breed and they've got the name... So I've got one little tool where I combined all eight of those .csv files, I've got them all in a dataset instance, and now I can say "Okay, for Golden Retrievers, what's the most common dog name?" So you can search by species, sum by the count of each name, and see what the most popular name for different types of dogs is. The answer is always Bella, it turns out. No matter what species of dog you are, Bella is the most common dog name in Australia. But it's still kind of fun to see the different naming trends.
And then for pugs I believe it was Ruby, right?
Yeah, Ruby beat Bella, actually, for pugs.
Only for pugs, though.
Only for pugs, and I think Pugsley came fourth.
So what I find interesting about that is -- here's some obscure dataset that maybe nobody's paying attention to, and because of what you've done with Datasette, you're able to query in these ways and find out this information... Maybe it's just for the curious, but I find that kind of interesting to me, that you can just play with this data like that... And of the entire (not even country) continent of Australia, right...?
Right. And the dog thing is kind of -- it's amusing, but not necessarily useful. A much more useful one is -- it turns out the members of Parliament in the U.K. have to register their conflicts of interest with the Parliament in the Register of Members' Interests, and it's public data... And there's an organization called My Society who turned that into XML; I took their XML and I loaded that into a SQLite database, so now I've got a tool that lets you search 1.3 million line items of MP's saying who paid the money, who invited them to speak, who gave them a free watch... It turns out that the Sultan of Brunei hands out Christmas hampers to MP's every Christmas, and you can see which MP has had the most Christmas hampers from him. It's super fun, and actually this is news-worthy, right?
If you're wondering why does this certain MP behave in certain ways to different countries, you can dig through all of this stuff and say "Oh well, it turns out they've been giving him a lot of free watches."
And it's undeniable data, because it's from the Parliament, right?
Yeah, it's official data.
It's official data.
They publish it as a rather ugly set of web pages, but with a little bit of work you can turn that into a queriable database.
And how easy is it to refresh that work? If it's so hard to, you know--
Most of the work has been done for me by this organization My Society, who've been scraping this for ten years, and dumping the data from that into these XML files... So I just wrote the thing that turns XML into a SQLite database and built on top of that.
Simon, I'm sure we could talk for hours; I've promised you 20... It was a good 20, for sure, so thank you so much for sharing your story. Where can people find you on the internet?
Mainly on Twitter. I'm at twitter.com/simonw, and I've got a blog at SimonWillison.net as well, where I write about a lot of these kinds of things.
Our transcripts are open source on GitHub. Improvements are welcome. 💚