On today’s show Nadia and Mikeal are joined by Eric Holscher to discuss non-code contributions, how they are regarded in open source culture, their value, and how to incentivize this type of work. They also talked about how Read the Docs grew a documentation community, contribution guides, and why this work matters.
Linode – Our cloud server of choice! This is what we built our new CMS on. Use the code
rfc20 to get 2 months free!
I’m Nadia Eghbal.
And I’m Mikeal Rogers.
On today’s show, Mikeal and I talk with Eric Holscher, creator of Read the Docs, which hosts documentation for thousands of open source projects. Eric also created Write the Docs, a community for people to meet and talk about writing good documentation.
Our focus on today’s episode is documentation. We talked about Eric’s experience in the Python and Django worlds, where he learned to value documentation, and why he built a community around it.
We also talked about why documentation matters, how to incentivize these types of contributions, how documentation changes as projects grow, and what managing Read the Docs looks like from the inside.
So I’m kind of curious, before we talk about documentation, to talk about how you first got involved in Python, and then also Django. Because I know you lived in Kansas at some point, working on Django, and I think that’s where you built Read the Docs.
Yeah, definitely. Do you want the medium or the long version of the story?
I kind of want long.
Okay, cool. So in high-school I started using Linux and Red Hat and all those kinds of stuff, and learned Perl as my first language. Then I went to university to get a computer science degree, but kind of realized that Perl wasn’t really what I wanted to be doing, so ended up doing a senior project in Python and Django, and so that’s really when I learned the Python-Django ecosystem, read blogs from a bunch of people like James Bennett, Jacob Kaplan-Moss and all these folks.
Then when I was graduating, I was like, “I need to get a job!” In hindsight, there was this really fascinating moment where I went to school in Fredericksburg, Virginia, which, if you’re a Zope person, it's actually the headquarters of Zope.
I had these two job offers coming out of university, one in the town I was living in. I had a really cool apartment, a bunch of friends... You know, it was in Virginia where my family is. Then this other one, in the middle of Kansas, working at a newspaper. But then I ended up actually – they flew me out to... Because they were like, “Nobody moves to Kansas without coming and seeing it first”, because it’s actually... Lawrence is a really, really cool small town. It’s in the liberal part of the red state, so I ended up landing in Lawrence, and just being blown away in the three days I spent there, with the amazing people, and being the home of Django, really just kind of seeing that iteration of Python technology.
Zope definitely felt like the old school, and Django was the new school. So that’s how I ended up in Lawrence. And then, Read the Docs was actually a Django-Dash project. There was a 48-hour coding competition. I kind of ended up doing a lot of Python development in Django. Django has always focused super heavily on documentation, and that’s part of the culture of Python and Django communities. I had some source code that had some okay documentation, and it was really just scratching my own itch, right? I had a Cron job, running on the server every hour, just pulling down my git repo and then building documentation from that, and hosting it. We really have better tools, and better technology for this. It really just spawned a 48-hour like, “Let’s have a thing that listens to GitHub web hooks and auto-generates documentation whenever we commit, so it’s always up to date", and then we kind of layered the whole version control paradigm on top of that. So use tags and branches to track docs along with the source code, and then building on top of your development workflow that you’re already using for tagging and branching and all that stuff, so your docs stay up to date, but also you can host old versions of documentation, that kind of stuff. So that was the long answer to that one. [laughter]
[00:04:24.06] You mentioned that Django had amazing documentation from the start and that’s very true, it’s beautiful. Do you want to talk a little bit about what prompted them to have such great documentation, and some of the values there, where that came from?
Well, I think one of the big ones is that it came out of a newspaper, right? And the people that created Django were journalists, and English majors, and really people that valued the written word, right? They really were good writers and they really valued that part of the world. I think coming out of a newspaper really does enforce that editorial power. I think really in many communities the values of the founders get set early and then they attract people who agree with those values, and then it just builds over time.
Both Adrian and Jacob were two of the co-founders along with Simon Willison, and Wilson Miner, who did the admin, he was the designer. They all really cared about that set of values, so I think the community just picked it up and ran with it from there.
So I saw when you first created Read the Docs, you got a hundred thousand views in that first month, and now obviously it’s grown to be so much more. I’m sure there people who are curious about how to grow a project - do you have a sense of how did you find those early users, and why did so many people start using Read the Docs?
So I think one of the key things is that we noticed that Sphinx, the documentation generator, was really a de facto in the Python community, so we really were able to build on top of that. Read the Docs basically just hosts and builds Sphinx documentation automatically. Obviously, Sphinx at this point has grown – well, maybe not obviously, but it has grown much beyond the Python community, to be used by many different parts of the programming world. But we were able to just build on top of an existing toolset, so it was really, really easy for people to switch, right?
If you’re already building, writing documentation, putting it in your repo, you basically just have to go to Read the Docs and click the Import Repository button, and we automatically build your documentation. We pull it down, we build it. Having the standard tooling underneath really allowed that to go forward and get momentum really quickly, whereas I tried to do this maybe one or two years prior with testing, basically what Travis CI is today, but in a much sillier and less useful form. But the Python world hadn’t standardized on any kind of testing. There was like noes, there was pie tests, there was unit tests, there was three or four different ways; there was no standard interface to starting or running... There’s no way to just be like, “I know how to run the tests when I get a repository”, so it was really hard to bootstrap any kind of standardized services on top, because there was no shared platform. Sphinx really enabled us to build on top of that, and then I would just... At that point I had given a couple talks at a Django-con, stuff that I was kind of somewhat known in the community. I actually started giving talks around testing, so people kind of knew me, they trusted me. I was hanging out in their IRC channel; they’d ping me on IRC and be like, “Hey, can you help me set this up?" when it took five minutes. Just like, “Put this repo here, and click the button”, and now you have matchable documentation on the internet.
[00:08:08.16] I think the other really big thing is that it was really well-designed from the start. Read the Docs was created by myself, but also Charles Leifer and Bobby Grace. Bobby was our designer, and he actually went on to become the design lead at Trello, and he’s done a bunch of other really amazing stuff. Having him have a really good design aesthetic, and putting your docs on Read the Docs, and you've got this really pretty theme - I think it was another huge driver of adoption, where it was like, “Hey, my docs look really ugly when I build them locally, but when I put them on this web service they look really pretty.”
That’s a really good point. I’m curious, when you were between two projects, two opportunities, and you showed up in Kansas and you checked it out, and you were like, “I can live here”, did you know that Django... Was Django’s background in coming from a newspaper and their interest in documentation and all these other aspects of a project - was that something that had appealed to you before you even moved there that you were aware of, or was it something that once you moved there, you just sort of fell into?
I definitely think it grew on me while I was there. Kind of the origin story and the founder myth, right? Like, “I was drawn to the newspaper by wanting to build more open access in the world”, but really, no. I was drawn by the technology. And then actually working in a newspaper and working with journalists...
And Lawrence, Kansas is this town of 100,000 people. Back in 2008 when I went there, it was like, they had a website that still trumped, or trumps any newspaper website in any major metropolitan area. They had a really amazing local events calendar, they had so much, so much amazing technology. I was really – the big reason I went there is I was drawn by the people. I’d been reading their blog posts, and there was this little group of people that I really respected, and getting to work with them was a huge part of it. I grew to really appreciate and understand the news industry while I worked there. The whole perfectionist with deadlines thing is very much a news-driven... It's like, "We have a huge room full of presses over there, they’re going to start printing at this time." It’s a very real production environment. There was just a lot of local coverage, and being able to have that credential when you would be talking to people and you’re like, “Oh, I work at the Journal-World”, they'd be like, "Ooh!" Feeling the power of a news organization within society first-hand is really, really cool.
My story with documentation it’s very similar. When I started Read the Docs, it was scratching my own itch - I was a programmer, I just wanted to solve a problem. But then, especially when we started creating the conference and got into that, I really started to appreciate the importance and the value of it, once I really became entrenched in that world. I was definitely, from the outside, not the documentation crusader who was trying to fix the documentation world at the start, but it transformed into something – obviously not that, but more akin to that, once I really learn and understand the problem, and I’m really able to think about it and understand it more deeply than when I started. I'm getting into the things that I’m doing and then actually understanding why they’re important.
[00:11:52.27] You mentioned really quickly Write the Docs, your conference. Could you tell us a little bit more about how that started, the early successes and what it’s turned into now?
Sure. It was back in 2012 and I was at a local café with a few people. In Portland we do these weekly coder meet-ups that are more social than getting anything done, and I was talking with Troy Howard, who has done a few other conferences like Node PDX, JSConf in China, and a couple other things. I was just lamenting that Read the Docs didn’t have a community. It’s like, we have a bunch of users, we have a bunch of people who use our software, but we don’t have the sense of community where they’re getting together, they’re doing test practices. Really, there was this general – nobody knows how to write documentation. This was a huge pain point for developers and nobody’s really talking about it. He was like, “Alright...” I was just complaining, and his answer’s just like, “Just start a conference.” That’s his answer for everything. I was like, “Yeah, I mean... Okay, I guess.” I kind of left it at that and it disappeared from my mind. Then, two weeks later he sends me an e-mail that was like, “Hey, I built a conference website for the conference we’re doing” and I was like, “Oh...” Because I had no interest in community organizing or conference/event organizing; I’d never done any of it. But I was like, “Alright, let’s see what happens...”
We built the website, and I went up and wrote a blog post and put it on Twitter, and then it hit the front page of Hacker News. We were envisioning this little 75-person Portland regional event, maybe from Seattle and San Francisco, but just about 75 people in an office on a Saturday or Sunday, or something; a free venue for a really cheap, free conference, but it hit the internet and it exploded. I think we got like two or three hundred sign-ups to our mailing list the first day. Everybody on Hacker News was like, “Oh, this is something that should exist! Why did this not exist?" So it really got more momentum behind it than we really were expecting.
The first year was a 200-person conference here in Portland, and we had people from all over the country, and a few from out of the country, who were in the States for other reasons, but swung by. That was four years ago, and then this past couple of weeks ago actually in Ma, we had our fourth year, which was 400 people. It was at the Crystal Ballroom here in Portland, which is a really, really amazing music venue, where The Grateful Dead, Willie Nelson, and a bunch of other people have played at. It’s always a trip to see your own little conference on the marquee.
We now have a European version in Prague, it’s going to be about 250 this year. It’s really starting to build this more global community of people that care about documentation. It’s something we really wanted to think more about once it really started to expand beyond Read the Docs users. It’s expanding to just beyond developers.
We had a bunch of people come first year who were tech writers, and it’s this whole community that I didn’t even know existed. From there, we're trying to keep it very cross-disciplinary, where there’s a lot of support-type people, that do support work. Now they have their own other set of conferences, very similar, like sub-conf and user-driven. There’s a couple of them, and it's in a very similar thing to Write the Docs, or doing support, where it’s like, “Hey, this is a part of the industry that’s not valued nearly as much as it should be, and we need to build this group of people, this community, this force that’s able to stand up and really make people think more deeply about this topic.
[00:15:51.02] That’s really where I see the conference today - it’s the documentation arm of the software world, or the constituency of people who care about documentation. Some of that’s tech writers, some of that’s developers, some of that’s support staff. A lot of people this year actually were devrel, evangelism-type people, because they’re starting to really see the value in documentation as well. I think about it as raising the profile of documentation within the software industry as a whole.
That’s really where we are now, just trying to build best practices, build out learning materials in an open source fashion, that’s free and on GitHub, that’s like, “Hey, you want to write documentation? That’s great! Here’s how to do it. Here’s where you could start. Here’s some good resources”, and really trying to help people along that path, because I think it’s something that a lot of people aren’t confident in. They don’t write documentation because they feel that they’re not good at it. I know I put off things that I’m not good at forever, right? “I’ll just do it tomorrow, because I’m not confident, I don’t know where to start.” We’re really trying to break down that barrier, where it’s like, “Hey, here’s how you start getting good documentation for your project, here’s how you maintain it, here’s how you develop the processes in your project, to make sure it stays up to date”, and that kind of stuff. That is the long answer to that question.
Take us back a little bit to the beginning, back in your day when you first got involved. What was the state of documentation like? What was it? How was it being culturally perceived? And then over the years, especially as you’ve been doing Write the Docs events, how has that been changing culturally?
So much of it is really hard to know because, because I'm sure I have my own little filter bubble. "My Twitter feed cares a lot about documentation..." I do sense a general trend in the software industry of caring more about documentation. Probably a small part of that is the work that we’ve done with the conferences in the community, but it’s also just a general, collective, raising of empathy. One of the things I really think a lot about in terms of documentation is the onboarding into software, right? Where it’s like, “Hey, you just went to this three-month boot camp, and now you have basic coding skills, but now you want to start using projects on your own or building something for a job application, or just trying to get involved in open source. And documentation is the first thing that you run into, right? If there’s not good docs, it really does dissuade a large number of those people. I think, especially as more and more people are coming in from non–beating their head against software routes to programming... There’s actually more formal education, more schooling, more industry trying to push that, the people that are coming into the industry value documentation more and more.
To answer the question, I think back in 2008, 2010, testing was undergoing this transformation. Maybe in 2005 testing was this thing that software - it might have, it might not, some people thought it was important, there was a lot of people talking about it, but it wasn’t this accepted best practice, and I think in 2016, pretty much every developer says, “Tests are good. We should be doing this, it’s an accepted best practice.”
I think documentation is undergoing a similar transformation, but just a few years later. I think you’re starting to see every open source project that gets announced will have documentation. If they actually want people to use it, it’ll have a reasonable set of documentation, whereas 2008, 2010 you'd have so many projects that were just released with a marketing page and a source code link, or something.
[00:19:59.23] I think that’s really my metric that I really think about, it's how many people look at documentation as one of the first one or two things to look at on a project to decide if they’ll use it or not. I think that's true for a number of people in the Python community - that has always been high, but I think also in the general programming world that number is growing. That’s one of the values that I think the Python community has had for a long time.
But we weren’t actually on the testing train as early, right? When I look back in history I see Ruby as really, really testing-focused and Python is really doc-focused, and now we’re both starting to merge into the other and get excited about... They’re both important parts of software development.
So I see more and more projects that care about documentation, of people talking about it, people actually writing it and caring about writing it. That’s really the metric, right? The number of projects with documentation that people are actually focusing on and putting time towards, and obviously that’s incredibly hard to determine at the Git Hub level, at least in my personal experience. When I click on a link in Hacker News, it's like, how prominent are the documentation links, and do they actually have more than two pages?
Well, we’re hitting time for our first break. We really enjoyed hearing all of your background experiences. When we come back from the break, we’ll dive deeper into the nuts and bolts of documentation.
We’re back from the break with Eric Holscher, who is the creator of Read the Docs. We’re just going to dive into the nuts and bolts of documentation. I thought we’d start just by making the case for anyone who might be listening to this and isn’t convinced that documentation is important, why does documentation matter? What are the practical benefits to a project maintainer and to the community?
Totally. Almost every talk that I give, I do a little five minutes at the beginning, because I think giving people the words - even if they’re already convinced, but using the words to convince others... I think it’s really important to have these arguments actually thought out. One of my favorite ones is for actual programmers, which is like a selfish appeal, right? Which is, if you’re using your code six months from now, it’s going to be indistinguishable from code someone else wrote.
I think about documentation as serializing your mental state into words, so that it can be loaded back in faster than reading source code. Reading source code is one way to put a program into your brain, but actually writing down your design decisions and code comments and doc strings allows you to basically reload what you did, what you were thinking, why you made these trade-offs in your brain in a faster way, and it allows other people to do that too, right? It’s useful for anyone who’s reading that code.
[00:23:53.29] In terms of project maintainers, I think documentation is the best marketing, right? I know a lot of developers who hear marketing shudder... It’s like one of those words that “Thou shall not say", but really, if you want people to use your software, they have to know what it does, they have to know how to install it, they have to know what it’s good for, they have to understand what the other competitors are. If you build that into your docs and you’re just like, “Hey, my web crawler does these things, it supports these types of – ignoring URLs and it follows robot sub-text. You could use Siege or Wget or Curl. Here’s the landscape.” Just providing that context for your project and its reason for existence is how you get people to actually use the software, right? It does what I want. It's going to work for me, it’s maintained, people care about it. It’s a huge part of the adoption of software.
That’s true for closed source code, as well as open source, especially if you’re in a larger company, right? You have six different divisions that are all basically writing the same software, and they’re not sharing anything, and you actually want to have other people use the software that you write, which is like one of the fundamental reasons that open source is so cool - having people use stuff that you’ve written, and within companies too, right? You have to document it so that it gets used, so people know that they need it, right? When you land on a GitHub page that has no readme, who uses that project?
Nobody uses that project.
I would fire the developer who used that project. I don’t want to work with the guy that uses that project, or the gal.
In fact, I’ve actually not put readme's on things as a sign to say please don’t use this yet. I’ll put readme on it when I want people to use it.
That’s great. I think one of my favorite appeals to everyone in software is that writing words is 80% of the job of software development. You have e-mails, you have commit messages, you have GitHub issues, you have chat, you have Slack, you have IRC, you have Twitter, you have your marketing content, you have your documentation, code comments, all this stuff is the written word, communicating with other humans. Writing documentation and becoming a better writer is a fundamental part of being a good software developer. Knowing how to communicate about technical topics, how to write about them, how to use your documentation tooling – that is a tool of the trade that you need to know how to use just as well as Git, or something like that. Having those writing skills is really just a fundamental part of being a good engineer, and being able to communicate with your team to build software.
That’s a great line.
I think there’s something similar here with the test-driven development, which is that you write a test for something so you could see how people use it. But then we’ve created so many of these test frameworks that you get so obfuscated from how people actually use it, whereas documentation - you really are just saying, “This is how you use it.” You’re trying to describe it simply, with English. If it sounds too complicated, you can actually rethink how you’re implementing that and how you’re going about it.
Right, and rethink without re-implementing. I get to re-architect the code without throwing away a bunch of work, right? That’s the beauty of test-driven development and the kind of readme-driven and documentation-driven development, it is really that thinking through the API that’s going to be public facing before you write the code. I love starting a project with a readme and having a code example that’s like, “These are the three most common public API calls that will be used in this library, and here’s what the interface looks like, and here’s how you import them", like really thinking through that public API. Because I find if I don’t do that, then the implementation leaks out into the public API, and really thinking about it top down, from what problems does it solve, what other solutions exits, how is this one different, and how do you use it, really informs the architecture of the code and allows you to write better code with better APIs, and it takes less time, right? You don’t have to go and refactor it once you’ve implemented it, because you realize that the public API takes some random object that nobody cares about. You can really think through the high-level usage of the system with documentation and testing with those code examples before actually writing any code.
[00:28:27.29] I’m curious for people who are writing documentation for different types of projects, do you find that the needs for documentation are different for different types of projects, different communities, if it’s a big project or a small project? At what point should they be making that investment into documentation? And how much do they really need to write?
That’s a lot of questions. [laughs]
Sorry. Basic question is, is documentation different for different types of projects?
Totally, totally. One of the themes in the writing world, in the conferences, is know your audience, right? That’s one of the things that’s always true about software and writing in general is who’s going to use it, right? The type of documentation that you write for a kernel module in C or C++is going to be very different than a Python library that has a command line interface. I think there’s definitely a point – we’re about to get into one of my favorite topics, it’s incredibly contentious. [laughs]
I agree that as projects grow, there are needs for documentation change, right? I know in the Node world particularly, there’s this small module philosophy (Unix philosophy) “Do one thing and do it well.” There are cultures that basically just write readme, right? As I’ve heard it expressed, and Mikeal could probably explain this better, but if you need more than a readme to explain this simple module, it’s probably too complex and it should be two modules. That’s one of the ways that I’ve heard that world view expressed, and in that world readmes are a great tool, in that development concept. But when you have something like Django or Rails, or these huge, multi-thousand line or multi-thousand file projects, you obviously need something much larger and much bigger.
This is actually a trap that I think a lot of people fall into with documentation, is that they start off and they’re like, “All right, we just need three or four pages, right? We need a support page, install page, couple other pages”, right? And it really doesn’t make sense to invest in a lot of documentation tooling or infrastructure or anything, but then if your project is actually successful, you start to grow out and it gains more functionality, and it gets bigger and bigger until your tools start to break at the pieces. Some of the stuff that you need when you have 50 or 75 different pages are very different than when you have 5 or 7 different pages.
When you just have a few pages on a website, Markdown is a wonderful tool and it works really well for that, but once you actually start to be documenting really large API references and a bunch of other inner-referenced code, that’s when something like AsciiDoc or reStructuredText or these more powerful languages combined with real documentation tooling like Sphinx or AsciiDoctor, start to make more sense. It’s a hard tradeoff, right? Because you don’t want to over-engineer it from the start, but you also don’t want to be having a tool that’s meant for seven pages once you have 700 pages, or whatever. I really do think just thinking about your goals for the project and what it’s going to look like over time, and making sure that your tool choices and the actual audiences that you’re writing for keep up with that.
[00:32:11.24] I guess the other facet of that is really building out documentation for specific audiences, like writing API documentation for people who are using it as a library, while also having the tutorials for people who are just coming in and want to figure out which project’s right for them and get started, and then topical guides for explaining where your project fits into the world, and talking about competitors and just the high-level concepts.
A lot of these are actually from Jacob KaplanMoss. He has this almost seminal work on documentation called "Writing Great Documentation" on his blog, and I think at this point it’s eight years old, but is still one of the de facto references, which shows you how fast documentation culture is moving in programming. A lot of his architecture and way of viewing documentation is what I was brought up in, and I learned how to document software in that style. Hopefully that answered your question.
Yeah, yeah. I think you hit on something really interesting there, which is that there are different types of learners, and they’re going to require different resources in order to learn. When you were mentioning the Node.js community – yes, there is this culture about, “Do one thing and do it well,” so every module that is really in use has a readme, and usually it’s a pretty good readme, but the documentation on how to put all those modules together is actually - it either doesn’t exist or is spread out in various blog posts and things like that.
People tend to find them through googling, but this is a general problem. I think that’s why there’s so many meet-up talks and conference talks about putting these things together. The boot camps have a really good curriculum about putting together Node stuff as well, because it is a hole and there’s not a central place to fill it, right? There’s a lot of decisions to make when you decide which of these components to put together and nobody wants to officially endorse a way a lot of the time. Whereas Django - if you decide to use Django, you’re making a decision to use this whole stack. That’s a really good place to build a great guide around it.
Totally. And there’s a recommended way this all fits together, and there are documented ways of using other tools, but no, everything else is now going to not integrate well. So there’s another interesting example there, which is Pyramid in the Python world which is... Django is like, “We’re going to build everything, and it’s going to be one big thing, and it’s going to integrate.” Pyramid is actually, “We’re going to take all of these best-of-class tools, combine them together and write some glue on top.”
Those are two very different worldviews about how to build software, but they change and inform how your documentation has to work, right? When you’re writing Django documentation, you can assume people are using the ORM, and the model structure, and the template language, and all of that stuff. Having all those higher level integration guides doesn’t make sense for Django, right? It's all already integrated, but if you’re in the Node ecosystem or doing something like Pyramid, then actually being like, “Alright, here’s how you integrate all these together, here's how we recommend this best practice and that kind of stuff." That’s really tricky, especially in Node I would imagine, where you don’t have that place or that project to do that. I would imagine over time there’ll be different sets of people who have different worldviews on how to put things together, and then they’ll start to build a set of resources and documentation around how they recommend doing things.
[00:35:51.11] In the Django world there’s a book called “Two Scoops of Django,” which is basically that as well, right? Even with this highly integrated ecosystem, there’s still a lot of different ways to do things, and that’s their best practice guide for how to put all this stuff together, and here’s the recommended way, and how that all works. That’s a really interesting documentation problem.
I’m curious, in light of there being different approaches and methodologies, how much you can automate? Because I think maybe in the greater good sense of there just needs to be clear documentation on how to use a project, and maybe some people care a lot about different methodologies or not. Can you just automate everything as much as possible, so that someone... I’m saying two things here, of teaching people that documentation matters and helping share best practices, but then how much of it can you automate for people who don’t care, but the world still needs to know how to use their project? Does that make sense? Am I just going off the rails here?
No, no. One of the values behind why we created Read the Docs was this intuition that every decision that you have to make along the path of doing something - it’s like a marketing funnel or a sales funnel, right? Each step you lose people, right? So you have to be like, “I’m gonna sit down and write documentation. Alright, what tool am I going to use? Oh, alright, where am I going to – how am I going to write it? Alright, what am I going to write? Oh, alright, where am I going to host it? Oh but then I have to make an Amazon account and put it on S3." Each step adds complexity, right?
That was the view of Read the Docs. It's like a well-paved path towards documentation, right? You use this tool set, you host it here, you build it in this format, here’s the guide... And the part that’s always been missing is the actual what to write, and how to write it, and the actual act. We’re really good as programmers with building these tools and these things around the real meat, but at the end of the day you still have to sit down and write the thing. I think that’s really the hard part, because it doesn’t matter how much you automate, you still have to convey the information. But I think you can really standardize on a set of tools and a set of processes that remove the distractions of tooling, and allow people to actually write, and know what they need to write.
Yeah. I mean, I think you are still definitely talking about writing documentation as an act of writing how to use the software, rather than the documentation being embedded in the code and auto-generated that way. I have a fairly low opinion of that kind of documentation. I don’t know what your thoughts are there.
Yeah, I agree. The Javadoc world, right, where it’s just like, “Here’s a alphabetically listed set of classes in your software.” That’s great for a very specific use case. If I already know your code works, and I just want to know the arguments to this function, and for some reason I’m not looking at the source code... And proprietary code, that's super valuable, where you can’t see the source code, and you’re like, “Here’s the signature for this method.” I actually really needed that. For open source, it makes less sense.
This is actually one of the things that I think Sphinx did really well, is it allows you to intersperse prose content with auto-generated content. You’ll see this in the Django documentation, you’ll see it in a bunch of other Python world things, where once you put the documentation in your repository - so you have a docs directory and a code directory - you’re able to magically pull that code in the doc strings and comments into your documentation. But it doesn’t have to be one big auto-generated alphabetic listing of classes. It actually allows you to basically say "In a reStructuredText file I’m going to write a bunch of words, and then I'm gonna pull in the auto-generated documentation for this method as part of that prose content." That allows you to contextualize and add value on top of just pure reference, but it also magically stays up to date with the source code.
[00:40:15.07] So when the definition of the program or of the function changes, you don’t have to go back and update every piece of code that’s referencing that function. You can actually pull that in dynamically, and then you’re able to mix prose content with live source code content that is always up to date. I’ve seen a few different ways of doing this, and that’s the best that I’ve seen in terms of building a cohesive, reasonable narrative where you’re actually communicating with humans and also pulling stuff out of the source code so that it’s always up to date, and you’re still getting value out of your doc strings and your code comments.
I think humans, in general, are the theme of this podcast, probably. Let’s talk about the people that are part of it.
Yeah. I’m really interested – when we had Jan Lehnardt on, we talked a lot about contributor funnels, and getting people involved in a project and having kind of a ladder. We did talk a bit about documentation and working on docs as being a great first step to get involved in those projects. I’m wondering if there’s any kind of tension between that and this professionalization of documentation that you’ve been working on. A lot of what Write the Docs is doing is really establishing that this is a core skill that you can have and people can get very, very good at just this one thing. If you’re professionalizing it, while at the same time saying that it’s a good first contribution, how do you tease that apart?
Interesting. Well, I mean, I think so many times when I hear about documentation as a contribution, it’s the whole beginner-mind argument, where it’s like, “Hey, new person. You’re able to explain stuff to me, the expert, that I’ve already forgotten about and integrate it into the abstractions that exist in my mind, right?” I think that’s one of the really big ways that onboarding people through documentation to contribution is like, “Hey, we value beginners. You have a different perspective. Yes, we have experts in writing code, but all of the people who are veterans of the project have a completely different worldview and understanding.” And like, “Oh, that guide that we wrote on why this project exists in this space makes no sense if you don’t know what the space is”, or things like that.
I don’t know if Write the Docs is a hundred percent professionalizing it, it’s more just saying it’s valuable and it’s a skill that we all need to have. Yes, there’s people whose job it is to write, but every developer, their job is also to write. Regardless of if you’re doing open source work or something else, being able to contribute documentation to a project really does increase your skill as a developer; I think that’s another way.
I always see documentation framed as a non-code contribution, which is just really weird, like othering of anything but code, it’s like NoSQL - we’re defining what we’re doing in the negative, right? Non-code contributions are not somehow lesser than code contributions. They’re contributions to the project, and the fact that we even have to call them that shows a broken culture. But I think that starting to value those – GitHub’s little activity tracker was only counting code comments, and they just updated it to include other things, but I think there’s much larger cultural things around not valuing contributions that aren't code nearly as much. I think that’s super hard to change, but it’s slowly starting to.
[00:44:13.25] There’s an ongoing theme here of the things that you value in your community are the things that people show up to do. That’s how you get contributors to actually value those kinds of skill sets.
We’re starting to head into time for a break right now. We’ll return shortly with Eric Holscher and we’re going to get a little bit more deep on getting user feedback around the documentation.
Alright, we’re back with Eric Holscher, creator of Read the Docs and Write the Docs. So Eric, we talked about valuing documentation and valuing documentation skill sets. Are there some really specific things that you do, or that you’ve seen work for signaling that you care about that documentation and building a community around it?
Yeah, so I realize my background is really Python-influenced, but I think Django has done those the best of anywhere I’ve seen. They have multiple core contributors to the project who have come in through documentation contributions. One of the big things that they did is they basically require documentation, along with tests, for every piece of code that gets merged into the project.
So you’re signaling that test and documentation are just as important as written code when we’re thinking about deploying features, right? If you put this in the codebase and nobody knows it’s there, because it’s not documented, it means that it’s not a complete pull request, it’s not a complete feature.
One of the really other interesting things that Django does as well is they have a policy of, if something is not documented, then it’s not supported. So if you start using features and they’re not in the documentation, that’s kind of a implicit gesture, or a implicit acknowledgement that it’s not documented. They are actually treating documentation as the canonical source of release maintenance and supportability over time.
So it has more influence than just being words about code, which is the canonical repository of the project, and the thing that really matters, right? The documentation is viewed as its own product, that has its own value independently, as a broader open source thing. Having tags in your issue tracker for documentation needed, or easy ways – saying it in your readme, like, “If you would like to contribute, here are some open issues that need fixing in the code, and here are some open issues in the tests and the docs that need to be improved on”, and really just providing that on ramp. And giving people commit access and core developer status for writing documentation.
[00:47:51.17] Django has a design BDFL, they have a documentation lead, and a design lead. There’s code leads as well, but there’s actually management structure within the project that shows that they value these things. I think that’s really the thing - there’s so many implicit signals that come from caring about something, and it’s really easy... Like, "How big is the page in the design of the landing page of the site? Do the links actually go to documentation or do they go to blog posts from third parties? Or do they go to rendered Markdown files in a GitHub repo somewhere? Or are they actually branded with the real project branding on the site, integrated and kept up to date and all this kind of stuff?"
Yeah, we did this in the Node .js project too, when we liberalized commit access. We started giving commitments for just solely documentation, and one of the really noticeable things is that retention was really high with a lot of those people. A lot of times when people show up to casually contribute to documentation it’s like, “Oh, I noticed this problem and then I fixed it", and then they kind of go away forever. But when we started actually onboarding them into becoming a committer, they stuck around quite a bit more. We tried to do that whole, “If it’s not documented it’s not supported” thing, but too many people started relying on undocumented features and when we broke them, they got very angry, so we had to back off of that.
Yeah, it’s a cultural thing. You have to ease that in, right? Something else I’d be really, really curious about is the diversity of the people coming in through documentation contributions versus code my guess would be is higher.
Yeah, yeah, yeah. The first woman given a commitment in the project started with documentation. Actually, technically I think she started writing up the blog post for the evangelism working group, and then became a committer on the website, and then started working on documentation on the core and got a commit, and then now she is actually doing some code work in the core.
Yeah, I think that’s so important. If you don’t want the same group of people working on the thing, you have to find new ways to bring them in. I think documentation, the Write the Docs in general is basically gender neutral; it’s 50/50... Every year we’ve had 50/50 speakers. The entire industry, as far as I can tell, is representative of national averages, in terms of gender. Of course, there’s other diversity things that we need to deal with, but I think so many people fell out of development into these auxiliary, support, writing, design, UX, and bringing those people in your communities is going to increase diversity just because of the structural issues in the industry, right?
Right, right. And also, I mean, giving them commitments and bringing them into leadership is really important as well, right?
Yeah, because you need those voices around. Especially, just generally, people who are thinking more about the end user experience of using it than having their head in the code implementation. They need to be enabled with the same kind of voting privileges and the direction of the project.
Yeah. The emphatic love-bond that happens to me at the conference every year because technical writers are so much -- they’re real humans who communicate. They’re, I would say, way above average in the empathy scales, and not all developers necessarily are. It’s attracting a different type of skills and a different type of person, and I think it’s super interesting to see these other communities around software, and where a lot of the more diverse folks have landed because they felt excluded or just – getting into software is really hard. And there are structural issues, right? It’s not just hard.
I guess from that I’m curious how we incentivize documentation and other non-code contributions, and actually recruit that talent and reach out. Do you have to look outside the communities that are actively contributing right now?
[00:52:11.09] Yeah, so that’s slightly tangential, but it's one of the things that I’m trying to do this year with the Write the Docs world, is to have a stable of speakers who are able to go to other events, because it felt – I’m worried that we’re getting a little echo chamber-y, where it’s like, “Hey we're just a bunch of people that like docs, talking about docs.” The goal here is to build a community and then push out, and that’s really my goal for the next few years - using this base and then starting to evangelize out.
I really want to put together a set of speakers who, it’s like, “Hey, we want a documentation talk at our conference. I think PyCon last year, I gave them - for not having any documentation talks... It's like, “Hey, you’re PyCon, how is this not happening?” I'm really trying to be able to influence the conferences, because obviously my worldview sees the conferences as influential. I do conferences, it’s an obvious place to start, and that I have a set of speakers who are really good at talking about documentation. So yeah, just trying to get that out and then starting to cross-pollinate. If we have an open source project that wants contributors for documentation, it’s like, “Hey we have a list of people that are interested in open source and documentation.”
I think one of the other big structural issues is that developers get value out of open source contributions that they do for free in hiring and career advancement, but I think other professions don’t have nearly the representative value, right? If I am a technical writer and I contribute to Django’s documentation or something, that’s not necessarily going to be a resume item in an interview question in nearly the same way that as a programmer I would have that.
I think that’s one of the other things, is trying to figure out how we increase the value to non-programmers who are working too on open source projects, and that’s one I don’t know how to solve.
The conference talks are a really good idea, actually. We’ve been doing that a little bit in the Node project around liberal contribution agreements and open governance, because we want that to persist out there. Essentially, what we’re really doing is just talking about, "Look at the level of success that we’ve had with these policies and connecting that to problems that all of these other projects have, like attracting more contributors and retaining people and stuff like that." Obviously, you’ve got a lot of success that you can talk about at conferences and have other people talk about it.
Right, and I think viewing these open source communities as really crazy incubators of ideas, and once we discover something and find something out, we really have to go and share it. I think – we haven’t talked about this much, but Nadia’s work in open source sustainability is the same way. It’s like, “Hey, all these different people, you have Ruby together, and we’re doing some stuff with Read the Docs," and there’s a bunch of other different funding initiatives and different ways of viewing sustainability and it’s like, once you find something that works, people don’t automatically know about it, right? You have to really do that work to go and talk about it, and you really need to get out there and say, “Hey folks, we made money! We’re sustainable! This is how we did it, and you can do it too.”
I think that’s the same way. It’s like, “Hey, we built this really popular open source project, the docs were amazing, and here’s the process that we followed. You can do that too! Here’s how we got more contributors.”
[00:55:36.13] There’s so many different things. PyCon does this with its diversity outreach, where it’s like, “Hey everybody, We have 40% female speakers at a tech conference, and five years ago it was 3%, and here’s how we did it. You can do it, too.” Spreading those messages... I think that so much of the value of the community that we’re building is these experimental areas to play with stuff. We have to talk about it once we have success and really spread it out, make it bigger.
Totally, and you’re the epitome of this. I think you’ve experimented with more things than any other project I know. Just for people who aren’t familiar, off the top of my head, you went through a start-up accelerator, which is actually how I first met you. You got a grant from Mozilla, you have enterprise clients who pay for stuff, you’ve been monetizing through Write the Docs and conference, I assume, crowdfunding your life, and ad space on Read the Docs. You’ve tried literally everything, and I’ve literally appreciated how transparent you’ve been in documenting that stuff so that other people can learn about it. It was really nice, even for me, when people say, “What’s a project that’s tried whatever?”, I could be like, "Oh, Read the Docs has tried everything. Go look at what they did."
I don’t know if it’s a compliment to have tried everything, because that means obviously nothing worked. [laughter]
Maybe, maybe... Which is also a great question and something that I’ve wondered about. I know that you and I have talked a little bit about the project management side of things like this, and Read the Docs is maybe more unusual in the sense of, you’re a service and a platform. I’m very curious to hear Mikeal’s take on this, because I know we’ve also talked about this - there are types of contributions where you can incentivize people to casually contribute to as volunteers, to do that in their spare time. Stuff like project management requires deep familiarity with the project. You can’t just jump in and help manage Read the Docs, but it also doesn’t involve code, it doesn’t involve a salary, there’s no money in it, so it’s really hard to recruit a contributor for something like that.
Yeah, and it involves an enormous time commitment, right? And all of those benefits that you were talking about earlier, like employers wanting you and things like that - project managers don’t get recruited that way by looking at the open source projects.
“Look at this issue that I wrote, it’s so good!” [laughter]
Yeah, exactly. How do you incentivize that? Besides just giving people salaries.
So this is what Django does, right? Django has - I’m not sure what they call it, but they have the one paid staff...
...that's like the project manager. And they were like, “This is not a volunteer thing, and every release we have a long tier who does it, and they burn themselves out doing it.” Because it is a lot of work. It’s like the release manager, project manager role - it’s incredibly stressful, it’s really hard, and it’s very thankless.
That’s a really, really tricky one, and this is actually where we’ve settled with Read the Docs, as well. We are a service, people use us, it’s free on the internet. We are also open source. That last part is the least important for most people. They’re like, “You’re a website that I use. You host my documentation. My code has to be open source, but I don’t care about your code because you’re a web service.” We’ve actually had a really, really hard time getting contributions because one, it’s a substantial contribution. It’s a big website that has background processes, and the external dependencies, and it’s hard to set up, but also it’s a service, so people aren’t using it and they’re not scratching their own itch nearly the same way as building a library, or building a programming environment.
There are things that people could do to add features, but in reality that almost never happens. We have a very, very low rate of actual contribution. In terms of sustainability, being open source is actually a detriment to the sustainability of the project, because it limits our commercial potential. Something like GitHub is incredibly valuable, but then they’re closed source, so people pay them for GitHub enterprise.
[01:00:11.26] GitLab, doing some very, very interesting stuff here. I haven’t fully viewed how their business works, but it seems complicated and interesting. I think it’s a really frustrating thing where we have this open source thing, we’re supporting the open source community, but there’s no obvious way to sustain it. A lot of the previous models have failed or are not accessible, because we are open source. There’s very few open source services that exist that are sustainably funded that I can think of, right? There’s so many that are traditionally viewed as open source, like GitHub or BitBucket - these things that enable open source as a business model, but aren’t actually open source projects.
I think you hit it. You can’t actually incentivize some of these contributions. If you can't and you need them and you have to pay for them, then how do you get the money to pay for them? How do you tie the money to the benefit that these roles actually fulfill?
Yeah, I think that’s one of the big outstanding problems in open source, right? I think lots of people are trying to solve the problem with money, because it is... As you do more and more free work, eventually you realize, “Hey, maybe I should get paid for this.”
Obviously, we don’t have the time to talk about all the different things that we’ve done, but so much of... The reason that Read the Docs still exists and it’s open source and running is because I’ve been on call for six years for free, basically. It’s something that I really truly care about, and it’s a true labor of love. We’re starting to make above poverty wages through the different schemes that we’re working on, for having the classic private hosting model, and then consulting, contracting conferences and all that stuff, but it’s a huge struggle. Making a market rate salary is really nowhere in sight at this point, while being used by basically every major corporation that’s based in San Francisco.
Right, right. Being that we don’t have time to get into all of them, let’s talk about some of the things that didn’t work. Let’s talk about one of them that didn’t work, and that everybody thought would work, it was obvious, but actually didn’t, for whatever reason. I’m very interested in why these didn’t pan out.
I think the classic one in open source that is tried a million times and fails is the Red Hat model. It’s like, “Hey! Build this thing, and then open source it, and people will pay for support.” Basically, all we found is people ask for free support.
Yeah, yeah. There’s never going to be another Red Hat. That worked once, and it’s still working great, good for them; it’s not portable.
When they’re doing databases and operating systems, right? Your internal documentation server is not something you’re going to pay Red Hat prices for. We have a few support contracts with different people, but that was not a scalable model. Most people, they just see it’s open source, install it locally, yell at us when it doesn’t work, and then we just have this stream of sadness in our GitHub issue tracker of like people trying to use our product for free, yelling at us when it doesn’t work, and not paying us for support... Which is probably I think what most projects experience, right? People getting paid to yell at you, or not supporting your free code.
It almost makes it worse, because then people think that you are getting paid by somebody or compensated in some way, so they don’t event have the empathy for people that are maintaining a project in their spare time.
[01:03:54.06] My favorite is when at conferences people come up and are like, “Oh, we installed Read the Docs locally and it’s so good! We’re getting so much value out of it, what a great product.” And I’m like, “Oh cool. Have you ever contributed anything?” and they’re like, “Oh...” And I’m like, “Oh, do you want to contribute to us to support ongoing development?” and they’re like, “Oh, it already works, it pretty much does what we need it to do...”, and end of conversation. It is a fascinatingly frustrating thing.
Yeah, and that’s why we’re looking at advertising as the latest thing, because it's like, we’re a free large website on the internet and there’s exactly one business model that has been proven to work. What we’re trying to do is do advertising properly, in the style of the deck, where we don’t track users, we host everything, we don’t share data. We basically run newspaper advertising, right? We are building a newspaper advertising business on the internet, where it’s like "We’re going to put a thing on the page, and we think some people are going to look at it."
Really, there’s so much ad tech that's been built to try and understand this incredibly complex data that then gets schooled and tricked and ad fraud and all this stuff... And it's like, "Hey, what if we just put an image on the page? We think most people are going to look at it, and then you just pay us money and then it’ll work fine." That worked for hundreds of years. We’ll see...
We have a bunch of traffic, we have decent users... Rolling that out was really stressful because we were really worried about alienating people and making them upset, because people do really view their documentation as part of their product, but people have been really understanding about "Yes, you need to get paid. We value this service." So hopefully that’s kind of the latest thing, and hopefully we’ll find a way to make that work and be scalable.
Bringing it back to the Django roots I guess... Newspapers...
Well, that seems like a great point to stop, actually. It’s been really great talking with you.
We learned so much, thank you.
Yeah, definitely. Let’s do this again in like six months and I’m sure I’ll have some other hair-brained scheme to talk about. [laughter]
That would be great.
Alright, cool, and thanks to you all for having me on the podcast. I think it should be an awesome one.
Our transcripts are open source on GitHub. Improvements are welcome. 💚