Today we’re talking about uses for Go in the medical industry. Tim Stiles develops and maintains a Go package for synthetic biology and molecular biology called Poly. It has broad applications for biotech R&D, but also has very direct applications to medicine.
Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights
Square – Develop on the platform that sellers trust. There is a massive opportunity for developers to support Square sellers by building apps for today’s business needs. Learn more at changelog.com/square to dive into the docs, APIs, SDKs and to create your Square Developer account — tell them Changelog sent you.
Retool – The low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog
|Chapter Number||Chapter Start Time||Chapter Title|
|3||02:14||It's Go Time!|
|5||03:28||Tim's biotech & Go history|
|6||08:11||Biotech's lack of software tooling|
|7||10:41||How Tim ended up at Harvard Med|
|8||12:59||Tim's wife got him into biology|
|9||15:27||Ending up in a Garage Lab to learn biology|
|10||17:08||Poly - a Go package for DNA engineering|
|11||19:54||Applications of Poly|
|13||24:35||Poly's DNA simulation|
|14||28:47||The price of DNA modeling|
|15||30:47||The future of bioengineering|
|16||34:41||Is Go part of this future? Why?|
|17||36:13||Choosing Go for Poly|
|18||39:48||The bright future of bioengineering|
|19||45:28||Natalie as a biotech expert?|
|20||46:01||AI as a biotech expert?|
|22||51:54||Biotech is 10 years behind software trends|
|23||52:56||Tim's inverse law of software quality|
|24||54:52||Owning software is not like owning land|
|25||56:00||How can the community get involved?|
|28||1:04:57||Time to go!|
Play the audio to listen along while you enjoy the transcript. 🎧
Hey, Ian, how are you doing?
I’m doing great. How about you?
I am doing well. I’m excited about our episode today. We’re gonna talk about Go in biology and medicine. And our guest is Tim Stiles. Hi.
Hey, how’s it going?
Good. We are very excited that you’re here to talk with us about something that I personally did not do enough since high school.
Hah! This is great. I’ve taught a lot of people this; this is wonderful. I don’t have any slides, I don’t have any whiteboards, but I’ll try my best.
[laughs] So Tim, you are doing biology with Go. How did that happen? Why? Tell us everything.
So I guess I’ll start with the short story, and then you can start asking me “How did you even become a software engineer in biology?” But starting with Go was – I’ve been writing biology software for a while, and I’d come up on this new project where essentially it was a version control system for cell lines. So think of it as like Git for like a cell when you’re programming a cell to do a new thing, and you sequence the thing, and the cell may or may not do it, and you try to track all the changes you’ve made across these cell lines as you engineer them. And I actually went into the original Git code to learn this. And I’d written Git servers before, so for me, it was familiar territory. But the thing was that Python just - it’s a just-in-time compile language. It’s not really built for command line tools. And I was “Well, what is?” And it came down to Go, which I’d played around a little bit before in undergrad a few years before, and Rust, and pretty much my decision was “Well, Go’s about as fast, and much easier to use, so I’m gonna go with that.”
So that’s sort of how I started with that in biology. And then it spun into something more, because at that time I was getting tired of biotech software… Because it’s a hard topic; you have to know how to write software, and you need to know how to do biology, and the intersection is very, very small. And people will tell you this all across the industry, it’s like the thing where like – I was in a meetup last night in biotech software, and everyone’s “Why do the biologists not know how to code? Why do the programmers not know how to do biology?” I’m “Yo, I’m right here, guys. Don’t forget about me…” Because maybe one of a handful can do this.
And so what happened was I was getting ready to leave for tech, because I was burnt out on this lack of tooling… And a friend of mine who was working for a professor at Stanford, he called me on his drive down from Stanford to LA where his parents lived, because he was quitting his job at Stanford… He was only 20 at the time; he was quitting his job at Stanford, and he was just going back to live with his parents, so he started with this idea for a startup. And he’s “Hey, Tim, before I leave my job at Stanford, I need to take all of these genetic parts that people have designed, and I have to take them from the JSON we have internally and turn them back into GenBank, which is this weird, esoteric sequence data format that we’ve been using since 1970, and it’s like THE sequence data format. This is what the government ships.” The NIH and various agencies in Europe and Japan have this consortium of groups that - they have their own, I guess you could call it like a GitHub for DNA, or DNA parts, or sequences; it’s not quite the same, it’s a little bit more esoteric and bureaucratic… But it essentially functions in a similar way.
[00:06:06.21] And so there’s this really esoteric, designed in 1978, pre even XML format, that’s all whitespace-based, and he’s like, “Hey, I can’t convert this JSON to GenBank, and GenBank to JSON and back. I can’t do it.” I was like, “You’ve got to be kidding me. This is the world’s most common data interchange format, JSON, and the world’s most common DNA sequence format, GenBank, and you can’t find a reliable way between the two?” And he’s like, “No. I can’t.”
Wait, in any programming language, or in Go?
He was trying to do in Python, and the thing that was happening is that he kept getting this fatal error, on several sequences, where instead of going [unintelligible 00:06:40.29] like, you know, letting you just to handle the error yourself, it would just kill the whole run… And so the run would take forever, and you’d be like six hours in, and it’d be like “I don’t like this specific sequence. No.” Or “I don’t like this specific metadata.” No, I would kill the whole run. And at that time – he’s a much better programmer now. He’s a great programmer. He spent a lot of time developing this with me; he’s been more of the biologist side, but now he’s learned a lot of software engineering working with me… And it’s turned into this real project where… Like, I announced I made GenBank parser, and people were into it. I was like, “Hey, I’m done with this.” It took me three weeks of going through various poor forms of documentation. I found one European website that I think is official from the government, that had bad SSL certificates…
So it gave me warnings when I tried to click on it in Firefox… Like, “Don’t go to this bug-infested site from the government.” I was like, “Okay…” I’m like, “I got to it.” And that’s the only specification I could find for the file format. It took me three weeks to write this parser. It’s not like I haven’t written a parser before. I’ve written plenty of web scrapers in my lifetime. It’s not like it’s new to me. But it became this real thing where - I tell people often, for biotech it’s a big deal if you put it in JSON, because they have been handling data formats since before the public; think Usenet era, 1978. We don’t even know who made the most – maybe we can find out who came up with JSON, but GenBank, I’ve tried to figure out who came up with GenBank. I can’t. All I know is there’s some council of elders, in 1978 got together in the deserts of New Mexico and decided on this format. And there’s no record of who these people were. And we’ve all been living with the consequences ever since.
This is really mind-blowing. I had no idea… Wow.
It starts with the complexity of this – I guess you call this legacy data, or legacy code or software… Because bioinformatics has existed since before the web. We’ve definitely evolved a lot since the internet came about. But scientists had been using the internet before everyone else by at least like 15 years maybe… So there’s a lot to parse there. So the most recent file format that I’ve seen GenBank, the NIH-backed, the United States NIH-backed database use something called ASN1, which is like the precursor to XML. It’s like what XML was to JSON, but for XML. It’s super-weird. And someone asked me, “Have you ever written a parser for this?” I’m like, “I didn’t even know this was a thing. This is amazing. But also, no, I never want to do this.”
And so a lot of biotech software is limited by the fact that there’s these data formats that the government uses, or some repository uses, but everyone else is just sort of like - most people doing this work are scientists, they’re not software engineers. And there’s a big push right now for scientists to learn a little bit of software engineering, DevOps practices… I’ll make a post, like “This is what unit testing is. Have you ever heard of example tests? They’re really great in Go”, and they’ll be like “You saved my life. I can actually write my code.” It’s wild.
So there’s this whole discussion right now in the field of like “How do we write better software?” and I’ve found myself in the center of it, because I wanted to write this to make cool stuff. Like, it does have medical applications, but I’m thinking of stuff like flying seaweed, and [unintelligible 00:09:44.01] trees and all this other goofy stuff that I think of in my spare time, where I’m like “Someday I’m gonna need the software to do that.” But it just didn’t exist. And the tools that you mostly find are usually either old C or Python packages, or there are companies that rewrap it in a nice GUI and sell it to the sciences for a good price… And you know, they do that, but it’s not really a programmatic approach; you can’t do much with it other than drag and drop.
[00:10:10.28] So for example, DNA synthesis, which is what we use to make new DNA for certain genes that we’re trying to engineer, there’s only like two companies that have an API for that, out of like the dozens that exist. Most of the time, if you want to send something to them, you have to do like a drag and drop into their GUI. And I’ve talked to a lot of vendors about this, and it’s just like – yeah, there’s only two or three I know of that have an API. One of them you have to send your DNA in Excel format. They don’t take JSON. It’s Excel format. I know, wild. But yeah, we can get off this diatribe for now, but… Yeah, it feels a little wild.
So that’s the tech side of things. How’d you get into the biology side of things? I think before the show you said you had a degree in computer science, but how did the biology come in?
That’s true. So this is really funny… So back when I was a student, I first started as a mechanical engineering student, and then I switched to as a design student, and then I got really into robotics. I was playing around with 3D printers, and I was doing a lot of computer vision stuff… And that was like my jam for most of undergrad, was doing computer vision stuff. And at the time, the NSF here in the United States was giving out grants to undergraduates, this “Do computational bioinformatics biology research.” And so I was like, “Well, I like money, and I want to get my undergrad paid for”, so I found a couple of professors to work with, I tried a few things, and I kind of stuck to computer vision until I got this job that bridged out from undergrad, where I was working at Harvard Med as a research assistant doing computer vision for endothelial cell morphology, which is a fancy term for the cells that line your blood vessels, how they move and shape and morph as they respond to hypoxic cells. So if you have a cell that doesn’t have enough oxygen, it releases something called vascular endothelial growth factor, VEGF, which is a chemical that goes through all of the nice little fluids and tissues in your arm, or your body… And when it hits the endothelial cell, the endothelial cell goes, “Hey, that cell’s in trouble”, and it coordinates all the other nearby endothelial cells to shift and morph and change the rigidity of their cell membranes to sort of push their way towards this hypoxic cell to deliver oxygen. And the reason why this is important for study - I mean, obviously, it’s just cool, first off… But the reason why a lot of people like to fund it is this is a dynamic that’s very central to tumor growth in cancer research… Because what tumors do and what cancerous cells do is they put out this VEGF factor to an extreme; they’re just saying they’re constantly hypoxic, even though they’re not. They’re saying they’re constantly hypoxic, and recruiting all these endothelial cells to feed it more and more blood, essentially, to deliver oxygen to it, and let it grow rapidly.
So usually, the line of research there is “Can we find a way to mess with that to stymie tumor growth?” And the answer is, “Yeah, a little bit.” But cancer is a very complex disease, with lots of different ways of presenting itself, and it’s not the end-all there.
So I got started in that, but at the same time, my spouse, Ren - we were dating them, but now we’re married - she was working at a lab at Harvard Med under George Church, who was a famous synthetic biologist. He’s like in Wired Magazine… He’s a really nice guy. He says yes to everything. So if you have a synthetic biology startup, he’ll be on your board. I don’t know if he’ll have enough time for you, but he will.
And so she was working in his lab, doing self-aid programming, which is this concept… Have you ever heard of stem cells? Do you know what stem cells are?
So stem cells - the idea is that they’re – a self-aid program is the idea that you can turn a stem cell into another kind of cell, a blood cell, or a tissue cell, an eye cell… You know, a different cell. And you do this by introducing various chemicals to it, and there’s different stages. So you’re programming of some various enzyme, or hormone, or some odd factor, and it becomes a different cell. And the big thing that people have been trying to do is figure out how to take cells that have gone from stem cell to fibroblast cell, or skin cell, and then turn it back into a stem cell. Because if you can figure out how to do that, and culture those, it has a lot of applications for, say, tissue engineering, and replacing degenerated tissues, and things like that. It’s still highly – I wouldn’t say speculative, but it’s new. It’s like, we’ve been doing it for a while, maybe there’s a few therapies for it, but I’m actually not a medical guy; I am a little squeamish, which is hilarious to a lot of people, but it’s true. If I’m not familiar with it, I may get a little nauseous talking about it.
[00:14:26.26] But how I got in biology is that eventually doing this, walking home with my wife, we were in the same area, so I’d walk 50 minutes to her office, and I was on the way home, and we’d walk home together, and she’d be explaining her work to me… And she was mentioning this thing called plasmids, which we’re gonna talk about a little bit more later, but essentially, they’re these little circular tokens of DNA, these little – consider them like tiny functions, that contains like a gene, and maybe some things to promote the expression of this gene, and a couple other things, like maybe some sort of resistance to an antibiotic, which is useful in lab plating… It’s called plate selection. I can talk about that, too… But essentially, it contains like two or three genes, it’s like 10,000 base pairs, which is tiny compared to all of your genome. But they occur in nature and bacteria as a way of getting around the fact that they reproduce asexually, and they need some genetic variation… But we’ve figured out how to use them in other places to do sort of like small genetic testing. So they’re like tokens, but she would call them circular. And I was like, “Wait, when you project it, isn’t the helix kind of circular, when you look at it from a certain projection?” She’s like, “No, no, no, no, no, no, no, no.”
And so I was having a hard time – there was some problem with staining, where I couldn’t get quite the information I wanted out of the data I had. There was something going on there. It’s been so long since I looked at this, but essentially, I was like, “I need to be able to explain what colors I need, and how it works for staining.” And I can’t rely just on the wet lab biologist to kind of bridge this gap for me; it’s not going to work that way.
And so eventually, she got tired of explaining things to me, and very gently suggested I go to this garage lab in Somerville, Massachusetts called BOSS Lab, which has been around forever in some various way, shape, or form… And I showed up there, and I was like, “Hey, I need to learn biology for my work”, and they’re like “Yeah, we were gonna make a class, but we haven’t done that yet. So if you teach the class, we’ll teach you how to teach the class.” And so I spent, I’d say, like six weeks in this lab, nights and weekends, working on what’s called the heat shock transformation protocol to put these plasmid tokens in the bacteria, and kind of learn the basics of biology that way… And it was kind of like a world-class education in like a sparse environment, because the people I was messaging for help were PhD’s from Institute Pasteur, and UC Berkeley, and UW Madison… And there’s a software engineer who’s the VP of like engineering at DataRobot, and they’d all be giving me advice over Slack, like, “Oh this over here, this over here…” But I was also kind of on my own, because none of them had made this protocol before. And from there, I kind of just got too into it, kept getting contracts, and kept doing things, and eventually, I was kind of struggling and I wanted to get out of the field, and I just stumbled on something that worked. And now I make my living as a consultant, doing something that works, which is really great.
You have also built Poly, which is a Go package for DNA…
Yes, so for engineering DNA, which is very specifically – it’s very different from what a lot of other people do with DNA. A lot of times they’re looking for diagnostics, they’re looking for variants, they’re trying to assemble it into a whole genome for reference… What my package does is it tries to engineer DNA to be used for certain experiments and certain designs. And so what my package is mostly focused on is going from designing DNA as a concept into something that you can put into cells… Which is surprisingly unique. You’d think there’d be more of this, and there are; there’s a lot of companies that write this internally for themselves. Imagine like they’re writing their own React framework, and then not sharing it… And that’s what’s going on in almost every company that does this. They burn like millions of dollars doing this, and they don’t share back, in most cases. And so I wrote this as a thing where I was thinking, “I want to be doing this for the next 10 years, I want a really stable library, I want it to be really well tested, and I want it to have the features that I need to create cool stuff.”
[00:18:10.25] And so I started from sort of that principle, and it started with just parsers, which every bioinformatician will tell you that’s like the first thing they have to do, is they have to write parsers. It’s like the bane of our existence, is taking government data and putting it into JSON. I wish I was kidding. But after you get past that, then you get into stuff like, “Well, how do you manage having a hash ID for a circular sequence?” Because the string is circular; there’s no way of telling where it’s going to start officially. So it turns out there’s this old algorithm called [unintelligible 00:18:39.27] that given a certain sequence, a circular, it will always bring it to the deterministic point. And we use that to create hash IDs for these plasmids, because otherwise it would take an inordinate amount of time to figure this out. Because then you’re using like pairwise alignment algorithms… And these are not linear algorithms; they’re kind of gnarly, and not my specialty. People actually specialize in like string alignment algorithms for this specific stuff. [unintelligible 00:19:01.16] a professor at Harvard Med specializes in this, and his software runs the world quietly. He does not get enough credit for how much bioinformatics relies on his stuff. The practitioners know, but the people downstream have no idea that this one Harvard professor is keeping all of our – you know, if you’ve ever used 23andMe, Ancestry or anything else, or any genomic testing ever, your DNA has probably gone through one of his algorithms. He’s very prolific. And unlike myself – you know, I still have a small user community. I have some very hardcore users that have their own startups, that are raising money for it, and I have a handful of people that give me feedback, and I’ve even had large companies go, “Hey, we’re thinking about using this”, but I still feel like a small player in comparison to people who have come before me. Again, I wrote it because I want to engineer DNA, and I think it’s super-cool, and it was hard to do that with the tools available.
So I want to get into the nitty-gritty details of it, but first…
…can we talk about like high-level what it does? What kind of projects could you do with it? Are there projects you’ve used for it? …just to kind of get some context here.
Okay. The thing is, it’s a very vast library, despite how much I’ve underplayed it so far… So some real things are - I know that a couple of consultants have actually engineered microbes as a potential therapeutic, and I’ve done that as a consulting gig for a couple of YC companies that I probably shouldn’t name, so I will not. There’s another company that’s trying to automate DNA synthesis and cloning, which is actually a big(ish) business. There’s plenty of companies trying to do it, and he’s like a main contributor. They just closed a $3.5 million dollars – or are about to. I don’t know, actually, etc. But if you’re interested in like writing Go and doing synthetic biology, check out [unintelligible 00:20:42.28] in San Francisco. They’re really great. I keep my surfboard there. I can recommend them wholeheartedly.
But there are things that people could be doing with it, that they aren’t yet. One is designing primer tests, like the PCR tests, like the ones you’ve seen for COVID. There’s very explicit tooling to be able to do that, for designing primers, and looking at – I’m currently working on a project where I’m essentially taking all this GenBank data that I told you about, the antiquated GitHub of DNA, and I’m putting it into a graph database of all these little fun extras, and part of that is now that I could probably, hopefully, be able to design primer tests for almost every viral string of DNA we have… Which is really cool.
So there’s a lot of medical applications, there’s a lot of biomedical applications, which most people when they talk to me are very interested in, because it’s innately human. Everyone’s had a medical condition, or knows someone who has a medical condition. And one of the examples that I should have started with - I don’t know why I didn’t - is… If you’re a diabetic, or you know anyone diabetic, they are taking insulin that was produced in yeast. Not the classic way, which was originally derived from pigs. But in the 1970s, I believe it was Eli Lilly engineered yeast to produce human-compatible insulin.
[00:21:58.19] And this is how synthetic biology works, at least at the early stages, for a lot of companies, is they engineer yeast, or E. coli; not the dangerous kind, not the Chipotle [unintelligible 00:22:04.13] kind, but like lab-safe, like difference between like a [unintelligible 00:22:07.18] and like a domestic cow sort of difference… And they engineer it to produce some molecule, some chemical, some drug, some valuable -esque thing, and then they bring it to a contract brewer in Wisconsin, and they go, “Hey, can you take this strain and brew a ton of it, so we can extract the thing of value?” And so that’s how insulin has been produced since the ‘70s, is this way. And it’s actually part of the reason why I think it’s so criminal how expensive it is, because it’s actually really cheap to produce.
But the thing is that this is a technology that’s been around for longer than I’ve been alive. It’s been around for 40 or 50 years, but it’s sort of become this new thing in the public eye of, “Oh, I can do it, too.” It’s not just the realm of Monsanto, and Eli Lilly, and pharmaceutical companies; it’s now getting to the realm of people like me, who have created this library that companies use. And I wrote this in the comfort of my living room. I haven’t had access to a lab in too long. I want to build my own lab again. But it’s at the point where now anyone with a computer can contribute in some way, even if they don’t quite understand biology.
Sorry, I’m processing for a second there…
Yeah, yeah, sorry.
Yeah. A lot of inflammation. Super-interesting.
So that kind of made sense… [laughter] To my limited mind. So what does it do? Are you like running simulations? Is it statistical modeling? Is it like how DNA reproduce?
Not statistical modeling, it’s more construction.
So think of it this way… If you need to engineer a DNA construct, like these little plasmid tokens I’ve told you about, these little, essentially swappable functions we’d like to put into different cell types, you have to construct that. They exist in nature’s bacteria, but for our purposes, we need to make sure that it’s human-compatible, that it will express the gene, and all this other stuff. And some people, they take essentially templates where they just swap out the gene for a different thing, and it’ll express in the organism choice… But my software makes it easy to design thousands of these at a time. And traditionally, the way that most people do this is they design it one at a time, with like a CAD-like GUI software.
Say you want to run a whole DNA synthesis operation - my software does that end-to-end. You can design the software, you can design the construct of interest, you can simulate how you’re going to construct it, you’re going to simulate these special ways of printing DNA… So DNA synthesis is a little funny, in that there’s some sequences we can’t print, but we can print analog sequences that are similar enough it doesn’t matter… But we have to know that. And so my software, for example, fixes that little quirk; it optimizes to be expressed in the cell, because different strings of DNA will be expressed differently, in different organisms. Like, they’ll express in both, but at different rates, depending on like usages of what we call codons, which… I should have just started with a lecture on the Central Dogma, but you can look it up at home. It’s called “The Central Dogma of Biology”, if you’re listening now. It’s like the core tenet of the biology you need to understand this.
[00:26:15.03] So essentially, my focus was just on making it so you could engineer a DNA string to do what you want, and to express faithfully, and you wouldn’t have problems when you send it to a DNA synthesis vendor, for them to go like, “Oh, it’s not the right sequence.” And so you could do it at scale, instead of like one at a time, via one of these many CAD-like tools; you could engineer thousands at a time, which is what a lot of scientists do. I’m not sure how they do it; I think they’re mostly also writing Python, which is not fun for them, because they’re writing a lot of this from scratch… But that’s like the method of production here, is you either use CAD software and you write one or two, or maybe you can get more out of it, or you write a script that generates thousands of primers for, say, PCR tests that you’re trying to develop, or thousands of variants of a plasmid, these little tokens I’ve been talking about, that will express something for an experiment. Or you have to do it one at a time.
And so my software is really the only software out there that’s fast and stable; it can do this reliably with high unit test coverage. There are some - I would call them predecessors, that are really impressive. And there’s also really impressive GUI tools that you’ll do this one at a time sort of thing too, that are open source.
My project is like the first that makes it really possible to do this at a scale where you’re not looking at bad compute bills… I mean, this is the thing… Like, synthetic biology shouldn’t have high compute bills. It’s not like we’re looking at whole genomes all the time, unless you’re working on human-specific stuff, which not everyone in synthetic biology is. A lot of people are, obviously, because that’s where a lot of the money is, but the strings you’re working with are on the order of like 10,000 - kilobase pairs is what we call it, instead of kilobytes, which is [unintelligible 00:27:47.22] to like 4 million, I think is what E. coli can be, and like 10 million maybe for yeast… I forget. Some biologist out there, if they’re listening, is probably screaming at me for getting this number wrong. But it also varies between these strains. I’m running out of words, so I think I’ll let you guide the topic to something…
Yeah… Could I repeat that back to you and just see if –
Yeah, of course. Yeah, this is a biology lesson for everybody. I wish I was talking more about Go and my practices there… I should have done that too, but yeah…
We’ll do Go right after this, I promise.
So basically, someone has engineered this like plasmid that they want to be able to produce, right? …to do whatever X thing, make a new protein, or whatever.
Yup. They’ve literally just whiteboarded it. It’s a concept on like a napkin.
Okay. Yeah. And so they take your software - or this is one thing it can do at least, is run simulations to try to find actual DNA that we can produce… Like, synthesize in the real world. But we don’t have to touch the real world, we can just simulate it, and yours can be like, “Yeah, we can make this.”
Okay, that’s cool. Okay.
It’s really valuable, because each one of those plasmids I’ve been talking about, the price is always going down for DNA synthesis, but right now, it’s not uncommon to hear of like a $500 to $700 bill per plasmid. It’s not uncommon. I have a customer who I’m working with right now, and they make DNA libraries that every time they send out an order, it’s $20,000 to $40,000. So for them, it’s worth it if they have a little extra confidence that instead of getting their grad student or whoever they – a lot of scientists don’t know how to hire programmers; we can get into that dynamic too a little bit. It’s a really fascinating one. But essentially, instead of having to worry about [unintelligible 00:29:24.19] and maybe blowing $30,000 out of his research budget, he just comes to me and says, “Hey, can you write a pipeline for this out of the software that you wrote?” And I go, “Yeah, I can do that.” And it’s cheaper than what it would cost him hiring his own people, and it’s just a nice deal all around, and he gets the feel like, “Oh, this is secure. There’s testing behind this.”
Because a lot of scientists - there’s a lot of prayer. I mean, if you ask a scientists what they do before they send out a sample to be sequenced, or go through [unintelligible 00:29:56.25] there’s this false dichotomy between religion and science, but a lot of them, even if they aren’t religious, do a little prayer, like “I hope, I hope this works…”
[00:30:10.13] Because scientists are engineers; it’s a difference in practice. Scientists are trying to discover something new, and they’re doing it like the startuppy, like “Gotta do as fast as possible, plant the flag, get the research, get the data, boom!” And engineers, we can be a little slower, we can be a little bit more pragmatic, we can make a little bit more stable sort of maintenance-conscious decisions. And I think that’s sort of what my role is with scientists, is I make these maintenance-conscious decisions where they can go, “Oh, that’s a good library. I can really trust it, I can really believe in it. I know that Tim isn’t trying to run as fast as possible towards this goal; he’s trying to build for the long-term.”
That is really interesting. We talked about what it is, what led to that, what it is doing now… Tell us about the future of bioengineering as you see it.
So this is a conversation I come into a lot. And actually, I have friends who - we are like polar opposites. It’s one of those things where we both agree where the future’s going, but we disagree on how the future’s gonna get there, and what the practice of it is going to look like. So for me, the practice of it in my mind is, you know, software engineers who have a decent enough biological experience, it’s easy to learn, hard to master, all the theory behind it. It’s a lot of little tiny variables, which is - computer scientists and engineers, we know; we have lots of tiny little variables all over our code. We’re aware of this. We’re gonna have to keep track of lots of little tiny details. But there’s this whole breadth of research, and every once in a while I get dragged into a field that’s adjacent, that I know nothing – like, I had to learn about immunology recently. Immunology is like its own thing, where it’s heavily related to like genetic engineering practice, especially with these new immunotherapies we’re doing for cancer, and stuff. But I had to go and learn about immunology as like a biological basis in like human medicine.
But the thing is that we’re gonna get to a point where hopefully there’s enough software engineers with biology experience that can talk to these people that have this experience in more pointed, niche parts of the field, like immunology, or plant physiology, or these different biological topics, where they can work together and some will say, “Hey, I need to go probe this.” And they’ll talk to the engineer, and the engineer will get it enough that they can work with them. And that’s what I would like to see, is two scientists with like 20 software engineers, writing some awesome code, doing stuff.
And then my friend - he’s entirely the opposite. He thinks we should have a few software engineers writing - not GUIs, but kind of like that; you know, report generators maybe. And there’s like 20 scientist analyzing them. We have entirely different outlooks on how it’s going to be. So what I’m betting on is that in the future we’re going to have a lot of software engineers coming from languages like Go, and Rust, coming into biology and sort of speeding things up.
One of the things that’s fun is that every time I rewrite an algorithm from Python into Go, it’s like 25 times faster by default. We have the world’s fastest DNA synthesis-fixing function by several orders of magnitude, which is kind of nuts. Like, we weren’t aiming for that. That’s just kind of how it happened.
So there’s a lot of work to be done there, and I think the real impetus for most software engineers is the biology is intimidating. And I think the one thing if there’s a software engineer listening to this right now who’s interested in biology - don’t be so intimidated by the biology. Seek to learn it and be humble when you ask questions to scientists, because they’ve gotten a lot of physicists and software engineers over the years coming into the field and acting cocky because they see this code that they’ve been working with, and they’re like, “Oh, these guys don’t know how to code. They don’t know nothing.” It’s like, “No, they know something. It’s just they’re not software engineers.” And you’ve really got to listen to them. You’ve got to be humble; you can’t come in with this attitude that you’re going to solve all their problems, because you’re not.
My favorite is – do you guys know the three-body problem? …like, this classic problem of physics that’s been around forever…
…where essentially we don’t have the math to model three bodies orbiting each other, like all related to each other; we just don’t have the math for it. And so that’s what the protein folding problem is. That’s why everyone gets excited when there’s a new – I mean, people here probably have heard of AlphaFold here, maybe; maybe you haven’t. But every time AlphaFold comes up with something new, it gets really exciting, because they’re fighting the three-body problem with huge gobs of data and models… But I’ve had mathematicians and software engineers go [unintelligible 00:34:06.15] “Oh, that? That’s a simple heat model” and spend like 45 minutes trying to explain to five biologists why they’re wrong, and it’s easy. I’m like, “No, we all know it’s – dude, we took freshman physics, too. We know what the three-body problem is. Like, stop.”
[00:34:20.27] That’s the one piece of advice, if there’s a software engineer who wants to get into like genetic engineering or DNA synthesis or anything out there, it’s just - remember to be humble, because you’ll learn something new literally every day, and you’ll be like “Wow, this should be some basic stuff.” But that’s kind of how it goes; this broad layer of basic stuff, and then you get into the really fine details.
So you mentioned Go, and Rust, and the future there… Do you think Go has like a prominent future there?
Yes, I do.
Usability is the biggest thing there. And also just the DevOps tools are amazing. There’s nothing that makes my life easier than writing in Go. First off, almost every tool, every DevOps tool is written in Go now. Almost every single one. For my project I regularly use – Gitpod’s a real favorite of mine personally, but…
That’s a German thing.
Yeah, no, they’re great.
I really love them. Oh, by the way, anyone listening out there, if you haven’t been to Gitpod’s Discord, go. They’re super-friendly. Just show them your work, with whatever you’re doing. They’re super-excited, supportive and ask a lot of questions. They’re really, really nice.
But when I got started, Go is like test-first; all the things you need… Like example tests are actually a great thing, where Python or a lot of other languages – like, maybe there’s a library for it, but in Go it’s just standard. Whenever I write an example test, it runs every time I’ve run my tests, and it’s an example that never does doc rot. The problem with doc rot, with a lot of scientific software is big, because people will write these huge things, the documentation, then they’ll change the API like two versions down, and then they won’t change the documentation.
And so I think the fact that Go has all these opinions, which if you’re like a – I guess you would call it like a gray beard or white beard software engineer… Like, maybe you have different opinions, but if you’re just starting out and you don’t really know where you’re going DevOps-wise, Go’s defaults are beautiful; they work. And yeah, you may struggle with generics, which Go has obviously been working on, which I think scientists will have a little bit hard time going into typed languages… They don’t have that much of a hard time. But the thing that when I was originally considering writing Poly, I was looking at what could be compiled to a binary, because I thought it was gonna be more of a command line tool. Eventually, I was like “Wait, this doesn’t make any sense as a command line tool.” But originally, I thought it was gonna be more of a command line tool… So I was like “What compiles to a binary, what’s easy to learn, what’s fast, and what has like a good DevOps ecosystem?” And it pretty much came down to like Rust was a little faster, but Go won on everything else. And that’s kind of what I needed.
I don’t know if you’ve ever done like string manipulation in Rust, but it’s hard. It’s not something you want to teach the people coming from Python. And maybe there’s a good reason for that, but I did not personally enjoy it. So when I was testing those two, it came down to Go winning on almost everything. And I think it’s also just a great step from what would be considered these scripting languages, where the syntax is easy enough, the concepts are still there. I mean, yeah, you can get into concurrency and all this other stuff, but coming from R and Python, you have to learn that anyways. You’re not learning multi-threaded stuff, and what most scientists use for languages, which are – usually, most scientists are using Python, R, MATLAB and Giulia, which - actually, Giulia does have multi-threading, a lot of modern features. But Go just has a bigger breadth, and a bigger DevOps community, and a better community of people that use it, which actually makes it really easy to find odd functions that I need.
One of my contributors convinced me that we need to use [unintelligible 00:37:33.08] is the default, and so Go’s crypto library doesn’t have that yet, so some guy wrote a working implementation and we just used that. Lua didn’t have that, so my friend who wanted to implement it in Lua had to go and write BLAKE3 three by hand. He’s like, “I don’t understand this. It’s all a bunch of math and single-little letter variables.” I was like, “So now do you realize why I make you write all your variables as whole words?” He’s like –
[00:37:59.16] That’s something that – some of my contributors are like “Why do you make us do whole words?” I’m like, “You’ll thank me in six months.” And Go has all these great tools for detecting like data race conditions… And that’s something I learned recently. I’m still learning various things… There’s a tool where you can find unnecessary conversions, and there’s code coverage tools that are just native, and it’s all easy to throw into like a GitHub Action, or any other CI/CD thing you’re doing. And so for me, it’s this thing where there’s all these wheels that are built-in, then this community, people that really care about how usable and shippable their code is… Which makes it just so much easier to integrate their stuff into mine.
So a lot of scientists, they heavily rely on Docker and Jupyter Notebooks. Like super, super, super heavily. And usually, like, “What do you do to containerize your software?” I was like “Well, Docker is also written in Go, so I just do what they do, which is ship binaries if it’s an application. [unintelligible 00:38:49.23] or CI/CD thing that puts out every architecture and operating system that it possibly can. And I think the only way – like, if you find a system that can’t run that, you’re advanced enough that you can figure out how to package it yourself. Like that’s it. If I’m doing – was it Fedora, Ubuntu, Arch Linux – like, you have enough.
And so I just think as a programming language, it’s just really powerful and very suitable for people that are trying to build something stable, that may not have had that experiences, or – you know, people in biotech have this insecurity of feeling like they’re not real software engineers. That’s not necessarily true, it’s just sort of they’re working from this corpus of work that is, you know, as I’ve said earlier in the show, since the ‘70s. There’s a lot of legacy there that we’re still dealing with. And part of this project is kind of jumping over the legacy and going to something faster and easier.
The stuff I’ve written has mostly been for DNA engineering, but next I want to get into metabolic pathway engineering, and protein engineering. That’s sort of the whole reason I’ve gotten into this, is I want to make – proteins, for those listening, you’ve probably heard this in the bodybuilder terms, of “You need enough protein to build muscle mass.” But proteins, believe it or not, [unintelligible 00:40:07.29] They’re actually nanobots. Believe it or not, you are the gray goo. There’s all of these proteins that are coded in your DNA, that are expressed all the time, that do little functions, from moving DNA from one part of the cell to the other, to helping the cells split, that are super-specialized, and they even – like, have you ever seen like a bunch of magnets when you throw them on a table, and they kind of just [unintelligible 00:40:25.15] all together? That’s kind of how protein folding works, in a little bit of a way. They’re all like little molecular magnets folding together and doing this thing. So what I’d like to do is I’d like to make the nanobots. There’s a lot of really interesting applications for energy for, for plant lifecycles, for almost anything. I mean, they’re nanobots; what can you do with these?
And so I’m really excited about that, but I’m also excited about the macro stuff; the stuff that we haven’t touched on. Traditionally, with genetic engineering, the moneymakers are first pharma, then ag tech, and then other. And I’m really, really, really interested in the other part. I think that’s the coolest part. Like, can we make plants as infrastructure? You may have seen the Glowing Plants project, which is –
Yeah, like the mushroom bricks that you may have seen. Why can’t we just have the mushrooms grow to house size? Is there some sort of fundamental physical limit there, or have we just not tried hard enough yet? I saw a great clip on Twitter where someone was talking to a group of kids, like “So are you working on –” I think the kid asked “Can we make elephants fly?” And they’re like, “I don’t really know.” He’s like, “What, so you’re just gonna give up?” And it’s like “Yeah, why don’t we try?” I’m a little more squeamish about working with whole organisms, like big – the kind of organisms that vegans would not want to work with. I get – oh my God, mice work in biology is scarring. If anyone’s out there thinking about working in biology lab, just don’t do… Like, I hate to say this, because mouse work’s important, and it’s a big part of science… Europe actually tried to ban mouse work at one point; there was talk about it, and like a bunch of Nobel Prize winners were like “I’m really sorry, guys, but we can’t – like, that would be a huge cost to society. We can’t do it yet.”
[00:42:08.20] But with plant work, for the most part, there’s a lot more to be done there with like working in morphology, and like expressing different genes… You can actually grow plants that produce medicine, which people have been working on… My personal favorite is people really want to build infrastructure, like they want to have – you may have seen on Instagram these trees, where people have guided routes across rivers for generations, and it becomes a tree bridge… And it’s like, wouldn’t it be cool if we could just plant a tree and it was just like “Yeah, I want it to go over this little thing and make a bridge.” That’d be great. That’d be super-cool. And that’s like the sort of future utopia that a lot of people might feel they’re looking towards, is like the “other” stuff.
But we also know that pharmaceuticals is super-important, and it’s also a very meaningful thing, because we’ve all had families in hospitals before, so it’s something that we all respect. And you know, if you’re squeamish like me, I try to avoid it, but also, it’s important. And the ag tech stuff - Monsanto ruined the whole – so the reason why we call ourselves synthetic biologists instead of genetic engineers… Monsanto - their lawyers and PR team just really messed up. I don’t blame the scientists at all for what’s going on, but their PR and legal team - they really messed up. And that’s why we call ourselves synthetic biologists now, for everyone at home… Because we didn’t want to have the same flack, because we weren’t doing the same stuff.
So yeah, I think that’s the future of biotech, maybe… There’s a lot more to it. But you’ll see like a lot more – oh, specialized cancer therapy. Specialized gene therapy is another thing that is probably going to be in the near future. We’re already starting to have it… There’s a lot of immuno – essentially, what we call CAR T-cell programming, or reprogramming, where we take a sample of a patient… There’s probably someone in immunology and CAR T-cell therapy listening to this that’s like “You’re so wrong, Tim. Don’t get this wrong.” Essentially, what we do is you take the patient’s cells, we take their CAR T-cells, we reengineer them to target the cancer cells that we’ve also sequenced, like we do this in the lab, then we reintroduce these CAR T-cell to attack the cancerous cells, and not the healthy cells. Because one of the issues with cancer is essentially your immune system doesn’t know how to attack, how to specify, like, things in the body. I mean, there are some cancers where it figures it out, but the problem with cancer is that this is your own cells; they’re very close to what your regular healthy cells would be doing. So if your immune system would be like “Hey, that’s bad”, it’s kind of hard for it to pick up. And there’s a couple of reasons how like the adaptive immune system works, and why that has to be… But if we can take these people’s CAR T-cells, these immune cells out, and engineer them to attack cancer cells without attacking the – that’s the crux, that’s the dangerous part, is if you don’t engineer it right, it attacks the healthy cells, which is not good. But the idea being there’s already therapies that don’t involve chemotherapy, I’m pretty sure. Again, I’m not a medical expert, this is not medical advice. Please talk to your oncologist about any cancer-related needs.
Actually, that’s one of the big things about being a biologist, is you find yourself in this position where you’re explaining biology and they’re like “Hey, should I do that, like health-wise?” Like, “No, no, no. Go talk to a doctor. They have liability insurance, they’ve been trained in patient-related matters, they know what drugs do… I’m just the guy that does the biology, that gives them the tools to do that.” That’s the difference here.
I’ve gone to the doctor, I’m like, “My stomach hurts. Is there something wrong with my microbiome?” and they’re like “No, goofy, you have acid reflux disease. Take this [unintelligible 00:45:14.01] and stop eating spicy food after eight.” I was like “Okay, cool.” And they were right. And that’s the thing - again, when you’re being a biologist, you just have to be humble. There’s always going to be another expert in some specific niche that you’re working in.
Will I be one of those experts?
You can absolutely be one of those experts. Actually, that’s been a joke that I’ve had for –
Can AI - like, can artificial intelligence…?
Oh, AI. I thought you said “I, personally.” I said YOU can be –
But I didn’t even dream of going that direction… [laughs]
No, you can do it. I’m not kidding. I had one coworker that’s like “How long would it take me to be a master in alignment software?” I was like “Master? A long time. Second-best, maybe third-best? It depends on how fast you read, but it could be this year.” It really depends on what your focus is and what you’re doing. We all have our unique specialties.
[00:46:01.00] But AI is this weird thing where - you know, machine learning is so obvious; everyone in tech knows it’s obviously overhyped, right? It’s like this thing where now we’re having all these general models, which are awesome… I was writing basic genetic programming with generative models for a weird little Pokémon a decade ago, but now we have full artistic renderings of – for example, my Twitter banner is a flying seaweed monster. I had to tweak it a bunch, and be like”Flying Spaghetti Monster, but with seaweed, and some other stuff”, and eventually, I got it to do the thing. But we’re at that point where we have these generative models that - it’s been a long time since I’ve been in deep learning, so not only has this field changed a lot, but the terminology they use is changing a lot. Every five years they have to reinvent every term for linear algebra. I don’t know why they do this, but they do. It’s a weird thing.
I had a similar problem when I was first learning it like five years ago. I think it was “What’s a linear filter?” Isn’t this just a kernel? And then the PhD [unintelligible 00:46:53.03] “Yeah, we just do that… I’m sorry.” And that’s a similar thing. Now there’s all these different terms, I’m like, “Oh, is this basically a convolutional neural net?” They’re like “Yeah, but it’s different…” And I’m like, “Uhm, okay…”
So with machine learning and AI I think there’s a lot of opportunity to get around what have been these traditional problems in modeling, where we don’t have the math for it; we don’t have – like I said with the three-body problem, we don’t have the math. No one’s [unintelligible 00:47:16.13] model three bodies orbiting each other. It’s just, we haven’t done it yet. But David Baker, who is a professor at the University of Washington - his lab came up with this game called Fold It, where essentially they’d get all these molecular biology students in college, and maybe high school even, to play this game where – humans are actually pretty good at figuring out from like a basic string and some mechanical principles, like a 3D model of this DNA laid flat, how to fold it like origami into a shape that’s what we expect at the end. So they’d have the flat sequence, and then the expected sequence they’d get experimentally, and they’d have these students essentially be mechanical turks for fun, to train this model to figure out how proteins fold… Which is super-useful.
So in this case, there’s some real roadblocks, where machine learning is absolutely vital, and you do need machine learning for it. But there’s also a lot of people that do machine learning models, and then you’re like “Yeah, you didn’t need machine learning for that.” It really depends.
But the folding stuff and the molecular dynamics stuff, stuff that is molecular interactions - I see a lot of potential there. There’s a lot of work that can be done there, especially with – so there’s something called ligand binding, which is like a fancy term for “We take a small molecule, we’ll make it attach to some protein of interest, or something, disrupt some function, that helps cure some disease.” It’s all downstream… Like, if you tried to do like a stack trace of like how medicine works, instead of a bug failure, it’s pretty brutal to be able to do that. We’re still at the point in life where we don’t really have that. The closest we get is my software, which is - yeah, that’s not good enough, I’m gonna be honest. Go’s stack trace cannot heal you. Sorry, that’s not a good endorsement of the language, but we’ll get there someday.
But that sort of space, of figuring out what molecules interact with each other and how they fit, and how they puzzle-piece together - that’s some really good use cases for machine learning, where you’re just [unintelligible 00:49:06.15] structures against each other, you’re doing - I call it sub sampling; or people call it fingerprinting, where you decide, “Oh, this is the only place where we really care about seeing [unintelligible 00:49:15.10] because that’s the therapeutic effect. And so there’s a lot of research there in that, and that’s super-valuable, and I don’t think there’s going to be many other routes to do it. Of course, there’s these classical deterministic ways, which are good shorthands, and they’re probably faster, and you don’t require like a big, beefy supercomputer to pull it off. But I think that’s the place where machine learning is really the – that’s the real strong suit there.
There are things with – people have figured out how to do DNA synthesis fixing, like I do, but like with machine learning, and it’s like… I don’t know, mine works like 99% of the time, or something, and there’s no machine learning involved… And I haven’t really looked at the other people’s models, mostly because I don’t even think they’re on Hugging Face. I mean, I’ve gotta give the Hugging Face guys super props. There’s also a machine learning library in Go that directly connects to Hugging Face. I wish I remembered the name of it right now, because I want to give their developer a shout-out. You should have him on the show, actually.
We’ll add them in the show notes, for sure.
[00:50:09.08] Oh, yeah. And so - yeah, machine learning has a place, but I feel like you still need the software engineers and biologists to understand each other’s fields; you can’t just kind of play it off on the hopes that this mushing of matrices will get the product you want, because you still have to figure out how to set up the mushing of matrices; you still have to have some concept of the data you’re putting in. And the data cleaning part, as many people listening in probably know, regardless of field, is a lot. It’s like 90% of the job [unintelligible 00:50:35.23] engineering to make sure you can shove it into a matrix, or shove it into some acceptable format for this to work. There’s so much of that still, and I think it’s still super-critical, and we haven’t figured out a way to generate data, or manage data in a way that works, to put it in these models correctly. I think we’re getting better at it, obviously, but we’re still working on it.
From what I understand, a lot of the innovation in machine learning in the last five years has been around that piece you were talking about; like, we’re making sure the data is usable.
So this is a really interesting thing about biotech that’s held true, is that biotech is ten years behind on any trend when it comes to software. So in five years I’ll believe it, but right now we’re still in the weeds, my man. We’re still there; we’re still trying our best. I mean, I’m working on a project, again, where we’re essentially making a big graph database server thing, just to have all these protein and DNA sequences and metabolic pathways in one place, so you can query them and sort of engineer around it… And I’m hoping that’s the innovation that’s coming in five years for us as a field, is that we have this same amount of data cleaning that the rest - I would call it the rest of the field that doesn’t rely on heavy domain expertise, someone who has a PhD in this stuff to kind of reason about what you’re working with - I hope we can catch up, because that I feel like that’s just the lag.
One of the things that I would say for a lot of people that are listening is I have a – egotistically, I call this Tim’s Inverse Law of Software Quality. The more important the thing, the worse the code. And there’s some organizational psychology theory around this, but essentially, the code behind Instagram is infinitely better engineered than the code behind a NASA rocket ship. Or your Volkswagen car. It’s just how it is. And there’s a couple of reasons why, but essentially, if you can find some super-niche thing, like the operating system for a car, or… Maybe get the clearance to work with nuclear reactors; there’s a lot of work to be done there. Nuclear reactors still run on Fortran and COBOL man; they need some help. Bank software still runs on COBOL. I have a friend whose parents came out of retirement because they couldn’t find any people to program bank software in COBOL.
And so there’s a lot to be done in writing sort of these niche libraries that are open source, where people can see it… Because the thing for me is that if I didn’t make Poly open source, I wouldn’t have nearly as much business. People wouldn’t be able to believe in me. I don’t have a PhD. The proof in this field is you either have to have a PhD or a corpus of work that is obviously there.
[00:54:04.14] And so I guess what I’d like to see in the future for Go developers is writing scientific-esque or heavy industry software that’s critical, in Go, or any other language that’s appropriate, and it’s well-documented in open source. And maybe with the nuclear reactor stuff - you probably couldn’t make that totally open source; the US government may have something to say about it. But I think there’s some real value there, because then you start having clients come to you and go, “Hey, can you adapt this for our needs, our specific car?” And that’s how I’ve made my career. People would be like “Hey, it’s really close to what we need. Can you adapt this for my needs?” And I think that’s a real path to making a living as an open source developer. Not as a startup, where you’re GitLab and you’re exiting for 7 billion… Or was it 11? 7 billion? More than GitHub, hilariously enough… But it’s a path to making a decent consulting business and a living as an open source – mostly open source developer.
Some clients, they’re like “Hey, I just need this, and I need to query for it.” I’m like, “Cool. I’ll write this pipeline, and I’ll keep most of it for myself, and make it open source, and you get this one query that I promise never to tell anyone what this query was, and it’s your data forever, or as long as you’re still a company, or something.” And that’s totally reasonable to a lot of companies, because a lot of biotech software - people are starting to learn that they don’t really have a business-owning software. People think that owning software is like owning land, and that it always appreciates… But no, you have to maintain it. Like, you can’t just clean it up at the end and call it a day. It’s like a constant – it’s kind of like surfing, it looks easy. But if you’ve ever surfed before… Like, I’ve just started, because I live in California, and it’s what everyone does here, apparently… Like, you think, “Oh, the waves are gentle.” Like, you get your – I’m trying not to swear on this podcast… I grew up in New England, so that’s all we do, is just speak in swears… But you just get your butt kicked by the ocean by standing still. And that’s kind of what it’s like to own software; the waves keep coming in, and you keep getting your butt kicked, and you’re just hoping that you can figure out how to not have your butt continually kicked by these changing tides.
No, that makes sense. So we’ll do one last quick question before we do the unpopular opinion… But this is all open source, and you said that we need more software engineers in biotech. How can the community get involved?
That’s an amazing question. So there’s a bunch of issues on Poly right now, but what I truly need from engineers or software engineers right now is like - I don’t know how to deploy large, scalable things. I write really efficient DNA manipulation algorithms. That’s my jam. That’s what I’m really good at. But now I’m getting this data problem where we’re doing this system where we have at the least five terabytes of legacy data that I’ve figured out how to write a really fast parser for, but now we have to store it in a database…
I’m using SurrealDB right now; I contributed to their Go client… Like a minor one, but still one. And it’s been so far a great experience with them, and I think they have like a real product going on there, that’s really nice, and a proper graph database… So if there’s someone out there that knows how to deploy things at scale, that are reliable as a DevOps [unintelligible 00:57:00.17] that’d be great.
The other thing is, I’m actually writing tutorials for Poly using Gitpod and an awesome plugin that has like 10 stars on GitHub, where essentially it lets you just put in the configs, like breakpoints for code, for the debugger… So the whole tutorial idea is that you open up this Gitpod instance and it brings you to the first tutorial, and it’s just the debuggers there, and you hit Run, and you just go through each checkpoint or breakpoint and go, “Oh, this is what it looks like here. This is what it looks like here”, and just keep going through that. Because I really love Go’s – like the Godoc’s example of things, and that you can run them… But since a lot of this stuff is parsers, and parser-based, you can’t really get everything you want out of that… So that’s why I kind of made the tutorials for that. But I’ve only made the first one. There’s like five others that I’d like to have, that I’ve put in like little “Oh, please, someone help me. Write it here.”
So if you wanna learn biology, like I did with this wet lab, where I was like “Hey, I want to learn biology”, and they’re like “We’ll teach you, but you’ve got to make the class…” Like, if you want to do a similar thing, but on the internet, that would be super-helpful.
[00:58:02.10] And if anyone’s listening and you’re into like learning how to do the wet lab biology yourself, there are plenty of garage labs all over the world, except for Germany. I’m so sorry, Natalie; Germany has a law against this. I wish I was kidding… But the rest of the world has these. I know, it’s super-weird, super-random, but every time I’ve talked to someone in the community that’s German, they’re like “I can’t even [unintelligible 00:58:16.26] my house. This is ridiculous.”
You can try… Just don’t tell anyone, apparently.
But there’s plenty of these garages; they’re called community labs, or biohacker spaces. They’re all over the world, and they’re used to software engineers just showing up and being like “Hey, what’s all this biology stuff about?” And they’ll take you in. They’ll be nice. They’ll show you the ropes. But if you want to get involved in the software, tutorials and deployment are like the big things that I really truly need help with.
And it’s a good time to remind that at least today, when we’re recording the episode, it’s still October, so it’s still time for Hacktoberfest.
You also get something for that contribution.
I added a tag in there, you’re right. Did you get T-shirt? What do you get for that? I’ve never gotten the swag.
I don’t know what’s this year’s swag.
I have a disappointingly small amount of tech swag… Like, I’m gonna be honest, I don’t have enough. And every time I get like a Google hoodie or something, I just give it to my dad. He thinks it’s hilarious.
So… Time for Unpopular Opinions!
I like the harmony there… Okay, unpopular opinion. This is probably not unpopular to people listening, but maybe it is… But open source has always been sustainable, and I see no business difference between that and closed source software, personally.
Now’s the time to drop that mic. [laughs]
Really? Okay… Yeah, so that’s like my hot take, is I don’t see a difference between them business-wise. And I know a lot of people do, and this is weird… I don’t know what – there’s something weird about open source in people’s heads. When they hear I’m an open source developer, they’re “Oh, you’re a fan of Richard Stallman, and you don’t have proper hygiene etc. You don’t believe in money.” And it’s like, maybe like two of those things are true, or maybe one. But the point being is that there’s this weird little stereotype of open source can’t make money, and it’s this weird thing where most people, and I think most engineers, especially when they’re coming from - in my realm, of Academia and biotech software, they don’t understand that there has to be a sales team, there has to be a person out there selling the software, getting the customers… And as a purveyor of the service, you are the expert; you are the person that – they’re not just paying for your code, they’re paying for your expertise that’s embodied in the code. That’s in your ability to deploy it for them in meaningful and useful ways.
So for me, looking at like a – like, if you look at MongoDB, obviously, GitLab, and… What’s the other one that I’ve already forgotten? Red Hat, obviously - they sell services; they’re the experts in these things. They help people deploy it in a way that’s scalable, that probably saves them money compared to them doing it themselves, and they keep it mostly open source or open core. And the point being there is that they have sales teams; they have people going out, finding customers, and doing all this other stuff. But I meet a lot of engineers who have the belief of “If you build it, they will come.” And for consumer-level stuff - which is not what biotech is; biotech is very business-to-business. I mean, there are some consumer biotech things as end products, and there are some people that are consumers, like students, or people that buy the home kits, or things like that… But it’s a very business-to-business thing, where like my code, being open source – people ask me, “Aren’t you afraid of someone’s gonna copy your code?” It’s like, “No. I’m the guy who wrote it.” Sure, you could use my code, but why would you pay someone twice as much to do half as much with it as I would do, if you can just hire me?
I think that’s something that a lot of people don’t recognize with open source stuff, is that it’s a very viable – it has almost no effect on their actual business model. That’s what’s always been surprising to me. Unless your business model is to not develop something novel… Like a lot of SaaS’es, obviously, different projects, just [unintelligible 01:02:08.19] open source packages put together in some sort of microservice framework that blah-blah-blah… And obviously, sometimes it’s novel stuff out of that, and sometimes it’s just re-skinning of other people’s open source stuff, but with like a fancier GUI, or something…
But if you’re writing something truly novel, that no one else has done before, and you’re the expert, you really have no fear of anyone copying you, because… You have to market yourself, that’s the big thing. You can’t be the quiet guy in the corner. You can’t be like “Oh, I wrote this thing”, and then you let some super marketing guy just come in and be like “Ah, this is my thing now. I made it” etc. You can’t let them do that; you have to have some hold there. But I don’t see much of a difference between a company whose product is open source and whose it’s not; it’s still the same thing for the most part. But you do get the benefit as an open source company, as people can believe that your software is good, because they can literally see it. The proof is in the pudding there. They can look at it.
And then with a lot of biotech software, they’re like “Just believe me that it’s good. Please, give me $100,000 to start, before we even show you the code. Just believe me.” And so it’s a big difference there. And then you get into – you know, there’s obvious code quality, then you have people that are submitting pull requests and little bug fixes, little documentation fixes… It’s not like you have a real engineering force behind that, but it does add up. And it does give you also some marketing power. Now people are like “Hey, have you heard about –” Word of mouth can happen there a little bit. I show up to meetups here in the Bay Area, and people are like, “Oh, yeah, you’re the guy who wrote Polymerase.” I was like, “Nice to meet you.”
So there is some real value to it, especially if you’re not the traditional purveyor of truth, if you’re not the PhD who’s gone to Harvard, who’s the fancy guys, fancy guys, grad students… It’s a great way to show that you’re the best, and you’ve got the right stuff.
So I feel like there’s gonna be more companies, hopefully, like GitLab, Red Hat, MongoDB… And I’m hoping companies like Gitpod, and NextFlow, sort of get to this IPO stage too, where people can see the engineer, and they can see the service, they can see the value in it, and they don’t care that it’s open source, because they know that it would cost them 60k to have a dev do it internally, or it could cost them 5k to 10k to hire the guys who wrote it. And I think that’s just sort of - I hate to use VC terminology. I’ve been spoiled by the city… That’s “the moat”. You are the moat. You’re the creator. You’re the person who made this. You are the artist who brought it to life, and there’s some real value behind that.
I agree with you very much.
Really? Okay. So it’s not controversial, and I’ve ruined the segment.
I think it might be controversial…
Yeah. So there will be a vote… As with every unpopular opinion, we will put this on our Twitter and we’ll see how many people agree or disagree with you. But to summarize it, we said that your unpopular opinion is that open source always has been sustainable, and… Yeah, we’ll see how that goes. I have to say, this must be the episode with the most show notes we will have, at least for me…
Oh God, I’m so sorry… [laughter]
No, that’s amazing. You taught us so much, and I want to say thank you very much for that.
I wish I had a whiteboard. I’m gonna have to go and make a new PowerPoint presentation…
I learned one million things today.
Okay. Alright. If anyone is out there listening to this, or you, Natalie, or Ian wanna ask more questions about biology, I’m always available. I may get tired of it eventually…
And your Twitter is mentioned here, for sure.
Oh, yeah. This is the Twitter. If you jump in the Discord, there’s gonna be hopefully some nerd that’ll be like “I know that bit of biology.” I love my Discord for that. Someone can be like “Yeah, I can explain that.” I’m like, “Great. I had no idea.”
Perfect. We’ll add the Discord to the show notes as well. And Ian, thank you so much for joining, and co-learning here, live. Tim, thank you for teaching us so many things. This was really fascinating.
Thank you for having me. I hope I did alright…
It was amazing.
Thanks, everyone who joined.
Thank you very much. Talk to you later. Bye!
Our transcripts are open source on GitHub. Improvements are welcome. 💚