Gerhard’s transition to a senior engineer started 10 years ago, when he embraced the vim mindset, functional core & imperative shell, and was inspired to seek simplicity in his code & infrastructure. Most of it can be traced back to one person: Gary Bernhardt, the creator of Execute Program, Destroy all Software and the now famous Wat idea.
Few stick around long enough to understand the long-term impact of their decisions on production systems. Even fewer are able to talk about them as well as Gary does.
Matched from the episode's transcript 👇
Gary Bernhardt: So Execute Program - very quick summary of what it is, just because some of this is going to be relevant… So it’s an interactive platform for learning programming languages and other tools. The lessons mix text with lots of interactive code examples, which is going to be important, because that code has to run somewhere. So it’s a very unusual sort of infrastructure requirement. It’s been a commercial product for three years, the code has started maybe five years ago in early forums. A maximum of four people have ever worked on it, so this is a small product, although most products in the world are small, even though mostly we hear about the big ones, which is sort of a distortion in the way that we talk about things. And it’s a bootstrapped company. So it makes real money, but it’s small. It’s not a giant unicorn or whatever. So that’s the product we’re talking about.
Here’s the architecture. The primary database is Postgres. I love Postgres. I think it’s great. It can do almost anything you need any database to do, ever, unless you’re at truly huge scale. The backend servers are at Heroku. It’s a monolithic backend. One repo, one server process, that’s it. It has some workers, with a queue, and the workers auto-scale as needed to accommodate load, just like the web processes do. And the workers in the queue are used for things like transactional emails, reminder emails, interacting with third-party API’s where we just want to sort of shield ourselves from those API’s if they’re flaky, or slow, or whatever. We receive one type of incoming WebHook, and that’s from Stripe. And the reason that exists is if we create a subscription and the underlying credit card is later expired or something, then we need to know that, so we can remove that person’s access, because they’re no longer paying. So Stripe hits us with a WebHook for that.
And then we get to the weird part, which is how do we execute the code that the user is putting into these exercises? So we have a fleet of executor VMs that exist only for this purpose. They scale up and down as needed, to handle whatever user load we have, because you know, if there’s a peak, it could be a lot of these VMs, and they’re expensive. And it’s a very difficult process, because they have to be security-hardened, because they’re running completely arbitrary user code, ultimately. And if we don’t harden them, and you know, firewalls and all that stuff, and sandboxes, then people are going to send spam, or mine Bitcoin, or do all kinds of nefarious things.
They also have the wrinkle that as the code is executing, they’re putting tracing information into the queue, which ultimately gets aggregated into the database, so that we can debug things when things go wrong… Because it’s ultimately a distributed system executing arbitrary code; it’s quite a complex problem. And this, of course, is the most difficult part of the architecture.
So it’s Postgres, a single backend, workers with a queue, Stripe WebHooks coming in, and executor VMs. That’s basically the architecture… Which I think is like – this is pretty normal, I would say. Well, the executor VMs are weird, because it’s a specific property of our problem space. But I think this design is pretty normal and not particularly complex.