Ship It! – Episode #75
How vex.dev runs on AWS, Fly.io & GCP
with Jason Carter, founder of Vex
Few genuinely need a multi-cloud setup. There is plenty of advice out there which mostly boils down to don’t do it, you will be worse off. Vex.dev is a startup that provides APIs for video and audio streaming. The hard part is real-time combined with massive scale - think hundreds of thousands of concurrent connections. They achieve this by using a combination of Fly.io, AWS and GCP. Jason Carter, founder of Vex Communications, is joining us today to talk about the multi-cloud setup that vex.dev runs.
FireHydrant – The reliability platform for every developer. Incidents impact everyone, not just SREs. FireHydrant gives teams the tools to maintain service catalogs, respond to incidents, communicate through status pages, and learn with retrospectives. Small teams up to 10 people can get started for free with all FireHydrant features included. No credit card required to sign up. Learn more at firehydrant.com/
Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights
Sentry – Working code means happy customers. That’s exactly why teams choose Sentry. From error tracking to performance monitoring, Sentry helps teams see what actually matters, resolve problems quicker, and learn continuously about their applications - from the frontend to the backend. Use the code
CHANGELOG and get the team plan free for three months.
Notes & Links
- vex.dev - Stream your meeting, or your moon landing 💥 Demo
- HLS - HTTP Live Streaming
- Typescript SDK for working with the Vex API
- 🎬 When to Choose Rust for Your Cloud Native App - Tim McNamara - #swisscnd 2022
- Postgres WASM
|4||05:19||What about resilience?|
|5||11:27||Now THAT's a lot of CPUs|
|6||17:41||HLS and WebRTC|
|7||19:08||Bandwidth for 500K CPUs|
|8||22:56||Not going with a cloud provider|
|11||32:40||Wow, who wrote that?|
|12||36:13||Using Vex or building on it?|
|13||39:35||What do you use to run vex.dev|
|14||41:14||Go provides performance|
|15||46:55||Who is Jason?|
|19||1:02:15||The next 6 months|
Click here to listen along while you enjoy the transcript. 🎧
The majority of companies use a single cloud provider, and it’s usually one of the big three - it’s AWS, GCP, or Azure. Few genuinely need multi-cloud. Jason does, for his startup. And the more I learned about it, the more intrigued I became, like “Wow, really? Seriously?” Jason, welcome to Ship It.
Thanks, Gerhard. It’s great to be here.
So that’s what I’m really curious about, why do you need multi-cloud? What is the story behind that?
Yeah. So I’ll start by talking a little bit about Vex, and what it is, and how our unique requirements make it not only advantageous to be multi-cloud, but almost a requirement. So Vex provides APIs for video and audio streaming. So if you’re a developer looking to build a video call into your app, or a remote podcasting service, you probably need to use something called WebRTC, which is a set of standards for recording and transmitting video and audio peer-to-peer. And it’s very tricky to scale WebRTC to larger and larger audience sizes. Very few platforms are able to do massive scale, hundreds of thousands of listeners.
Most of the time, if you go to something like Twitch or YouTube, you’re using HLS to stream at scale that way, but you lose the immediacy of it; there’s a ton of latency involved. And so for us, trying to build something that scaled really well without customers meant that we had to do a lot of scalability testing, and kind of see, “Hey, what’s the best way to make this scale well? How can we make it super-reliable?” And at the beginning, almost out of necessity, in order to do the sorts of numbers of scale that we wanted to, we had to split our infrastructure across a couple of different cloud providers. And it turned out that that was actually very advantageous as well, because if you’re doing a super-large meeting on our platform, you want redundancy, you want reliability. And so having the ability to kind of run anywhere, whether that’s on Google, where we ran the majority of our services, because we had a lot of free credits there, or AWS, or put certain things on Fly, certain things on local machines… We’ve just really tried to be kind of flexible, so that we can provide the most stable and reliable service we can.
Yeah. So you mentioned one thing about reliability that I wasn’t expecting you to say that. So first of all, it’s scale, and certain cloud providers - you cannot achieve certain scale as quickly as you may need it. And I imagine that is a limiting factor. But what about the resiliency? So what happens when, for example, Google was to be unavailable? I don’t think that happens often, but if it did, do you load-balance between cloud providers for that resiliency? How does that work? That’s very interesting.
So when we were first talking to some initial folks that are building - you know, in the context of large virtual events, they would have hundreds of thousands of people joining live on a WebRTC connection. And they really wanted that reliability of “Hey, if a cloud provider goes down, or a region goes down - that’s happened to us, and we’re totally hosed.” And you can’t have that when a lot of your business is on the line for a meeting or an event of that size.
[06:20] I see.
So we don’t yet have the ability to kind of fail over gracefully if something’s happening, like Google goes down - which again, very rare - but we do naively load-balance between the two when we’ve got our system deployed in kind of a multi-cloud mode. A lot of the times we just run on Google, because as you can imagine, there are various difficulties with networking to kind of connect those two together… But we’ve kind of imagined being able to allow customers to choose where they want things deployed. Some customers that we’ve talked to - they have very specific requirements for their clients. For example, a client might not be able to even stream their data through particular regions. And so we’re not yet able to provide kind of a choose your own adventure of where you want everything to go, but that’s kind of been a goal for us, is to have that capability.
So that sounds really challenging, because when I’m thinking of building a product, just like starting out, I’m thinking “Make it work. Make it nice, make it good, and then make it fast.” But for you, you seem like to be starting from the “Make it fast” angle, right? Because you need that reliability, you need that scale. That’s almost one of the unique value propositions that you bring. That must be really challenging, to start from there… And from a scalability – how do you even simulate that many connections? You said hundreds of thousands… That is a lot of data. How does that work?
Yeah. And in fact, it was almost trickier to build that testing framework than to kind of get the initial prototype up. And it is kind of backwards. Yeah, I think the best way to describe it is that for us, scale is the core feature that we wanted to start with. A lot of other providers in the space - they have participant living limits, or you can have a certain number of people in a live meeting, but then you have to redirect everyone else to a YouTube stream, or say, “Sorry, watch the recording afterwards.” And so we really wanted to have no limits on that; you can have as many participants as you want… You’d have to limit a little bit who can send audio and video, otherwise it would be just an absolutely crazy meeting… But we decided that that was super-important to us, and some of the initial people we talked to were really interested in it.
So to test something like that, you have to do two things. You have to scale up your infrastructure to be able to handle that amount of traffic… And there’s really sort of two things that you’re watching out for with video and audio streaming. It’s CPU load of just transcoding, or in a lot of cases forwarding the media from one user through some servers to others… And it’s bandwidth. So in order to figure out what is the cost that it would take us, per user, to run a large meeting, we had to scale up and test it. No one’s going to trust you to run a meeting of that size if you’re not able to prove that you can do it well. And so we tried a lot of different things over the last six months or so. We started out with headless browsers, so deploying hundreds or thousands of Google Chrome instances, scripting them to connect to our application, and tell some of them to turn on their cameras and microphones, tell others to just listen… We found that we were able to kind of deploy that on Kubernetes, and orchestrate lots of them… But there’s a huge resource cost just to the overhead of a browser.
[09:59] So the next thing that we did is we started going lower level. What if we can just connect WebRTC’s process called signaling - that is essentially how you get two peers in a call to know about each other, and establish a connection. So we wrote a much lighter-weight script in Python that could handle the signaling, and then optionally send audio and video from a file. And that got us to around 50,000. But again, the CPU cost was very high. And when you’re bootstrapping a startup, you can’t really afford to spin up lots of servers. So we were able to get a lot of credits in Google Cloud and AWS. There’s great programs out there for startups to get free credit, and we were able to then kind of rewrite our script in Go, to again reduce the CPU cost. And eventually, the largest test that we’ve done to date is 500,000 users receiving video and audio from a couple of presenters.
These are, by the way, simultaneous connections, and that’s really important. It’s not like 500,000 requests per second spread over I don’t know how many seconds. This is genuinely – and they’re constant, right? the connection remains open. So these are long-running - and you’ll tell me what long-running means; like minutes, maybe? The duration of the test. And they have to be simultaneous.
Yeah, that’s correct.
So how many CPUs are we talking about? How much bandwidth are we talking about? Can you give us some numbers?
Yeah, so to run tests of that size, again, 500,000 simultaneous connections, we sort of had two sets of things that we needed to scale up, both the load test system and the actual media servers. And so we ended up running somewhere in the neighborhood of 15,000 CPUs for the load test users, and about 1,600 CPUs for the system that was actually forwarding the media streaming platform.
So that’s more than 30,000 CPUs, right?
Wow. 30,000 CPUs. Now, that is a very expensive load test.
Yes. And so you can imagine that we’d do them very quickly, right? we were only running that for 30 minutes or so. We sort of staggered the joins, so that we’re not sending 500,000 connection requests at a single second, maybe over the course of several minutes. But at the end, all those connections are established. There’s really two involved; there’s the WebSocket signaling connection. That’s connected to our application servers, with basically, “Hey, someone joined. You should subscribe to them. Here’s how to subscribe to them”, that kind of thing. And then the WebRTC connection to the media server.
That sounds like an awful lot of capacity. Did you have to give some notice to Google about, “Hey, we need like 30,000 CPUs?” how did that work?
Yeah. So Google has – like other cloud providers, they all have a quota system and the ability to request more quota. And so we sort of had to step it up over time, because we’ve been working on it for quite a while; we didn’t yet have a Google rep to talk to. And so we started out by taking a Google project and requesting the max CPUs of a particular CPU architecture in a particular region. After we got that, we’ve run that for a while, and kind of showed that we were actually using that capacity. And then we’d ask, “Hey, we want the same amount of CPUs, but in a different region.” And so over time, we were able to scale out across East, West and Central, in several different regions and availability zones, and many different CPU types as well; we took advantage of the fact that you have with Google E2, N2, N2D… So we were just saying “Give us 1,500 of those, 1,500 of those, 1,500 of those, across all the regions.”
[14:05] And so what was kind of interesting is that even though it was all the same Google account, we would get different results in different Google projects. So we might have our load testing project, and we could easily get more CPUs there. But then we’d have the other project and it’d be much more difficult. So it was a lot of trial and error, a lot of ramping up quota requests until they got approved, and then, pushing our luck… And we kind of maxxed ou. We believe that we can scale the system even larger, but we sort of hit a point of, “Well, Google’s not going to give us any more CPUs anymore, and 500,000 is probably good enough to demonstrate what we’re hoping to demonstrate.”
can you imagine a conference or an event that requires 500 simultaneous connections? which event is big enough? I’m thinking KubeCon, and KubeCon is tens of thousands, maybe up to 30,000… What event requires 500,000 simultaneous connections? NFL? NBA? That’s the only one I can think of.
Yeah. So for many large events - you can think of a trade show. We just had Dreamforce in San Francisco recently; they’re broadcasting out to lots of people. The size could be 10,000 people, 100,000 people, and anywhere in between. And in that case, you’re mostly just trying to present that information without any interaction to those types of people.
Where you might want real live connections is if you have kind of a much more interactive experience, where “Hey, any one of those 500,000 people watching Dreamforce could raise their hand and say, “Hey, I’d like to ask my question on stage.” So there are some folks where they really want that sense of interaction. And so being able to kind of convert that just sort of read-only connection to an active connection that can send and receive audio on the fly, and video, is kind of the sorts of people that would be really excited about what we’re doing. But for a lot of cases, you really don’t need that.
But having that flexibility to – you don’t have to run two systems, one for people who might interact and say, “Well, you’re on the stream that’s lagging behind, and you want to ask a question; now we’ve got to convert you to this other connection and figure that out.” There’s some play… Especially, I think, in the virtual events category, hybrid events, anything where every little second of latency matters, you want to have high interactivity.
Another interesting example is online real-time auctions. They have a lot of people connecting to a call, or just a live auction, and so you need to be able to bid very quickly. Any kind of delay is going to make that a problem. Or sports betting. There’s lots of kind of interesting applications if you can transmit video and audio really quickly, in a very high-performance manner.
The one thing which I didn’t realize is that we’re not talking about one event, we’re talking about many simultaneous events. When you add them up, you could have hundreds of thousands of simultaneous participants, because they’re all running on the same platform, on the Vex platform, but they’re different events, happening at different times. And there are spikes, and then dips, and you have to adjust to the traffic, and all of that. Okay, okay.
So you mentioned HLS. I didn’t know about HLS and WebRTC. But you did mention that WebRTC is really important for the low latency. How low are we talking here? what is the difference in latency between HLS and WebRTC?
[17:59] Yeah, so the way that HLS works is - to simplify it, as you’re kind of producing video and audio, you’re essentially writing chunks of files. And then someone who’s actually consuming it is grabbing each chunk of this stream and playing it back. So there’s a lot of steps involved to get that video up to usually a CDN, so that it can be downloaded and streamed over time. So with HLS, you can look – I think they do have lower-latency versions now, but you’re kind of generally looking in the 5 to 22nd range. With WebRTC, you can get down to under 200 milliseconds, under 100 milliseconds. And that’s because it’s much more of a direct path. You have your own connection, you’re receiving your own RTP packets of the video and audio in real time, directly to you, you’re decoding them there, instead of them kind of going through this process to go to a CDN and be consumed that way, in your in a typical implementation.
So the CPU is one issue, and obviously you want to have lots of CPUs and fast CPUs, and not one issue, one challenge… But the other one is the bandwidth. So what are the bandwidth requirements to service 500,000 simultaneous connections? We must be talking terabits per seconds, I think. I think the resolution depends as well, because it depends how big is the video, but I think 720p is like the standard. I can’t imagine people having a good experience on 480p. Even 4k nowadays is like – that’s like the cutting edge, I suppose, if you’re a dev – 4k, you are doing really well. But that is a lot of data.
Yeah. A lot of times platforms will charge more for those higher-resolution streams, just because that is the major cost, if you think about per minute. So one example that we had was if we had an event with 100,000 people consuming a stream from one user - you could think of a keynote - it would cost about $10 per minute to run that. And so the vast majority of that is actually the bandwidth cost. We’re talking – certainly, once you have that many people going, you’re talking gigabits per second of traffic, spread out across a whole network.
Okay. So now terabits, gigabits. And gigabits per second – I know that the bandwidth is one of the biggest costs, and it’s a hidden cost of cloud providers, because people don’t realize just how expensive that stuff is, especially when you have a global audience. Someone from Asia accessing something in the US is a lot more expensive; or Oceania, Australia, New Zealand… It’s very, very expensive. And latency is what it is, but still. That’s when CDNs help; it helps if you have a CDN, so that they access to data from there. But in your case, that wouldn’t work, right? So someone would need to have a direct connection, have the latency specific to it, which would be, I think, 200 milliseconds, roughly, thereabouts…
Ideally, yeah. Well, plus obviously anything that the protocol uses; you mentioned that’s another 200 milliseconds. But the speed of light is a constant, right? You can’t exceed that. And even that, it’s like 80% of that you get like in real terms. So for that, the bandwidth costs are significant, and I think they are, by far, the highest ratio from the cost per minute of actually doing this. It’s not the CPUs, it’s not the memory… There’s no storage involved, because everything is real time. There’s nothing stored, it’s just transmitted. Do you know roughly how much is the ratio of CPU to bandwidth when it comes to the cost to stream this data? Are we talking 10% CPU, 90% bandwidth? Or whereabouts are we?
[22:12] It’s much more like 95% bandwidth.
Yeah, it’s very high. one example of, “Hey, let’s have one person streaming to one person receiving” - for a whole month of that it would cost about $4.04 in bandwidth, but $0.09 in CPU. That was a strange way to say that, but…
Much, much cheaper. Yeah.
Wow. That’s crazy. So the ratio is like – in that case, it’s even more than 95%. Right?
Yeah, it’s more like 99% once you get up to that size, yeah.
That’s crazy. Okay. Okay. Was the cost of bandwidth a factor for you to choose a specific cloud provider? Google versus AWS is what I’m thinking here.
It’s often a case, actually, to choose to not go with a cloud provider. One of the main reasons that we started building on Google is that we had access to those credits. And when we started, we didn’t quite really understand, “Hey, what is the bandwidth cost going to be?” A lot of folks in the industry will tell you that’s one of the first things you realize, is once you hit a very large scale, you’re actually at risk of saturating connections, even. the bandwidth is going to be pretty good within Google servers, and so we’re able to kind of not spend as much by deploying a lot of our load test workers in Google, so that it’s all kind of in Google networking… But in a lot of cases, to scale these sorts of things, you start on the cloud, and then you end up deploying your own servers. It’s not only so that you can have cheaper CPU costs, but mainly for the bandwidth cost. And you can imagine going from one provider to another, all of a sudden that bandwidth is leaving a cloud provider’s network, and that becomes more expensive… So you really have to start to think about - and we’re very early on this; excited to see what we come up with… But efficient transmission of these things.
Let’s say that we have a stream that needs to come in and then branch out to five different relay servers. Can we sort of bundle up the connections in a smart way? Can we compress them in a smart way? Do we really need to send the full resolution stream to someone who’s viewing the presentation, versus someone who’s actively participating? So I think that’s how a lot of these providers figure this stuff out, is they run a lot of tests, they gracefully step up and down resolution as they deal with bandwidth congestion… And there’s also stuff in the WebRTC protocol that can help with that, too.
Do Imagine yourself running your own bare metal hosts at some point? Do you see that in your future?
Yeah, I think so. I think being able to have much cleaner control over the network is super-important. I used Kubernetes for a long time, and we found that we ended up having to go straight to VMs, so that we could really understand and not throw a lot of extra-complicated networking layer in between… I’ve seen some very interesting systems where they do kind of sort of point-to-point VPNs for all their traffic, and so that way they’re able to have a combination of on-premise hardware and a combination of cloud… But I think the key advantage is obviously cost, and in some cases being able to put servers in very specific areas, and kind of keep traffic localized there.
Another thing that we’re kind of excited to explore is - let’s say you have a company that requires video and audio infrastructure. They’re generally going to need it for a lot of different reasons. Maybe it’s for internal meetings, maybe it’s for broadcasting out to events… And so if you can provide a system where for all the internal meetings they can just run it kind of in their own network… Mostly, people are going to be in that area, connecting to that office; maybe you want to keep your recordings there, that sort of a thing. I think it’d be very cool to provide that hybrid approach to these sorts of companies, where “Hey, for the most part, you can use the super-cheap, really, really fast server, it’s really close to you, but then if you do need to scale out because you’re doing an event, you can kind of branch out to the cloud, where it gets a little bit more expensive.” But having that flexibility is super-important to me, because I think it opens up just kind of a lot of interesting opportunities for us and for folks that might use our services.
That’s a really interesting point that you are alluding to here, the data privacy. So when a user uses Zoom, or Skype, or Microsoft Teams, or whatever you use to communicate with your team, you don’t really know which way that communication is flowing, which way the data is flowing. So what if you are required by law to keep all the data within the EU? How do you solve that problem? Do you not use them? Do not use those tools? What are the alternatives? And I don’t know if many alternatives exist, but is this something that you’re thinking about, data privacy and how they just route it when it comes to communication within teams and companies?
[30:21] Yeah, it’s something that I find really interesting and important to think about. Everyone’s probably been through a Zoom call where you get the “This meeting is being recorded”, right? Where is that recording happening? Who has access to those recordings? How secure is that. And as I mentioned, there are folks that do have specific needs, whether it’s GDPR reasons, or HIPAA compliance reasons… And there are some options out there. Zoom has a HIPAA-compliant version. I haven’t actually been able to test it yet, but I’ve heard that it has significantly fewer features, because there’s so much that these systems enable by being spread out all over the place, and having a lot of different kind of components to them.
And if you’re working in video and audio, I think about how that’s super – it’s just very private information. It’s something that you want to be absolutely sure that that’s safe and secure. And so I think for certain use cases, being able to say, “Hey, we can guarantee that your data is in this region of this cloud provider”, or “Your data is in a combination of your own infrastructure and the cloud provider.” Or being able to say – typically, how recording works is you might have a media server that’s writing out essentially the packets and frames that are coming in, transcoded, you composite them together to kind of get that nice, Brady Bunch squares set at the end… It would be awesome to be able to say, “Hey–” Kind of like with the tool that we’re using right now - you have cloud recording versus local recording. What if you had cloud recording versus my own servers recording, versus local?
So especially as people had to move online for the pandemic, it feels to me like if we’re going to spend a lot of our time in these calls, I want to be sure as a consumer that that data is private, I can use it how I want… I just don’t want to open up this door of “Who has access to all this stuff?” And that’s just me personally, right? That’s very different if you have compliance reasons for that, as well.
Because you mentioned that, I’m going to read something, and the question to you is who wrote it. “Customers deserve online spaces that aren’t isolating, but invigorating. That are real-time. With a lot less “I think you’re muted”, and a lot more “It just works.” You shouldn’t have to download and configure another video call application to join your next remote event or happy hour.” Who wrote that?
Ah, I did.
You did. That’s it. That’s right. So I think most of us relate to this, right? “Is this working? Can you hear me?” I mean, later, by the way, I may heckle you. “Jason, I can’t hear you. Can you repeat that question?” And we’ll leave that in the recording. Not now… It’s coming. [laughs] But that happens a lot, right? “Hey, can you hear me? Is my audio setup correctly? Is my camera on?” And there’s issues on when people’s sides… There’s always something in these calls, and you waste a lot of time, every single time, configuring it. So how are you thinking about this problem? Because obviously, it’s on your mind. I was reading this, I think you wrote this five months ago…
Yeah. It’s interesting, because you have to think about it differently if you’re a platform provider of the technology. So if you’re Zoom, you control the entire experience, right? You have a dedicated mobile app development team, dedicated desktop team etc. If you’re a company like Vex, you can sort of provide components for people that, “Hey, if you use this on your application, then you will get a really great preview page, and the devices are going to work really well no matter what platform you’re on.” And that’s very challenging.
[34:24] I’m always trying to join our kind of internal demo from any device that I can, just to see, “Hey, how does it work on this?” I just got the folding Samsung smartphone, and I’ve been having a lot of fun playing with that… And that trips up the site quite a lot. People aren’t designing for that sort of a system.
So the way that I see that it could work is providing super-great tools, whether that’s SDKs that work on lots of different platforms, with just sort of like training wheels and safety built-in, where “Hey, if you want to get started and build a better application, just clone this open source repo that has examples of how it works.” You could just run that yourself, if you want; toss in an API key, and you’re off to the races. Or “Hey, here’s a really good audio picker. And we know that it’s cross-platform, we know that it works well… Just take that if all you need is an audio picker”, versus “Hey, here’s a video grid that works super-well across all sorts of systems.” And it’s kind of a moving target in the browser as well, because there are constant updates and changes to WebRTC, right? Safari did not support it for quite some time, until, if I recall correctly, Apple added FaceTime to the web. And then all of a sudden, they fixed up their WebRTC support so they could support their own product.
So long story short, it’s challenging, but I think the best way you can do it is make it so that it’s much simpler to provide those experiences, so that if someone is kind of using Vex as we hope they would, with our components, then they’ll have a good time, and hopefully, that spreads out to more and more applications.
Okay. So are you imagining your users building things on top of Vex? Or are you imagining users consuming it as end users, more similar to Zoom? Is it both? Is it one versus the other? Which one is it?
It’s definitely more of a platform that folks would build things on top of. So we provide currently a web SDK, so that you can really quickly add to a website video and audio calling. We hope to provide mobile SDKs soon as well; that’s very common with other providers in this space. But I think in order to be good at that, you have to kind of build your own applications, too. So we’re always experimenting with things, and we’ve been dogfooding our own built-on-Vex conferencing system for quite some time… And I’d like to kind of offer both. “Hey, here’s a way that you could – if you just want to replace Zoom at your company, and you’re comfortable deploying an application, here you go. Plug in the API, and your tokens, and it will work fine.” But that’s not really our goal, right? Anything that we put into an application like that, we’d want to extract out and make a kind of more generic and available kind of toolkit for folks to use to build things.
Yeah, I really like that model. I really like that model, because then you have the freedom of mixing and matching, however you want them, so you’re providing building blocks for others to build. They are mostly open source, right? Apart from the platform stuff that you need to run… And I’m sure that at some point you can have like the enterprise version to run it yourself, but there’s a bunch of things that you need to be aware of… But having those components for you to build using your applications - then it has your look and feel. It’s Personalized, and you know how you want to combine it. That sounds like the builder’s dream, I would say; just like the Tailwind, right? Tailwind is a little bit like that.
[38:13] I’d love to provide a UI component system for video calling; it’s surprisingly tough to find a really good video grid that responds super-well, and handles all the different aspect ratios and things… There’s just a lot of work involved in that, and different folks take different approaches of “Hey, here’s an API, and that’s it. You can build anything that you want to with this API. Good luck.” And someone says, “Hey, we built an API and you can’t customize it, but we’ve also created sort of a WYSIWYG type interface to it, where you can drag and drop components, and create roles, and stuff. It does a lot more of the work for you.” And I think there’s sort of an interim, kind of stepped up approach of, “Hey, at the top level, just deploy this thing. At the bottom level, build whatever you want on top of this API”, things that we can imagine. And then in the middle, a component library, a functionality; like, grab the chat component, grab the video component. That’s kind of what I would like to see. And ideally, it shouldn’t be that those components only work with Vex, right? It should work with any WebRTC system. So that’s kind of how I’m thinking about it and what I hope to be able to do over time.
When it comes to your tech stack, what are you using to run currently Vex.dev? Because if you go there, there’s just like a landing page, there’s the GitHub where some of these components are available, there’s a GitHub org… We will share some of those links in the show notes. But when it comes to your tech stack, what you run, what does that look like today?
Yeah, so I hope to – by the time this is published, we’ll actually have a link to sign up for our alpha, which I think will be pretty exciting. But right now, those are kind of hidden. In our tech stack at the moment we’re big fans of Elixir. So we use Elixir and Phoenix for the majority of the web servers, for the UI; we’ve basically built most of our things in LiveView. For the media servers, we use a system called Janus, which is an absolutely fantastic open source project that is fairly difficult to scale well; it kind of leaves it as an exercise to the reader. And so we’ve been able to, I think, do something pretty special by combining Elixir and Phoenix, and that process model, and scalability distribution, with kind of a much more black box kind of system, and sort of use the Elixir/Phoenix to code orchestrate the fastest media server that we could find.
Do you feel like Go is giving you all the performance that you may need from a CPU perspective, from a memory perspective? Or are you tempted to go further than Go?
I think that Go provides pretty great performance. We got a 20x improvement over our same Python code. There’s also an absolutely fantastic project called Pion. That’s a Go WebRTC implementation; you can build all sorts of crazy things on top of it. I think one of the reasons we might switch out of Go is that we’d love to be able to kind of orchestrate Go processes from Elixir a little bit better. So there’s really great ways to bridge from Elixir to Rust, and that’s a very common path that people take… And there are ways to do it with Go. It’s a little bit trickier. We haven’t explored as much as we should. But I think that performance 0 maybe Rust would be a lot faster, maybe not. I think they’d probably be fairly comparable. But for us, it’s like, “How can we use Elixir as much as we can, but delegate the really mission-critical, high-performance things to a faster, lower-level language, but still kind of control it from Elixir?”
[42:30] I was at a conference recently, it was the Swiss Cloud Native day in Bern… And there was this speaker, Tim McNamara - now, he’s pretty big on Rust. And I learned a few things about Rust which I didn’t know, and I was genuinely impressed with some of those. He blogs, he writes books, he gave talks, and there’s a talk which is recorded, I will put it in the show notes, from the Swiss Cloud Native day in Bern… And he talks about Rust and why there are certain advantages that only Rust has. And Go is great for the majority of things. But knowing what I know about Erlang and Elixir, and Rustler is what I’m thinking for Rust, that integration specifically… I think WhatsApp was involved with it, because they need, at their scale, certain things. And Erlang is not that great when it comes to computationally-intensive tasks. And that’s when you want something which is better. And C is used quite a bit with Erlang, because they have the same heritage… But that can be a bit awkward when it comes to integrating. It does happen, but you need some very specific knowledge, and usually legacy systems do that.
But with the growing popularity of Rust - I mean, shipping in the Linux Kernel itself, that is big. Right? I mean, you can get Rust support in the Linux Kernel… What? I wasn’t expecting that… I think it’s worth checking out. And again, some conversations with Tim, knowing about Rustler… There’s like a lot of hints that Rust is worth exploring, especially at very large scales, where a 10%, a 20% improvement can mean millions of dollars. And that’s when you start seeing the difference, where you say 10%, 20% is not that much different, but at 500,000 users, it makes a big difference. That actually means an extra 100,000. Right? You can do 600,000 for free, with the same resources.
And the memory is a very interesting approach in Rust. There’s no garbage collection, there’s none of that. And for like real-time, it makes a difference when real-time is a priority. So I’m just mentioning it, putting it out there, food for thought. Something which I’ll check out.
Yeah, I totally agree. I think the other things that are really interesting to me about Rust is kind of how much focus is put on sort of the WebAssembly side of it as well. One dream that I have - in our testing service, we’ve built this system to load-test our own system, and we think it might be useful to other folks that are trying to kind of build these types of systems… One of the things that makes it really effective is if you’re able to place these load test workers all over the world, in different network conditions. And so one thought I had would be, how cool would it be if you could just open up a browser tab and say, “Okay, now I’ve connected this worker, it’s running in WebAssembly, it’s very high-performance, it’s able to go and actually connect and run tests on your behalf… What if I could deploy a media server in someone’s browser?” That’s maybe a little crazy, but “Oh, hey, we need extra capacity for this meeting.” Someone use their laptop and open up a tab, and now you’ve got a media server running in your browser.
[46:07] I don’t think it’s crazy… I’ve seen PostgreSQL being shipped in the browser via WebAssembly. I was like, “What?! What are these people doing?” Like, PostgreSQL in the browser? Apparently, yes. So I don’t think it’s as crazy these days. Yeah, I was reading – and that’s a very good article, well written; I think Supabase wrote it. “Supabase, WASM, PostgreSQL in the browser.” Wasm.supabase.com. That is crazy. We’ll put the link in the show notes.
So I don’t think it’s a crazy idea anymore. It might have been last year, or even six months ago, but not anymore. Now, I’m genuinely surprised about some of the things that I’m seeing in the WebAssembly space; I was not expecting them. And there you go.
Who is Jason? There aren’t a lot of articles online< I haven’t seen any videos… Maybe I haven’t done my research well enough, but I’m seeing that you’re a former senior software engineer at Adobe, a senior engineer at Geometer, and for our listeners, that was episode 66, that was the intro. So thank you, Rob, for that. And now you’re a CEO at Vex Communications. But that said, I’m sure there’s a lot more to Jason, so who is Jason? Who are you, basically?
Man, that’s a…
It’s a tough one, I know.
Yeah… So I’ve always kind of been very interested in technology from a young age. I grew up in a super-large family, where my dad was always buying the latest consoles to play, and things like that… And kind of, from there, learned a great love of technology, and games… And kind of my first favorite introductions to music, for example, was kind of these soundtracks for things that I was playing… And so these days, I’ve been very focused on work, since I moved to San Francisco, obviously, but I love playing music, learning, I play piano, trying to get back into that, get better at that… I’m a big e-biker, so I bought an e-bike at the beginning of the pandemic, and it was a Rad PowerBike, and I started to kind of tweak it, and like “Oh, maybe I could upgrade the motor on this thing. Maybe I can upgrade the battery, and the controller, and all that.” And I love skiing, all that kind of stuff. But yeah, I’m much more confident and comfortable talking about tech than about myself.
You did not – you searched just fine. I tend to have a very low social media presence; that’s been something that I will have to adjust as I kind of grow into this role a little bit more.
So why Vex? Why were you attracted to this problem space? Because the journey - you explained a few things; like, you mentioned piano, you seem to be into arts, into music… But why Vex? It’s really interesting. Not many people choose WebRTC as their problem space, but you have… So what’s the story behind it?
Yeah, so as I’ve kind of got into programming – I’m a self-taught engineer; I went to a coding bootcamp, and that was sort of how I got my start. So I’ve always just been really attracted to complicated problems, and kind of learning as much as I can. I just love learning things; it’s part of why I’ve moved from engineering into a CEO position, like “Hey, I want to try that out.” But as I kind of worked my way from startup to startup, I got to Adobe, and I really liked working there. There was a lot of interesting challenges. I think it’s very cool to work at a large business, and sort of see how the sausage is made, as it were… But I got kind of bored. I felt “Hey, I really enjoy working with small teams, I really enjoy working on different problem sets.”
And so, you kind of mentioned, but Geometer is an incubator, or a venture studio that was founded by Rob Mee of Pivotal, who was on your show quite recently… He kind of reached out and said, “Hey, we’re working on this stuff in the WebRTC space. Are you interested?” I didn’t really know much about WebRTC, but – oh, hey, I stream my games sometimes when I’m playing with my friends. I’m like, “How does Discord do it? How do they make this work?” And I just spent the last year, like everyone else, working remotely, and just felt like there’s gotta be a better way to do this… And so I was, I think, at a really good point in my life, where I moved from Utah, where I was living at the time, to San Francisco, and I just started immersing myself in this problem, because of its challenge; because I think it can impact a lot of people if you provide better tools for folks to make more engaging spaces online… And how cool is it to just kind of work with video all day, right?
[52:15] As you mentioned, I play music, and things, so being able to kind of try out “Hey, what if I stream my keyboard into this room? How does that work? Or “What if I set up this webcam on a Raspberry Pi and stick it over here? How does that work?” It’s just fun. It’s just fun to play with, it’s fun to kind of – every time you try a new video tool, you get ideas on what’s good, what’s bad, you sort of see how things are working… I’m probably one of the few guests that’s just absolutely fascinated by how we’re recording this podcast… That kind of stuff is really interesting to me.
So it’s a combination of just like this insatiable thirst to learn new things and try new things… I never want to sit still, I always want to keep expanding my capabilities… And just – video is a very, very cool problem, and it affects a lot of people. And if you can do it well, you can make a big difference.
That is an amazing attitude. I have to say, it’s very inspiring listening to you talk like that. And the one thing which I’m really picking up on is that fun aspect. This is fun, right? That’s how it’s supposed to feel. It doesn’t matter what you do, if it feels fun, and you’re genuinely captivated by it, keep going. That’s it. You’ve found it. There’s nothing else, other than you continuing on the journey. So I can really relate to that. It resonates very deeply with me.
Yeah, I’m glad to hear that. I think especially if you come from that sort of self-taught, like bootcamp route, right? You’re just sort of thrown to the wolves; you get to your first job and you have no idea what’s going on, and I think the only way that you can be successful from that – or one of the ways, I shouldn’t say the only… It’s to find the fun where you can, and just keep trying to learn stuff.
My introduction to DevOps, for example, was at Mavenlink, a startup that I worked at, and I was trying to build a bot that would help us with our deploy email automation… Kind of a little boring problem, but I thought it was really interesting. I got to learn about Slack bots, and all this stuff… And it got to the point where I was ready to deploy it, and I talked to the ops team, and said, “Hey, I’d like to deploy this little bot. What do you think?” And they said, “Oh, well, we have this Kubernetes development cluster that we’ve been trying to get people to use. Why don’t you deploy it there?” And I’m like, “What’s Kubernetes?”
Wow… [laughs] That is steep. Right? This little bot… “Oh, by the way, there’s Kubernetes.” Wow… So how did you make that work? That is fascinating, as an entry point. It’s crazy.
[56:12] Because when I find something cool, I just get really excited to share it with people. “Hey, check this out. See what we can do and.” And I think there are some developers that once they’ve seen the things that they can build when they’re masters of both their programming domain and also the operations domain, they get really excited; they realize, “Wow, that’s the true full-stack engineer.” And there are plenty that don’t, too. I don’t expect everyone to go and learn all these things just because they wanted to deploy a little bot. But to me, it’s just fun. And the harder the challenge, the more fun it is for me.
So we talked during our multi-cloud discussion about GCP quite a bit, Google, AWS… But we only mentioned Fly.io. So what made you go to Fly.io? What is the story there? That must have been – I mean, you were using Kubernetes, and you still are using Kubernetes… So how come that you added Fly.io into the mix?
Yeah, that’s a great question. So we originally ran basically everything on Kubernetes, and we found that – and we still run chunks of the system on it. But we found that for certain things, what we wanted was lower-level access; like, we’d like to build the machine image, deploy that ourselves, set up the firewall rules ourselves, etc. And then for certain things - you can imagine a frontend site, a demo side of using the technology - like, you don’t need all that fancy stuff. The heavy-lifting is done by the backend service that’s running all these servers. And again, something like Kubernetes, or even learning how to do things with a VM, say to use Packer to build these images, and we use Pulumi to kind of orchestrate a lot of this stuff - there’s a huge learning curve involved there. And not everyone wants to do that; they want to ship code, and build things. And Fly just happens to be, I think, one of the easiest ways to do that, especially with Heroku kind of getting rid of some of their free tiers. It’s also very, very cheap.
We also like to support other companies that are doing things with Elixir, and you can imagine – I’d say there’s two ways that we use Fly today. The first is any little app that we want to build and deploy, it’s super-fast. We can get something running on Fly in an hour, right? As opposed to – it’s a lot more complicated when you use a more heavy-duty deployment system.
And then things where you want really great regionality. So for example, in the case I mentioned about load testing - a good test of the scalability of a system, and if you want to see “Hey, what’s our latency look like across the system?”, you’re not going to get the most accurate measure if you run all of your load test bots and your service in one cloud, right? They’re mostly just going to talk on Google’s backbone within the same system. So if I want to deploy a bunch of bots all around the world, anyone that’s worked with cloud providers knows it’s really hard to do regionality; it’s hard to set up the appropriate networks, the appropriate regions, kind of wire it all together… With Fly it’s just deploy it, and say “I want it in all these places. Go.” So that’s really cool.
Fly has its own challenges, too. It’s I think maybe where there’s a bit of a learning curve, if you want to do super-advanced things… We do a lot of UDP traffic, so I’m still kind of learning how that works on Fly, which has made it so we haven’t wanted to deploy the media server workloads there… But hey, we want to throw up a docs site? Boom, do it in Fly in second. We want to make it so our app servers scale across regions? Boom, do it in Fly, super-quick.
I’m really excited to try out their Machine API, that allows you really quick access to booting machines… Because you can imagine, if you’re doing a video call or a large event, figuring out how to scale that up and down is challenging. And one of the things that’s really a problem with some of these solutions like Kubernetes is what’s the cold boot time of a new node booting, registering with a cluster, and then being schedulable, pulling a Docker image and booting it up, right?
Minutes, right. You can do it quite a bit faster with just the VMs themselves. And Fly says that their system is just wicked fast. So I’m really excited to kind of play with that and see.
200 milliseconds, according to the docs. I really want to try that out. Machines - it will be the same one, but spinning up in 200 milliseconds? That’s crazy quick. Crazy, crazy quick. We’re talking minutes and milliseconds. That is container and VMs. Huge, huge difference.
Yeah… As I understand it, they have their own hardware, and then they’re using a tool called Firecracker, which allows you to create super-lightweight – they call them micro VMs. So it’s pretty interesting; if you’re trying to set up Firecracker, it’s really like building a VM the hard way. It’s like, “Here’s a little micro thing.” Okay, now I have to figure out how to set up the network interfaces correctly… And so I’m imagining that they’ve really kind of nailed that, where they just have capacity where – booting a Firecracker VM is super, super-fast. And so that’s something that – there’s always more things to play with. I’d love to figure that out ourselves. That’s something we thought about for, you asked, “Hey, would we run on on-premise someday?” Yeah. And I think it would be something similar to Fly, where you have your own hardware, and you’re using Firecracker VMs to create little chunks of work that you can do.
That’s really interesting. Speaking about things that you want to do in the future, what does the next six months look like for a Vex.dev?
Yeah. So we are really close to launching our private alpha. In fact, it should be open to the public when this episode airs.
Yes, please. If you’re shipping it, that’s amazing. That’s the best outcome I can hope for. So yes, please; two thumbs up from me. Let’s do it. Let’s do it. Yup.
So I think the hope for us is that we will kind of start getting some folks who want to build some things, and really moving quickly with those people to – especially if someone has a need for super-high scale, or if any of the things that I’ve talked about - deploying servers in your own system, or this sort of hybrid approach… We want to talk to those types of people. We’re going to just keep improving the service that we have, and kind of start out “Hey, you can use this for free. Check it out. Talk to us.” But I’d hope that pretty soon we can find and kind of build a really great – partner with someone and help them build their awesome platform.
So a lot of fun days of user research, trying these things out… It’s always interesting to put particularly an API-based product in. There’s a lot of different techniques for usability testing and research testing when you’re hoping that developers find it easy to use, versus a user clicking around finding it easy to use… And I think we’re really going to lean into a lot of the sort of scalability, cross-cloud sorts of things that we’ve talked about, right? I’d love to be able to go as a developer and see that my call’s running, and click in and be like “Where’s this being hosted? Oh, it’s in these regions. And I have 20 people connected from here, and 20 people connected from here…” It’ll really kind of allow you to not have to operate all this complicated stuff yourself, not have to learn lots of cloud providers, but still feel like you have very solid visibility into what’s going on.
[01:04:22.12] I think we will focus on stability, performance, monitoring, because I think that’s where we can really make a difference. It will be a little bit bare bones at the beginning, but I think you’ll find that we move pretty quickly and are excited to deliver just an awesome, scalable, reliable product.
That’s what matters. Keep shipping it, keep improving, week on week… It always should get better, and you’re always learning; that should be constant. And then everything else will take care of itself. That’s at least how I’m thinking of it.
What about the team? Are you growing the team? How many are you? We haven’t spoken about that. Is it just you? I don’t think it is just you. So tell us a little bit about that.
Yeah, so I’ve been working with Geometer for quite some time, and we’ve actually had some of their engineers working on this project with us. So at the moment, there’s me and my co-founder, Sam Pearson, who is an awesome engineer. So he and I are the Vex team. And we’ve been working with three or four folks from Geometer as well.
Now, Geometer, they’ve brought me, that we talked about, is kind of about to make a few changes themselves. We’ve learned a lot as a team building and operating these services, and so Rob is going to be spinning out a new company; it’s currently codenamed P3, but kind of a spiritual successor to Pivotal Labs. And I think a lot of what they’re trying to do is not only do things the same way that the Pivotal Labs brand, of pairing and kind of working with customers and helping them to build things, but also operating. You know, “Hey, we’ll build something for you, and we don’t want to then hand it over and give you that burden. We’ll operate it for you, too.” So unfortunately, Vex is going to be losing some of our great engineers to kind of go work on that… But as soon as we start kind of landing a few initial customers, we’ll be hiring, especially looking for folks to help us with our frontend SDKs, help us kind of start building out that component library… You always need super-talented WebRTC folks, since that’s just an amazing, complicated domain in and of itself. But we’re kind of hoping too that as Geometer kind of spins out these other companies, and has this new kind of consulting arm, that we might continue to work with them, or even drum up business for them, and vice versa. “Hey, client, do you need a video conferencing system? It turns out we’ve got a really great one that we could build on top of for you.”
That sounds pretty exciting. As we are preparing to wrap our conversation up, what would you say is the one key takeaway that you would like our listeners to have, the ones that stuck with us all the way to the end?
[01:07:18.10] I’d say that building a great product, or – like, building a startup, it sounds so scary. And it is, in a lot of ways. But like I said before, it’s really that curiosity of – you know, I didn’t know much about WebRTC before I started this. Right now, it’s my whole job. And it sounds hard, and it always is, but if you have that right attitude of being in it for the learning, in it to help others and work with others, that takes you really far.
I’m just a bootcamp kid that was lucky to have a lot of great mentors along the way, and make his way into various companies, and just riding that as far as I can. There’s nothing special about that. I just kind of learned these things as I went along. And as long as you’re having joy in doing that, and finding that - that’s what you want to do. Don’t focus on the total comp, don’t focus on the prestige of “Oh, I’m a startup guy”, or whatever. Focus on having fun, focus on doing good things, with good people. I think that’s taken me far, and I think it takes other people far.
And if you’re into video and audio streaming, check out WebRTC For the Curious. That’s one of the best sort of initial introductions. So if any of this stuff sounds interesting and exciting to you and you want to learn more, check that out. That’s kind of where I got my start in this field, and it’s just still a great resource.
That’s all really good. Thank you, Jason. I’m so curious to see what you do next. I think it’s going to be amazing once you get it out there, once you’ll get the feedback, once you realize all the things that you didn’t even know were a thing. I mean, that’s just a magical moment, right? All the things that you could be doing, that you should be doing, and then starting to figure out what is your next step, and what is the most important, and so on, and so forth. And then from there, it’s like a rocket ship, most of the time. It’s just like reacting, and things are happening… It’s just so exciting.
Yeah, I think there’s a phrase, “No battle plan survives contact with the enemy”, and I think that’s very true in startups in this world, too. We have a kind of idea – we sort of focused on high scale as our initial value proposition. It could be that people are like “Hey, that’s cool and all, but we really need this other thing. We’re doing crazy stuff with virtual reality” or “We have lots of little devices that we want to stream…” So I’m just excited about that. I think that’s the fun part. You can only expect so much, and then once you actually start working with people and seeing what they want to build, they just always surprise you.
I’m really excited about you. I’m really excited about what you do next, about what Vex does next. I’ll be following it closely, and who knows, maybe in six months time or a year time we do this again, and we will share all those learnings and all those highlights, and the lows, the lessons learned, all of that. So I’m super-excited. Thank you, Jason, for joining us today. Looking forward to next time.
Thanks, Gerhard. It’s been an absolute pleasure.
Our transcripts are open source on GitHub. Improvements are welcome. 💚