Typesense is truly open source search with Jason Bosco, co-founder & CEO at Typesense Search (Changelog Interviews #505)

All Episodes

This week we’re joined by Jason Bosco, co-founder and CEO of Typesense — the open source Algolia alternative and the easier to use ElasticSearch alternative. For years we’ve used Algolia as our search engine, so we come to this conversation with skin in the game and the scars to prove it. Jason shared how he and his co-founder got started on Typesense, why and how they are “all in” on open source, the options and the paths developers can take to add search to their project, how Typesense compares to ElasticSearch and Algolia, he walks us through getting started, the story of Typesense Cloud, and why they have resisted Venture Capital.

Changelog++ members get a bonus 5 minutes at the end of this episode and zero ads. Join!

80 minutes
Recorded Sep 7, 2022
Published Sep 9, 2022
Download (77MB)
Transcript
🎧 34,769

Featuring

Jason Bosco – Website, GitHub, LinkedIn, X
Adam Stacoviak – Website, GitHub, LinkedIn, Mastodon, X
Jerod Santo – GitHub, LinkedIn, Mastodon, X

Sponsors

Fly.io – Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights

InfluxData – All of the open source software InfluxData creates is either MIT-licensed or Apache2-licensed. These are very permissive licenses. But why are they all for permissive licenses? Paul Dix shares his thoughts on the spirit of open source and why freedom, evolution, and impact drive them to license InfluxData’s open source software as permissively possible. Learn more at influxdata.com/changelog

Retool – The low-code platform for developers to build internal tools — Some of the best teams out there trust Retool…Brex, Coinbase, Plaid, Doordash, LegalGenius, Amazon, Allbirds, Peloton, and so many more – the developers at these teams trust Retool as the platform to build their internal tools. Try it free at retool.com/changelog

Notes & Links

📝 Edit Notes

Chapters

Chapter Number	Chapter Start Time	Chapter Title	Chapter Duration
1	00:01	This week on The Changelog	01:22
2	01:23	Sponsor: Fly.io	02:01
3	03:44	Welcome Jason	00:29
4	04:13	Search landscape and getting started	06:45
5	11:02	Good compression!	00:22
6	11:26	It can't be THAT hard	03:51
7	15:18	Typesense is GPLv3	03:49
8	19:07	The license of Linux	01:26
9	20:45	Sponsor: Sourcegraph	01:42
10	22:36	Strategies for implementing search	06:13
11	28:49	Can't the data live in one place?	03:22
12	32:12	Typesense vs Algolia vs ElasticSearch	02:31
13	34:43	Using the pain of ElasticSearch to improve Typesense	03:23
14	38:07	How do I use it?	02:07
15	40:14	Typesense admin UI	01:26
16	41:41	On prem?	01:53
17	43:47	Sponsor: InfluxData	01:36
18	45:23	Sponsor: Retool	00:56
19	46:37	Typesense Cloud and resisting VC	06:23
20	53:00	The tension VC can bring on creating value	01:49
21	54:50	The sheer size of the search market	03:27
22	58:16	Transparent pricing for Typesense Cloud	04:36
23	1:02:52	Typesense Cloud helps improve the open source binary	03:04
24	1:05:57	Algolia's give it for free to open source playbook	03:10
25	1:09:07	What happens when AWS offers Typesense as a service?	03:55
26	1:13:03	What's on the horizon? What's next?	02:40
27	1:15:43	Closing out the show	00:36
28	1:16:22	Outro plus a clip from #353 with Adam Jacob	03:46

Transcript

📝 Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. 🎧

So Jason Bosco is here to school us, I guess give us a glimpse into building a search engine, the algorithms behind it, not taking venture, making it open source… A ton of fun stuff. One of the co-founders behind Typesense. Jason, nice to see you. Welcome to the show.

Thanks, Adam. Thank you for having me. This is exciting to be on the show.

We are excited to get schooled about search engines, open source things, and all the stuff Adam just listed, specifically what’s going on in search engine land. It seems like there’s lots of interest and hype around open source search engines, ElasticSearch etc. And I don’t know, my thumb’s not on the pulse of search, like what’s going on these days… Typesense looks cool. I wonder what else is out there. People are always working on making better wheels, and we’ve had plenty of them along the years. Jason, maybe tell us how you got into search, and then give us maybe the lay of the land of what’s going on and what’s kind of innovative in the search space.

Yeah. So we got into Typesense in 2015, and the lay of the land back then was ElasticSearch was worse, and maybe it still is the dominant player in the search space… So pretty much think about anything related to search and you will eventually land on ElasticSearch, because they have so much content out there… And it’s a super well-adopted product.

So that’s where we were in 2015… I was working at Dollar Shave Club. My co-founder, Kishore, he was working at another company in the search space, or in the space which required search as one of the tools that they needed… And we were just quite frustrated with how complicated it was to get ElasticSearch up and running, and scaling it, and fine-tuning it. In my personal experience I’ve had at least two engineers spend a week or so every two to three months fine-tuning our ElasticSearch clusters as we scaled it… And it seemed like there was just too much machinery that needed to be handled to get search working. And our use case at Dollar Shave Club was seemingly pretty simple, which was to be able to search for customer names, emails, addresses, phone numbers when they write in or call in, for our support agents to look up customers. So it seemed like a pretty simple use case, but then the amount of effort it involved to get that going seemed out of whack with the feature – or it seemed simple.

[06:08] So anyway, that’s how we started out with the idea for Typesense. So it was more what would it take to build our own search engine? Something that’s simple to use. It was a little naive at that point; it’s something like “Can I build my own huge database, or a huge piece of software that people have spent decades working on? Can we build our own?” But we stuck with it, we started reading up papers on how such algorithms work, and what goes into building a search engine… And now, looking back, I see how much work is involved in doing search engine…

It’s been a long time since 2015.

Oh, yeah. It’s been seven, eight years now. So now I know how much work is involved, so I’m glad that naivety is what helped us bridge the gap of “Okay, let’s just stick with it.” And it started as more like an R&D project, as a nights and weekends thing… there were no time commitments, or deadlines we were trying to hit… It was just chipping away, little by little… And so even though we started working on it in 2015, 2018 is when we actually got to a stage where we were like “Okay, now it’s good enough to be used by maybe someone other than just the two of us.” And 2018 is when we open sourced it.

One of the bets that we took at that point in time was we wanted to put all the search indices in memory, whereas in 2015 disk-based search indices were the norm. That’s what ElasticSearch was doing, and there’s another search engine called Solr, which actually predates ElasticSearch, and everyone used disk-based indices, because disk was cheap, RAM was expensive. But then what we figured at that point was RAM was only going to get cheaper as years rolled by… And we said, “Let’s put the entire index in memory.” Of course, the trade-off there is that you get fast search, because if you put everything in memory, it’s as good as it’s gonna get in terms of speed. But the trade-off is if you have petabytes scale of data, then there is no petabyte scale RAM available today, unless you fan it out across multiple machines. Of course, AWS, for example, has a 24-terabyte RAM VM that you can spin up… But it’s still expensive compared to a 24-terabyte disk.

So I think that’s the sweet spot where we figured Typesense would fit, is if you have massive volumes of data, like for example logs, application logs, or tons of analytics data, that it would be very expensive to put on RAM, than use disk-based search… And that’s where ElasticSearch and Solr play in. If you want instant - we call it “search-as-you-type”, that’s where something like Typesense fits in; you can put everything in memory, and get fast search results.

So that’s how we started working on Typesense, and after that, once we open sourced it in 2018, it was just a matter of – not just a matter of, but we were just listening to what users were telling us, and just adding features one by one.

Another interesting thing that happened in parallel is that there’s another search layer called Algolia, and they have pretty – they’re a close source SaaS product, but then they have very good visibility among developers, because many documentation sites use Algolia, because they give it out for free for open source documentation sites… So for any documentation, it’s usually - you’ll see a little “Powered by Algolia” logo, and that has worked very well. And Algolia is a fantastic product, but something that ended up happening was they kept raising their prices, and then Algolia users started discovering Typesense, and started asking us for features that existed in Algolia… And then we started adding those, and then we eventually got to a stage where we were like “Okay, I think now we have a sufficient number of features where we can call ourselves an open source alternative to Algolia.” And I think that resonated with a lot of people, because Algolia is a very good product, very well known, and solves for actually many of the pain points that ElasticSearch has, from a developer experience point of view.

[10:09] So they essentially simplified search, and then spread the word around that “Hey, search need not be this complicated.” And then once we started saying, “We’re an open source alternative to Algolia”, people quickly connected that, “Okay, this is what we’re trying to do with Typesense as well, which is a good developer experience, fast search, and easy to use, to get up and running with search.”

So then we started seeing good traction, and then people started asking us for a paid version, where we hosted for them, because they didn’t want to host it… That’s when we realized that we have a business model in front of us, and people are telling us that they will pay us if we had this… And I just couldn’t let the opportunity go by, and we quit our full-time jobs and started working on it full-time in 2020.

Okay. Exciting.

Very exciting.

Yeah, yeah. I compressed I think five years’ worth of ups and downs…

That’s a good compression algorithm you’ve got there on your history…

It’s about one minute per year. Good job. [laughter]

But yeah, it was a fun journey.

Yeah. if your code is anywhere near as good as this, then you’ll be in good hands with Typesense. So gosh, where do we go from there…? First of all, I was just thinking back that you mentioned the naivety of like “It couldn’t be that hard.” Like, how many businesses are started with such statements…? Which reminded me - I mean before we started recording, you were talking about the Hacker News crowd, and how often you see those kinds of statements on Hacker News when somebody releases their product. “I could build this in a weekend.” And it’s like, first of all, no, you couldn’t. But second of all, we all understand that sentiment, because that core thing of course you could do in 48 hours, maybe 72 hours, or whatever it is… But you’re so far from finished at that point. You almost have to have - whether it’s arrogance, or naivety, or a combination, whatever it is, to say “I’m actually going to try this” and get started, and get going… And have a different idea, like “Hey, what if we put everything in memory?” to even start on a journey that’s gonna take seven years; and I’m sure it’s just getting started, right? Like, you guys aren’t at your finish line; you’re just barely off the starting line.

So it’s always cool to see when a story like that comes to fruition… Even though it’s like so often that is the story. It’s like, “Yeah, it couldn’t be that hard.” And now seven years later, you’re like “Actually, it was really hard. And it still is.”

[laughs] Yeah, yeah… I think what I realized - like you said, maybe the core of stuff you can get done in a weekend or whatever unit of time which is smaller; it’s maybe a little closer to what you have in mind. But the iterations on top of that, to actually make it a product that someone other than you can use - that is what takes so much effort. And it’s not even just effort on your side; of course, you have to invest a lot of time, but it’s also interacting with people who are using the product other than you, getting that feedback, and then iterating based on that feedback. I think that is what takes a lot of effort and time. So even if you were to iterate by yourself, for whatever X amount of years, I don’t think the product will be as mature as being able to iterate with actual people using it and giving you feedback.

Case in point, for example, for us. At one point, we tried an open core model with Typesense, where there were some features where we held back from the open source version and said, “You have to pay for a premium version.” And then eventually we did away with it, because what we realized was the features that were in the open source version - more people were using it and giving us feedback, so it was generally more stable and more feature-rich than the features that we held back, because a less number of people were actually paying for it and giving us feedback.

[14:03] So ironically, the closed source features that people are paying for ended up being the ones that had a little less stability, and less maturity… And that’s when I realized, “Okay, this is hurting us by keeping some parts closed source, because people are just not adopting it as well as we’d like.” And at that point, we just open sourced 100% of Typesense. And after that, we uncovered a series of bugs in what used to be the closed source features and we quickly addressed them, and people started asking us for more features in line with what we already had, like improvements on those features, and it suddenly skyrocketed the amount of how useful those closed source features were, because people kept asking for more things on top of that.

So I feel like that is actually a good example of - product maturity comes from actually talking to users, and iterating based on that, rather than just building it yourself and thinking that it’s gonna be awesome. I think that’s needed in the beginning, because you need to have a point of view on what it is you’re building and defend that, but after that point, I think talking to users and getting feedback, and building based on that - I feel like that has been our super-power; our not-so-secret superpower, I guess.

Yeah. Since we’re on the note of (I guess) licensing, to some degree, it’s GPL v3 licensed.

Yeah. So we initially started out with GPL v2, and then someone pointed out that GPL v2 was not compatible with some other license, so we changed it to GPL v3. But still we stuck with GPL instead of MIT or Apache, because at least in my opinion, GPL is an open source license, which encourages other people modifying the source code to contribute that back. And of course, that’s a big debate, what is open source… But my philosophy at least is that if you’re taking advantage of an open source software, and if you’re modifying that software, then it’s only fair to ask you to contribute that back to the community, versus taking it closed source… Versus something like an MIT license, or Apache - what I’ve seen happen is open source projects end up getting modified, and then that modified version ends up getting closed source, which kinds of goes against – it’s almost like a take and not give back model. So that’s why we’ve kind of stuck with GPL.

And of course, the most stringent version of it, which is AGPL, and that, it seems like people tend to avoid as much as possible. I’ve heard, for example, at Google they just don’t use AGPL license for anything…

I’ve heard that as well.

Yeah. And ironically, I was on that side of the table at Dollar Shave Club, for example… I was the one who had to say, “No AGPL license software.” Because just during a re-round of fundraising, for example, the lawyers would ask us, “Give us a list of all your open source software you’re using, all the licenses”, and if there’s anything that’s AGPL, or anything that’s a little off Apache we’ll get asked questions, “Why are you using this?” and then more discovery into “Are you using it the right way? Did you modify it?” Just a lot of conversations need to be had when you use anything that’s AGPL. So that’s one reason we haven’t gone down the AGPL path.

So far, it’s worked out well… And I guess the best model for that is the Linux Kernel; it’s as popular as an open source project is gonna get, and they use GPL for a license, and it’s worked out well for them. And that’s what I usually tend to point to developers sometimes when they ask “Hey, if it’s MIT, I’d be more inclined to use it.” Then I point out that, “Hey, you’re probably using something Linux-related, and that is GPL.” So it’s a very similar model. I think there’s a lot of misunderstanding about how GPL works in the industry, and that is definitely a friction point… But I think the benefits outweigh the risks, I guess, for us to change the license.

[18:18] Yeah… There’s kind of like a freedom of ignorance with the MIT license, where it’s like – this one, and the Apache too, and the BSD license. It’s the very permissive ones, where it’s like, I don’t have to think about it, I’m just good. Where it’s like, “Okay, the GPL and the AGPL - I need to understand what exactly I’m getting myself into.” And once you do, it’s not that hard to understand the implications. I mean, it can get hairy, especially if you’re trying to build businesses and stuff… But I think the “I’ll just MIT it and forget about it” kind of thing is kind of throwing caution to the wind. And it’s nice for adoption, because you can just green-list or whatever, and go ahead… And all these MIT-licensed projects are just good to go; you don’t have to think about it.

So I can definitely understand that; you have a good example of a GPL project that’s massively adopted and popular. I wonder how often we don’t think about Linux in our infrastructure as much as we think about a database, or a search engine… Even though Linux is the underpinnings most of the time, for all that stuff. But for some reason, it’s almost like so low-level that you don’t even consider the licenses of your operating system maybe.

Right, right. Yeah, and I think that’s probably success of the GPL showing itself, where once a product is so popular that it seems like it’s everywhere, but then there are different flavors of it, all coming from the core source, and it still didn’t hurt the adoption of the Linux Kernel… So it kind of shows that GPL can also be a very successful model. And I’d say that maybe also helped the core project mature much faster, because all these modifications that were done were being contributed back into the open… And that helped evolve the product much more faster, versus a bunch of people forking it into private forks, and then making their own modifications without contributing back. Who knows, maybe that might have hurt how fast the core Linux project evolved over time. But again, this is just my hypothesis.

Yeah. The hard part is we can’t fork history and run both experiments in parallel. If we could just do that…

Wouldn’t that be nice…

That would be nice. We need some version control systems inside of our timelines.

So let’s go back to search now. We were kind of on the licensing beat… But if we go back to just thinking about search - any organization that has interesting data, like if it exists long enough, there’s going to be a request for search, right? Otherwise, the data is just not interesting; because everybody wants to poke at what they have, and learn things from it. As an indie dev and as like a small teams/small customers developer most of my days, I kind of had two strategies for search. Strategy one was, “Can I do it inside of Postgres?” Like, I can [23:05] some full-text search inside there. Is that good enough? And for a lot of cases, that’s just good enough.

And then it gets really hard from there… And I was never going to do an ElasticSearch, or like add another appendage to my infrastructure. So from there, I’d go straight over to services. So it’d be like “Can I do it in Postgres? Or is it going to be an Algolia?” Or there’s one called Swiftype; I’m not sure if they’re still around… But you know.

They got acquired by ElasticSearch.

Oh, okay. So they’re gone now. They were cool for a minute. I liked what they were up to. I think I actually had my blog on Swiftype for a little while; they just provided probably a lot of the stuff that Typesense provides. But that was basically it. And I’m wondering, what are other options? Like, is that the fork in the road for most people? …like, “Well, ElasticSearch or Apache Solr”, with infrastructure needs.

[24:00] When I looked at it – it’s not like I’m just afraid of adding things to the infrastructure. It’s like, you know, I’m not a DBA or an ElasticSearch BA… It seems hard. And one thing I’m liking about Typesense just reading about it is it seems pretty simple. No dependencies, C++, compile it… It seems like it’s pretty easy to run. But I’m just wondering how, from your vantage point, working in larger companies than I usually work with, is it basically that, like ElasticSearch or Solr, or a service? Or shove it in your RDBMS of choice, or what does Mongo have built-in, etc?

Yeah, so I think most people just start out with database search, and your standard LIKE SQL queries on Postgres and MySQL… And it works for relatively small datasets. Because when you use a LIKE query, unless for example if say “it starts with” queries that uses the index if you set an index on the field, at least in MySQL; but anything that – if you’re trying to search in the middle of the string in a field, things like that, basically it’s scanning the entire table and you just start seeing performance issues.

So once your dataset is large enough, plus you need to do more standard things that typically search engines do, a thing that’s called faceting… So in the results, if you want to say “These many number of results have a status of active, these many results have a status of inactive”, or whatever your field is, if you want to group like that… So you combine. And then the stability will come in need for doing some sort of fuzzy searching… So you wanted to account for typos, to make sure that misspellings still fetch the results that you’re expecting… So as you add each of these, you can still do a lot of this with Postgres, for example, but performance is the key thing that starts taking a hit once you have a sizable amount of data.

So that’s the point when a search engine can help, where you do have to then build something to take the data from your Postgres or MySQL, or whatever database you have, and then sync a copy of that into your search engine. And what a search engine essentially does is it builds indices that are custom, or are optimized specifically for full-text search, with typo tolerance, and faceting and the standard things that you need with search.

So because it’s optimized for that, it’s going to return results, whereas a database is more meant for – that’s more concerned about consistency, and making sure your data never gets lost, and transactions, making sure parallel writes still end up with a consistent copy of the data, and things like that… Which is why usually we say search engines are not your primary data store. Instead, it’s a secondary data store when you sync a copy of the data from your primary data store.

Now, interestingly, like you said, once you have data, you eventually need to search on it, or run some sort of aggregations on it. And I think, over time, databases also have realized that, which is why you see something like Postgres add full-text search support within it. And then I know for example MongoDB added full-text support within it… And even Redis added full-text support–

Really?

…yeah, in one of the latest versions. So it seems like everyone is realizing that full-text support is a thing that databases need… But then the type of indices that you need to build to support both a full-text mode and your standard data storage model is different. And that’s where you have dedicated search engines that do that one thing well, versus databases trying to offer everything that works reasonably well for the full-text search use case as well, but then again, it’s not optimized specifically for fast full-text search. So once you run into that, that’s when you take the hit of “Okay, I need to build some machinery to get the data from my primary store into my search engine”, and then you hit your search engine for search results.

[28:02] Another interesting use case though is for – even though we call it a search engine, search engines typically also have filtering capabilities, where you can say, “Get me all records which have this particular value for this field.” I know some users for Typesense, for example, are using it as essentially like a caching JSON store, because you can just push a bunch of JSON, you can search on that JSON, and you can also get JSON documents by ID. And since they’re anyway replicating a copy of the data into Typesense to search on it, some users are actually using it as just another JSON store in front of the database so that they don’t have to hit the database for any heavy queries, which is another interesting use case for Typesense.

That is interesting. I have felt the pain of marshaling – I don’t think marshaling is the right term here… Syncing data over to a search store. And I’m wondering if there’s ever been an effort, or are there projects that just say, “Don’t send your data over to the search; just point your search at your database, and then maybe configure it for what you want”, and it can exist in one place, and this could be a proxy; like you said, you could use it however you want, and it has maybe read-only access or something, so it’s safe and it’s not gonna like destroy stuff… Or does that have performance implications that are massive?

So in fact there are projects which do this; Airbyte, for example, is one company that I know is doing it. They’re actually building an open source way to transport data from one source to a different destination. And I think FiveTran does it… There’s a bunch of different startups that have attempted to do this. But when it comes to search engines, usually if you replicate an exact copy of the data into your search engine, you’re probably going to be replicating things that you don’t want to search on… Or you might want to change the shape of the data a little bit before putting it into your search engine, so that it’s more optimized for the types of search queries you’re running, instead of replicating a structure that works more for your application querying the data.

What I’ve seen is even though there are many of these frameworks out there - another one Singer framework, I think; that’s another open source project that does this. But even though there are a couple of these out there, it seems like you eventually end up having to transform the data a little bit, so that it’s more optimized for your search use case. At that point you have to customize that logic yourself, and eventually people end up writing their own transformation layer, and building it themselves, maybe on top of one of these. But still, there is some customization needed.

So I don’t think – given that the access patterns are different, just mirroring your entire dataset usually will mean that you’re probably storing more in your search engine that’s actually needed, which might increase your costs, you have to deal with more data going through the wire, so consistency issues, for example… So eventually, people end up building their own custom sync scripts.

So it’s sort of unavoidable, because you’re either going to do it upfront, or you’re gonna do it slowly, probably not as well, eventually, as you use it anyhow.

Right, right.

Okay, that’s too bad… It’d be great if you could just point it and be like “Hey, just index this thing differently.” It’d be awesome.

Oh yeah, I wish maybe one of these frameworks allowed you to also set up transformation rules on the fly…

Yeah, exactly.

…especially if they allow you to join – that’s the most common transformation that I’ve seen, is joining data from two different tables in your relational database, and putting it into one flattened structure. Because in a search engine, you typically flatten out your data, because if you do runtime joins, it’s gonna slow down the search.

Right…

So if they allow you to set up joins at transformation time, I think that’ll be an amazing product.

Add it to your roadmap, Jason. C’mon.

[32:02] [laughs] Yup, yup. Yeah, I think we have – there are a lot of search – core search use cases for features… So yup.

So you said 2015 was your begin date? It’s, by my math, 2022 now, so that’s seven years…

Good math, good math…

You’re compared to Algolia, you’re compared to ElasticSearch… How well do you think you compare to Algolia and to ElasticSearch? Do you think you’re a pretty good one-to-one? Do you win in most cases? What makes you win? What’s your differentiator?

Yeah, so I would say it depends on the use case. So if you’re looking at feature parity, I would say we’re – because we’re closer in spirit Algolia, I would say we’re at 85% feature parity with Algolia. Most of the features that we don’t have today are things related to AI, or any machine learning-related features that Algolia has out of the box. With Typesense you have to bring your own machine learning model and integrate that into the search engine. So with Algolia we’re at 85% feature parity, and even with that, a good number of Algolia users are switching over on a regular basis.

ElasticSearch though is a different type of a beast, in that they do app and site search, which is what Typesense and Algolia do, so a search bar on your website or apps… They also do things like logs search, they also do anomaly detection, they do security incident monitoring, they do analytics, and visualizations if you’re using a Kibana stack… So they have a whole bunch of umbrella of search-related use cases that’s of course built on the core Lucene engine, but it’s still customized very well for a whole plethora of use cases.

So I wouldn’t say we’re at feature parity with ElasticSearch by any stretch, because they do a whole bunch of different things. What we’ve done with Typesense is essentially just taken the site and app search piece and we’re trying to simplify that, and have an opinionated take on what sort of features or parameters are allowed to be configured, and we’ll choose defaults for you. So this is an opinionated take on app and site Search.

So given that our goal is not to be feature parity with ElasticSearch, even if it’s just site and app search, if we become feature parity with ElasticSearch, then we’ll also invite the same level of complexity, so that is not our end goal. Instead, we want to see what use cases people are using Typesense form and then building an opinionated thing that works out of the box for, say, 80% of the use cases.

So I’d say we’re nowhere close to feature parity with ElasticSearch, to answer your question, but that’s by design, because if we did do that, then we’d end up becoming another ElasticSearch, and that’s not what we wanna do.

Yeah. You also said the frustration you had early on was maintaining the ElasticSearch instance; not just the code behind it, what made the code work and be able to be a great algorithm to search and transform data, and be real-time, or whatever the needs are for the engine… You mentioned maintaining the actual ElasticSearch infrastructure took hours every couple of months. Can you talk about how you’ve changed, how you’ve used that pain to change things with Typesense?

Yeah. So with ElasticSearch, part of the complexity comes with the fact that it runs on top of the JVM, and fine-tuning the JVM itself is such a big task. And then you have to configure ElasticSearch’s parameters on top of that. I was recently – I actually grepped the ElasticSearch codebase for the number of configuration parameters that they have… It’s almost 3,000 different configuration parameters to do various things… And you need to figure out which of those parameters apply in your specific use case, to fine-tune that on top of, of course, the JVM configuration parameters.

[35:52] So that dependency on the JVM was one big thing that we avoided with Typesense, because we built it in C++, so there are no other runtime dependencies. It’s a single binary, so you just use a package manager to install it, or download and install the binary, with zero other dependencies. So it’s a single binary that you start up, and it scales without any fine-tuning, and that’s something we’ve done in Typesense; we’ve set the sane defaults for many of the configuration parameters, so that it scales out of the box without you having to tweak some parameters.

For example, I’ve seen users do – without any fine-tuning, there was one use case where this one user did almost 2,500 requests per second on their particular dataset; it was only 4,000 records, but still, on a 2 vCPU node with just 512 MB of RAM they were able to get almost 2,500 requests per second from a Typesense cluster, without fine-tuning anything; just installing it, indexing the records and, and running a benchmark against it.

So that’s what we optimize for, which is out of the box no finagling with all the knobs; it just scales out of the box. So you throw more CPU at Typesense, it just takes advantage of it without you having to do more work to take advantage of all the cores. So it’s - use all the resources available that you provide Typesense; that’s the model that we’ve gone for with Typesense… Versus ElasticSearch - in addition to adding resources, you need to make sure it’s configured to take advantage of them in the best way possible.

And with Algolia you don’t know.

Oh, great. Yeah. With Algolia I don’t think they allow you to benchmark their services. Plus, if you’re benchmarking because they charge by the number of requests that you send them, if you benchmark it, you’re probably – even if they allow benchmarking, you’d probably have to pay a ton of money to just run the benchmarks. For example, if you’re doing 2,500 requests per second, you’re paying $2.50 per second, for however long you run your benchmarking. It’s based on that public pricing. So it will be very expensive to run benchmarks on Algolia.

So let’s say you yum-install Typesense, or d package install, or whatever it is with Homebrew, pick your distro choice and do the standard package management installation… Then what do you do? Is it provide an API that listens on a port? How do you start to use the thing? Let’s just say I have a typical 12-factor web app with like a database. What do I do from there? I have a Typesense now, I’m sure it’s registered as a service or something on the operating system, so it’s going to start when the OS boots, and it’s going to turn off, and stuff… How do I use it?

Yeah, so Typesense we’ll start listening by default on Port 8108, which is the standard port that we’ve chosen… And an API key is auto-generated for you if you use one of the package managers to start Typesense. So you get the API key from the config file, and then you look at the documentation and just use curl to first create a collection, and then you send JSON data into it in another curl command… And then that’s it; it’s indexed, and then you call the Search API endpoint, again via curl, or directly you at that point start building a search UI, and have the search UI make such calls out to Typesense with an API key that you generate just for search purposes.

So roughly, it’s just two steps to get the data into Typesense, create a collection, and then index your JSON data. And then the third step can be as complicated or as simple as you need it to be, but at that point the data is ready to be searched, either via curl or through a UI that you build.

Okay. So it’s all just JSON – let’s say the data is in there already, and I’m doing queries against it; it’s just gonna send JSON back and forth.

Correct. Yeah. It’s all JSON, and a RESTful(ish) API.

RESTful(ish). Isn’t RESTful already has the -ish in it; that’s the full part, right?

That’s a good point, yes. [laughter]

But I know what you mean, because REST is not exactly what we all think of it; when you look at the full thing, there’s a lot there.

[40:12] Yeah.

Okay, cool. What about administration? Is there any sort of UI for Typesense itself? Is there an admin, or is there – I know it’s supposed to be sane defaults, but what if I do decide I want to save some RAM;, or I don’t know, whatever… I’m sure you have some configuration.

Yup. So on the self-hosted version, it’s an API-only thing. We don’t publish a UI. But there is a community project where people have built a UI that you can basically hit all the API endpoints. So it’s almost like a Postman, but on top of that there’s a nice UI to look at the collection schema, and things like that.

And then on Typesense Cloud we do have a UI that’s built in, that’s built by the Typesense team, and that comes with things like role-based access control, so you don’t have to share API keys, and permissions, and all that good stuff that if you’re in a team setting, things that might be useful there, we put that in, at least on the UI front, in Typesense Cloud. But we actually run the same open source binaries that we publish on Typesense Cloud as well. So it’s exactly the same right versions that we published that we run on Typesense Cloud.

Yeah, that’s super-cool. And I think hosting is an obvious business model. Obviously, it’s working well so far, better than the open core, which was giving you probably indigestion… At least that’s how I think of it; like, to decide where to put stuff, and then as you confessed to earlier, the open source stuff was more solid than the proprietary stuff because of the fact that more people were using it. Have you considered on-prem as another way of going about it? Because a lot of orgs, I would assume, want search, but they don’t want hosted search, because their data is precious, and they may have regulations, or they may have security concerns… And you think you could make money with an on-premise version, even though I could just, you know, yum-install and run it myself… But I don’t know, maybe the tooling around it that y’all are building for the hosted version could be value-add for larger orgs.

Yeah, we did consider it. I guess we just didn’t go down that path because of the complexity of maintaining on-prem installations. Because on Typesense Cloud we have like full visibility into the entire infrastructure… And we’ve built monitoring infrastructure; those are not really directly related to Typesense, but still, monitoring tooling that helps us monitor the Typesense Cloud clusters - installing something like that on an on-prem environment… I mean, it’s possible, we can probably set up VPCs and private networks and all that stuff, but it’s just added complexity that we didn’t want to take on just yet. So I think it’s just maybe a matter of time if enough people ask us for it.

Today, it seems like - you know, if people say, “Hey, we need to be HIPAA-compliant”, for example. We’re not HIPAA-compliant on Typesense Cloud… Then the only option is to self-host. I tell them “If you need additional support, we can do like a support agreement separately, and help you”, but then being on-call and doing production-level support for stuff running on someone else’s infrastructure where you don’t have complete visibility… I haven’t yet come to a point where I can digest doing that… Unless we figure out more ways to make that efficient, I guess.

Right. Or the number has to be good enough, right?

[laughs] True.

Like, it’s gotta be that worth it to go through the headache.

So Jason, you mentioned Typesense Cloud for the first time in this conversation… Now, I assume that I see a pricing tab, and this is hosted, this is your ability to make money, your ability to resist venture, potentially attract venture… This started as a nights and weekends project… How did you get to – did you ever think you’d be here? You know, launch cloud, and self-fund… What’s the story there?

Yeah, so I think Typesense Cloud is a product that our users essentially pulled out of us… Because when we started working on Typesense – I mean, we didn’t think we’d build a company around it in 2015, if you had asked me… But eventually, once we open sourced, and let’s say in 2018-2019 we figured, “Okay, we probably need to figure out a business model here to make sure this is a sustainable open source project.” And then we tried the open core model, and that didn’t go too well… And then people eventually told us that they will pay us if we hosted Typesense for them. So that’s essentially people telling us that they are ready to pay, if only we had a hosted version. So that is how this came about; then we started building Typesense Cloud just based on people asking us for it… Which is, I’d say, a nice place to be in. So me and my co-founder have probably built like 12 or 13 different products in the last 15 years, and some of them did well, some of them didn’t get too much traction… But every product in the past, we would build the product first and then hope it makes money… And that used to be our operating model. But with Typesense we were in a different place, where people were telling us that, “Hey, do this and we will pay you.” So it was nice that when we launched, that week we had people paying us already, once we launched Typesense Cloud.

[48:26] So that’s when we realized there is a real problem that people are willing to pay to have solved for them… So we started just mentioning Typesense Cloud in different places in the documentation and on our landing page, saying that this exists, and people kind of organically started signing up for it and using it… And we also made sure that the product is full-featured in the open source and in the hosted version as well… So it was nice to be able to tell people that “Hey, we’re doing this only if you don’t want to worry about servers and if you don’t have an infrastructure team. We’ll take care of that for you, and that’s what we’re charging you for.” So it was very easy to explain to users what the benefit we’re giving with Typesense Cloud is, which is we’re essentially like an extended infrastructure team for them, so they don’t have to worry about servers…

So that worked out pretty well, I’d say… To answer your question, I’m pleasantly surprised with how many folks opt to use Typesense Cloud, especially – it seems like serverless is a thing that is getting a lot of adoption these days, so people generally don’t have any other… Or where Typesense Cloud fits in is if people don’t have any other VMs that they run in their infrastructure, and they don’t want to deal with hosting anything themselves; then Typesense Cloud is a nice fit there. So that also means that we now have revenue to sustain ourselves while working on Typesense…

Some of the attention we got on Hacker News, etc. we had inbound interest from almost 30 different VCs at this point, asking us if we’d be interested, if we’re considering etc. But for me personally – so I’ve worked at venture-backed companies in the past, so I kind of know the song and dance of what it takes to run a venture-scale business… And the realization that I had eventually was that in a venture-backed company, you’re essentially selling stock to your investors. And stock is, if think of it, just like another product line that you have, and your customers here are your investors. So in addition to selling your core product to your customers, to your users of the core product, you’re also selling a new product line, which is your company’s stock, to your investor group of customers.

So once I started seeing it that way, the value that your investor group of customers get from the product that they’re buying, which is the company stock, is appreciating stock value. So to keep them happy, you have to do things to increase your company’s stock value. And sometimes, some of the things that you do there might not sit well with the core group of your customers who are buying your core product… And that tension is what – I’ve seen that play out in the past; I keep seeing that play out in other SaaS companies that are VC-backed, where the eventual cycle seems to be that they price their product super-low, and subsidize it, to gain massive adoption… And then eventually, they work their way up to like Fortune 5000, Fortune 1000 companies, and start looking at million-dollar deals… And suddenly, once you have a million-dollar deal on your radar, you’re $15/month paying customer seems like a tiny drop in your revenue bucket, and your priorities as a company completely shift.

So that is what I hate to have happen with a product like Typesense, because one of my goals with Typesense is to make sure that it is available to as many people as possible, without cost being an issue. And that’s why it’s open source, it’s freely accessible… And I felt like – or at least this is my current thinking… I felt like the venture model kind of doesn’t sit well with that goal of making sure that as many people have access to Typesense as possible, or in case it doesn’t make that goal easy to achieve without conflicts of interest here and there, different position points as you grow the company.

[52:26] So that’s one big reason I’ve actually said no to all the VCs who reached out so far. And who knows – I mean, at least that’s my current state of mind. If something changes… And then we’ve been able to sustain with this model, so it’s working out very well for us.

Jerod knows I’ve been one to say absolutes, what we will and won’t do… Only years or days potentially even later changing my mind, or having my opinion change, and sort of walking back that hard absolute I’d say before. One thing you said was the appreciation, right? The appreciation of the stock to the investor. Isn’t that the name of the game for business anyways? Don’t you want your business to appreciate? So how does the tension with an investor involved change the game for you?

Yeah, that’s a good point. So I’d say the value of a business – you know, there’s building value into the core product that you’re selling, and providing that value to the customers who are paying for that core product. That’s one way to grow the value of the business.

Now, of course, if you’re looking at it from the perspective of stock prices, to be able to maybe sell the company later on, then building value in the core product is not going to be as financially rewarding as selling stock to investors. But I’m wondering if maybe – once you have a sufficiently large adoption of your core product, I’m wondering if that will help translate to also… You know, not that we’re looking to do this, but if we were to do like a crowdfunded fundraising eventually, maybe that core value that the product delivers is what determines our stock prices, if we ever were to do a crowdfunded round… Rather than – today, it feels to me like the way stock prices increase in a VC-backed model is that it’s only by raising your next round of funding. So once you get on a train, to keep your latest round of investors happy for the valuation that they paid, you have to raise the next round of funding, or go public, or have some sort of a liquidity event, so that the latest round of investors make good returns on that investment. So that’s what leads to increasing valuations; you just keep having to raise additional rounds of funding to keep that group of “customers” happy.

Good point. One more question on this front… I mean, 2015 to 2018 isn’t a far stretch. ElasticSearch IPOed in 2018. You had to see the possibilities of this space in terms of a business, right? Algolia was well-funded. ElasticSearch IPOed. You had to see the possibility of you taking a large portion, or even a large small portion of that market share, and capitalize on it.

Oh, yeah, for sure. I think search, like we were discussing in the beginning, is something that is an evergreen problem, something that didn’t start yesterday, and is not going to stop being a problem suddenly. So I’d say definitely something that we consciously chose is to choose a market that’s big enough, so that even if we capture a very tiny portion of that market, it’s still a good investment of our time.

So the space was such that there are not actually that many search players in the market. Now, there are a bunch of closed source SaaS search providers, which more likely than not many of them are maybe using ElasticSearch, for example. I’d say Algolia is at least one that I know of that has built their own search algorithms. But for the most part, people just use Lucene, and build on top of that.

[56:09] So the space didn’t have too many players, so that was the second thing. The first thing was the large, evergreen problem that’s not going to go away, and the second thing was not many players in the market trying to solve this problem. So I think that’s why we were like “Okay, maybe we’ll find our way through to making money with some business model eventually”, that’s the thought we had in mind. We probably wouldn’t have – if it was any other SaaS product, I would say, like a SaaS closed source product, or even an open source product in a different market, we would have probably done a little bit more research before jumping into it as building a business around it. But I think this space was, again – and oh, I should say, the third thing is search, as we’ve learned, is also a very hard problem to solve, which is why you don’t see many search engines around in the market. And if you want to call it like the technical mode, I guess, is a huge gap to jump, to figure out search as a problem domain, get up to speed with it, and see what everyone else is doing, and then seeing where you can improve it. That is a huge chasm to jump before you build a product, and even if you do that a couple of weeks, polishing it and then bringing it to market, and then telling developers “This is why your product is special”. So it’s a lot of effort to cross that big gap. So all of this was in our mind, for sure, and we thought this is a good bet, worth taking.

All of the other ideas we’ve had, our focus was always going after very niche things, like things that no one else would probably have an interest in going after, mainly because it’s so niche, and it’s not really directly related to the day to day technologies that you might be using. We basically took boring, old spaces for all the other past products… And this one was modern, cutting-edge, and the target audience happened to be developers who - you know, we both are first engineers, my girlfriend and I are engineers, so we were able to speak the same language as our target audience… So all of these put together made it seem like this is like a once in a lifetime type of an idea that we just have to execute on.

So I really dig your transparent pricing for the cloud, and the way that it calculates out. Do you want to just tell folks how that works? And you mentioned you want to bring this to as many people as possible, and it seems like being able to pay as you go, get exactly what you need and scale up as your needs scale up is a great way of doing that. Of course, a lot of the public clouds have this kind of pricing as well, but you’ve got a configurator right there on the pricing page… Do you wanna tell us how you came up with this and how it all works?

Yeah, so we came up with it mainly to mirror the cost of running the service with how much we charge users. So that’s one core principle that we held on to, because - sure, from a business perspective, that probably doesn’t make the best idea, because you’re very closely tied to your costs… But that’s what we chose in service of trying to make sure that we offer something that’s as affordable as possible.

So if you were to run, for example, Typesense on your own cloud accounts, we wanted the cost to be somewhat similar. And where we get savings is from economies of scale, essentially, like running thousands of clusters ourselves, so both the management effort involved, and the savings that you get with high spend. So that’s what we capitalize on. And then we pass on – you know, some of the savings we get, you know if you want to call it that, all savings back, instead of trying to do value-based pricing, which is what I’ve seen some other SaaS companies do. Now, that does make the pricing a little bit more complicated, because people have to know how to calculate RAM, how to calculate how much CPU they need… And that’s why we added a little calculator which says “Just plug in the number of records you have, or the size of every record”, and then we’ll calculate and roughly give you an estimate of how much RAM you might need. That works out well for most use cases.

[01:00:08.08] If people choose x as the size of their dataset, Typesense typically takes 2x to 3x RAM, and that’s given out as the recommendation in that calculator. And then for CPU, we just tell people, pick the lowest CPU available for that RAM capacity, and then as you start adding traffic, you’ll see how much CPU is being used and we can scale you up from there. Or we say run benchmarks – if you already have high traffic in production, run benchmarks with similar kinds of traffic, staging environments, see how much CPU use, and then pick good CPU. So that does make it a little bit more complicated to calculate CPU.

And then the other configuration parameters, like you can turn on high availability, meaning that we’ll spin up three nodes in three different data centers, and automatically replicate the data between those, and then load-balance the search traffic that’s coming, search and write traffic between all the three nodes. So flick of a button, you have a HA service.

And then we have this thing called Search Delivery Network which we built in Typesense Cloud, which we essentially replicate the dataset to different geography regions. So you could have one node running in Oregon, one node running in Virginia, one node running in Frankfurt, another one running in Sydney etc. And anytime a request originates, we will automatically route it to the node that’s closest to the user.

It’s similar to a CDN, except that in a CDN they only cache the most frequently used data, whereas here, we replicate the entire search index to each of those nodes sitting at different locations. So it’s as good as it’s going to get in terms of reducing latency for users. In fact, this search delivery network is what prompted some users to use Typesense as a distributed caching JSON store. So instead of having to replicate your primary database, which is probably sitting in one location, out to different regions, which is a hard thing to do, they instead send a copy of the data into Typesense, and have Typesense replicate the data to different regions, and then hit Typesense directly as a distributed cache. So that’s an interesting use case that people have used Typesense for.

So yeah, these are the different pricing angles. And I think when people realize that, “Oh, if I were to host this on AWS, or GCP, this is how much the incremental spend I have to spend with Typesense Cloud” - when that delta is tiny, when people realize that, that’s when hopefully that’s a convincing case for people to let us deal with the infrastructure stuff, rather than having to spend time on it yourself, and spend that engineering time and bandwidth. However tiny that might be, we still take care of that on an ongoing basis.

So for the true DIYers who are doing it at scale, is the clustering stuff - are those things that are in Typesense, and you’re implementing it in your cloud, and they could also go about doing it for themselves, or is that stuff that’s outside of the binary and is only in the cloud?

No, the clustering is also something that’s available in the open source version. So it’s the same binary that you can run multiple copies of in different machines, and set up a configuration file to point each other to the IP address of the other nodes, and it’ll automatically start replicating that out. So we, again, run the same Typesense binary in Typesense cloud as well. In fact, any improvements that we do in Typesense Cloud, once we observed people using it at scale, that actually makes its way back into the open source version. And that actually has helped in a nice little feedback loop where because we have first-hand visibility into Typesense running in production, at scale, with Typesense Cloud, we were able to then improve the open source product with that experience, because from what I’ve realized now running Typesense Cloud, writing software is one thing, but watching the software run in production, and observe how it works in different datasets, different types of traffic patterns, querry patterns, shape of the data - you get so much more visibility into how your software performs… And I’d say that has been a nice side benefit of Typesense Cloud, besides of course the revenue, to keep improving the open source product as well through the hosted version.

[01:04:28.17] Is that a commitment of yours, to always give back to the open source through cloud, or is this just a natural byproduct that has happened? Is it a commitment, or is it just a sort of an accident, I guess? I don’t want to downplay it, by any means.

So I guess when we started out with Typesense Cloud, we didn’t intend for this side effect that I mentioned to happen, which is us being able to use experience from Typesense Cloud also benefiting open source… But now that I see it happen and see how that benefits the open source product - and I shouldn’t even say open source product, because it benefits Typesense the product. Because Typesense the core product, like the API, is fully open source, and the fact that we’re able to use our experience from Typesense Cloud to improve Typesense the product is amazing to me. So I don’t think we’ll ever stop doing that. Because if the product improves, whether you’re self hosting it or not, I’d love for Typesense to be adopted. Like, if people say – today, if people think about search, most developers, backend developers at least, tend to think about ElasticSearch. I’d love for Typesense to be that thing when people think about if they think search, especially for site and app search. And because that’s one big goal that I have, I’d hate to not contribute things back into the product, open source or not… Because that does a disservice to what we’re trying to do in the long-term with Typesense, which is good adoption for a product that works well out of the box.

Right. So in light of that, have you considered stealing or borrowing a page out of Algolia’s playbook? Because they’ve become that because they’re willing to offer that open source free tier and become kind of the starting place for many people who have the money, maybe in the business context, but on the personal side they don’t etc. Usually, that’s the kind of move that VC money allows you to do. So I’m wondering where you stand on that, because you’ve got a whole lot of users - now, they’re not giving you any money, but if you want to be that default that people think about, that’s one move. It worked for them.

Yeah, so people do ask us regularly for an unlimited free tier in Typesense Cloud. Right now we give out a free tier for 30 days, and after that you have to start paying… But I think the difference between Algolia and Typesense is that Typesense is open source. So if you wanted it for free, you could definitely run it yourself, and it’s fully featured, and there’s a community UI. You can basically run this whole thing if you were willing to put in a little bit of effort; you can get this for free, an unlimited amount of time. So I’d say that is kind of equivalent to Algolia’s unlimited free tier, which does have a lot of restrictions, and you can only put so many records, so many searches… With Typesense, this “free” tier - it’s unlimited everything, except for, of course, the infrastructure costs that you’ll have to pay any cloud provider… Or if you wanna it on your machine, then it’s going to be completely free, except for the electricity. So that’s how I think about it. So if someone says they absolutely want a free tier, I just tell them, maybe sign up for one of the cloud providers. They offer a better free tier for at least like a year, or give you free credits if you open new accounts, and then just run Typesense under your own cloud account and you get it for free.

Yeah. I think that’s logical, reasonable and fair. But what is not is a go-to market strategy, whereby Typesense can become the default. I agree with you, and that’s a nice answer, and that’s probably what I would say as well… But if you did have the VC money, you can say “But we’ll also, for open source, do this”, and people will just use it. And then you would become –

[01:08:08.04] Yeah, that’s fair. Fair, fair.

I definitely understand what you’re saying… And that step of like “Host it yourself” - you’re gonna cut off like 80% of the people that would use it. And I’m not saying you can’t get there. You can totally get there. But it limits you in certain ways.

Yeah, for sure. Yeah. I think being able to subsidize some of the - I guess you could call this a form of marketing…

Yeah, it is.

…that’s definitely one downside of not having marketing funds available as large as a VC-funded company would have… But I guess that’s the trade-off.

Well, the upside is you don’t lose sleep at night while you’re burning through somebody else’s money, you know?

Truth.

True, true. And the fact - I’d say every morning just waking up and saying that the only thing that I need to focus on today is what the majority of my users are telling me as the next important feature, and I just need to focus on that, and keep chugging through that, and everything else is falling into place - like, that is such a satisfying feeling for me, I should say, at this point.

At some point, Typesense is going to become ubiquitous enough to gather the attention of somebody else that gathered the attention of ElasticSearch. And we had a conversation a year ago when ElasticSearch changed their licensing because of the Goliath in the room, basically… What happens when AWS decides to offer Typesense? What would happen then? Are you are you prepared? Are you ready for that day? Have you business-planned enough? Have you licensed-planned enough? What are you going to do?

Yeah, I would say that if that happens, that would be a good thing, because AWS has reached such far depths of where technology can be used. I’m talking about (I don’t know) like oil and gas, and industries that I probably don’t have a first-hand experience in, and that we might have to hire a huge sales force just to be able to get into some of these large, traditional industries that AWS has already spent a ton of time and money getting into. And for example working with the government agencies, and things like that. So they’ve done a lot of this legwork, and if they were to offer Typesense under that umbrella, it only works well for Typesense’s adoption at that point.

From a revenue perspective, I think the mindset that maybe ElasticSearch has is that they need to capture all the value that they’re creating, which is understandable, I guess. I mean, I can see that point of view as well. But my point of view on that is that we’re creating value, but then we’re also creating this value together with the community, even if it’s just people asking us questions and giving us feedback and asking feature requests and telling us that “Here’s how we’re using Typesense”, or “How best can we use it?” All this is feedback that has collectively gone into building Typesense the product. And that’s the nature of open source.

So my opinion is that when you’ve built a product like that, standing on the shoulders of your community, and on other like dependencies that you’re probably using - we’ve already built this together with shared value; let’s spread this value around, rather than trying to capture it all within one commercial entity. So I would actually love it if additional cloud providers start offering Typesense as a service, because that’s how you get to be a Linux, and not an ElasticSearch, I should say… There’s so many flavors of Linux, and so many people use Linux, and it’s become the foundation. And I’d rather become a Linux, rather than an ElasticSearch, at least from a licensing adoption perspective.

We didn’t ask if you’re a listener of this show before we brought you on this show… But if you are not a listener of this show, I would suggest you go back and listen to Adam Jacob talk about this, because… I asked you that question thinking “What is he going to say?” Because I kind of know what your answer might be, but I’m kind of hoping that it is in light of what Adam Jacob said… Which is, essentially, they’re your marketing funnel, right? Why get upset when AWS offers your thing as a service, because they’ve just blessed you, essentially, as worthy. Worthy to use, worthy to try, and let the tech, the usefulness of the tech and the community behind it and the people behind it be the support mechanism to say “This is worth keep using.” Versus “What is Typesense? Who are they?” when AWS hasn’t chosen you yet. You’re just a in a sea of obscurity, essentially, of search land. And if they blessed you in that way, then it’s like, wow. That’s a better go-to market strategy, potentially, than the free tier of Algolia.

[01:12:45.09] True.

Maybe.

Yup, yup. Yeah, for sure. I think AWS’ breadth of adoption - you’re just riding on its coattails if they end up offering you as a service, like you said.

A hundred percent.

So yeah, that’s exactly how I look at the world as well.

Let me bring a question over that I ask on Founders Talk often, which I think I’ll ask here as a closer for this show… Which is what’s on the horizon? We’ve talked a lot about Typesense Cloud, your commitment to open source, your commitment to the community, the unintended consequence of being so faithful to the sturdiness and stability of the open source to give back from the advances you’ve made in cloud, to bring them back to the binary that everybody else gets… What’s on the horizon? What do we not know about today that you could share here on the show?

Yeah, so I think this is the first time I’m gonna mention this publicly, but we’ve been working on vector search in Typesense. So essentially, you can bring your embeddings from any ML model that you have into Typesense, and have that be mixed with text relevance as well. And you could do things like, in the context of eCommerce, for example, you could say, “Get me all products that are similar to this product.” Or get me products that I’d recommend based on this user behavior in the past, or… Whatever you construct your model on, you can bring your embeddings into Typesense and have Typesense do a nearest neighbor search… And this is actually another example of something that users asked us for, and essentially said, “We’d have to start looking at using two different services if it’s not built into Typesense.” We started looking into it, and decided – we’re essentially right now building it actively with users, so I’m super-excited about that, and I think it’s going to open up a whole… So far, I’ve always had to tell people that, “Hey, we don’t have any AI or ML-related features”, and that is going to change very shortly, so I’m super-excited about that.

Awesome. Sounds cool. When does it drop, Jason? When does it drop?

[laughs] Oh, it’s actually already available in an RC build. We just selectively give it out to folks. So if anyone’s listening and wants to try our vector search in Typesense, I’d love to get some feedback before we publicly launch it. But we don’t have a fixed timeline for releases. That’s another maybe unique thing we do. We just essentially collect sufficient volume of features, and then once we think “Okay, this is a good chunk of volume to put out as the next year release”, we promote the latest RC build as the GA release. So it varies between two months to sometimes four months before we do GA releases.

What’s the best way to get in touch with you if they want to try that out?

I’d say just sending an email to support@typesense.org. That’d be good. Or just DM me on Twitter. I have my DMs open; my Twitter handle is @JasonBosco. I’m happy to – or join our Slack community, of course, and then mention it there.

What’s left? What have we not asked you? Is there anything we haven’t asked you yet that you want to share before we let you go?

I think we’ve covered good ground here… Yeah, I think we’ve covered…

Everything. We’ve covered it all.

Yeah. [laughs] I can’t think of anything that we haven’t talked about. You guys did good.

Cool.

We did our job, Adam…

Nice.

It was a good breadth of topics.

No stones unturned, all the crevices examined… Jason, thank you so much for your time. Thank you for your commitment to open source. Thank you for coming on this show and sharing your wisdom. We appreciate you.

Of course, yeah. Thank you for having me on. This was a great conversation, Adam. And thanks, Jerod.

Thank you.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

View all episodes

Player art