JS Party ā€“ Episode #276

The ORMazing show

with Joist ORM creator Stephen Haberman

All Episodes

Nick & KBall sit down with the brilliant Stephen Haberman to discuss all things ORMs! šŸ’»šŸ”

From the advantages and disadvantages of ORMs in general, to delving into the intricacies of his innovative project Joist, which brings a fresh, idiomatic, ActiveRecord-esque approach to TypeScript. šŸš€

So sit back, relax, and letā€™s dive deep into the world of ORMs with the experts!

Featuring

Sponsors

Fastly ā€“ Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com

Fly.io ā€“ The home of Changelog.com ā€” Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

KBall Coaching ā€“ Free exploratory coaching sessions from JS Party co-host KBall! Click here to get started

Notes & Links

šŸ“ Edit Notes

Chapters

1 00:00 It's party time, y'all 00:56
2 00:56 Welcoming Stephen 01:41
3 02:37 Did you hear that, Jerod? 00:39
4 03:16 Defining "ORM" 01:53
5 05:10 Why write a new ORM? 06:37
6 11:47 The data loader pattern 05:20
7 17:07 Joist vs ActiveRecord 04:42
8 21:49 Why Joist uses type hints 07:07
9 28:56 When Joist uses promises 03:20
10 32:16 Joist's API details 03:34
11 35:58 Sponsor: KBall Coaching 00:34
12 36:43 Where the biz logic goes 03:33
13 40:16 Lifecycle hooks 04:26
14 44:42 No cycles please 01:21
15 46:03 Batching & ergonomics 04:52
16 50:55 Arguments against ORMs 04:14
17 55:09 Thinking in entities 05:18
18 1:00:27 Reactivity in Joist 06:30
19 1:06:57 KBall is excited 02:54
20 1:09:51 Wrapping up 01:18
21 1:11:15 Next up on the pod 00:55

Transcript

šŸ“ Edit Transcript

Changelog

Play the audio to listen along while you enjoy the transcript. šŸŽ§

Hoy-hoy. Welcome to another exciting JS Party. Iā€™m your host this week, Nick Nisi, hoy-hoy, and I am here to do an ORMmazing show with Kball. Kball, howā€™s it going?

I do not have a pun planned ahead to one-up you there, so Iā€™m gonna let you win thatā€¦ But Iā€™m excited. I like ORMs. I think thereā€™s a lot of interesting conversations right now about interfacing with SQL, especially Iā€™ve been playing around with using AI to talk to SQL, and things like thatā€¦ So I think we have some opportunity for conversations about what are the right levels of abstraction to talk with our data. So Iā€™m looking forward to it.

Yeah, I like that. And because I can barely spell ORM, we do have a guest to help us out with that as well. Heā€™s the creator of Joist ORM, Stephen Haberman. Stephen. Howā€™s it going?

Yeah, Iā€™m doing fine. Thanks for having me.

Awesome. Tell us about yourself.

Sure. Iā€™m a software engineer, Iā€™ve been doing this for a little whileā€¦ And yeah, I donā€™t know, I got started back in the day doing Java. Maybe thatā€™s gonna ā€“

We wonā€™t hold it against you, itā€™s okay.

Oh, donā€™t hold it against me. But I actually did quite a bit of that for a while, and enjoyed itā€¦ And yeah, about four or five years ago I found this little thing called Typescript, and I was just really surprised. I love the typed system. And back in the day, seeing people do Node circa 2010, the only thing you had was callbacks. It was crazy. But the JavaScript language, and TypeScript on top of it, has grown so much, with the evolution of async/await, and just everything that theyā€™ve thrown in the language. Itā€™s been really impressive. And so Iā€™ve been doing that lately, and I really enjoy it.

Did you hear that, Jerod? Did you hear all of that praise for TypeScript? [ā€œIā€™m not listeningā€¦ Iā€™m not listeningā€¦ā€]

Thatā€™s right. It used to be you were trying to convince me, but Iā€™m a convert for the last couple of years, soā€¦ Jerod might be the one person whoā€™s regular on the show who is still a holdout.

Ah yeah. Well, thatā€™s fair. I mean, to a lot of what TypeScript has been able to do - for sure, is on top of the core evolution of just JavaScript itself. All of the async/await stuff in the core language has certainly benefited TypeScript a lot, they got a lot from that.

Now, youā€™re joining us to talk about ORMsā€¦ So maybe itā€™d be appropriate to start with a definition of what that is?

Oh, sure. Yeah. I mean, itā€™s an Object Relational Mapper, so basically youā€™ve got a relational database floating around, that youā€™d like to talk to as ergonomically as possible; you know, kind of the OG way of doing that is raw SQL strings. Every time you want to talk to your database to SQL, select star from this, and thisā€¦ Actually, Iā€™m pretty ā€“ we can get into the pros and cons of ORMs later, but for the more esoteric SQL that you do, Iā€™m a big fan of doing custom SQL when we need to. But for the 80% to 90% ratio of your code that is like ā€œI just want this author entity. I just want this book ā€“ā€ Well, maybe Iā€™m playing my hand a little bit by saying the word ā€œentityā€, I guess, but I just want this author row, or the book rows for this author. I think it can start to get kind of tedious to do SQL selects for every single little boilerplate thing. And so I think thatā€™s where ORMs come in. Back in the day, you could see ones that tried to do everything, everything. But I think if you stay in the circle of competence, of just treating your database for 80% of things as an object graph, and then being able to very ergonomically get your data out of your relational databaseā€¦ Thereā€™s also reads, but then also writes; writing back what youā€™ve changed. I guess, yeah, thereā€™s a spectrum of how fancy ORMs go, insofar as ā€œIs it just doing just queries, and then also doing the reads and the writes back?ā€ But as you go on the spectrum of like fancier ORMs, youā€™ll start to see a lot of things where applying business logic around your writes of validation rules, and lifecycle, you can get into some of the reactivity that Joist is bringing to the table, and really leaning in on encapsulating business logic and business rules in a central location, beyond just like ā€œIā€™m reading and writing data.ā€

Well, can we start then with ā€“ since there are so many different directions that ORMs take, and different trade-offs that they can make, what inspired you to write a new one?

Yeah, thatā€™s a good question. So I think Joist came around probably circa 2019 or soā€¦ So the problem we were solving there at the time was standing up a new tech stack, very stereotypical tech stack at the time, where we had GraphQL on the backend and React on the frontend, and we were using Apollo and Postgres. I love Postgres. And yeah, just trying to find what was the most ergonomic way of standing up our backend. And when looking around for other tools at the time, thereā€™s the class of tools out there, like the Hasuras and the PostGraph files that are super-ergonomic in terms of directly mapping your database schema to your GraphQL public API. Super-amazing. But you know, kind of touched on that business logic thingā€¦ We had been wanting to find a way to do like 80% to 90% of like just take your database schema and make your GraphQL API out of that, and do that for the common caseā€¦ But thereā€™s always this last 10% to 20% where I think like the Hasuras and the PostGraph files, you can start to miss out on ā€œWell, I donā€™t want my GraphQL API to exactly be my database schema. And so that ruled out thoseā€¦ And yeah, just from my past, Iā€™ve done enough or ORMs, or I had used ORMs, kind of like those. I had used TypeORM in the past, and I was just looking to use something else.

[06:37] But we started out with MicroORM, which is actually really great. I still like it. It matched ā€“ and you can tell, like probably 60% to 70% of Joistā€™s API matches Micro, because our codebase was on Micro for probably six to nine months before we flipped over to Joist.

And the big reason for moving away from Micro - I mean, there was nothing wrong with it, but we were very much in the GraphQL environment, where itā€™s so easy to do N+1s. And so we were really looking for an ORM that would build in data loading, the whole Facebook Data Loader pattern of you wait until the next event tick to kind of see whatever happened, and then at the end of the event tick youā€™re like ā€œOh, you asked for 10 authors in this one event tick, instead of 10 SQL calls. Iā€™m just gonna do one SQL call for all 10 authors, with a ā€˜Where inā€™.ā€ And I actually had a pull request into Micro to kind of start to do thatā€¦ And it worked, but Micro was just mature enough at the time; it was probably already ā€“ I really havenā€™t kept up on it. I liked the Micro author. He was great to work with. But it was probably v ā€“ Iā€™m gonna make up v3 or v4ā€¦ I donā€™t know, it was a little while ago. But it was already pretty a mature codebase. So it just wasnā€™t as easy to wander in and like put data loader into the guts of the ORM after it had already been established, and that sort of thing.

And so really, that was it. That, and then the other thing that ā€“ from my days working on TypeORM. And again, I havenā€™t worked with TypeORM for five years at this pointā€¦ But the biggest frustration I remember, with disclaimers that I have no idea what it looks like these daysā€¦ But it was that it was incredibly opaque whether your collections were loaded or not; or even any relation. So you might go get an author, and so ā€“ oh, to go back to one of the things that ORMs are really good at, is like lazy-loading parts of your object graph as your business logic needs them. So you might start at an author, your endpoint is like ā€œDo something with an authorā€, and so you get the author, and then you do some business logic, and youā€™re like ā€œOh, I need the books.ā€ And then you do some business logic, and you ā€“ oh, you need some book reviews. And so ORMs are really good about kind of making it ergonomic to load more and more of your little subgraph as you go. But the trade-off is that you start out with it not loaded. So you start out with an author, and you donā€™t have the books yet, and you donā€™t have the book reviews. And my recollection of TypeORM was that it didnā€™t have a way of representing these two states in the system. And Iā€™m trying to remember, I think you could do things like for the author and the books, tell it that the books is always loaded. But that would mean like every time you touched an author, you de-facto brought back the books. But then at least in the type system you were guaranteed for the books to have been loadedā€¦ But you rarely want that, precisely because itā€™s kind of lazy-loading and the object model is de facto what ORMs are good at.

And so tangenting back to why did Joist come aroundā€¦ So the kind of two a-haā€™s that kicked off Joist was ā€œI want to build a data loader from day one, for every single lazy-loaded call.ā€ And the other one was figuring out a way in the type system to represent the two states of a collection thatā€™s not loaded until I ask for it to be loaded with a populate hint, or load hint, which - the load hints and populate hints are not novel to Joist; they go back to Active Record, and Iā€™m sure other ones before, where you start with an author, but before you go in and play with the guts of the author, you say ā€œBy the way, I know Iā€™m gonna want the books and the book reviews loadedā€, so you give a little hint upfront, to like ā€œPlease, go get those for meā€, and then your business logic after that can have that.

[10:02] So youā€™ve always had to do that with ORMs, like the Active Records and TypeORMs of the world. What was novel with Joist about the time was that transition changed the types. So you would start with the author, and you couldnā€™t do books.get; you would have to do books.load, which was a promise, and then for every book youā€™d have to do mybook.review set load, and that would be a promiseā€¦ But if you did ā€“ so itā€™s by default safe, which is one of the things I donā€™t think I liked about TypeORM, was like by defaultā€¦ Like, the collections would look like you could call get, but it would turn into a runtime exception if you hadnā€™t made extra-sure that you had done a populate hint 10 or 20 lines up, or weā€™re even a completely separate method. Thatā€™s where I think this can really break down. I remember ā€“ even Active Record still kind of has problems with this, where you might have an endpoint that kicks off, and like loads an author in Active Record, and then eventually, you get into abstractions; you call this helper method, and this helper method, and at some point, one of these helper methods is going to need data that you didnā€™t remember to populate, not only 10 or 20 lines up, but way off in some other method, in your endpoint method.

Yeah, so anywayā€¦ So with Joist, just TypeScript map types are just so neat. So Iā€™ve been playing around with a prototype of like ā€œCan I have a domain model that is inherently unloaded, and when overlay this type hint of like ā€œPlease ask the ORM to go load the dataā€, both go load the data from the SQL database, but mark in the text system that all of these are now loaded, and you can do gets.

So I went on super-long tangents, but those are the twoā€¦ You know, once I had those figured out, both of those seemed novel enough to like ā€œOkay, okay, now itā€™s worth taking what had been musings, and turning them into an actual projectā€, and thatā€™s what kicked off Joist.

Thereā€™s a number of threads there that we could pull onā€¦ I want to start first may be going down the data loader thread. So Iā€™m pretty familiar with the data loader pattern, but I donā€™t know that everyone listening to this podcast will be. So can you kind of explain what that pattern is, and what it enables?

Yeah, sure. So data loader is just a way of doing automatic batching of almost anything in your system. And so it comes out of Facebookā€¦ And maybe the niftiest thing about data loader is it just works on top of raw promises. So I donā€™t know if this is maybe too academic of a tangent, but if you look at ā€“ the concept of data loading is not super new. So Facebook has a library called Haxl, that builds this concept in Haskellā€¦ And that was from 10 or more years ago. And then somebody had written a version of that in Scala; they had kind of ported it to Scala, and Iā€™m gonna forget the name of itā€¦ There was maybe two libraries in Scala that did itā€¦ And we had used it, again, about 10 years ago, when I was at a Scala shop; at least in terms of the patterns. So I think itā€™s kind of easy to talk about the non-JavaScript version of the patterns, and what those would be is instead of directly loading the data, you would make little ā€“ Iā€™m gonna go ahead and use the word promise, because thatā€™s what it ends up being in JavaScriptā€¦ Which is why it works so well. But back in the Scala days, you would say ā€œWell, Iā€™m going to have an author, and Iā€™m going to go loop over the authors books. And I want this book, this book, this book.ā€ But in these old libraries, instead of literally getting the book, you would make a little AST of like a promise to get this book, a promise to get the next book, a promise to get the next book, a promise to get the next book, and youā€™d kind of have a layer of like your attention to get the data, and then you would ask the runtime system, like the Haxl runtime system, or whatever itā€™s called, ā€œOkay, now please go do that.ā€ And it would have the smarts to ā€œOh, I can see that youā€™re trying to load 10 booksā€, but instead of doing make it a book wire call, book wire call, book wire call, book wire call, you an N+1 of a whole bunch of IO calls, a whole bunch of wire calls; Haxl and the other libraries would recognize that youā€™re doing essentially the same operation, just for a different where clause, basically. So can I do one operation, one wire call, for all 10 books?ā€ And the Facebook Data Loader library - yeah, so it does exactly the same thing.

[14:02] The innovation there is that they realize you can use the event loop instead of what - in other languages you need some sort of little kind of interpreter thingy to do. I donā€™t know if this is meandering and kind of technical, but the idea is that sure, if Iā€™m going to take an author, and I want to loop over the booksā€¦ Of course, you have to be doing something with - I want to load each book; each of those is going to be a promise. So go to load book one, I get a promise; load book two, get a promise; load book three, get a promise; load book four, get a promise.

And what data loader lets library authors do is - since the promise has to wait anyway, you donā€™t get the book right away. The promise is like ā€œWell, Iā€™ll give it to you in a little bit, in a little bit.ā€ And what data loader does is before it even kicks off the wire call - so you ask for book one, book two, book three, book four - it doesnā€™t immediately do the wire call, but it puts onto the event loop basically a flush, that says ā€œOkay, do this wire call at the end of the event loopā€, which will come up right away, because event loops are just going spin, spin, spin, spin, spin, spin. And so it puts a little - basically like a flush event at the end of the event loop that then when it gets hit, can let the library author, or whoeverā€™s using data loader, realize ā€œOh, okay, there have been 10 promises that asked for books. Before I actually kick off the wire calls, Iā€™m gonna condense those down into one wire call, that will load 10 books all at once.ā€

And data loader - itā€™s really a pretty smallā€¦ Itā€™s probably, I donā€™t know, 115 lines of JavaScript, maybe something like thatā€¦ And could auto-batch anything; it could do it for microservice calls, it could do it for SQL callsā€¦ Itā€™s really very generic and really awesome how well it fits into the event loop and into JavaScript.

So if I were to kind of replay that back and simplify it a little bit, itā€™s taking the concept that any sort of call that goes over the wire, whether itā€™s to a database, or an API - that is slow, and it has a sort of intrinsic latency to do that, whether youā€™re fetching one book, to you use your books example, or 10 books. And so it says, ā€œOkay, instead of sort of optimistically fetching as soon as you ask for it, Iā€™m going to accumulate the set of requests that you have into an array, and wait till the next time that data is requested, or rather the end of the event loop, and now I have a list of things; Iā€™m going to try to condense that list of things into a single query and pull it back.ā€ And I think with data loader you actually define how that condensing happens. So you get a list of keys, passed off to a function you define, you say, ā€œOkay, I have a list of keys, I know how to batch this into a single query, grab that, and then I hand back that list.ā€ And so data loader is kind of letting you multiplex a set of queries into a single query that goes over the wire, and then demultiplex that data back out to all the requesters that asked for it.

Yeah, thatā€™s a great point, that you have to both combine and then uncombine, because thatā€™s also important on the other side. You get back one big batch response for all your 10 books, but youā€™re like ā€œWell, Iā€™ve got to give the first book to the first caller, and the second book to the second callerā€, and do that bookkeeping. Yeah, thatā€™s a great explanation.

So one thing that that leads me down, where - thatā€™s a question kind of specifically for Joistā€¦ And this goes back, I think, to how Active Record would do this. So Active Record did not have the data loader pattern, but it still would lazy-load. And so within a particular kind of query, you could accumulate a set of queries. So you might say, ā€œHey, Iā€™m loading a bunch of authors, and for each of them I want their books, and each of whateverā€, and then at the end, when you actually go to load that data, thatā€™s when it issues that query. And it tried to do something smart, where sometimes it would issue a set of queries, and sometimes it would issue a single query with joints.

Where I have seen data loader used in the past, it was usually accumulating on a per-table basis; it was not trying to do any of that joining. But Iā€™m curious, with Joist you have potentially more of a global view, or a global way of hooking this in. Are you able to accumulate queries across tables?

[18:02] Yeah. Well, actually, we could. We donā€™t right now. We could, in some cases. So in the case ā€“ you donā€™t always know what the next level of table you would need is. It depends on whether you give it an explicit populate hint. So one of the great things I think about Joist is that you donā€™t actually need to use populate hints. So as far as I understand with Rails in your example if you knew your business logic was gonna - you know, ā€œI have an author, and Iā€™m gonna go through all the books, and go through all the reviewsā€, if you wanted to make thatā€¦ And youā€™re saying you can either make one giant query to join in all three levels of data, or you could do like one query per level.

Exactly.

So what the nice thing with Joist is is - Iā€™ll go to maybe why we donā€™t do that right nowā€¦ Is that with Joist you donā€™t even need that upfront populate hint. So because of data loader, you can just say load the author, and then ā€“

Right, you can accumulate all the levels asynchronously.

Right. And so if you do that, we donā€™t know what youā€™re going to need ahead of time, until after weā€™ve loaded all the books, and then you go in and youā€™re like ā€œOh, I also need the book reviewsā€, and youā€™re like ā€œOh, you also need the book reviews. Weā€™ll get those for you.ā€ And so if youā€™re not using these ahead of time populate hints that ā€“ you know, that the populate hints would tell us ahead of time that we need ahead of time multiple levels of data. So if youā€™re not using that, then no, we couldnā€™t. But if you do use that, we could, technically. We donā€™t right now, for no real reason than ā€“ well, two reasons. One, we just havenā€™t gotten to it yet. But the other reason is that - to go back to kind of the Hasuras of the world that exactly match the domain modelā€¦ Or take the database exactly, map it out your GraphQL API - Joist also wants to do that for 80% of things, but thereā€™s this other 20% of things that are in your domain model that are not necessarily strictly in the database, so derived properties and calculations that you might just want to do in-memory. And for the most part in Joist those look the same, in kind of a good way. If youā€™re consuming the domain model, you canā€™t always tell which collections are literally from the database, or more of a derived valueā€¦ And it just ended up being easier for our initial implementation of the popular logic to not have to subdivide those, because to do the joint thing, you really have to be getting everything literally from the database, to get it into a joint. And we could go back and implement that, but weā€™d have to split the populate hint into like ā€œOkay, hereā€™s this half of the populate hint is stuff that was literally coming from the database, and this other part of the populate hint is coming from derived collections and derived fields that we can literally put in the join.ā€

Interesting. So if I understand correctly, youā€™re kind of allowing someone to do a JavaScript-level sort of derived view or computed feature of some sort. And at the Joist level, you actually are able to treat those pretty much the same way.

Yeah. A lot of the Active Record things, like it has many through ā€“ we have terms that kind of come from Active Record, but they donā€™t map one to oneā€¦ So Iā€™m forgetting if Iā€™m saying the Joist one or the Active Record one. But it has many through, or has one derived, has many derived, and these sort of things. Little abstractions that are super-commonly used in your domain model, that are little shortcuts. Almost like Rails scopes. I hadnā€™t done a lot with Rails scopes. Actually, I havenā€™t done a lot with Rails. I worked in Rails for maybe six months, circa 2008, or something like thisā€¦ And then was in a shop that was moving off of a Rails monolith onto other things. And so havenā€™t done a lot in it, but did enough to be like ā€œOh, wow, this is nice.ā€ Just the ergonomics of Rails is ā€“

The ergonomics of Active Record are, as far as Iā€™ve seen, unparalleled.

Totally agree.

Itā€™s really powerful.

And so Iā€™ve done just enough to know that I want to ā€“ I come back to like Rubyā€¦ At least in the past, it didnā€™t have types. Iā€™m enough of a static typing bigot that I like to play in the languages that have types. But for sure, 100% use Active Record as an inspiration for ā€“ you know, gold standard.

[21:49] Another direction Iā€™d like to dig into a little bit was this piece you were talking about, about sort of being able to check with the types whether something is pre-loaded or not. And I guess a part of me is wondering why you need it, and let me flesh that out a little bit more. One of the benefits of going to a fully asynchronous model, which is what you need to do to be able to take advantage of data loader, is that I as a caller donā€™t need to know whether youā€™ve loaded that data or not. I treat it the same way. Because my code is already having to be asynchronous, Iā€™m already having to treat it as if it might come back sometime undetermined in the future; I already have to await it if I need it now. So why do I need a type hint to say ā€œThis is pre-loaded versus notā€? I can treat it the same, regardless.

Yeah. No, thatā€™s a great question. I think itā€™s just that async programming can be pretty tedious, once you get down to ā€“ itā€™s great for this sort of dynamic discovery sort of thing, where your system (or at least the data loading side of that) evolves or whatnot. But I think the best example I can think of is like Nick from the talk a week or two ago, if you just look at ā€œIā€™m gonna have a book, and Iā€™m gonna go over the ā€“ I have an author, and Iā€™m gonna go over the booksā€, and you do books.load, like okay, thereā€™s an awaitā€¦ And thatā€™s not too bad. But now, youā€™re gonna have to do .map, and now each of your book .reviews.load is going to be a promiseā€¦ And all of a sudden, once youā€™ve got for loops, and maps, and filters that are doing asyncs, it starts to get kind of ugly.

So when you know youā€™ve got like ā€“ Iā€™ve got 10 lines of business logic that I want to write right here, in scope, but itā€™s like two or three levels deep, and Iā€™m gonna await promise all, and do an async map or an async filter for every single little operation, it just gets super-tedious. And I think ā€“ I donā€™t have any examples at hand, but I like to show off the before and after of like ā€œYeah, sure, hereā€™s how you can do it with a bunch of .loads.loads.awaits.awaitsā€, and all these other sorts of things. But if you do the preload hint, where now itā€™s just like .gets, and itā€™s all synchronous, you can do like flat maps, and filters, and all sorts of things, just like your regular collections, your regular lists and maps, as if theyā€™re in memory, and all of a sudden code that is gnarly, and just goes to literally two or three lines of functional programming, I think itā€™s a huge, night and day difference.

That makes sense. Well, and it makes me wonder if thereā€™s room for a utility that basically does that, even in a non-preloaded case, where itā€™s like ā€œIā€™ve got all this data. It might be loaded, it might not be loaded. I want to flatten it out into synchronous, so that I can just work with it.ā€

That sounds a lot like what Joist does, yeah. Granted, itā€™s super-baked into an ORM. So if youā€™re using ā€“ kind of like how generic data loader is; could you do the same thing for ā€“ I mean, to go back to your question of why we donā€™t do the SQL loads for joining three levels of dataā€¦ Which I do want to do someday, but - it is actually pretty generic. Right now the Joist code that does this little preloading thing, in a good way itā€™s pretty agnostic about that itā€™s asking SQL to be executed; itā€™s just sees them as promises that then it ā€“ anyway.

It does seem like ā€“ it feels like an interesting problem to even make that aggregation into a joint work, potentially in the async case. Because if you could - having Joist as a sort of central pivot point for data loader, you could write a single data loader function that is receiving all of your different classes of loads, and aggregating, and it could distribute them out into distinct underlying fetchesā€¦ Or it could say ā€œHey, you know what? These things are connected. I can join them, because I know the relationships between the parts of the objects.ā€

Yeah, I hadnā€™t thought of it that way in terms of doing itā€¦ Yeah, I think when I thought of how we would implement it in Joist I was leaning more into ā€œWell, weā€™ve got the populate hint, and weā€™ll lean into the populate hint to know ahead of time what we would need to do.ā€ While youā€™re loading books, you know like ā€œOh, Iā€™m going to need book reviews.ā€ That might be where it may not work without the populate hints. Like, unless youā€™ve got a hint explicitly telling you itā€™s going to want the next level of data, the runtime doesnā€™t actually know yet. Nobody has called bookreview.load, so the data loaders wouldnā€™t have been primed to know that they need that yet.

[26:14] Oh, because you have to load the original author or book level before you have an object that you can call that loader on.

Yeah, 100%.

Yeah. And if you wanted to get around that, youā€™d have to fake out promises, and then it gets really complexā€¦ [laughter] Like, I could imagine returning a customized thing that wraps a promise, that is like ā€œThis is a book promise, and I can call on it to load book reviews, and it knows now to aggregate things.ā€ But thatā€™s probably more complexity than itā€™s worth it.

Yeah, youā€™re like speculatively executing the next layer of business logic, I guess, to see what itā€™s going to ask for, something like that.

Thatā€™s kind of what I was thinking. Itā€™s kind of extrapolating down the data loader pattern of not fetching the data until the absolute moment you actually finally need a piece of data. And when you have the structure of the tables, which you do, you understand the structure of the relationships here, theoretically I donā€™t need to load the data on a book to understand how to load the associated book reviews. And so once again, in theory, you could create this ā€“ I donā€™t know whether to call it beautiful, or ridiculous, but like this crazy structure that is keeping track of ā€œOkay, hereā€™s a thing thatā€™s going to represent ā€“ā€ Back to your point, like, ā€œThis is going to represent a book. It doesnā€™t yetā€, but you can expose these operations on it, which will then follow down the relationships and do the correct joining and all of that mess. I donā€™t know that itā€™s worth it at all, but thatā€™s where my head got to.

Yeah, I think the trick there would be like ā€“ I would imagine that the logic that you would execute on this pseudo book would expect it to look and smell so much like a book; it would want to do like ā€œIs your title this?ā€ Interact with it in a way that ā€“ yeah, you could return a fake book, a null book, that just returned empty titles, and that sort of thing. But since youā€™re executing actual user business logic code against it, it seems like it could pretty easily trip up, potentially.

Yeah, I think the moment that you have to actually access data, now youā€™ve got to issue your queries and load. But you could, in theory, delay ā€“ and once again, I donā€™t know how much this would show up, but you could in theory delay up until that point, and maybe by then youā€™ve also told the system, ā€œHey, I want the reviews, and I want these other things.ā€ I donā€™t know that itā€™s worth it, but I just seemed like an interesting direction.

Iā€™d love to see the experimentation with data loader, because I feel like asynchronous data access patterns that are easy to use are an innovation that has happened, but are not widely spread. And so kind of exploring what those possibilities look like is fun.

Yeah, yeah, I agree.

Nick, Iā€™ve been talking a lot, and youā€™re sort of sitting there, nodding. Whatā€™s on your mind? What are you thinking?

Iā€™m mostly trying to keep up, I think. I think you answered the question that I had coming up about that example you had with loading books, and then loading reviews for the bookā€¦ And Iā€™m just curious how that looks from a type of perspective. When you call load on the books, would you get that data, but then the reviews is like a promise that will resolve to that data, or an object that you have to call load on again? How does that work?

Well, so in the vanilla case, which is like not using preload hints - yeah, you would say ā€œIā€™ve got a book, and Iā€™m gonna go read the reviews, and Iā€™ll load the reviewsā€, thatā€™s a promise. That gets you the book reviews back, and they just look like objects now. And so thatā€™s kind of pretty vanilla.

Now, I guess we kind of already started a second layer down in the graph, because now itā€™s our ā€“ letā€™s say book reviews have comments. And so if you had books and you load a bunch of book reviews, to access the books reviews comments is going to look like a promise, because weā€™re not using any of the preload hints, and so to Joist itā€™s just a book review; thereā€™s nothing that has told Joist to load the comments ahead of time. So you would go into book review one.comments.load do a wait and that sort of thing.

[30:20] But if you had done a thing where like in a book I want you to populate your book reviews, and Iā€™m going to pass a little hash of populate the book reviews and their comments, a little nested hash of book reviews call and comments - thatā€™s going to change the type of your book itself. Granted, you do technically have to have a new variable for that, because maybe youā€™re ā€“ although if you know this ahead of time, you can do ā€˜const bā€™ load this book for this primary key, and then pass your populate hint here as like the third argument to load, and then your b parameter will already be ā€“ itā€™ll have the type overlaid with it. So thereā€™s just the normal book type that has a title, and it has commentsā€¦ But the normal one, you have to load the comments as a promise. When you do this, pass the books as comments, you basically get a type overlaid on top of this, which all it does is add .get methods to the relations. So if youā€™ve told it that you know you want the comments now, when you do book.reviewsā€¦ Or actually, since youā€™ve told it you want previews and comments, you can now do book.reviews.get. So you do book.reviews.get, no promise, sub zero, itā€™s just a regular array, and you could do comments, and then you could do ā€“ and so now this comments is no longer just aā€¦ Comments would be a one-to-many, with a load. But weā€™ve added to get overlaid on top of the type, that weā€™ve added the get method to the comments property. So now instead of doing comments.load and waiting that, you can do comments.get.length, or whatever. So itā€™s added a bunch of getters that are synchronous into what would otherwise be relations that you have to .load and wait for. I donā€™t know if that was maybe what you were asking.

Yeah, yeah, I think so.

So the relations themselves are an object, and to access the underlying array, if itā€™s a one-to-many, or something like that, you use either .get or .load on await.

Right. Which I think has thrown some people off ergonomically with Joist. Like, they would expect ā€œIf I take a book, and Iā€™ve got the book reviewsā€, they want the book reviews to be like literally an array of book reviews. And to your point, no, itā€™s not actually that. For book and book reviews itā€™s a ā€“ we call it a property, but itā€™s like a holder of them, that doesnā€™t have them yet. Or maybe it does. Like, if it doesnā€™t have them yet, you have to call .load.

Right. But that lets you abstract around that.

Yeah. It lets us change the type. Because if we couldnā€™t do that, it becomes hard to ā€“ yeah, book.bookreview would like sometimes be just an array, because you know itā€™s a memoryā€¦ But sometimes itā€™s a promise of an array, when itā€™s not. And by making it a property, and it is ā€“ once you get used to it, it just blends into the backgroundā€¦ But you do have to do .load if itā€™s not in memory, and .get if it is, and then itā€™s just as if itā€™s a regular array.

So if itā€™s not in memory, can you do .get and it will forcibly run the async, basically force you to await there?

No, itā€™ll blow up and say you canā€™t do that. And the assertion is that that will be a compiler; like, youā€™re never allowed to call .get in a way that would blow up at runtime without being a compiler, or the type system knows itā€™s not loaded.

Got it. So it forces you to be explicit about when the data is going to be loadedā€¦

ā€¦and not sort of accidentally force synchronization when you donā€™t want it.

[33:58] Right. Yeah, which works out really well. The thing I love the most about it is that itā€™s orthogonal to the N+1 prevention. So I think in a lot of ORMs this whole thing of like ā€œWell, populating so you can do the .getā€ is super-ergonomic, which is great, but you have to do that to get N+1 prevention. You have to tell the ORM upfront that you want three levels of things. But with Joist, you can still do that; you can still use populate hints to get .gets, and use it in memory, and thatā€™s super great. But you donā€™t have to. If your business logic is scattered over multiple functions that get called in a loop, and all of these other sort of things, itā€™s still going to end d n + 1

Right. Because if you donā€™t do the populate, then youā€™re forced to program in an async style. But that moves you into data loader land, which could do that aggregation for you, and youā€™re golden regardless. Thatā€™s beautiful. I like that. So that container package where the relation is actually an object wrapping it - is that the same when youā€™re doing like a one-to-one relationship?

We have different implementations of those. Our term is just a ā€“ itā€™s a relation, anytime you point to another entity or collection of entities. And so you can have like two categories of relations; some point to a single entity, which is a reference to another entity, and others are collections to other entities. And on the point to one side, youā€™ve got either a one-to-many, which is just a foreign key, or a one-to-one, which is an incoming foreign key. And then on the collections side, you can have one-to-many, or many-to-manys. So yeah, we have four or five implementations of each of those things that know how to have get methods if itā€™s in memory, and if itā€™s not, you call loadā€¦ They each have their own data loader implementation, because to batch many-to-many is slightly different than batching one-to-one, which is slightly different than batching each of these. So they each have their own data loader implementation that will do the N+1 prevention.

Cool.

Break: [35:50]

Can we talk now a little bit about how you enable putting business logic in? And I thought it was interesting looking at what Joist does compared to Active Record, where you want your ORM to be able to keep up with data migrations, right? Table changes, things like that. And in the Active Record world, this is all done magically behind the scenes; you donā€™t see it. Whereas if I was understanding it correctly, in Joist you actually have a sort of generated class that is visible, that presumably I donā€™t touch, because youā€™re going to be modifying that to keep up with data. And then you have another class that is where I can put my custom logic. Is that right?

Yeah. 100%.

I was gonna say, that might relate to a question I had, which was how you define those relationships.

Yeah, we can cover both. So the relationships, at least all of the ones that weā€™ve talked about, are defined just - theyā€™re foreign keys. And so Joist will look at your database schema - which is also what Active Record does; weā€™ll get to, Kball, your pointā€¦ But yeah, so we have a codegen step, where we look at the database and all of the tables, and pull in foreign keys and figure out pointers, and those sort of things.

[37:55] And then yeah, we generate two ā€“ thereā€™s two files. So thereā€™s an author.ts, which we will create once, and then never again, so itā€™s safe for you to put stuff in thereā€¦ But that author.ts extends in author codegen, which has all of the just super-mirroring your database, of like getters and setters for first name, getter and setters for last nameā€¦ But also for these relations. And author codegen is where ā€“ like, we know books is a one-to-many, but author has a publisher, so thatā€™s a many-to-one. And just all of those things go in this author codegen class, which - I think we could check it in, actuallyā€¦ But itā€™s generated code; youā€™re not supposed to touch it, so you donā€™t have to. And every time you run your database migrations, you run your migrations and then you run the codegen step, which makes these author codegen and book codegen, and book review codegen exactly match your database schema.

You mentioned Active Record - thatā€™s actually extremely similar to what Active Record does. Itā€™s just that we do it as an explicit codegen step, and they do it when your Ruby process boots.

Right. Yeah, theyā€™re doing it at runtime, and I never see that code.

Yes. Right. And we do it at codegen time. So thereā€™s a slight assertion that if you change your migration once, and then youā€™re gonna run your test loop 50 times, moving the schema and all of that stuff to happen only once would more ties to cost potentially. But the biggest reason we do it is just the TypeScript can see it.

Thatā€™s what I was gonna guess. You get types available to a compiler. They do not.

Yes, 100%. Thatā€™s 95% of why we do it. But yeah, and so - just to follow on that a little bit, the author codegen always exactly matches the database, but the author, to go back to Active Record in like 2008, when you pop open an Active Record, you class-author it and like that is it. Versus all of the Java Hibernate, whatever annotations and gyrations you have to do compared to today, itā€™s just magical. And so thatā€™s what Joist provides, tries to mimic.

But then that author.ts - yeah, it gives you a place to add business logic. It lets you add other getters and setters that are built on top of the other ones, or other relations that are built on top of the other ones, or validation rules, or lifecycle hooks, and all of the real meat of your domain model, separate from the boilerplate of what my getters and setters are.

Can you quickly go into what are the lifecycle hooks that you expose, and how they interact with your validations, and other pieces?

Yeah, sure. So letā€™s seeā€¦ I think we only have three or four. So before flush, we should start with before create, and before update. So before create only runs on creates, before update only runs on updates; before flush runs on either. So maybe a business logic that wants to run on either create or update. And then we have before delete. And then I think weā€™ve got like an after committed, which the idea there was okay, after I commit, I want to make a wire call to another system, but I need my primary key to be set first before I make a wire call, some of these things. We donā€™t use after committed very often, but I think weā€™ve got that. And there might be one other oneā€¦ But really, the meat and potatoes is before update, before create and before flush.

And we also have validation rules, and the difference there is that ā€“ so the lifecycle hooks can have side effects, like youā€™re allowed to, in an author before create be like ā€œOh, fuck, I want book by defaultā€, or something like this. Or you can tell that I already have some books, but I need to auto publish them, or I donā€™t know, whatever your business logic is. Then the validation rules - those donā€™t have side effects.

So we use a unit of work pattern, which also throws people off a little bit, like itā€™s a little bit differentā€¦ Actually, that is where we depart from just being an Active Record clone, is in Active Record youā€™ll do things like author.save, and it literally does a SQL there. Or author.update, and it literally does a SQL there.

To the point of data loader, we want to wait a little bit anyway, but also, we use this unit of work patternā€¦ Itā€™s called entity manager. Again, kudos Micro, because thatā€™s what it was called in Microā€¦ But itā€™s basically a per-request cache of data youā€™ve already got from the system.

[42:08] So if you go back to the old days of the hibernates, and these sort of thingsā€¦ Like, caching was a really big thing, but they would do like caching like at the machine level, or the process level, and all of a sudden you have to worry about staleness, and all of these other sort of thingsā€¦ And thatā€™s just too hard for me to really be excited about. And so Joist right now doesnā€™t do any sort of process sort of cache; it just does per request, which is usually pretty safe. Like, for this request, Iā€™ve already got this author. And if you have, any changes that you would have made to it would be in-memory. So anyway, why was I talking about the entity manager?

So do you delay flushes until after you respond to the user?

Yes, right.

Thatā€™s brilliant.

So it doesnā€™t live across requests, but in my save author mutation or endpoint youā€™re going to make an entity manager for this request, and then do a bunch of stuff. Youā€™re going to load some authors, youā€™re gonna load some books, and youā€™re gonna mutate some authors, and youā€™re gonna mutate some books, in memory, just by touching the objects. And then when you do emflush, itā€™s like ā€œOh, let me go diffā€ whatever youā€™ve changed in your domain objects, and figure out ā€œOh, I should save all of the authors.ā€ The really neat thing there is like if youā€™ve touched 10 authors, itā€™ll batch those into a save. So it batches saves and updates, just like it does loads.

That is a very nice improvement.

Yeah. Right. I was talking to a friend the other day who was like ā€œOh, this entity manager thing was a little weirdā€, but then the first time I went to think about like ā€œOh, yeah, I donā€™t want to have 100 little different inserts.ā€ So it turns out that was nice.

But the other thing that emflush does is run our hooks and our validations. So it technically runs the hooks first, because since the hooks can run side effects, there might be other entities that it needs to then hook. Like, if you had an author, and its hook touched a book - well, okay, now I need to loop back around and do theā€¦ I donā€™t want to do the author hooks ever again, because then you get infinite loops, but Iā€™ll at least do the book hooks again. And so thereā€™s this kind of stepwise function where you handle running hooks on anything that might have been loaded by your other hooks. But then once all of that is done, then you just run all of the validation rules as a single ā€“ you know, just one pass, because youā€™re not allowed any side effects, now that all the books have settled down, and we can just validate anything that has changedā€¦ Which could be anything from not nulls, to formats of fields, or anything that youā€™d want to write as a validation rule. Someday we can talk ā€“ or not some day; maybe later we can talk about reactivity. Reactive validation rules. But anyway, I run all of those, assuming they pass, batch your inserts updates andā€¦ Bobā€™s your uncles, i havenā€™t said that in a little while, but Bobā€™s your uncle.

Okay, two questions on that. So one is how do you can make sure there are no cycles, if youā€™re allowing sort of these hooks to update things and generate hooks? I mean, I could write code that creates a cycle, so how do you make sure that doesnā€™t happen?

Yeah, kudos to Zack, who wrote thisā€¦ We just keep track of the entities that weā€™ve ran hooks on before. So we do have a loop, but the first iteration of the loop is ā€œOh, I ran it on these five entities. Put them in a ā€œhooks has already ranā€, never run them again, and okay, go on to my next, discover any new entities, and if there are new entities, run that on the second iteration.ā€ But just keep putting things in the list of ā€œnever run the hook on this author again.ā€ For this emflush anyway.

So I couldnā€™t ā€“ if I were to create a rule where I had a hook on authors that updated books, and then a hook on books that updated authors, it would not work as an add; like, the authors one would run, and then the books would update, but then their hook to touch the authors wouldnā€™t?

Yeah, the author would not have a chance to respond to whatever the book had done.

Got it.

Youā€™re right. So both will run; the author will run, the book will run, but if for whatever reason your author wanted to see the latest changes, then ā€“

See that change and react to it, they couldnā€™t. Got it. Okay, so that answers one question. The other question I had is - so youā€™re talking about batchingā€¦ When does that batch go out? Like, does it wait until Iā€™ve done a user response, or it just waits for some amount of work? What tells the flush that ā€œHey, Iā€™ve got enough things to do an actual write over the wireā€?

[46:20] Whatever you in your endpoint decide to call emflush, thatā€™s when the SQL calls happen. So itā€™s kind of up to you ā€“

Got it. So itā€™s explicit.

Yeah, itā€™s an explicit call to emflush. Itā€™s not like put on the event loop to do ā€“ and typically, we do a lot of GraphQL mutations, and itā€™ll be get the input, load the entity, put the input into the entity, and then call emflush, and then return the results.

Got it. But you could in some ways, you could start streaming that result back out, and then put the flush in like an after request hook, or something like thatā€¦?

Yeah, yeah. Yeah. Joist does not have any opinions on when you would start returning data to your users, versus when you try and commit things to the database.

Do you have visibility into the requests, such that you can catch if there is unflushed changes that are still there at the end of a request?

Off the top my head, no. We do have some ā€“ so Joist tries to be, in general, a pretty unopinionatedā€¦ Well, I mean, it has opinions. But in terms of your APIs; it doesnā€™t really care about whether youā€™re using RPC, gRPC, or GraphQL, or Pacifier, Express, or all of those sorts of things. So I think because of that, it doesnā€™t really have any visibility into any of those.

My one caveat is just because we personally use GraphQL, weā€™ve built a fair amount of scaffolding and glue kind of on top of Joist for living in a GraphQL worldā€¦ But thereā€™s a pretty clear delineation between the core ORM that could work with anything, and then our little GraphQL polish, that is pretty neat, but optional on top of it. So yeah, for your musing of drilling into the request - yeah, it would be exercise to the reader, I guess.

Yeah. Well, so how do you have per-request caching then, if you donā€™t have visibility into the request?

In the GraphQL server you get a little hook of making a contextā€¦

Okay, so itā€™s outside of the ORM core.

Yeah, yeah. Right. So when we boot up our GraphQL server, itā€™s gonna ask for ā€œA new request came in, and itā€™s a Fastify request entity. How do you want to turn this into a GraphQL context parameter for all of the resolvers to use?ā€ and that sort of thing. And thatā€™s where weā€™re like ā€œOh, okay, this is a new request; letā€™s make a new entity manager.ā€ And we put it on the GraphQL context. Fastify lets you put things onto their request, too. Actually, we do that first. So weā€™ve got middleware that first sets request.context. Because we do have some Fastify handlers that are not GraphQLā€¦ So they also have access to the request context, and hence the entity manager, if they wanted to do anything. But then most of our stuff ends up being GraphQL mutations that then uses that context parameter to get the entity manager out of the request context, and do any loads, or do any saves.

Got it. So basically, you insert middleware, and thatā€™s what sets up the entity manager caches for that instance of the entity manager, go, go go. So you could actually, in theory, at the end of your middleware request, wrapping around on the other side, say ā€œAre there unflushed changes? Maybe I need to throw an errorā€, or something like that.

Yeah, actually, the assistant that I did back in the Java days did do that. Our middleware created whatever our version of unit of work was back in the day, and auto-flushed, or auto-committed whatever the ā€“ surely it was hibernate back in the dayā€¦ But the middleware did it. And our middleware doesnā€™t. I guess I donā€™t have a real good reason for whyā€¦ But no, yeah, youā€™re right, it could.

[50:07] Nick, I feel like Iā€™ve been talking a long time againā€¦ Whatā€™s going on? [laughter]

Yā€™all are going deepā€¦

Sorry, I get excited about this stuff.

No, I love it. Yeah, yeah, this is great. Youā€™re asking way deeper questions than I know as a casual ORM user that usually just yells at it, but bangs my head against the wall until it starts workingā€¦

Well, you havenā€™t used Joist then.

I havenā€™t yet, not. The last one was TypeORM, and thatā€™s why I have those opinions, I guessā€¦ [laughs]

Yeah, I was talking to a friend, a mutual friend, who was like ā€œYou canā€™t really understand the ergonomics of an ORM until you just use it in hateā€, or hopefully not hateā€¦ But itā€™s hard to know until you get into the guts of whatever youā€™re building, and youā€™re like ā€œOh, this actually does or does not suck.ā€

So I guess to take it up a little higher level, as an ORM creator, and just like a user of them too, what are some arguments against ORMs that youā€™ve commonly heard?

Yeah, sure. I think that ā€“ well, and Iā€™m going to insert my bias into these answers obviously, because Iā€™m kind of an ORM apologistā€¦

Please.

ā€¦but I can think of two things. One is that - and actually, a different mutual friend that we were talking about this the other week isā€¦ If you go way back in time, thereā€™s the paper from the guy, ā€œORMs are the Vietnam of computer scienceā€, I forget his nameā€¦ Which made the rounds pre Hacker News. And his a certain ā€“ actually, itā€™s been so long, I forget the assertions of his paper, but it was one of the first ORM pushback things back in the day. And my recollection at the time - and I should go read it - was that if you go back to circa 2000-2005, which was maybe peak ORMs, but also peak ORM hate and kind of pushbackā€¦ What I saw, the people who hated ORMs - and again, this is where Iā€™m gonna read my bias into it as kind of an ORM apologist - was ORMs would try to work with a database schema that just fundamentally was not great. So circa 2000, whatever, typically youā€™re doing Java Enterprise stuff in the back-office of an insurance company, that has a schema designed in 1970, or something like this, and youā€™re like ā€œWow, this schema has fields that are like four characters long in all constantsā€, or something like this. Which - that in and of itself was not terrible. But ORMs there in the mid ā€˜90s and into the 2000s tried to make their pitch of taking generally bad - I donā€™t wanna say bad; legacy, letā€™s use legacy - relational schemas, and then like crafting them into this beautiful object model that was genuinely different from whatever the database schema was, and patch over the sins of the past, and these sorts of things. And I think thatā€™s just hard, and difficult. Thatā€™s where you get a lot of this mismatch, the whole ORM versus relational mismatch; thatā€™s where all of this comes from And I assert that if you donā€™t try to do that, if you just make sure you join companies and jobs for you to greenfield database schema - disclaimer that you can cheat - and Iā€™m certain that object relational mismatch goes away. Because if youā€™ve got a nice schema, and you just treat your tables as entities, or even as a little graphā€¦ Joist thinks in terms of graphs; it thinks that this entity points to this other entity. And granted, the database nodes are not explicit entities; itā€™s not a true graph, where you can go to a node and put attributesā€¦ Or sorry, you canā€™t go to an edge and put attributes on an edge, on a foreign key. But otherwise, you really canā€™t think of the entities in your database as a graph. And at that to me is just kind of this ā€“ if you squint and you look at a relational database, you can squint and see a graph, or you can look at a graph and squint and see a relational database. And look at an ORM and squint and see either of thoseā€¦

[54:00] So as long as youā€™re doing a one-to-one mapping. So as long as youā€™re doing a one-to-one mapping between those domains, I think the object relational mismatch doesnā€™t exist. Maybe thatā€™s a strong statement, but at least in the way that annoyed so many people back in the day. So thatā€™s one of my big, main assertions as an ORM apologist, is just donā€™t try to do that; donā€™t try to do all of the crazy stuff that they did circa 2000s, of crafting your database schema to be something better and different.

One of the big arguments that I have seen and been on both sides of in the past was ORMs make it really easy to write non-performant SQL. And I cannot tell you how many N+1 loops I cleaned up back in the day. However, it seems as though you may have solved that with Joist.

I was gonna say. Right.

So that is a big argument in favor, actually. And one of the things ā€“ when I saw what you were doing with Dataloader and Joistā€¦ Because to be honest, I looked up Joist this morning, in preparation for this episodeā€¦ But I saw that and I was like ā€œWhoa, they solved that problem. Thatā€™s amazing. Thatā€™s a really, really good.ā€

I do think there is a question around - so one of the things that ORMs do is they push you to kind of make some simplifying assumptions about the structure of tables, and one of the examples is the default, easy way to do it is every table has a single ID, thatā€™s a sequence, or an incrementing number, or something like thatā€¦ And I have seen the argument - and I donā€™t know how I feel about it, but Iā€™m curious your take - that those basically allow you to get away from really deeply thinking about how your data naturally should be structured. Now, my argument is most of the people writing these database migrations are not data scientists or data engineers, and those simplifying assumptions, on average, lead to better data, or better data structures than when somebody is trying to do every custom oneā€¦ But Iā€™m kind of curious what youā€™ve seen in that domain.

Yeah, I mean, again, maybe this is just my bias or my background talking, but Iā€™m just so bought into the concept of entitiesā€¦ Like thinking about your domain model in terms of authors, and books, and book reviews, and claims, or whatever domain that youā€™re interested in, is figuring out the logical identities of the entities involved. Thatā€™s just so much how I think that, of course, Iā€™m gonna just have tables with ID primary keys that match my entities, and that sort of thing.

So I guess - and maybe I guide myself to be working in jobs that do that; like, Iā€™m just building another web backend in my version of Rails, or whatever, that I think it works out really well. Iā€™m sure there are niche use cases out there of ā€“ I do so much OLTP, but like the OLAPs of the world, which are all metrics, and those sort of thingsā€¦ Like, I could see that when you really start caring about storing data in columns, and columnar databases, and all of these - well, although you could maybe represent those as entitiesā€¦ But yeah, I think I can buy that for certain super-optimized niche cases, but I would assert that the benefit of the doubt is on that itā€™s probably empty, and onus would be on the other side to prove why itā€™s not, and they need some sort of esoteric way of storing it.

But the other thing is that if youā€™ve got that rationale, then go for it. You donā€™t have to use an ORM for everything ā€“ which was going to be my other point. The other thing that I hear, and often criticism over - and kind of back to your queries - is that ORMs almost always have query builders, and once you start to get to a certain esoteric level of a query builder, that does like aggregates, and group buys, and inner queries, and all of these other sort of things, like, is it really that much simpler than writing SQL? And I agree, itā€™s not.

[58:00] So once youā€™ve hit a complexity of a SQL query that is like basically doing group buys and aggregates, I personally am not interested in writing a query builder to represent that, especially in a type-safe wayā€¦ Except for - and Iā€™m gonna say it wrong, but Kyselyā€¦ However you say that. Theyā€™ve done great. So go use that. Like, if youā€™ve got a sufficiently complex query that needs all of these sort of things, get as low to the metal in terms of SQL as you can.

But if I look at the app that weā€™ve got in production, probably 90% of our queries and mutations donā€™t need that, and can go through the ORM for super for free data access. And that last 10% - yeah, donā€™t treat an ORM like the hammer that you have to do everything for. For that last 10%, where you need the escape hatch to connect, Kysely, or even just a raw SQL string - thatā€™s fine. Take it if you need it. But I think a lot of anti-ORM people see those use cases, and now theyā€™re like, ā€œWell, I canā€™t use the ORM for the other 90%ā€, that is just like select star from books, or ideas, whatever.

Yeah, I think that thatā€™s a big touchy point for me, tooā€¦ When itā€™s like, ā€œI could do this in SQL, and I know how to do it there, but now I have to learn this seemingly esoteric way of doing the exact same thing.ā€ That kind of turns me off from it.

The thing that I love about ORMs, that was one of the reasons I was advocating ORMs in a previous place that was somewhat anti-ORM, is the standardization of lifecycles and validation.

Because when you go to raw SQL, or a raw query builder, or something like that, thereā€™s no obvious place to put that validation in a way that you know no matter how this thing is being updated, no matter whatā€™s going on, weā€™re going to make sure that the things that need to be true or should be true, are true.

Yeah, your business invariance. A hundred percent.

Exactly. And so what an ORM does is it provides a standardized way of representing and talking about and writing invariants that donā€™t live in the database.

A hundred percent.

And thatā€™s incredibly valuable.

Yeah, I agree. Obviously, Iā€™ve heard a lot of the feedback and assertions of the subset of the community or the set of the community that likes to use raw SQL. I go to the same thing; where do you validate that your author title is what you want it to be every time you touch an author? And I donā€™t knowā€¦ Yeah, I miss it.

Is there anything that we havenā€™t talked about, that you would like to share with folks listening to this episode, whether itā€™s about Joist , ORMs in general?

Well, yeah, actually, I was thinking, the one thing that we havenā€™t talked about too much - and we can try and keep it short, because it could getā€¦ Well, I donā€™t know if it could get long. But Joistā€™s concept of reactivity, basically. So a lot of what weā€™ve talked about in Joist so far is ā€“ I mean, itā€™s cool, but not necessarily novel, per se. I guess a dataloader baked in all the way is probably novel.

Iā€™ve found it novel.

The data loader baked in all the way - thatā€™s great. I love it.

The type marking the collections as typed or notā€¦ Whatā€™s novel - actually, in Micro, I believe [unintelligible 01:01:03.29] has a form of doing that now, which is greatā€¦ He was ā€“ Iā€™m trying to think if I had done a PR to do a version of that in Microā€¦ But I was chatting with a Microbyte, and I took a lot of stuff from Micro, and heā€™s like ā€œIā€™m gonna take the type hints from Joist and put them inā€“ā€, which wasā€¦ The cross-pollination was great. But I believe that kind of the last sort of novel thing that Joist has is the concept of reactive validation rules and reactive derived fields. So what these are is ā€“ so a lot of lifecycle hooks and validation rules by default only run if youā€™re touching the entity itself, right? Letā€™s take an author, and youā€™ve got some business rule that the first name has to be this or that, or this or that. But what if youā€™ve got a business rule on an author that depends on entities that it touches. Like, I can only have ten books. Or none of my books can be published until Iā€™m published, or until Iā€™m over the age of 18, or something like this. And so Joist has a way of defining both validation rules, and then also drive fields in the database. The quickest one that comes to mind is like author and number of books. Letā€™s say youā€™ve got an author search, you want to search a bunch of authors, and one of the things you want to search on is number of books is greater than ten, or greater than a hundred. And you can do that by doing the join into the box, and then doing the sum, and aggregate, and all of that. But a lot of times with these dashboard pages, itā€™s great to just have an int column. I just want an int column in the database.

[01:02:28.26] Yeah, you have a counter cache, basically, thatā€™s like caching the count as it ā€“

Right, exactly. And it makes it super-easy. And so letā€™s say that we want to do these. So in Joist you can define one of these, like ā€œIā€™m going to define a number of booksā€, and we take those populate hints that used to tell us like the subgraph of entities to populate so I can get data to them - we take that same concept and we put it into our number of books declaration, and say like ā€œPopulate this subgraph of the author, the author and the books, so I can count them, and return ten.ā€ And thatā€™s all you do. And youā€™re like ā€œOkay, thatā€™s great. But what does that get me?ā€ What that gets you is that Joist has the capability to reverse that load hint, and now find any place that would basically invalidate that number, and have it walk back ā€“

Right, you have a book created; let me go back and load this and update it.

I love that.

Whenever a book is created, whenever a book is deletedā€¦ Also, in this little reactive hint, you can say like - maybe you only want to count the books that are published, or something like that. Some attribute of the book. And so then if you go back to the book, and only when that attribute changes - like the book moves from published or whatnotā€¦ Itā€™s like ā€œOh, I know that the author drive field.ā€ Or it could be a validation rule. It depends on this field changing; so I will go find my author - even though heā€™s not loaded into memory yet, I will go load him into memory and tell him to run as part of the emflush. Another big win for the emflush is if you were just doing ā€“

It will autosave these sorts of derived entities.

Thatā€™s really cool.

Yeah, right. Like, even though youā€™ve only loaded the book, and youā€™ve only touched the book, youā€™ll get into the flush, and itā€™ll be like ā€œOh, I need to do these other 20 things.ā€

Thatā€™s beautiful.

And load them up and tell them to recalc, or revalidateā€¦ And that is slick. Itā€™s slick.

That isā€¦ Because that allows you to do something weā€™ve gotten very used to in the frontend world, at least in Vue and other things, where youā€™re basically creating these declarative interfaces, where youā€™re like ā€œThese are the things that must be true. Go and make it so. I donā€™t want to have to worry about them.ā€ But doing that at the data layer is not something that Iā€™ve seen done very often.

No. Yeah, the only thing that I can think of thatā€™s somewhat similar is like the super-big data guys; if you get into like Materialize IO, theyā€™ll sit on top of Kafka queues, and watch for the ā€“ or the other place the is if Postgres ever gets like fully reactive views, you can declare a materialized view. But the materialized view refreshes every ten minutes or so. And there are certain places out there that are trying to get materialized views that incrementally update, and they can tell whatever the components of the materialized view update, and then do incremental updates. Because a full update would be expensive. So Materialize does it, and then people have played around with doing it at Postgres. But other than thatā€¦

But this is at the application layer.

Because there are ways that you can do these triggers within your database, and whatever, whateverā€¦ And I donā€™t want to be mucking around with stored procedures in my database. I want application logic that I can test writing my unit tests, and just make it happen. Okay, so what are the limitations on that? Can I bridge through entities? Can I say, ā€œOkay, this entity depends on that entity, which depends on this following entity?ā€

Yeah. Yes. [laughs]

Okay. Just yes. Okay.

Okay, only because we do have one boundary case that came up just last week, where ā€“ or maybe it was two weeks ago. Weā€™ve got it written up ā€“ weā€™ve mentioned this difference in Joist between the core database relations of yesā€™es as a foreign key, yesā€™es in a many-to-many, yesā€™es as a many-to-one, right? So you can string together these reactive things with the core database relations, and everything 100% works.

[01:06:10.23] You can also go through one of your director relations, where itā€™s not quite in the database ā€“ itā€™s not in the database, but you want to have like number of books depends onā€¦ Weā€™ve got an example of this; number of books depends on the book being public. And that in and of itself is a drive value. But if you put two of these in a row, weā€™ve got a bug there we need to go fix. Like, going through and stacking these non-core database thingsā€¦ So just being upfront honest, weā€™ve got a known issue with that. But I think thatā€™s just ā€“

Thatā€™s a bug, not a core limitation.

Itā€™s a bug. The core model, the core mental model works for anything you can think of, asterisk that guy, that weā€™ll get it fixed here next hack day, or something like that.

Iā€™m excited to go and build some stuff with Joist now, because I have been looking for a JavaScript or TypeScript ORM that got anywhere close to what I remember from Active Record.

Yeah, me too. [laughs] The other disclaimer that we should ā€“ we really like tagged IDs. And I donā€™t know if this is ā€“ tagged IDs are awesome. So usually, your entities come out with ā€“ itā€™s an int in the database, and then you expect the author.id to be a numberā€¦ Ourā€™s are strings, and when we need a flag; like, we love that, because you see ā€“ and we put theā€¦ Every entity gets a little tag; like author gets abbreviated to A, and book gets abbreviated to B, and these sort of things. And so those will show up in your IDsā€¦ Which actually we did selfishly, because we were doing GraphQL, which wants string IDs anywayā€¦ And so instead of every time we put an ID on the wire, weā€™re like ā€œint to string whateverā€ Anywayā€¦ So it was just so ergonomic for us to have the ID getters and setters take stringsā€¦ And then we took that opportunity to sneak in prefixes, which once you ā€“ like, Iā€™m super-addicted to that now. Seeing an ID go by in the log and being like ā€œOh, thatā€™s BR26. I know thatā€™s a book review.ā€ And it also keeps your IDs from accidentally switching. Like ā€œOh, I got a response, and I got an ID of two. But then I put it back into the system. And itā€™s technically a book ID, but I put it back into the system as a book review ID.ā€ Well, book review two technically worked, and so my foreign key didnā€™t catch it. But if you tag your IDs, Joist will notice that the tag is wrong, and it will blow up at runtime.

I mentioned that just in terms of ā€“ this is a tangent away from if you go play with it; I think most people do probably expect the IDs to be numbers, and we totally want to support that, because itā€™s a very valid way to use an ORMā€¦ But I just havenā€™t gotten around to adding that config flag yet, soā€¦ Hopefully, you donā€™t mind ID strings, or IDs that are strings. Or if you really donā€™t like it, you can submit a pull request to support the numbers.

So, question - are they strings in the database level as well?

Okay, got it. So youā€™re using a sequential integer or something, but when you pull it out, you put the tag at the front of it, and thatā€™s how itā€™s represented within Joist. Okay. Got it.

Yeah, right. I mean, that would be weird, to do strings ā€“

That was my initial reaction, because like ā€œWait, is he saying what I think heā€™s saying?ā€ But no, itā€™s ā€“ okay, thatā€™s entirely at the Joist layer.

Yeah, we do support UIDs columns. If you want to use UIDs as primary keys, you [unintelligible 01:09:24.25] so we snuck that one in, butā€¦

And those are also tagged in the Joist layer?

You can chooseā€¦ You can choose to tag your UID, or you can choose not to. And I guess ā€“ I donā€™t think we give you that choice with just the regular numbersā€¦ We definitely donā€™t give you the choice to use a number number. But I guess why not let you use ā€“ technically, itā€™s a string with an untagged number in it.

I agree that Joist is awesome. I saw you present about it at Nebraska.js, and Iā€™m excited to have an excuse to get away from TypeORM, or to explore more of whatā€™s out there.

Yeah, for sure.

I donā€™t have the same history with Active Record, or Rails, or anything like thatā€¦ Yeah, but itā€™s super-interesting how much thought goes into these, and it really seems like youā€™re doing really well with the ideas in Joist. So Iā€™m excited about it.

Yeah, good. Fr sure, go kick the tires, let us knowā€¦

Yeah.

Weā€™ve got a little link to Slack that I think probably worksā€¦ For a while it did not work, so let me know if that doesnā€™t work, or file an issue.

Definitely. We will have links to all of this in the show notesā€¦ And Stephen, thank you so much for joining us this week. I learned a lot, and I learned that thereā€™s a lot that I donā€™t know as well.

Oh, no, thanks for the invite. I had a ton of fun. That went really well. Kudos for driving the agenda really well, with great questions. I loved them.

Well, let us know what you think ā€“

This was delightful. Thank you for putting up with my interest in diving down into weird niches.

No, no, itā€™s great.

Yeah, it was a fun conversation. Alright, we will see you next time.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. šŸ’š

Player art
  0:00 / 0:00