The ORMazing show
Nick & KBall sit down with the brilliant Stephen Haberman to discuss all things ORMs! đ»đ
From the advantages and disadvantages of ORMs in general, to delving into the intricacies of his innovative project Joist, which brings a fresh, idiomatic, ActiveRecord-esque approach to TypeScript. đ
So sit back, relax, and letâs dive deep into the world of ORMs with the experts!
Discussion
Sign in or Join to comment or subscribe
Matt
2023-05-22T00:46:49Z ago
on the âcan Joist do the joinsâ section of the discussion (around minutes 25-30), you may be interested in this approach which
customer.queryInvoices().queryLines().queryTrack().whereName(P.startsWith('C'))
Example: https://observablehq.com/[@tantaman](/person/tantaman/aphrodite-chinook
Iâve mostly abandoned that ORM to focus on https://vlcn.io but Iâm free to discuss how the ORM works, how to build a query planner and optimizer, how to optimize query plans for different backends, etc.
Stephen Haberman
Omaha, NE
2023-05-24T01:03:29Z ago
Yo! Amazing that you pinged. I had noticed vlcn-orm while poking at around! :-)
In terms of your #2, I believe we have that (if you just mean deep where clauses), it would look like:
em.find(Customer, { invoices: { lines: { track: { name: { startsWith: âcâ } } } } });
(âŠdisclaimer Iâm realizing we donât actually have a
startsWith
operator, we just have like/ilike so farâŠ)But Iâm definitely curious what sort of optimizations we might do? Currently the ^ turns into:
SELECT * FROM customers JOIN invoices/lines/tracks/etc. WHERE t.name LIKE âc%â
Granted, this is for WHERE clauses; the âJoist can do joinsâ segment iirc was also discussing less of WHERE clauses per se, and more loading multiple entities/tables in 1 query, i.e.:
const author = await em.load(Author, âa:1â, { books: âreviewsâ });
Could/should (it does not today) do a single
SELECT a.*, b.*, r.* FROM ...
that joins into books + reviews to bring back three levels of entities all in 1 SQL query (as long as both âbooksâ and âreviewsâ database-based relations and not domain-layer abstractions).This is specifically what we donât have implemented yet, although https://github.com/stephenh/joist-ts/blob/main/packages/orm/src/dataloaders/lensDataLoader.ts (which does something different, but related) has the guts of the âtake a load hint and find out which tables to walk/join acrossâ infraâwe just havenât taught
em.populate
(our preload hint implementation) how to: a) split load hints into âdb-basedâ vs. âmodel-basedâ relations, and then b) for the db-based relations, build theSELECT a.*, b.*, r.* FROM ...
statement.But also happy to learn of any optimizations we could/should be doing there.
In terms of vlnc-orm/Aphrodite, I just found SQLExpression and the concept of hoisting, but admittedly in like 1 minute of reading, Iâm not entirely sure what optimizations itâs doing. Iâll keep looking at it, but would love a tldr/overview. :-)
If youâd like to reply here, thatâs great, but also feel free to follow up at stephen.haberman@gmail.com or drop an issue/discussion into the joist-ts github project. Thank you!
Matt
2023-05-25T15:32:00Z ago
Is this finding the
Customers
that have purchased tracks that start with âcâ?I was thinking of the case where you have a specific customer node and want to go from that customer to all their purchased tracks that start with âcâ.
From the interview I got the impression that that wasnât possible in joist and that thereâd be many round-trips. E.g.,
Thatâs where you all got into the conversation of stubbing out promises so you could know the intent of the user to load more data.
The âstubbing out promisesâ idea is approximately related to the route I was taking in Aphrodite.
For each model I generate two classes:
The Query class allows the user to express their full intent of everything theyâd like to load / the full graph of objects theyâd like to fetch and then work with.
getters
for each field andqueryX
for each edge from that modelqueryX
,whereX
,orderByX
for each edge and field.queryX
on a query class takes an prior query as its input rather than an actually loaded model.E.g.,
Says:
âStarting from customerId, fetch me their purchased tracks by traversing through their invoices and the invoice line itemsâ
Looks like this is the same idea as the lens data loader you linked.
Examples of the generated code:
The expression abstraction was to allow different backends besides SQL. The idea is that a SourceExpression consumes all the steps of a query it encounters (created by the query classes) and converts them to operations against the backend represented by that expression.
The big idea was to support entities that come from different backends (SQL / Redis / ZippyDB / Memory) and have edges to one another. Imagine having a single interface to gather and relate all your data, even across data stores.
Anyway, if youâd like to keep reading đ I wrote a bit about queries and expressions: https://tantaman.com/2022-05-26-query-planning.html
The simplest Expression is the MemorySourceExpression which just allows querying objects that reside in-memory.
Stephen Haberman
Omaha, NE
2023-05-26T02:07:18Z ago
Ah yep, it isâŠ
Oh sure. Yes, if doing
await customer.populate({ invoices: { lineItems: "tracks" } })
and then doing gets/maps/flatMaps against the in-memory data structures, then, yes, it will load all ~3-4 levels into memory with one-per-level SQL queries. This is generally a very idiomatic pattern for us/Joist, and so far one-query-per-level vs. the obviously worse N+1 has been enough of an optimization for us.That said, we also have lens-based path navigation, which is closer to what you mentioned, like:
But currently they donât support conditions (something like
lines.tracks.if(t => t.name.startsWith(...))
), but we can do the hops via just SQL (so hoisting, if Iâm following your terminologyâthat code is thelensDataLoader
from my previous comment), however the SQL-joined based âfast pathâ is behind an opt-in flag, like:The rationale is that traditionally Joist prioritizes a consistent view of any WIP changes in the unit-of-work/loaded graph, so if we were to execute ^ as SQL by default, then
tracks
might be missingTrack
s that have WIP mutations to anInvoiceLineItem
, or a newly createdInvoice
that has lines + tracks that would also match the condition, but are not yet flushed to SQL for the SQL join to find them.So, I wrote the
{ sql: true }
flag as a proof-of-concept optimization for our path navigation, but we have so much business logic in our app that relies on this âconsistent view of WIP changesâ assumption/principle, that weâre generally scared/hesitant to actually use the SQL-only feature, except in opt-in cases where we know there are no WIPs.âŠgranted, we have all the info to detect whether any
Track.invoiceLine
orInvoiceLine.invoice
Fks have been mutated during this UOW, so we could potentially execute the fast-path method if we know for sure there are no WIP changes that affect that path.Even then, weâd discussed this, but were nervous about the non-obvious performance difference from the same exact code, i.e. it opaquely changing from âfasterâ to âslowerâ because of out-of-sight WIP changes in the UOW.
But, perhaps weâre being too cautiousâŠ
Ah! Thatâs interesting. We originally coupled our find/query DSL directly to a SQL query library (Knex), but eventually put an AST between the two, and that has been great + driven a lot of features that previously would have been a PITA (the big win being auto-batching of SELECT queries issued in a loop).
Weâve not yet gone to non-SQL backends, and honestly probably will notâI do really share the insight/opinion that the domain/entity model is a graph, and so business logic/graph traversal logic can be pleasantly uncoupled from the backend SQL/NoSQL/InMemory/etc.
But, that said, nearly all of Joist has been driven by day-to-day needs at the day job, where the pragmatism of âmeh weâll just assume postgresâ (not just assume SQL :-)) has been an admittedly lazy but very pleasant simplification.
Ooh! Yeah, that is fancy, and tempting⊠Because, yes, a lot of Joistâs principles (declarative business logic, cross-entity validation rules, etc.) are today implemented with a âjust use a monolith and itâll be fineâ assumption/naivety, but the holy grail, for me, is, yes, applying the same principles across an enterpriseâs entire domain modelâwhich today requires programmers to use various glue, events, background jobs, RPCs, etc. to keep in sync. All of which is the same spaghetti code that frontend engineers wrote before declarative reactivity. But, that is a big fish to fry. Maybe someday. :-)
Matt
2023-05-26T13:45:16Z ago
Oh interesting. Iâd never considered that but I see how that could be a real issue.
I ended up doing exactly the same thing, even starting with Knex đ
Anyway, Iâve fully embraced SQL at this point (w/ some syntax extensionsâŠ) but maybe Iâll find myself back in the ORM space one day.
If you havenât seen it, you might be interested in browsing https://entgo.io/ for ideas.
Itâs another ORM modeled after Facebookâs internal Ent Framework and represents your data as a graph.
Btw, how does Joist differentiate itself from things like Drizzle and Prisma?
Stephen Haberman
Omaha, NE
2023-05-26T19:19:09Z ago
Yeah! Agreed it looks interesting. I think their most impressive feature, for me, was the ability of middleware to not just add lifecycle hooks (on author save, on book insert), but also do rewrites of queries before they hit the wire.
I.e. Joist has âautomatically ignore soft deleted rowsâ as a feature, but we just hard-coded the âAND deleted_at != nullâ into the few places internally that build out queries, where as Entgo can implement soft-deletes a generic middleware layer that can modify the query AST. Which is neat! Iâd like to clone that approach at some point.
Their migration approach is cool tooâhistorically Iâve been very satisfied with âjust hand-write migrations that do each incremental changeâ, vs. approaches where you declare your desired schema, and the tool auto-generates diff from current schema -> desired schema. But entgoâs âdesired schemaâ is built-up with a DSL that looks neat, and can do some of the same âsoft delete as a mixinâ type reuse.
Those were my two main takeawaysâotherwise the Go syntax throws me off just enough that it takes me awhile to dig through and really find/grok any ânot just a regular ORMâ features to use as inspiration. Definitely lmk if you think there are any particularly unique/interesting features that it has.
Afaict both Drizzle and Prisma are more query builders than âtraditional ORMsâ, which traditional ORM have things like rich domain models with validation rules and business logic. At least Prisma in particular afaiu the only place to add âvalidate this Authorâ business logic is in generic middleware? Not sureâŠI havenât used it on a project to really deeply understand/kick the tires.
But, generally, I think Joistâs three-four killer/differentiating features are: