The ORMazing show

2023-05-22T00:46:49Z ago

on the “can Joist do the joins” section of the discussion (around minutes 25-30), you may be interested in this approach which

Can speak to any arbitrary backend (SQL, Rest, gRPC, etc.)
Allows one to express the full depth of their query. E.g., customer.queryInvoices().queryLines().queryTrack().whereName(P.startsWith('C'))
For SQL backends, can optimize those queries to issue joins

Example: https://observablehq.com/[@tantaman](/person/tantaman/aphrodite-chinook

I’ve mostly abandoned that ORM to focus on https://vlcn.io but I’m free to discuss how the ORM works, how to build a query planner and optimizer, how to optimize query plans for different backends, etc.

2023-05-24T01:03:29Z ago

Yo! Amazing that you pinged. I had noticed vlcn-orm while poking at around! :-)

In terms of your #2, I believe we have that (if you just mean deep where clauses), it would look like:

em.find(Customer, { invoices: { lines: { track: { name: { startsWith: “c” } } } } });

(…disclaimer I’m realizing we don’t actually have a startsWith operator, we just have like/ilike so far…)

But I’m definitely curious what sort of optimizations we might do? Currently the ^ turns into:

SELECT * FROM customers JOIN invoices/lines/tracks/etc. WHERE t.name LIKE ‘c%’

Granted, this is for WHERE clauses; the “Joist can do joins” segment iirc was also discussing less of WHERE clauses per se, and more loading multiple entities/tables in 1 query, i.e.:

const author = await em.load(Author, “a:1”, { books: “reviews” });

Could/should (it does not today) do a single SELECT a.*, b.*, r.* FROM ... that joins into books + reviews to bring back three levels of entities all in 1 SQL query (as long as both “books” and “reviews” database-based relations and not domain-layer abstractions).

This is specifically what we don’t have implemented yet, although https://github.com/stephenh/joist-ts/blob/main/packages/orm/src/dataloaders/lensDataLoader.ts (which does something different, but related) has the guts of the “take a load hint and find out which tables to walk/join across” infra–we just haven’t taught em.populate (our preload hint implementation) how to: a) split load hints into “db-based” vs. “model-based” relations, and then b) for the db-based relations, build the SELECT a.*, b.*, r.* FROM ... statement.

But also happy to learn of any optimizations we could/should be doing there.

In terms of vlnc-orm/Aphrodite, I just found SQLExpression and the concept of hoisting, but admittedly in like 1 minute of reading, I’m not entirely sure what optimizations it’s doing. I’ll keep looking at it, but would love a tldr/overview. :-)

If you’d like to reply here, that’s great, but also feel free to follow up at stephen.haberman@gmail.com or drop an issue/discussion into the joist-ts github project. Thank you!

2023-05-25T15:32:00Z ago

em.find(Customer, { invoices: { lines: { track: { name: { startsWith: “c” } } } } });

Is this finding the Customers that have purchased tracks that start with “c”?

I was thinking of the case where you have a specific customer node and want to go from that customer to all their purchased tracks that start with “c”.

From the interview I got the impression that that wasn’t possible in joist and that there’d be many round-trips. E.g.,

Trip 1: Load the Customer entity
Trip 2: Load their invoices
Trip 3: For each invoice, load the linked line items (this step gets batched by data loader)
Trip 4: For each line, load the related track (also batched by data loader) with the startsWith filter

That’s where you all got into the conversation of stubbing out promises so you could know the intent of the user to load more data.

The “stubbing out promises” idea is approximately related to the route I was taking in Aphrodite.

For each model I generate two classes:

The model class
A query class (the analog to the “stubbed promise”?)

The Query class allows the user to express their full intent of everything they’d like to load / the full graph of objects they’d like to fetch and then work with.

A ModelClass has getters for each field and queryX for each edge from that model
A QueryClass has queryX, whereX, orderByX for each edge and field. queryX on a query class takes an prior query as its input rather than an actually loaded model.

E.g.,

Customer.queryById(customerId).queryInvoices().queryLines().queryTrack()

Says:
“Starting from customerId, fetch me their purchased tracks by traversing through their invoices and the invoice line items”

Looks like this is the same idea as the lens data loader you linked.

Examples of the generated code:

Query class: CustomerQuery
Model class: CustomerBase

Note that the predicates and “model field getters” you see are for the cases where an operation provided by the user cannot be hoisted into a SQLExpression. In those cases the filter will be applied in JavaScript rather than SQL. This allows some interesting features like filtering by a lambda: TodoList.queryAll().whereLambda(() => ...)

In terms of vlnc-orm/Aphrodite, I just found SQLExpression and the concept of hoisting

The expression abstraction was to allow different backends besides SQL. The idea is that a SourceExpression consumes all the steps of a query it encounters (created by the query classes) and converts them to operations against the backend represented by that expression.

The big idea was to support entities that come from different backends (SQL / Redis / ZippyDB / Memory) and have edges to one another. Imagine having a single interface to gather and relate all your data, even across data stores.

Anyway, if you’d like to keep reading 😅 I wrote a bit about queries and expressions: https://tantaman.com/2022-05-26-query-planning.html

The simplest Expression is the MemorySourceExpression which just allows querying objects that reside in-memory.

2023-05-26T02:07:18Z ago

Is this finding the Customers that have purchased tracks that start with “c”?

Ah yep, it is…

I was thinking of the case where you have a specific customer node and want to
go from that customer to all their purchased tracks that start with “c”.

Oh sure. Yes, if doing await customer.populate({ invoices: { lineItems: "tracks" } }) and then doing gets/maps/flatMaps against the in-memory data structures, then, yes, it will load all ~3-4 levels into memory with one-per-level SQL queries. This is generally a very idiomatic pattern for us/Joist, and so far one-query-per-level vs. the obviously worse N+1 has been enough of an optimization for us.

That said, we also have lens-based path navigation, which is closer to what you mentioned, like:

const tracks = await customer.load(c => c.invoices.lines.tracks);

But currently they don’t support conditions (something like lines.tracks.if(t => t.name.startsWith(...))), but we can do the hops via just SQL (so hoisting, if I’m following your terminology–that code is the lensDataLoader from my previous comment), however the SQL-joined based “fast path” is behind an opt-in flag, like:

const tracks = await customer.load(c => c.invoices.lines.tracks, { sql: true });

The rationale is that traditionally Joist prioritizes a consistent view of any WIP changes in the unit-of-work/loaded graph, so if we were to execute ^ as SQL by default, then tracks might be missing Tracks that have WIP mutations to an InvoiceLineItem, or a newly created Invoice that has lines + tracks that would also match the condition, but are not yet flushed to SQL for the SQL join to find them.

So, I wrote the { sql: true } flag as a proof-of-concept optimization for our path navigation, but we have so much business logic in our app that relies on this “consistent view of WIP changes” assumption/principle, that we’re generally scared/hesitant to actually use the SQL-only feature, except in opt-in cases where we know there are no WIPs.

…granted, we have all the info to detect whether any Track.invoiceLine or InvoiceLine.invoice Fks have been mutated during this UOW, so we could potentially execute the fast-path method if we know for sure there are no WIP changes that affect that path.

Even then, we’d discussed this, but were nervous about the non-obvious performance difference from the same exact code, i.e. it opaquely changing from “faster” to “slower” because of out-of-sight WIP changes in the UOW.

But, perhaps we’re being too cautious…

The idea is that a SourceExpression consumes all the steps of a query it encounters

Ah! That’s interesting. We originally coupled our find/query DSL directly to a SQL query library (Knex), but eventually put an AST between the two, and that has been great + driven a lot of features that previously would have been a PITA (the big win being auto-batching of SELECT queries issued in a loop).

We’ve not yet gone to non-SQL backends, and honestly probably will not–I do really share the insight/opinion that the domain/entity model is a graph, and so business logic/graph traversal logic can be pleasantly uncoupled from the backend SQL/NoSQL/InMemory/etc.

But, that said, nearly all of Joist has been driven by day-to-day needs at the day job, where the pragmatism of “meh we’ll just assume postgres” (not just assume SQL :-)) has been an admittedly lazy but very pleasant simplification.

relate all your data, even across data stores.

Ooh! Yeah, that is fancy, and tempting… Because, yes, a lot of Joist’s principles (declarative business logic, cross-entity validation rules, etc.) are today implemented with a “just use a monolith and it’ll be fine” assumption/naivety, but the holy grail, for me, is, yes, applying the same principles across an enterprise’s entire domain model–which today requires programmers to use various glue, events, background jobs, RPCs, etc. to keep in sync. All of which is the same spaghetti code that frontend engineers wrote before declarative reactivity. But, that is a big fish to fry. Maybe someday. :-)

2023-05-26T13:45:16Z ago

The rationale is that traditionally Joist prioritizes a consistent view of any WIP changes in the unit-of-work/loaded graph, so if we were to execute ^ as SQL by default, then tracks might be missing Tracks that have WIP mutations to an InvoiceLineItem, or a newly created Invoice that has lines + tracks that would also match the condition, but are not yet flushed to SQL for the SQL join to find them.

Oh interesting. I’d never considered that but I see how that could be a real issue.

Ah! That’s interesting. We originally coupled our find/query DSL directly to a SQL query library (Knex),

I ended up doing exactly the same thing, even starting with Knex 😅

Anyway, I’ve fully embraced SQL at this point (w/ some syntax extensions…) but maybe I’ll find myself back in the ORM space one day.

If you haven’t seen it, you might be interested in browsing https://entgo.io/ for ideas.

It’s another ORM modeled after Facebook’s internal Ent Framework and represents your data as a graph.

Btw, how does Joist differentiate itself from things like Drizzle and Prisma?

2023-05-26T19:19:09Z ago

entgo

Yeah! Agreed it looks interesting. I think their most impressive feature, for me, was the ability of middleware to not just add lifecycle hooks (on author save, on book insert), but also do rewrites of queries before they hit the wire.

I.e. Joist has “automatically ignore soft deleted rows” as a feature, but we just hard-coded the “AND deleted_at != null” into the few places internally that build out queries, where as Entgo can implement soft-deletes a generic middleware layer that can modify the query AST. Which is neat! I’d like to clone that approach at some point.

Their migration approach is cool too–historically I’ve been very satisfied with “just hand-write migrations that do each incremental change”, vs. approaches where you declare your desired schema, and the tool auto-generates diff from current schema -> desired schema. But entgo’s “desired schema” is built-up with a DSL that looks neat, and can do some of the same “soft delete as a mixin” type reuse.

Those were my two main takeaways–otherwise the Go syntax throws me off just enough that it takes me awhile to dig through and really find/grok any “not just a regular ORM” features to use as inspiration. Definitely lmk if you think there are any particularly unique/interesting features that it has.

Btw, how does Joist differentiate itself from things like Drizzle and Prisma?

Afaict both Drizzle and Prisma are more query builders than “traditional ORMs”, which traditional ORM have things like rich domain models with validation rules and business logic. At least Prisma in particular afaiu the only place to add “validate this Author” business logic is in generic middleware? Not sure…I haven’t used it on a project to really deeply understand/kick the tires.

But, generally, I think Joist’s three-four killer/differentiating features are:

Guaranteed N+1 prevention, not just within a single query, but across all queries/abstractions in your app (i.e. either accidentally or purposefully lazy loading the object graph in a loop)
Business logic & validation rules
Great support for domain model-layer abstractions like derived fields, derived collections, etc., that have business logic that is not literally columns in your database, but that you can load/preload seamlessly with the rest of the actual-in-the-database fields & relations (big win/required imo vs. approaches that go direct db -> API and have no place to put the ~10-20% of stuff that is not “just your database columns”)
Novel support for automatically/reactively running cross-entity validation rules & derived fields (basically “bringing reactivity to the backend”)