FROM guests SELECT Andrew
Andrew Atkinson joins Autumn & Justin to tell them why folks should (and are) picking PostgreSQL as their database in 2024 and how to scale it.
Andrew Atkinson joins Autumn & Justin to tell them why folks should (and are) picking PostgreSQL as their database in 2024 and how to scale it.
Discussion
Sign in or Join to comment or subscribe
Matt
2024-05-20T13:08:03Z ago
To answer the question on Vector vs Graph DB â
Theyâre 100% different things. A vector DB will hold what are arrays of numbers and look like blob to a human. The queries you do against a vector DB are mostly âwhat are the vectors most similar to this input vector?â where you input vector is also an array of numbers.
A Graph DB, on the other hand, is not much more than a relational DB. Nodes are like rows in tables, edges are like foreign keys. Graph DBs store run of the mill information we typically think of when we think of storing things in a relational DB.
Matt
2024-05-20T13:17:30Z ago
âŚwhere your* input vectorâŚ
Justin Garrison
2024-05-20T13:39:32Z ago
Thanks for the explanation. Why is finding similar vectors important? Wouldnât finding closely connected nodes be relevant? Do you have an example of a query or dataset I can experiment with?
Matt
2024-05-20T14:15:28Z ago
Vectors, in a vector db context, represent âembeddings.â
An embedding is a representation of some problem/concept/idea as an N-dimensional vector. Usually these are generated by some ML algorithm. Helpful explanation, I know.
One of the most common embeddings is word2vec where a sentence in natural language is converted into a vector. Why would someone convert natural language to a mathematical concept?
The nice property that falls out is that you can then start applying normal math operations to the sentence.
Something simpler to think about are hex codes for numbers.
red
andgreen
have their own hex codes:Since theyâre represented as numbers, we can add
red + green
and getyellow
:And we can ask other questions about colors like âis orange (0xFFA500) more similar to yellow (0xFFFF00) or blue?â
Clearly orange is much more similar to blue than yellow as the magnitude of the difference of the two colors is smaller.
Back to word2vec â
suffice it to say, some researchers figured out what values to give words to produce vectors that, when finding the cosine distance between them, results in finding semantically similar text.
There are embeddings everywhere these days. From general content embeddings that you can use to find similar content to domain specific embeddings to find things like âabusive contentâ, âhate speechâ, violence, gore, etc. (sorry for the negative slant, I used to work on abuse @ Meta).
VectorDBs provide an efficient way to search all the vectors you may be generating for your site. As an example, every photo upload @ Meta gets some set of vectors generated which is then used to check against a vector DB of known harmful images to see if the new upload has similarity to any known bad / banned imagery.
Justin Garrison
2024-05-20T23:58:35Z ago
very insightful. Thank you.
Andrew Atkinson
Minneapolis, MN, USA
2024-05-21T05:02:44Z ago
Hey Justin- Iâm a big fan of Tmux as well (mentioned in outro) and use it daily. I learned from Brian Hoganâs book a decade back. Heâs got a new version coming (see below). Potential guest?! đ
https://x.com/bphogan/status/1783939076149621216