Theyâre 100% different things. A vector DB will hold what are arrays of numbers and look like blob to a human. The queries you do against a vector DB are mostly âwhat are the vectors most similar to this input vector?â where you input vector is also an array of numbers.

A Graph DB, on the other hand, is not much more than a relational DB. Nodes are like rows in tables, edges are like foreign keys. Graph DBs store run of the mill information we typically think of when we think of storing things in a relational DB.

Thanks for the explanation. Why is finding similar vectors important? Wouldnât finding closely connected nodes be relevant? Do you have an example of a query or dataset I can experiment with?

Vectors, in a vector db context, represent âembeddings.â

An embedding is a representation of some problem/concept/idea as an N-dimensional vector. Usually these are generated by some ML algorithm. Helpful explanation, I know.

One of the most common embeddings is word2vec where a sentence in natural language is converted into a vector. Why would someone convert natural language to a mathematical concept?

The nice property that falls out is that you can then start applying normal math operations to the sentence.

Something simpler to think about are hex codes for numbers. red and green have their own hex codes:

red: 0xFF0000

green: 0x00FF00

Since theyâre represented as numbers, we can add red + green and get yellow:

red 0xFF0000
+ green 0x00FF00
= yellow 0xFFFF00

And we can ask other questions about colors like âis orange (0xFFA500) more similar to yellow (0xFFFF00) or blue?â

Clearly orange is much more similar to blue than yellow as the magnitude of the difference of the two colors is smaller.

Note: color is really a 3 dimensional vector here (R part, G part, B part) so it would be more accurate to model each color as a vector and find the cosine similarity between the vectors. The cosine similarity of orange and yellow is nearly 1 (very similar), the cosine similarity of orange and blue is 0 (no similarity).

Back to word2vec â

suffice it to say, some researchers figured out what values to give words to produce vectors that, when finding the cosine distance between them, results in finding semantically similar text.

There are embeddings everywhere these days. From general content embeddings that you can use to find similar content to domain specific embeddings to find things like âabusive contentâ, âhate speechâ, violence, gore, etc. (sorry for the negative slant, I used to work on abuse @ Meta).

VectorDBs provide an efficient way to search all the vectors you may be generating for your site. As an example, every photo upload @ Meta gets some set of vectors generated which is then used to check against a vector DB of known harmful images to see if the new upload has similarity to any known bad / banned imagery.

Hey Justin- Iâm a big fan of Tmux as well (mentioned in outro) and use it daily. I learned from Brian Hoganâs book a decade back. Heâs got a new version coming (see below). Potential guest?! đ

## Discussion

Sign in or Join to comment or subscribe

Matt

2024-05-20T13:08:03Z ago

To answer the question on Vector vs Graph DB â

Theyâre 100% different things. A vector DB will hold what are arrays of numbers and look like blob to a human. The queries you do against a vector DB are mostly âwhat are the vectors most similar to this input vector?â where you input vector is also an array of numbers.

A Graph DB, on the other hand, is not much more than a relational DB. Nodes are like rows in tables, edges are like foreign keys. Graph DBs store run of the mill information we typically think of when we think of storing things in a relational DB.

Matt

2024-05-20T13:17:30Z ago

âŚwhere your* input vectorâŚ

Justin Garrison

2024-05-20T13:39:32Z ago

Thanks for the explanation. Why is finding similar vectors important? Wouldnât finding closely connected nodes be relevant? Do you have an example of a query or dataset I can experiment with?

Matt

2024-05-20T14:15:28Z ago

Vectors, in a vector db context, represent âembeddings.â

An embedding is a representation of some problem/concept/idea as an N-dimensional vector. Usually these are generated by some ML algorithm. Helpful explanation, I know.

One of the most common embeddings is word2vec where a sentence in natural language is converted into a vector. Why would someone convert natural language to a mathematical concept?

The nice property that falls out is that you can then start applying normal math operations to the sentence.

Something simpler to think about are hex codes for numbers.

`red`

and`green`

have their own hex codes:Since theyâre represented as numbers, we can add

`red + green`

and get`yellow`

:And we can ask other questions about colors like âis orange (0xFFA500) more similar to yellow (0xFFFF00) or blue?â

Clearly orange is much more similar to blue than yellow as the magnitude of the difference of the two colors is smaller.

Back to word2vec â

suffice it to say, some researchers figured out what values to give words to produce vectors that, when finding the cosine distance between them, results in finding semantically similar text.

There are embeddings everywhere these days. From general content embeddings that you can use to find similar content to domain specific embeddings to find things like âabusive contentâ, âhate speechâ, violence, gore, etc. (sorry for the negative slant, I used to work on abuse @ Meta).

VectorDBs provide an efficient way to search all the vectors you may be generating for your site. As an example, every photo upload @ Meta gets some set of vectors generated which is then used to check against a vector DB of known harmful images to see if the new upload has similarity to any known bad / banned imagery.

Justin Garrison

2024-05-20T23:58:35Z ago

very insightful. Thank you.

Andrew Atkinson

Minneapolis, MN, USA

2024-05-21T05:02:44Z ago

Hey Justin- Iâm a big fan of Tmux as well (mentioned in outro) and use it daily. I learned from Brian Hoganâs book a decade back. Heâs got a new version coming (see below). Potential guest?! đ

https://x.com/bphogan/status/1783939076149621216