PGVector integration #3351

BjornMoren · 2024-12-12T06:24:36Z

It would be nice to be able to send an embedding vector directly to PostgreSQL without having to serialize it into a string first.

// Example with Transformers.js library and PGVector extension
const transformers = await import('@xenova/transformers');
const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
const result = await similarityEmbedder(text, { pooling: 'mean', normalize: true });

const embedding = '[' + [...result.data].join(',') + ']';   // Please get rid of this step so we can send the Float32Array directly 

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

The text was updated successfully, but these errors were encountered:

dannyb101 · 2025-02-21T14:38:15Z

I would like to see this feature as well

brianc · 2025-02-27T21:14:38Z

definitely happy to work on this! if you can submit a self-contained snippet of code that works w/o needing to call any 3rd party APIs I can probably whip this out extremely quickly.

BjornMoren · 2025-02-27T21:39:07Z

Sounds great brianc. Will the below work for you?

The embedding column below is of type Vector.

// const transformers = await import('@xenova/transformers');
// const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
// const result = await similarityEmbedder("The cat sat on a mat.", { pooling: 'mean', normalize: true });
// const embedding = result.data;

// Simulated 384 vector embedding from Xenova/paraphrase-MiniLM-L6-v2 
const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

brianc · 2025-02-27T21:40:35Z

preferrably something I don't need to npm install - looks like that has a dependency on @xenova/transformers & I'm not sure what that does exactly. Basically I want to avoid taking deps on things when writing unit tests for the feature.

BjornMoren · 2025-02-27T21:41:10Z

No those lines are commented out.

const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

brianc · 2025-02-27T21:43:02Z

ohhh nice - i see i see. what about the create temp table statement for the query? Would be useful to know the types of the columns. The idea is this needs to run in CI - so the more greased we can make the rails for me to dig into it the easier I can turn it around.

BjornMoren · 2025-02-27T21:48:13Z

All good?

CREATE TABLE IF NOT EXISTS public.post_embedding
(
    post_id integer NOT NULL,
    embedding vector(384) NOT NULL
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.post_embedding
    OWNER to postgres;

brianc · 2025-02-27T21:49:41Z

aye ty - i'll take a look at this soon!

…

On Thu, Feb 27, 2025 at 6:48 PM Bjorn Moren ***@***.***> wrote: All good? CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL ) TABLESPACE pg_default; ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres; — Reply to this email directly, view it on GitHub <#3351 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY> . You are receiving this because you commented.Message ID: ***@***.***> [image: BjornMoren]*BjornMoren* left a comment (brianc/node-postgres#3351) <#3351 (comment)> All good? CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL ) TABLESPACE pg_default; ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres; — Reply to this email directly, view it on GitHub <#3351 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY> . You are receiving this because you commented.Message ID: ***@***.***>

BjornMoren · 2025-02-27T21:53:52Z

Sorry, I should have mentioned that your PG installation must have the PGVector extension installed, or the column type "vector" is not defined.

https://github.com/pgvector/pgvector

brianc · 2025-02-27T22:03:22Z

hmmm this is not straightforward & I'm not sure there's a backwards compatible way to do this actually. I just tried but it doesn't work because node-postgres already automatically converts ArrayBuffers (and typed arrays) to be compatible with the BYTEA column type (binary). You can try mapping your typed array to a normal javascript array of numbers....that might work?

BjornMoren · 2025-02-27T22:20:06Z

I can't get that to work unfortunately.

What my example above shows is a standard operation with LLMs (Large Language Models), which are becoming very popular due to the explosion of AI. They process large amounts of data. In this case a vector of 384 dimensions, but other more popular models have 2000 or even more dimensions. So this kind of stuff is coming more and more and requires an efficient way to do this.

On the other hand, when high performance is needed then normally you move away from doing these operations in Node and instead install a proper LLM server. I guess it is your call if this is something you think your PG library should have.

charmander added the feature request label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGVector integration #3351

PGVector integration #3351

BjornMoren commented Dec 12, 2024

dannyb101 commented Feb 21, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025 via email

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

PGVector integration #3351

PGVector integration #3351

Comments

BjornMoren commented Dec 12, 2024

dannyb101 commented Feb 21, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025 via email

BjornMoren commented Feb 27, 2025

brianc commented Feb 27, 2025

BjornMoren commented Feb 27, 2025