Cloud SQL for PostgreSQL Vector Database
The Cloud SQL for PostgreSQL plugin provides indexer and retriever implementations that use PostgreSQL with the pgvector extension for vector similarity search.
Google Cloud SQL for PostgreSQL with the pgvector extension provides a fully managed PostgreSQL database with vector search capabilities. It combines the reliability and scalability of Google Cloud with the power of PostgreSQL and pgvector, making it ideal for production AI applications that need managed vector storage with enterprise-grade features.
Installation
Section titled “Installation”npm i --save genkitx-cloud-sql-pg
Configuration
Section titled “Configuration”To use this plugin, first create a PostgresEngine
instance:
import { PostgresEngine } from 'genkitx-cloud-sql-pg';
// Create PostgresEngine instanceconst engine = await PostgresEngine.fromInstance('my-project', 'us-central1', 'my-instance', 'my-database');
// Create the vector store tableawait engine.initVectorstoreTable('my-documents', 768);
// Or create a custom vector store tableawait engine.initVectorstoreTable('my-documents', 768, { schemaName: 'public', contentColumn: 'content', embeddingColumn: 'embedding', idColumn: 'custom_id', // Custom ID column name metadataColumns: [ { name: 'source', dataType: 'TEXT' }, { name: 'category', dataType: 'TEXT' }, ], metadataJsonColumn: 'metadata', storeMetadata: true, overwriteExisting: true,});
Then, specify the plugin when you initialize Genkit:
import { genkit } from 'genkit';import { postgres } from 'genkitx-cloud-sql-pg';import { vertexAI } from '@genkit-ai/vertexai';
const ai = genkit({ plugins: [ postgres([ { tableName: 'my-documents', engine: engine, embedder: vertexAI.embedder('gemini-embedding-001'), // Use additional fields to connect to a custom vector store table // schemaName: 'public', // contentColumn: 'custom_content', // embeddingColumn: 'custom_embedding', // idColumn: 'custom_id', // Match the ID column from table creation // metadataColumns: ['source', 'category'], // metadataJsonColumn: 'my_json_metadata', }, ]), ],});
// To use the table you configured when you loaded the plugin:await ai.index({ indexer: postgresIndexerRef, documents: [ { content: [{ text: 'The product features include...' }], metadata: { source: 'website', category: 'product-docs', custom_id: 'doc-123', // This will be used as the document ID }, }, ],});
// To retrieve from the configured table:const query = 'What are the key features of the product?';let docs = await ai.retrieve({ retriever: postgresRetrieverRef, query, options: { k: 5, filter: { category: 'product-docs', source: 'website', }, },});
Import retriever and indexer references like so:
import { postgresRetrieverRef, postgresIndexerRef } from 'genkitx-cloud-sql-pg';
Index Documents
Section titled “Index Documents”You can create reusable references for your indexers:
export const myDocumentsIndexer = postgresIndexerRef({ tableName: 'my-custom-documents', idColumn: 'custom_id', metadataColumns: ['source', 'category'],});
Then use them to index documents:
// Index with custom ID from metadataconst docWithCustomId = new Document({ content: [{ text: 'Document with custom ID' }], metadata: { source: 'test', category: 'docs', custom_id: 'custom-123', },});
await ai.index({ indexer: myDocumentsIndexer, documents: [docWithCustomId],});
// Index with custom batch sizeawait ai.index({ indexer: myDocumentsIndexer, documents: [ { content: [{ text: 'The product features include...' }], metadata: { source: 'website', category: 'product-docs', custom_id: 'doc-456', }, }, ], options: { batchSize: 10 },});
Indexing Options
Section titled “Indexing Options”The indexer supports:
- batchSize: Number of documents to process at once
- Custom ID and metadata handling through table configuration
Retrieve Documents
Section titled “Retrieve Documents”You can create reusable references for your retrievers:
export const myDocumentsRetriever = postgresRetrieverRef({ tableName: 'my-documents', idColumn: 'custom_id', metadataColumns: ['source', 'category'],});
Then use them to retrieve documents:
// Basic retrievalconst query = 'What are the key features of the product?';let docs = await ai.retrieve({ retriever: myDocumentsRetriever, query, options: { k: 5, // Number of documents to return (default: 4, max: 1000) filter: "source = 'website'", // Optional SQL WHERE clause },});
// Access retrieved documents and their metadataconsole.log(docs.documents[0].content); // Document contentconsole.log(docs.documents[0].metadata.source); // Metadata fieldsconsole.log(docs.documents[0].metadata.category);
Retriever Options
Section titled “Retriever Options”The retriever supports the following options:
k: Number of documents to return (default: 4, max: 1000) filter: SQL WHERE clause to filter results (e.g., “category = ‘docs’ AND source = ‘website’“)
Distance Strategies
Section titled “Distance Strategies”The retriever supports different distance strategies for vector similarity search:
import { DistanceStrategy } from 'genkitx-cloud-sql-pg';
// Configure retriever with specific distance strategyconst myDocumentsRetriever = postgresRetrieverRef({ tableName: 'my-documents', distanceStrategy: DistanceStrategy.COSINE_DISTANCE, // or EUCLIDEAN_DISTANCE});
Available strategies:
- COSINE_DISTANCE: Cosine similarity (default)
- EUCLIDEAN_DISTANCE: Euclidean distance
- DOT_PRODUCT: Dot product similarity
Metadata Handling
Section titled “Metadata Handling”The retriever preserves all metadata fields when returning documents. You can access both individual metadata columns and the JSON metadata column:
// Example 1: Search for product documentationconst productQuery = 'How do I configure the API rate limits?';const productDocs = await ai.retrieve({ retriever: myDocumentsRetriever, query: productQuery, options: { k: 3, filter: "category = 'api-docs' AND source = 'product-manual'", },});
// Example 2: Search for customer support articlesconst supportQuery = 'What are the troubleshooting steps for connection issues?';const supportDocs = await ai.retrieve({ retriever: myDocumentsRetriever, query: supportQuery, options: { k: 5, filter: "category = 'troubleshooting' AND source = 'support-kb'", },});
// Access retrieved documents and their metadataconsole.log(productDocs.documents[0].content); // Document contentconsole.log(productDocs.documents[0].metadata.source); // e.g., "product-manual"console.log(productDocs.documents[0].metadata.category); // e.g., "api-docs"console.log(productDocs.documents[0].metadata.lastUpdated); // e.g., "2024-03-15"
See the Retrieval-augmented generation page for a general discussion on indexers and retrievers.
The Postgresql plugin provides the retriever implementation to search a Cloud SQL for Postgresql database using the pgvector extension.
Google Cloud SQL for PostgreSQL with the pgvector extension provides a fully managed PostgreSQL database with vector search capabilities. It combines the reliability and scalability of Google Cloud with the power of PostgreSQL and pgvector, making it ideal for production AI applications that need managed vector storage with enterprise-grade features.
Configuration
Section titled “Configuration”To use this plugin, follow these steps:
-
Import the plugin
import "github.com/firebase/genkit/go/plugins/postgresql" -
Create a
PostgresEngine
instance:-
Using basic authentication
pEngine, err := NewPostgresEngine(ctx,WithUser('user'),WithPassword('password'),WithCloudSQLInstance('my-project', 'us-central1', 'my-instance'),WithDatabase('my-database') -
Using email authentication
pEngine, err := NewPostgresEngine(ctx,WithCloudSQLInstance('my-project', 'us-central1', 'my-instance'),WithDatabase('my-database'),WithIAMAccountEmail('mail@company.com')) -
Using custom pool
pool, err := pgxpool.New(ctx, "add_your_connection_string")if err != nil {return err}pEngine, err := NewPostgresEngine(ctx,WithDatabase("db_test"),WithPool(pool))
-
-
Create the Postgres plugin
-
Using plugin method Init
postgres := &postgresql.Postgres{engine: pEngine,}if err := (postgres).Init(ctx, g); err != nil {return err} -
Using the genkit method init
postgres := &postgresql.Postgres{engine: pEngine,}g, err := genkit.Init(ctx, genkit.WithPlugins(postgres))if err != nil {return err}
-
To add documents to a Postgresql index, first create a retrieve definition that specifies the features of the table:
cfg := &postgresql.Config{ TableName: 'documents', SchemaName: 'public', ContentColumn: "content", EmbeddingColumn: "embedding", MetadataColumns: []string{"source", "category"}, IDColumn: "custom_id", MetadataJSONColumn: "custom_metadata", Embedder: embedder, EmbedderOptions: nil,}
doc, retriever, err := postgresql.DefineRetriever(ctx, g, postgres, cfg)if err != nil { return err}
docs := []*ai.Document{{ Content: []*ai.Part{{ Kind: ai.PartText, ContentType: "text/plain", Text: "The product features include...", }}, Metadata: map[string]any{"source": "website", "category": "product-docs", "custom_id": "doc-123"}, }}
if err := doc.Index(ctx, docs); err != nil { return err}
Similarly, to retrieve documents from an index, use the retrieve method:
d2 := ai.DocumentFromText( "The product features include..." , nil)
resp, err := retriever.Retrieve(ctx, &ai.RetrieverRequest{ Query: d2, k:5, filter: "source='website' AND category='product-docs'"})
if err != nil { return err}
It’s also possible to use the Retrieve method from Retriever
_, retriever, err := postgresql.DefineRetriever(ctx, g, postgres, cfg)if err != nil { return err}
d2 := ai.DocumentFromText( "The product features include..." , nil)
retrieverOptions := &postgresql.RetrieverOptions{ k:5, filter: "source='website' AND category='product-docs'"}
resp, err := ai.Retrieve(ctx, retriever,ai.WithDocs(d2), &ai.WithConfig(retrieverOptions))if err != nil { return err}
See the Retrieval-augmented generation page for a general discussion on using retrievers for RAG.