Cloud Firestore Vector Search
The Firebase plugin provides vector search integration with Cloud Firestore, enabling you to build intelligent RAG (Retrieval-Augmented Generation) applications with scalable document indexing and retrieval.
Installation
Section titled “Installation”Install the Firebase plugin with npm:
npm install @genkit-ai/firebasePrerequisites
Section titled “Prerequisites”Firebase Project Setup
Section titled “Firebase Project Setup”-
All Firebase products require a Firebase project. You can create a new project or enable Firebase in an existing Google Cloud project using the Firebase console.
-
If deploying flows with Cloud Functions, upgrade your Firebase project to the Blaze plan.
Firebase Admin SDK Initialization
Section titled “Firebase Admin SDK Initialization”You must initialize the Firebase Admin SDK in your application. This is not handled automatically by the plugin.
import { initializeApp } from 'firebase-admin/app';
initializeApp({ projectId: 'your-project-id',});The plugin requires you to specify your Firebase project ID. You can specify your Firebase project ID in either of the following ways:
-
Set
projectIdin theinitializeApp()configuration object as shown in the snippet above. -
Set the
GCLOUD_PROJECTenvironment variable. If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on),GCLOUD_PROJECTis automatically set to the project ID of the environment.If you set
GCLOUD_PROJECT, you can omit the configuration parameter ininitializeApp().
Credentials
Section titled “Credentials”To provide Firebase credentials, you also need to set up Google Cloud Application Default Credentials. To specify your credentials:
-
If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
-
For other environments:
- Generate service account credentials for your Firebase project and download the JSON key file. You can do so on the Service account page of the Firebase console.
- Set the environment variable
GOOGLE_APPLICATION_CREDENTIALSto the file path of the JSON file that contains your service account key, or you can set the environment variableGCLOUD_SERVICE_ACCOUNT_CREDSto the content of the JSON file.
Cloud Firestore vector search
Section titled “Cloud Firestore vector search”You can use Cloud Firestore as a vector store for RAG indexing and retrieval.
This section contains information specific to the firebase plugin and Cloud Firestore’s vector search feature. See the Retrieval-augmented generation page for a more detailed discussion on implementing RAG using Genkit.
Using GCLOUD_SERVICE_ACCOUNT_CREDS and Firestore
Section titled “Using GCLOUD_SERVICE_ACCOUNT_CREDS and Firestore”If you are using service account credentials by passing credentials directly via GCLOUD_SERVICE_ACCOUNT_CREDS and are also using Firestore as a vector store, you need to pass credentials directly to the Firestore instance during initialization or the singleton may be initialized with application default credentials depending on plugin initialization order.
import { initializeApp } from 'firebase-admin/app';
import { getFirestore } from "firebase-admin/firestore";
const app = initializeApp();let firestore = getFirestore(app);
if (process.env.GCLOUD_SERVICE_ACCOUNT_CREDS) { const serviceAccountCreds = JSON.parse( process.env.GCLOUD_SERVICE_ACCOUNT_CREDS, ); const authOptions = { credentials: serviceAccountCreds }; firestore.settings(authOptions);}Define a Firestore retriever
Section titled “Define a Firestore retriever”Use defineFirestoreRetriever() to create a retriever for Firestore vector-based queries.
import { defineFirestoreRetriever } from '@genkit-ai/firebase';import { initializeApp } from 'firebase-admin/app';import { getFirestore } from 'firebase-admin/firestore';
const app = initializeApp();const firestore = getFirestore(app);
const retriever = defineFirestoreRetriever(ai, { name: 'exampleRetriever', firestore, collection: 'documents', contentField: 'text', // Field containing document content vectorField: 'embedding', // Field containing vector embeddings embedder: yourEmbedderInstance, // Embedder to generate embeddings distanceMeasure: 'COSINE', // Default is 'COSINE'; other options: 'EUCLIDEAN', 'DOT_PRODUCT'});Retrieve documents
Section titled “Retrieve documents”To retrieve documents using the defined retriever, pass the retriever instance and query options to ai.retrieve.
const docs = await ai.retrieve({ retriever, query: 'search query', options: { limit: 5, // Options: Return up to 5 documents where: { category: 'example' }, // Optional: Filter by field-value pairs collection: 'alternativeCollection', // Optional: Override default collection },});Available Retrieval Options
Section titled “Available Retrieval Options”The following options can be passed to the options field in ai.retrieve:
-
limit: (number) Specify the maximum number of documents to retrieve. Default is10. -
where: (Record<string, any>) Add additional filters based on Firestore fields. Example:
where: { category: 'news', status: 'published' }collection: (string) Override the default collection specified in the retriever configuration. This is useful for querying subcollections or dynamically switching between collections.
Populate Firestore with Embeddings
Section titled “Populate Firestore with Embeddings”To populate your Firestore collection, use an embedding generator along with the Admin SDK. For example, the menu ingestion script from the Retrieval-augmented generation page could be adapted for Firestore in the following way:
import { genkit } from 'genkit';import { vertexAI } from "@genkit-ai/vertexai";
import { applicationDefault, initializeApp } from "firebase-admin/app";import { FieldValue, getFirestore } from "firebase-admin/firestore";
import { chunk } from "llm-chunk";import pdf from "pdf-parse";
import { readFile } from "fs/promises";import path from "path";
// Change these values to match your Firestore config/schemaconst indexConfig = { collection: "menuInfo", contentField: "text", vectorField: "embedding", embedder: vertexAI.embedder('gemini-embedding-001', { outputDimensionality: 2048 }),};
const ai = genkit({ plugins: [vertexAI({ location: "us-central1" })],});
const app = initializeApp({ credential: applicationDefault() });const firestore = getFirestore(app);
export async function indexMenu(filePath: string) { filePath = path.resolve(filePath);
// Read the PDF. const pdfTxt = await extractTextFromPdf(filePath);
// Divide the PDF text into segments. const chunks = await chunk(pdfTxt);
// Add chunks to the index. await indexToFirestore(chunks);}
async function indexToFirestore(data: string[]) { for (const text of data) { const embedding = (await ai.embed({ embedder: indexConfig.embedder, content: text, }))[0].embedding; await firestore.collection(indexConfig.collection).add({ [indexConfig.vectorField]: FieldValue.vector(embedding), [indexConfig.contentField]: text, }); }}
async function extractTextFromPdf(filePath: string) { const pdfFile = path.resolve(filePath); const dataBuffer = await readFile(pdfFile); const data = await pdf(dataBuffer); return data.text;}Firestore depends on indexes to provide fast and efficient querying on collections. (Note that “index” here refers to database indexes, and not Genkit’s indexer and retriever abstractions.)
The prior example requires the embedding field to be indexed to work. To create the index:
-
Run the
gcloudcommand described in the Create a single-field vector index section of the Firestore docs.The command looks like the following:
gcloud firestore indexes composite create --project=your-project-id \ --collection-group=yourCollectionName --query-scope=COLLECTION \ --field-config=vector-config='{"dimension":"2048","flat": "{}"}',field-path=yourEmbeddingFieldHowever, the correct indexing configuration depends on the queries you make and the embedding model you’re using.
- Alternatively, call
ai.retrieve()and Firestore will throw an error with the correct command to create the index.
Deploy flows as Cloud Functions
Section titled “Deploy flows as Cloud Functions”To deploy a flow with Cloud Functions, use the Firebase Functions library’s built-in support for Genkit. The onCallGenkit method lets you create a callable function from a flow. It automatically supports streaming and JSON requests. You can use the Cloud Functions client SDKs to call them.
import { onCallGenkit } from 'firebase-functions/https';import { defineSecret } from 'firebase-functions/params';
const apiKey = defineSecret("apiKey");
export const exampleFlow = ai.defineFlow( { name: 'exampleFlow', }, async (prompt) => { // Flow logic goes here.
return response; },);
// WARNING: This has no authentication or app check protections.// See genkit.dev/js/auth for more information.export const example = onCallGenkit({ secrets: [apiKey] }, exampleFlow);Deploy your flow using the Firebase CLI:
firebase deploy --only functionsLearn more
Section titled “Learn more”- See the Retrieval-augmented generation page for a general discussion on indexers and retrievers in Genkit.
- See Search with vector embeddings in the Cloud Firestore docs for more on the vector search feature.