Skip to content

Cloud Firestore Vector Search

The Firebase plugin provides vector search integration with Cloud Firestore, enabling you to build intelligent RAG (Retrieval-Augmented Generation) applications with scalable document indexing and retrieval.

Install the Firebase plugin with npm:

Terminal window
npm install @genkit-ai/firebase
  1. All Firebase products require a Firebase project. You can create a new project or enable Firebase in an existing Google Cloud project using the Firebase console.

  2. If deploying flows with Cloud Functions, upgrade your Firebase project to the Blaze plan.

You must initialize the Firebase Admin SDK in your application. This is not handled automatically by the plugin.

import { initializeApp } from 'firebase-admin/app';
initializeApp({
projectId: 'your-project-id',
});

The plugin requires you to specify your Firebase project ID. You can specify your Firebase project ID in either of the following ways:

  • Set projectId in the initializeApp() configuration object as shown in the snippet above.

  • Set the GCLOUD_PROJECT environment variable. If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), GCLOUD_PROJECT is automatically set to the project ID of the environment.

    If you set GCLOUD_PROJECT, you can omit the configuration parameter in initializeApp().

To provide Firebase credentials, you also need to set up Google Cloud Application Default Credentials. To specify your credentials:

  • If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.

  • For other environments:

    1. Generate service account credentials for your Firebase project and download the JSON key file. You can do so on the Service account page of the Firebase console.
    2. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key, or you can set the environment variable GCLOUD_SERVICE_ACCOUNT_CREDS to the content of the JSON file.

You can use Cloud Firestore as a vector store for RAG indexing and retrieval.

This section contains information specific to the firebase plugin and Cloud Firestore’s vector search feature. See the Retrieval-augmented generation page for a more detailed discussion on implementing RAG using Genkit.

Using GCLOUD_SERVICE_ACCOUNT_CREDS and Firestore

Section titled “Using GCLOUD_SERVICE_ACCOUNT_CREDS and Firestore”

If you are using service account credentials by passing credentials directly via GCLOUD_SERVICE_ACCOUNT_CREDS and are also using Firestore as a vector store, you need to pass credentials directly to the Firestore instance during initialization or the singleton may be initialized with application default credentials depending on plugin initialization order.

import { initializeApp } from 'firebase-admin/app';
import { getFirestore } from "firebase-admin/firestore";
const app = initializeApp();
let firestore = getFirestore(app);
if (process.env.GCLOUD_SERVICE_ACCOUNT_CREDS) {
const serviceAccountCreds = JSON.parse(
process.env.GCLOUD_SERVICE_ACCOUNT_CREDS,
);
const authOptions = { credentials: serviceAccountCreds };
firestore.settings(authOptions);
}

Use defineFirestoreRetriever() to create a retriever for Firestore vector-based queries.

import { defineFirestoreRetriever } from '@genkit-ai/firebase';
import { initializeApp } from 'firebase-admin/app';
import { getFirestore } from 'firebase-admin/firestore';
const app = initializeApp();
const firestore = getFirestore(app);
const retriever = defineFirestoreRetriever(ai, {
name: 'exampleRetriever',
firestore,
collection: 'documents',
contentField: 'text', // Field containing document content
vectorField: 'embedding', // Field containing vector embeddings
embedder: yourEmbedderInstance, // Embedder to generate embeddings
distanceMeasure: 'COSINE', // Default is 'COSINE'; other options: 'EUCLIDEAN', 'DOT_PRODUCT'
});

To retrieve documents using the defined retriever, pass the retriever instance and query options to ai.retrieve.

const docs = await ai.retrieve({
retriever,
query: 'search query',
options: {
limit: 5, // Options: Return up to 5 documents
where: { category: 'example' }, // Optional: Filter by field-value pairs
collection: 'alternativeCollection', // Optional: Override default collection
},
});

The following options can be passed to the options field in ai.retrieve:

  • limit: (number) Specify the maximum number of documents to retrieve. Default is 10.

  • where: (Record<string, any>) Add additional filters based on Firestore fields. Example:

where: { category: 'news', status: 'published' }
  • collection: (string) Override the default collection specified in the retriever configuration. This is useful for querying subcollections or dynamically switching between collections.

To populate your Firestore collection, use an embedding generator along with the Admin SDK. For example, the menu ingestion script from the Retrieval-augmented generation page could be adapted for Firestore in the following way:

import { genkit } from 'genkit';
import { vertexAI } from "@genkit-ai/vertexai";
import { applicationDefault, initializeApp } from "firebase-admin/app";
import { FieldValue, getFirestore } from "firebase-admin/firestore";
import { chunk } from "llm-chunk";
import pdf from "pdf-parse";
import { readFile } from "fs/promises";
import path from "path";
// Change these values to match your Firestore config/schema
const indexConfig = {
collection: "menuInfo",
contentField: "text",
vectorField: "embedding",
embedder: vertexAI.embedder('gemini-embedding-001', {
outputDimensionality: 2048
}),
};
const ai = genkit({
plugins: [vertexAI({ location: "us-central1" })],
});
const app = initializeApp({ credential: applicationDefault() });
const firestore = getFirestore(app);
export async function indexMenu(filePath: string) {
filePath = path.resolve(filePath);
// Read the PDF.
const pdfTxt = await extractTextFromPdf(filePath);
// Divide the PDF text into segments.
const chunks = await chunk(pdfTxt);
// Add chunks to the index.
await indexToFirestore(chunks);
}
async function indexToFirestore(data: string[]) {
for (const text of data) {
const embedding = (await ai.embed({
embedder: indexConfig.embedder,
content: text,
}))[0].embedding;
await firestore.collection(indexConfig.collection).add({
[indexConfig.vectorField]: FieldValue.vector(embedding),
[indexConfig.contentField]: text,
});
}
}
async function extractTextFromPdf(filePath: string) {
const pdfFile = path.resolve(filePath);
const dataBuffer = await readFile(pdfFile);
const data = await pdf(dataBuffer);
return data.text;
}

Firestore depends on indexes to provide fast and efficient querying on collections. (Note that “index” here refers to database indexes, and not Genkit’s indexer and retriever abstractions.)

The prior example requires the embedding field to be indexed to work. To create the index:

Terminal window
gcloud firestore indexes composite create --project=your-project-id \
--collection-group=yourCollectionName --query-scope=COLLECTION \
--field-config=vector-config='{"dimension":"2048","flat": "{}"}',field-path=yourEmbeddingField

However, the correct indexing configuration depends on the queries you make and the embedding model you’re using.

  • Alternatively, call ai.retrieve() and Firestore will throw an error with the correct command to create the index.

To deploy a flow with Cloud Functions, use the Firebase Functions library’s built-in support for Genkit. The onCallGenkit method lets you create a callable function from a flow. It automatically supports streaming and JSON requests. You can use the Cloud Functions client SDKs to call them.

import { onCallGenkit } from 'firebase-functions/https';
import { defineSecret } from 'firebase-functions/params';
const apiKey = defineSecret("apiKey");
export const exampleFlow = ai.defineFlow(
{
name: 'exampleFlow',
},
async (prompt) => {
// Flow logic goes here.
return response;
},
);
// WARNING: This has no authentication or app check protections.
// See genkit.dev/js/auth for more information.
export const example = onCallGenkit({ secrets: [apiKey] }, exampleFlow);

Deploy your flow using the Firebase CLI:

Terminal window
firebase deploy --only functions