Cloud Firestore Vector Search
The Firebase plugin provides vector search integration with Cloud Firestore, enabling you to build intelligent RAG (Retrieval-Augmented Generation) applications with scalable document indexing and retrieval.
Installation
Section titled “Installation”Install the Firebase plugin with npm:
npm install @genkit-ai/firebase
Prerequisites
Section titled “Prerequisites”Firebase Project Setup
Section titled “Firebase Project Setup”-
All Firebase products require a Firebase project. You can create a new project or enable Firebase in an existing Google Cloud project using the Firebase console.
-
If deploying flows with Cloud Functions, upgrade your Firebase project to the Blaze plan.
Firebase Admin SDK Initialization
Section titled “Firebase Admin SDK Initialization”You must initialize the Firebase Admin SDK in your application. This is not handled automatically by the plugin.
import { initializeApp } from 'firebase-admin/app';
initializeApp({ projectId: 'your-project-id',});
The plugin requires you to specify your Firebase project ID. You can specify your Firebase project ID in either of the following ways:
-
Set
projectId
in theinitializeApp()
configuration object as shown in the snippet above. -
Set the
GCLOUD_PROJECT
environment variable. If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on),GCLOUD_PROJECT
is automatically set to the project ID of the environment.If you set
GCLOUD_PROJECT
, you can omit the configuration parameter ininitializeApp()
.
Credentials
Section titled “Credentials”To provide Firebase credentials, you also need to set up Google Cloud Application Default Credentials. To specify your credentials:
-
If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
-
For other environments:
- Generate service account credentials for your Firebase project and download the JSON key file. You can do so on the Service account page of the Firebase console.
- Set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
to the file path of the JSON file that contains your service account key, or you can set the environment variableGCLOUD_SERVICE_ACCOUNT_CREDS
to the content of the JSON file.
Cloud Firestore vector search
Section titled “Cloud Firestore vector search”You can use Cloud Firestore as a vector store for RAG indexing and retrieval.
This section contains information specific to the firebase
plugin and Cloud Firestore’s vector search feature. See the Retrieval-augmented generation page for a more detailed discussion on implementing RAG using Genkit.
Using GCLOUD_SERVICE_ACCOUNT_CREDS
and Firestore
Section titled “Using GCLOUD_SERVICE_ACCOUNT_CREDS and Firestore”If you are using service account credentials by passing credentials directly via GCLOUD_SERVICE_ACCOUNT_CREDS
and are also using Firestore as a vector store, you need to pass credentials directly to the Firestore instance during initialization or the singleton may be initialized with application default credentials depending on plugin initialization order.
import { initializeApp } from 'firebase-admin/app';import { getFirestore } from 'firebase-admin/firestore';
const app = initializeApp();let firestore = getFirestore(app);
if (process.env.GCLOUD_SERVICE_ACCOUNT_CREDS) { const serviceAccountCreds = JSON.parse(process.env.GCLOUD_SERVICE_ACCOUNT_CREDS); const authOptions = { credentials: serviceAccountCreds }; firestore.settings(authOptions);}
Define a Firestore retriever
Section titled “Define a Firestore retriever”Use defineFirestoreRetriever()
to create a retriever for Firestore vector-based queries.
import { defineFirestoreRetriever } from '@genkit-ai/firebase';import { initializeApp } from 'firebase-admin/app';import { getFirestore } from 'firebase-admin/firestore';
const app = initializeApp();const firestore = getFirestore(app);
const retriever = defineFirestoreRetriever(ai, { name: 'exampleRetriever', firestore, collection: 'documents', contentField: 'text', // Field containing document content vectorField: 'embedding', // Field containing vector embeddings embedder: yourEmbedderInstance, // Embedder to generate embeddings distanceMeasure: 'COSINE', // Default is 'COSINE'; other options: 'EUCLIDEAN', 'DOT_PRODUCT'});
Retrieve documents
Section titled “Retrieve documents”To retrieve documents using the defined retriever, pass the retriever instance and query options to ai.retrieve
.
const docs = await ai.retrieve({ retriever, query: 'search query', options: { limit: 5, // Options: Return up to 5 documents where: { category: 'example' }, // Optional: Filter by field-value pairs collection: 'alternativeCollection', // Optional: Override default collection },});
Available Retrieval Options
Section titled “Available Retrieval Options”The following options can be passed to the options
field in ai.retrieve
:
-
limit
: (number) Specify the maximum number of documents to retrieve. Default is10
. -
where
: (Record<string, any>) Add additional filters based on Firestore fields. Example:where: { category: 'news', status: 'published' } -
collection
: (string) Override the default collection specified in the retriever configuration. This is useful for querying subcollections or dynamically switching between collections.
Populate Firestore with Embeddings
Section titled “Populate Firestore with Embeddings”To populate your Firestore collection, use an embedding generator along with the Admin SDK. For example, the menu ingestion script from the Retrieval-augmented generation page could be adapted for Firestore in the following way:
import { genkit } from 'genkit';import { vertexAI } from "@genkit-ai/vertexai";
import { applicationDefault, initializeApp } from "firebase-admin/app";import { FieldValue, getFirestore } from "firebase-admin/firestore";
import { chunk } from "llm-chunk";import pdf from "pdf-parse";
import { readFile } from "fs/promises";import path from "path";
// Change these values to match your Firestore config/schemaconst indexConfig = { collection: "menuInfo", contentField: "text", vectorField: "embedding", embedder: vertexAI.embedder('gemini-embedding-001'),};
const ai = genkit({ plugins: [vertexAI({ location: "us-central1" })],});
const app = initializeApp({ credential: applicationDefault() });const firestore = getFirestore(app);
export async function indexMenu(filePath: string) { filePath = path.resolve(filePath);
// Read the PDF. const pdfTxt = await extractTextFromPdf(filePath);
// Divide the PDF text into segments. const chunks = await chunk(pdfTxt);
// Add chunks to the index. await indexToFirestore(chunks);}
async function indexToFirestore(data: string[]) { for (const text of data) { const embedding = (await ai.embed({ embedder: indexConfig.embedder, content: text, }))[0].embedding; await firestore.collection(indexConfig.collection).add({ [indexConfig.vectorField]: FieldValue.vector(embedding), [indexConfig.contentField]: text, }); }}
async function extractTextFromPdf(filePath: string) { const pdfFile = path.resolve(filePath); const dataBuffer = await readFile(pdfFile); const data = await pdf(dataBuffer); return data.text;}
Firestore depends on indexes to provide fast and efficient querying on collections. (Note that “index” here refers to database indexes, and not Genkit’s indexer and retriever abstractions.)
The prior example requires the embedding
field to be indexed to work. To create the index:
-
Run the
gcloud
command described in the Create a single-field vector index section of the Firestore docs.The command looks like the following:
Terminal window gcloud alpha firestore indexes composite create --project=your-project-id \--collection-group=yourCollectionName --query-scope=COLLECTION \--field-config=vector-config='{"dimension":"768","flat": "{}"}',field-path=yourEmbeddingFieldHowever, the correct indexing configuration depends on the queries you make and the embedding model you’re using.
-
Alternatively, call
ai.retrieve()
and Firestore will throw an error with the correct command to create the index.
Deploy flows as Cloud Functions
Section titled “Deploy flows as Cloud Functions”To deploy a flow with Cloud Functions, use the Firebase Functions library’s built-in support for genkit. The onCallGenkit
method lets you create a callable function from a flow. It automatically supports streaming and JSON requests. You can use the Cloud Functions client SDKs to call them.
import { onCallGenkit } from 'firebase-functions/https';import { defineSecret } from 'firebase-functions/params';
export const exampleFlow = ai.defineFlow( { name: 'exampleFlow', }, async (prompt) => { // Flow logic goes here.
return response; },);
// WARNING: This has no authentication or app check protections.// See genkit.dev/js/auth for more information.export const example = onCallGenkit({ secrets: [apiKey] }, exampleFlow);
Deploy your flow using the Firebase CLI:
firebase deploy --only functions
Learn more
Section titled “Learn more”- See the Retrieval-augmented generation page for a general discussion on indexers and retrievers in Genkit.
- See Search with vector embeddings in the Cloud Firestore docs for more on the vector search feature.
The Firebase plugin provides integration with Firebase services for Genkit applications. It enables you to use Firebase Firestore as a vector database for retrieval-augmented generation (RAG) applications by defining retrievers.
Prerequisites
Section titled “Prerequisites”This plugin requires:
- A Firebase project - Create one at the Firebase Console
- Firestore database enabled in your Firebase project
- Firebase credentials configured for your application
Firebase Setup
Section titled “Firebase Setup”- Create a Firebase project at Firebase Console
- Enable Firestore in your project:
- Go to Firestore Database in the Firebase console
- Click “Create database”
- Choose your security rules and location
- Set up authentication using one of these methods:
- For local development:
firebase login
andfirebase use <project-id>
- For production: Service account key or Application Default Credentials
- For local development:
Configuration
Section titled “Configuration”Basic Configuration
Section titled “Basic Configuration”To use this plugin, import the firebase
package and initialize it with your project:
import "github.com/firebase/genkit/go/plugins/firebase"
// Option 1: Using project ID (recommended)firebasePlugin := &firebase.Firebase{ ProjectId: "your-firebase-project-id",}
g := genkit.Init(context.Background(), genkit.WithPlugins(firebasePlugin))
Environment Variable Configuration
Section titled “Environment Variable Configuration”You can also configure the project ID using environment variables:
export FIREBASE_PROJECT_ID=your-firebase-project-id
// Plugin will automatically use FIREBASE_PROJECT_ID environment variablefirebasePlugin := &firebase.Firebase{}g, err := genkit.Init(context.Background(), genkit.WithPlugins(firebasePlugin))
Advanced Configuration
Section titled “Advanced Configuration”For advanced use cases, you can provide a pre-configured Firebase app:
import firebasev4 "firebase.google.com/go/v4"
// Create Firebase app with custom configurationapp, err := firebasev4.NewApp(ctx, &firebasev4.Config{ ProjectID: "your-project-id", // Additional Firebase configuration options})if err != nil { log.Fatal(err)}
firebasePlugin := &firebase.Firebase{ App: app,}
Defining Firestore Retrievers
Section titled “Defining Firestore Retrievers”The primary use case for the Firebase plugin is creating retrievers for RAG applications:
// Define a Firestore retrieverretrieverOptions := firebase.RetrieverOptions{ Name: "my-documents", Collection: "documents", VectorField: "embedding", EmbedderName: "text-embedding-3-small", TopK: 10,}
retriever, err := firebase.DefineRetriever(ctx, g, retrieverOptions)if err != nil { log.Fatal(err)}
Using Retrievers in RAG Workflows
Section titled “Using Retrievers in RAG Workflows”Once defined, you can use the retriever in your RAG workflows:
// Retrieve relevant documentsresults, err := ai.Retrieve(ctx, retriever, ai.WithDocs("What is machine learning?"))if err != nil { log.Fatal(err)}
// Use retrieved documents in generationvar contextDocs []stringfor _, doc := range results.Documents { contextDocs = append(contextDocs, doc.Content[0].Text)}
context := strings.Join(contextDocs, "\n\n")resp, err := genkit.Generate(ctx, g, ai.WithPrompt(fmt.Sprintf("Context: %s\n\nQuestion: %s", context, "What is machine learning?")),)
Complete RAG Example
Section titled “Complete RAG Example”Here’s a complete example showing how to set up a RAG system with Firebase:
package main
import ( "context" "fmt" "log" "strings"
"github.com/firebase/genkit/go/ai" "github.com/firebase/genkit/go/genkit" "github.com/firebase/genkit/go/plugins/firebase" "github.com/firebase/genkit/go/plugins/compat_oai/openai")
func main() { ctx := context.Background()
// Initialize plugins firebasePlugin := &firebase.Firebase{ ProjectId: "my-firebase-project", }
openaiPlugin := &openai.OpenAI{ APIKey: "your-openai-api-key", }
g, err := genkit.Init(ctx, genkit.WithPlugins(firebasePlugin, openaiPlugin)) if err != nil { log.Fatal(err) }
// Define retriever for knowledge base retriever, err := firebase.DefineRetriever(ctx, g, firebase.RetrieverOptions{ Name: "knowledge-base", Collection: "documents", VectorField: "embedding", EmbedderName: "text-embedding-3-small", TopK: 5, }) if err != nil { log.Fatal(err) }
// RAG query function query := "How does machine learning work?"
// Step 1: Retrieve relevant documents retrievalResults, err := ai.Retrieve(ctx, retriever, ai.WithDocs(query)) if err != nil { log.Fatal(err) }
// Step 2: Prepare context from retrieved documents var contextParts []string for _, doc := range retrievalResults.Documents { contextParts = append(contextParts, doc.Content[0].Text) } context := strings.Join(contextParts, "\n\n")
// Step 3: Generate answer with context model := openaiPlugin.Model(g, "gpt-4o") response, err := genkit.Generate(ctx, g, ai.WithModel(model), ai.WithPrompt(fmt.Sprintf(`Based on the following context, answer the question:
Context:%s
Question: %s
Answer:`, context, query)), ) if err != nil { log.Fatal(err) }
fmt.Printf("Answer: %s\n", response.Text())}
Firestore Data Structure
Section titled “Firestore Data Structure”Document Storage Format
Section titled “Document Storage Format”Your Firestore documents should follow this structure for optimal retrieval:
{ "content": "Your document text content here...", "embedding": [0.1, -0.2, 0.3, ...], "metadata": { "title": "Document Title", "author": "Author Name", "category": "Technology", "timestamp": "2024-01-15T10:30:00Z" }}
Indexing Documents
Section titled “Indexing Documents”To add documents to your Firestore collection with embeddings:
// Example of adding documents with embeddingsembedder := openaiPlugin.Embedder(g, "text-embedding-3-small")
documents := []struct { Content string Metadata map[string]interface{}}{ { Content: "Machine learning is a subset of artificial intelligence...", Metadata: map[string]interface{}{ "title": "Introduction to ML", "category": "Technology", }, }, // More documents...}
for _, doc := range documents { // Generate embedding embeddingResp, err := ai.Embed(ctx, embedder, ai.WithDocs(doc.Content)) if err != nil { log.Fatal(err) }
// Store in Firestore firestoreClient, _ := firebasePlugin.App.Firestore(ctx) _, err = firestoreClient.Collection("documents").Doc().Set(ctx, map[string]interface{}{ "content": doc.Content, "embedding": embeddingResp.Embeddings[0].Embedding, "metadata": doc.Metadata, }) if err != nil { log.Fatal(err) }}
Configuration Options
Section titled “Configuration Options”Firebase struct
Section titled “Firebase struct”type Firebase struct { // ProjectId is your Firebase project ID // If empty, uses FIREBASE_PROJECT_ID environment variable ProjectId string
// App is a pre-configured Firebase app instance // Use either ProjectId or App, not both App *firebasev4.App}
RetrieverOptions
Section titled “RetrieverOptions”type RetrieverOptions struct { // Name is a unique identifier for the retriever Name string
// Collection is the Firestore collection name containing documents Collection string
// VectorField is the field name containing the embedding vectors VectorField string
// EmbedderName is the name of the embedder to use for query vectorization EmbedderName string
// TopK is the number of top similar documents to retrieve TopK int
// Additional filtering and configuration options}
Authentication
Section titled “Authentication”Local Development
Section titled “Local Development”For local development, use the Firebase CLI:
# Install Firebase CLInpm install -g firebase-tools
# Login and set projectfirebase loginfirebase use your-project-id
Production Deployment
Section titled “Production Deployment”For production, use one of these authentication methods:
Service Account Key
Section titled “Service Account Key”import "google.golang.org/api/option"
app, err := firebasev4.NewApp(ctx, &firebasev4.Config{ ProjectID: "your-project-id",}, option.WithCredentialsFile("path/to/serviceAccountKey.json"))
Application Default Credentials
Section titled “Application Default Credentials”Set the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/serviceAccountKey.json"
Or use the metadata server on Google Cloud Platform.
Error Handling
Section titled “Error Handling”Handle Firebase-specific errors appropriately:
retriever, err := firebase.DefineRetriever(ctx, g, options)if err != nil { if strings.Contains(err.Error(), "plugin not found") { log.Fatal("Firebase plugin not initialized. Make sure to include it in genkit.Init()") } log.Fatalf("Failed to create retriever: %v", err)}
// Handle retrieval errorsresults, err := ai.Retrieve(ctx, retriever, ai.WithDocs(query))if err != nil { log.Printf("Retrieval failed: %v", err) // Implement fallback logic}
Best Practices
Section titled “Best Practices”Performance Optimization
Section titled “Performance Optimization”- Batch Operations: Use Firestore batch writes when adding multiple documents
- Index Configuration: Set up appropriate Firestore indexes for your queries
- Caching: Implement caching for frequently accessed documents
- Pagination: Use pagination for large result sets
Security
Section titled “Security”- Firestore Rules: Configure proper security rules for your collections
- API Keys: Never expose Firebase configuration in client-side code
- Authentication: Implement proper user authentication for sensitive data
Cost Management
Section titled “Cost Management”- Document Size: Keep documents reasonably sized to minimize read costs
- Query Optimization: Design efficient queries to reduce operation costs
- Storage Management: Regularly clean up unused documents and embeddings
Integration Examples
Section titled “Integration Examples”With Multiple Embedders
Section titled “With Multiple Embedders”// Use different embedders for different types of contenttechnicalRetriever, err := firebase.DefineRetriever(ctx, g, firebase.RetrieverOptions{ Name: "technical-docs", Collection: "technical_documents", VectorField: "embedding", EmbedderName: "text-embedding-3-large", // More accurate for technical content TopK: 5,})
generalRetriever, err := firebase.DefineRetriever(ctx, g, firebase.RetrieverOptions{ Name: "general-knowledge", Collection: "general_documents", VectorField: "embedding", EmbedderName: "text-embedding-3-small", // Faster for general content TopK: 10,})
With Flows
Section titled “With Flows”ragFlow := genkit.DefineFlow(g, "rag-qa", func(ctx context.Context, query string) (string, error) { // Retrieve context results, err := ai.Retrieve(ctx, retriever, ai.WithDocs(query)) if err != nil { return "", err }
// Generate response response, err := genkit.Generate(ctx, g, ai.WithPrompt(buildPromptWithContext(query, results)), ) if err != nil { return "", err }
return response.Text(), nil})
Firestore Vector Store
Section titled “Firestore Vector Store”The Firestore plugin provides retriever implementations that use Google Cloud Firestore as a vector store.
Installation
Section titled “Installation”pip3 install genkit-plugin-firebase
Prerequisites
Section titled “Prerequisites”- A Firebase project with Cloud Firestore enabled.
- The
genkit
package installed. gcloud
CLI for managing credentials and Firestore indexes.
Configuration
Section titled “Configuration”To use this plugin, specify it when you initialize Genkit:
from genkit.ai import Genkitfrom genkit.plugins.firebase.firestore import FirestoreVectorStorefrom genkit.plugins.google_genai import VertexAI # Assuming VertexAI provides the embedderfrom google.cloud import firestore
# Ensure you have authenticated with gcloud and set the projectfirestore_client = firestore.Client()
ai = Genkit( plugins=[ VertexAI(), # Ensure the embedder's plugin is loaded FirestoreVectorStore( name='my_firestore_retriever', collection='my_collection', # Replace with your collection name vector_field='embedding', content_field='text', embedder='vertexai/text-embedding-004', # Example embedder firestore_client=firestore_client, ), ] # Define a default model if needed # model='vertexai/gemini-1.5-flash',)
Configuration Options
Section titled “Configuration Options”- name (str): A unique name for this retriever instance.
- collection (str): The name of the Firestore collection to query.
- vector_field (str): The name of the field in the Firestore documents that contains the vector embedding.
- content_field (str): The name of the field in the Firestore documents that contains the text content.
- embedder (str): The name of the embedding model to use. Must match a configured embedder in your Genkit project.
- firestore_client: A
google.cloud.firestore.Client
object that will be used for all queries to the vectorstore.
-
Create a Firestore Client:
from google.cloud import firestore# Ensure you have authenticated with gcloud and set the projectfirestore_client = firestore.Client() -
Define a Firestore Retriever:
from genkit.ai import Genkitfrom genkit.plugins.firebase.firestore import FirestoreVectorStorefrom genkit.plugins.google_genai import VertexAI # Assuming VertexAI provides the embedderfrom google.cloud import firestore# Assuming firestore_client is already created# firestore_client = firestore.Client()ai = Genkit(plugins=[VertexAI(), # Ensure the embedder's plugin is loadedFirestoreVectorStore(name='my_firestore_retriever',collection='my_collection', # Replace with your collection namevector_field='embedding',content_field='text',embedder='vertexai/text-embedding-004', # Example embedderfirestore_client=firestore_client,),]# Define a default model if needed# model='vertexai/gemini-1.5-flash',) -
Retrieve Documents:
from genkit.ai import Document # Import Document# Assuming 'ai' is configured as aboveasync def retrieve_documents():# Note: ai.retrieve expects a Document object for the queryquery_doc = Document.from_text("What are the main topics?")return await ai.retrieve(query=query_doc,retriever='my_firestore_retriever', # Matches the 'name' in FirestoreVectorStore config)# Example of calling the async function# import asyncio# retrieved_docs = asyncio.run(retrieve_documents())# print(retrieved_docs)
Populating the Index
Section titled “Populating the Index”Before you can retrieve documents, you need to populate your Firestore collection with data and their corresponding vector embeddings. Here’s how you can do it:
-
Prepare your Data: Organize your data into documents. Each document should have at least two fields: a
text
field containing the content you want to retrieve, and anembedding
field that holds the vector embedding of the content. You can add any other metadata as well. -
Generate Embeddings: Use the same embedding model configured in your
FirestoreVectorStore
to generate vector embeddings for your text content. Theai.embed()
method can be used. -
Upload Documents to Firestore: Use the Firestore client to upload the documents with their embeddings to the specified collection.
Here’s an example of how to index data:
from genkit.ai import Document, Genkit # Import Genkit and Documentfrom genkit.types import TextPartfrom google.cloud import firestore # Import firestore
# Assuming 'ai' is configured with VertexAI and FirestoreVectorStore plugins# Assuming 'firestore_client' is an initialized firestore.Client() instance
async def index_documents(documents: list[str], collection_name: str): """Indexes the documents in Firestore.""" genkit_documents = [Document(content=[TextPart(text=doc)]) for doc in documents] # Ensure the embedder name matches the one configured in Genkit embed_response = await ai.embed(embedder='vertexai/text-embedding-004', content=genkit_documents) # Use 'content' parameter embeddings = [emb.embedding for emb in embed_response.embeddings]
for i, document_text in enumerate(documents): doc_id = f'doc-{i + 1}' embedding = embeddings[i]
doc_ref = firestore_client.collection(collection_name).document(doc_id) result = doc_ref.set({ 'text': document_text, 'embedding': embedding, # Ensure this field name matches 'vector_field' in config 'metadata': f'metadata for doc {i + 1}', }) print(f"Indexed document {doc_id}") # Optional: print progress
# Example Usage# documents = [# "This is document one.",# "This is document two.",# "This is document three.",# ]# import asyncio# asyncio.run(index_documents(documents, 'my_collection')) # Replace 'my_collection' with your actual collection name
Creating a Firestore Index
Section titled “Creating a Firestore Index”To enable vector similarity search you will need to configure the index in your Firestore database. Use the following command:
gcloud firestore indexes composite create \ --project=<YOUR_FIREBASE_PROJECT_ID> \ --collection-group=<YOUR_COLLECTION_NAME> \ --query-scope=COLLECTION \ --field-config=vector-config='{"dimension":<YOUR_DIMENSION_COUNT>,"flat": {}}',field-path=<YOUR_VECTOR_FIELD>
- Replace
<YOUR_FIREBASE_PROJECT_ID>
with the ID of your Firebase project. - Replace
<YOUR_COLLECTION_NAME>
with the name of your Firestore collection (e.g.,my_collection
). - Replace
<YOUR_DIMENSION_COUNT>
with the correct dimension for your embedding model. Common values are:768
fortext-embedding-004
(Vertex AI)
- Replace
<YOUR_VECTOR_FIELD>
with the name of the field containing vector embeddings (e.g.,embedding
).