Vertex AI plugin
The Vertex AI plugin provides access to Google Cloud’s enterprise-grade AI platform, offering advanced features beyond basic model access. Use this for enterprise applications that need grounding, Vector Search, Model Garden, or evaluation capabilities.
Accessing Google GenAI Models via Vertex AI
Section titled “Accessing Google GenAI Models via Vertex AI”All languages support accessing Google’s generative AI models (Gemini, Imagen, etc.) through Vertex AI with enterprise authentication and features.
The unified Google GenAI plugin provides access to models via Vertex AI using the VertexAI initializer:
Basic Model Access
Section titled “Basic Model Access”Installation
Section titled “Installation”uv add genkit-plugin-google-genaiConfiguration
Section titled “Configuration”from genkit import Genkitfrom genkit.plugins.google_genai import VertexAI
ai = Genkit( plugins=[ VertexAI(location='us-central1'), # Regional endpoint # VertexAI(location='global'), # Global endpoint ],)Authentication Methods:
- Application Default Credentials (ADC): The standard method for most Vertex AI use cases, especially in production. It uses the credentials from the environment (e.g., service account on GCP, user credentials from
gcloud auth application-default loginlocally). This method requires a Google Cloud Project with billing enabled and the Vertex AI API enabled. - Vertex AI Express Mode: A streamlined way to try out many Vertex AI features using just an API key, without needing to set up billing or full project configurations. This is ideal for quick experimentation and has generous free tier quotas. Learn More about Express Mode.
# Using Vertex AI Express Mode (Easy to start, some limitations)# Get an API key from the Vertex AI Studio Express Mode setup.import osVertexAI(api_key=os.environ.get('VERTEX_EXPRESS_API_KEY'))Note: When using Express Mode, you do not provide project and location in the plugin config.
Basic Usage
Section titled “Basic Usage”from genkit import Genkitfrom genkit.plugins.google_genai import VertexAI
ai = Genkit( plugins=[VertexAI(location='us-central1')],)
response = await ai.generate( model='vertexai/gemini-2.5-pro', prompt='Explain Vertex AI in simple terms.',)
print(response.text)Text Embedding
Section titled “Text Embedding”embeddings = await ai.embed( embedder='vertexai/text-embedding-005', content='Embed this text.',)Image Generation (Imagen)
Section titled “Image Generation (Imagen)”response = await ai.generate( model='vertexai/imagen-3.0-generate-002', prompt='A beautiful watercolor painting of a castle in the mountains.',)
if response.message and response.message.content: media_part = response.message.content[0] generated_image = media_part.root.media.urlThinking Config
Section titled “Thinking Config”Thinking Level (Gemini 3.0)
Section titled “Thinking Level (Gemini 3.0)”response = await ai.generate( model='vertexai/gemini-3-pro-preview', prompt='what is heavier, one kilo of steel or one kilo of feathers', config={ 'thinking_config': { 'thinking_level': 'HIGH', # Or 'LOW' or 'MEDIUM' }, },)Thinking Budget (Gemini 2.5)
Section titled “Thinking Budget (Gemini 2.5)”message = (await ai.generate( model='vertexai/gemini-2.5-pro', prompt='what is heavier, one kilo of steel or one kilo of feathers', config={ 'thinking_config': { 'thinking_budget': 1024, 'include_thoughts': True, }, },)).messageEnterprise Features (Python)
Section titled “Enterprise Features (Python)”The following advanced features are available in Python. Note that some features require additional plugin packages:
Installation for Advanced Features
Section titled “Installation for Advanced Features”Core Vertex AI features (included in genkit-plugin-google-genai):
uv add genkit-plugin-google-genaiModel Garden and Vector Search (requires separate plugin):
uv add genkit-plugin-vertex-aiEvaluation Metrics (requires separate plugin):
uv add genkit-plugin-evaluatorsIf you want to locally run flows that use these plugins, you also need the Google Cloud CLI tool installed.
Configuration for Advanced Features
Section titled “Configuration for Advanced Features”from genkit import Genkitfrom genkit.plugins.google_genai import VertexAI
ai = Genkit( plugins=[VertexAI(location='us-central1')],)The plugin requires you to specify your Google Cloud project ID, the region to which you want to make Vertex API requests, and your Google Cloud project credentials.
- You can specify your Google Cloud project ID either by setting
projectin theVertexAI()configuration or by setting theGCLOUD_PROJECTenvironment variable. If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on),GCLOUD_PROJECTis automatically set to the project ID of the environment. - You can specify the API location either by setting
locationin theVertexAI()configuration or by setting theGCLOUD_LOCATIONenvironment variable. - To provide API credentials, you need to set up Google Cloud Application Default Credentials.
- To specify your credentials:
- If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
- On your local dev environment, do this by running:
- To specify your credentials:
gcloud auth application-default login --project YOUR_PROJECT_ID- For other environments, see the Application Default Credentials docs.
- In addition, make sure the account is granted the Vertex AI User IAM role (
roles/aiplatform.user). See the Vertex AI access control docs.
Grounding
Section titled “Grounding”This plugin supports grounding Gemini text responses using Google Search.
Important: Vertex AI charges a fee for grounding requests in addition to the cost of making LLM requests. See the Vertex AI pricing page and be sure you understand grounding request pricing before you use this feature.
Example:
ai = Genkit( plugins=[VertexAI(location='us-central1')],)
await ai.generate( model='vertexai/gemini-2.5-flash', prompt='What are the latest developments in quantum computing?', config={ 'google_search_retrieval': { 'disable_attribution': True, }, })Context Caching
Section titled “Context Caching”The Vertex AI Genkit plugin supports Context Caching, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.
How to Use Context Caching
Section titled “How to Use Context Caching”To enable context caching, ensure your model supports it. For example, gemini-2.5-flash and gemini-2.0-pro are models that support context caching, and you will have to specify version number 001.
You can define a caching mechanism in your application like this:
from genkit import Message, Part, TextPart, Role
ai = Genkit( plugins=[VertexAI(location='us-central1')],)
llm_response = await ai.generate( messages=[ Message( role=Role.USER, content=[Part(root=TextPart(text='Here is the relevant text from War and Peace.'))], ), Message( role=Role.MODEL, content=[ Part(root=TextPart(text="Based on War and Peace, here is some analysis of Pierre Bezukhov's character.")), ], metadata={ 'cache': { 'ttl_seconds': 300, # Cache this message for 5 minutes }, }, ), ], model='vertexai/gemini-2.5-flash', prompt="Describe Pierre's transformation throughout the novel.",)In this setup:
messages: Allows you to pass conversation history.metadata.cache.ttl_seconds: Specifies the time-to-live (TTL) for caching a specific response.
Example: Leveraging Large Texts with Context
Section titled “Example: Leveraging Large Texts with Context”For applications referencing long documents, such as War and Peace or Lord of the Rings, you can structure your queries to reuse cached contexts:
from pathlib import Pathfrom genkit import Message, Part, TextPart, Role
text_content = Path('path/to/war_and_peace.txt').read_text()
llm_response = await ai.generate( messages=[ Message( role=Role.USER, content=[Part(root=TextPart(text=text_content))], # Include the large text as context ), Message( role=Role.MODEL, content=[ Part(root=TextPart(text='This analysis is based on the provided text from War and Peace.')), ], metadata={ 'cache': { 'ttl_seconds': 300, # Cache the response to avoid reloading the full text }, }, ), ], model='vertexai/gemini-2.5-flash', prompt='Analyze the relationship between Pierre and Natasha.',)Supported models: gemini-2.5-flash-001, gemini-2.0-pro-001
Model Garden Integration
Section titled “Model Garden Integration”Access third-party models through Vertex AI Model Garden using the separate vertex_ai plugin:
Installation:
uv add genkit-plugin-vertex-aiClaude 3 Models
Section titled “Claude 3 Models”from genkit import Genkitfrom genkit.plugins.vertex_ai import ModelGardenPlugin
ai = Genkit( plugins=[ ModelGardenPlugin( location='us-central1', models=['claude-3-haiku', 'claude-3-sonnet', 'claude-3-opus'], ), ],)
response = await ai.generate( model='claude-3-sonnet', prompt='What should I do when I visit Melbourne?',)Llama 3.1 405b
Section titled “Llama 3.1 405b”ai = Genkit( plugins=[ ModelGardenPlugin( location='us-central1', models=['llama3-405b-instruct-maas'], ), ],)
response = await ai.generate( model='llama3-405b-instruct-maas', prompt='Write a function that adds two numbers together',)Mistral Models
Section titled “Mistral Models”ai = Genkit( plugins=[ ModelGardenPlugin( location='us-central1', models=['mistral-large', 'mistral-small'], ), ],)
response = await ai.generate( model='mistral-large', prompt='Explain quantum computing',)Vertex AI provides access to various third-party models through Model Garden. Consult the Vertex AI Model Garden documentation for the full list of supported models and their capabilities.
Evaluation Metrics
Section titled “Evaluation Metrics”Genkit provides evaluation metrics through the evaluators plugin:
Installation:
uv add genkit-plugin-evaluatorsUsage:
from genkit import Genkitfrom genkit.plugins.google_genai import VertexAIfrom genkit.plugins.evaluators import define_genkit_evaluators, GenkitMetricType, MetricConfigfrom genkit.blocks.model import ModelReference
ai = Genkit( plugins=[VertexAI(location='us-central1')],)
# Register evaluators using the evaluators plugindefine_genkit_evaluators( ai, [ MetricConfig( metric_type=GenkitMetricType.FAITHFULNESS, judge=ModelReference(name='vertexai/gemini-2.5-flash'), ), MetricConfig( metric_type=GenkitMetricType.ANSWER_RELEVANCY, judge=ModelReference(name='vertexai/gemini-2.5-flash'), ), MetricConfig( metric_type=GenkitMetricType.MALICIOUSNESS, judge=ModelReference(name='vertexai/gemini-2.5-flash'), ), ],)Available built-in metrics from the evaluators plugin include:
- Answer Relevancy: Measures how relevant the answer is to the question
- Faithfulness: Evaluates if the answer is grounded in the provided context
- Maliciousness: Detects harmful or malicious content
- Regex Match: Pattern-based validation
- Deep Equal: Exact match comparison
See the evaluation documentation for more details on implementing comprehensive evaluation workflows.
Vector Search
Section titled “Vector Search”Vertex AI Vector Search enables efficient similarity search for RAG applications using the separate vertex_ai plugin:
Installation:
uv add genkit-plugin-vertex-ai- Create a Vector Search index in your Google Cloud project
- Deploy the index to an endpoint
- Configure the retriever in your Genkit application
Configuration
Section titled “Configuration”Using Firestore:
from genkit import Genkitfrom genkit.plugins.google_genai import VertexAIfrom genkit.plugins.vertex_ai import define_vertex_vector_search_firestore
ai = Genkit( plugins=[VertexAI(location='us-central1')],)
from google.cloud import firestorefirestore_client = firestore.AsyncClient()
# Define a Firestore-backed retrieverretriever = define_vertex_vector_search_firestore( ai, name='my_retriever', collection_name='documents', embedder='vertexai/text-embedding-004', firestore_client=firestore_client,)Using BigQuery:
from genkit.plugins.vertex_ai import define_vertex_vector_search_big_query
from google.cloud import bigquerybq_client = bigquery.Client()
# Define a BigQuery-backed retrieverretriever = define_vertex_vector_search_big_query( ai, name='bigquery_retriever', dataset_id='my_dataset', table_id='my_table', embedder='vertexai/text-embedding-004', bq_client=bq_client,)# Use the retriever in a RAG flowdocs = await ai.retrieve(retriever='my_retriever', query='your search query')response = await ai.generate( prompt='How do I use the vector search feature?', docs=docs.documents,)See the RAG documentation for comprehensive examples of implementing retrieval-augmented generation with Vertex AI Vector Search.
Next Steps
Section titled “Next Steps”- Learn about generating content to understand how to use these models effectively
- Explore evaluation to leverage Vertex AI’s evaluation metrics
- See RAG to implement retrieval-augmented generation with Vector Search
- Check out creating flows to build structured AI workflows
- For simple API key access, see the Google AI plugin