Vertex AI plugin

The Vertex AI plugin provides access to Google Cloud’s enterprise-grade AI platform, offering advanced features beyond basic model access. Use this for enterprise applications that need grounding, Vector Search, Model Garden, or evaluation capabilities.

Accessing Google GenAI Models via Vertex AI

All languages support accessing Google’s generative AI models (Gemini, Imagen, etc.) through Vertex AI with enterprise authentication and features.

The unified Google GenAI plugin provides access to models via Vertex AI using the VertexAI initializer:

Basic Model Access

Installation

uv add genkit-plugin-google-genai

Configuration

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

ai = Genkit(
    plugins=[
        VertexAI(location='us-central1'),  # Regional endpoint
        # VertexAI(location='global'),      # Global endpoint
    ],
)

Authentication Methods:

Application Default Credentials (ADC): The standard method for most Vertex AI use cases, especially in production. It uses the credentials from the environment (e.g., service account on GCP, user credentials from gcloud auth application-default login locally). This method requires a Google Cloud Project with billing enabled and the Vertex AI API enabled.
Vertex AI Express Mode: A streamlined way to try out many Vertex AI features using just an API key, without needing to set up billing or full project configurations. This is ideal for quick experimentation and has generous free tier quotas. Learn More about Express Mode.

# Using Vertex AI Express Mode (easy to start; some limitations).
# Get an API key from the Vertex AI Studio Express Mode setup.
import os

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

ai = Genkit(
    plugins=[
        VertexAI(api_key=os.environ['VERTEX_EXPRESS_API_KEY']),
    ],
)

Note: When using Express Mode, you typically omit project and location on VertexAI (see the Express Mode docs).

Basic Usage

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

ai = Genkit(
    plugins=[VertexAI(location='us-central1')],
)

response = await ai.generate(
    model='vertexai/gemini-2.5-pro',
    prompt='Explain Vertex AI in simple terms.',
)

print(response.text)

Text Embedding

embeddings = await ai.embed(
    embedder='vertexai/text-embedding-005',
    content='Embed this text.',
)

Size Vector Search indexes (and any application-side buffers) to the length of vectors your app actually produces. gemini-embedding-001 and gemini-embedding-2-preview default to 3072 dimensions; pass output_dimensionality in options on embed / embed_many to use a shorter vector (Google documents common choices such as 768, 1536, or 3072). Example:

embeddings = await ai.embed(
    embedder='vertexai/gemini-embedding-001',
    content='Your text here.',
    options={'output_dimensionality': 768},
)

vertexai/text-embedding-005 and vertexai/text-multilingual-embedding-002 typically use 768 dimensions. See Embedding models and the Gemini embedding documentation.

Image Generation (Imagen)

response = await ai.generate(
    model='vertexai/imagen-3.0-generate-002',
    prompt='A beautiful watercolor painting of a castle in the mountains.',
)

if response.message and response.message.content:
    media_part = response.message.content[0]
    generated_image = media_part.root.media.url

Thinking Config

Thinking Level (Gemini 3.0)

response = await ai.generate(
    model='vertexai/gemini-3-pro-preview',
    prompt='what is heavier, one kilo of steel or one kilo of feathers',
    config={
        'thinking_config': {
            'thinking_level': 'HIGH',  # Or 'LOW' or 'MEDIUM'
        },
    },
)

Thinking Budget (Gemini 2.5)

message = (await ai.generate(
    model='vertexai/gemini-2.5-pro',
    prompt='what is heavier, one kilo of steel or one kilo of feathers',
    config={
        'thinking_config': {
            'thinking_budget': 1024,
            'include_thoughts': True,
        },
    },
)).message

Enterprise Features (Python)

The following advanced features are available in Python. Note that some features require additional plugin packages:

Installation for Advanced Features

Core Vertex AI features (included in genkit-plugin-google-genai):

uv add genkit-plugin-google-genai

Model Garden and Vector Search (requires separate plugin):

uv add genkit-plugin-vertex-ai

If you want to locally run flows that use these plugins, you also need the Google Cloud CLI tool installed.

Configuration for Advanced Features

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

ai = Genkit(
    plugins=[VertexAI(location='us-central1')],
)

The plugin requires you to specify your Google Cloud project ID, the region to which you want to make Vertex API requests, and your Google Cloud project credentials.

You can specify your Google Cloud project ID either by setting project in the VertexAI() configuration or by setting the GCLOUD_PROJECT environment variable. If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), GCLOUD_PROJECT is automatically set to the project ID of the environment.
You can specify the API location either by setting location in the VertexAI() configuration or by setting the GCLOUD_LOCATION environment variable.
To provide API credentials, you need to set up Google Cloud Application Default Credentials.
1. To specify your credentials:
  - If you’re running your flow from a Google Cloud environment (Cloud Functions, Cloud Run, and so on), this is set automatically.
  - On your local dev environment, do this by running:

gcloud auth application-default login --project YOUR_PROJECT_ID

For other environments, see the Application Default Credentials docs.

In addition, make sure the account is granted the Vertex AI User IAM role (roles/aiplatform.user). See the Vertex AI access control docs.

Grounding

This plugin supports grounding Gemini text responses using Google Search.

Important: Vertex AI charges a fee for grounding requests in addition to the cost of making LLM requests. See the Vertex AI pricing page and be sure you understand grounding request pricing before you use this feature.

Example:

ai = Genkit(
    plugins=[VertexAI(location='us-central1')],
)

await ai.generate(
    model='vertexai/gemini-2.5-flash',
    prompt='What are the latest developments in quantum computing?',
    config={
        'google_search_retrieval': {
            'disable_attribution': True,
        },
    }
)

Context Caching

The Vertex AI Genkit plugin supports Context Caching, which allows models to reuse previously cached content to optimize token usage when dealing with large pieces of content. This feature is especially useful for conversational flows or scenarios where the model references a large piece of content consistently across multiple requests.

How to Use Context Caching

To enable context caching, ensure your model supports it. For example, gemini-2.5-flash and gemini-2.0-pro are models that support context caching, and you will have to specify version number 001.

You can define a caching mechanism in your application like this:

from genkit import Message, Part, TextPart, Role

ai = Genkit(
    plugins=[VertexAI(location='us-central1')],
)

llm_response = await ai.generate(
    messages=[
        Message(
            role=Role.USER,
            content=[Part(root=TextPart(text='Here is the relevant text from War and Peace.'))],
        ),
        Message(
            role=Role.MODEL,
            content=[
                Part(root=TextPart(text="Based on War and Peace, here is some analysis of Pierre Bezukhov's character.")),
            ],
            metadata={
                'cache': {
                    'ttl_seconds': 300,  # Cache this message for 5 minutes
                },
            },
        ),
    ],
    model='vertexai/gemini-2.5-flash',
    prompt="Describe Pierre's transformation throughout the novel.",
)

In this setup:

messages: Allows you to pass conversation history.
metadata.cache.ttl_seconds: Specifies the time-to-live (TTL) for caching a specific response.

Example: Leveraging Large Texts with Context

For applications referencing long documents, such as War and Peace or Lord of the Rings, you can structure your queries to reuse cached contexts:

from pathlib import Path
from genkit import Message, Part, TextPart, Role

text_content = Path('path/to/war_and_peace.txt').read_text()

llm_response = await ai.generate(
    messages=[
        Message(
            role=Role.USER,
            content=[Part(root=TextPart(text=text_content))],  # Include the large text as context
        ),
        Message(
            role=Role.MODEL,
            content=[
                Part(root=TextPart(text='This analysis is based on the provided text from War and Peace.')),
            ],
            metadata={
                'cache': {
                    'ttl_seconds': 300,  # Cache the response to avoid reloading the full text
                },
            },
        ),
    ],
    model='vertexai/gemini-2.5-flash',
    prompt='Analyze the relationship between Pierre and Natasha.',
)

Supported models: gemini-2.5-flash-001, gemini-2.0-pro-001

Model Garden Integration

Access third-party models through Vertex AI Model Garden using the genkit-plugin-vertex-ai package (ModelGardenPlugin). The plugin requires a Google Cloud project ID: pass project_id, or set GCLOUD_PROJECT / GOOGLE_CLOUD_PROJECT. Model IDs must use the publisher-qualified names shown in the Google Cloud console (for example meta/... for Llama, anthropic/... for Claude on Vertex). Pass them to model_garden_name() so Genkit resolves the action as modelgarden/<model-id>.

Installation:

uv add genkit-plugin-vertex-ai

Llama (Meta) models

from genkit import Genkit
from genkit.plugins.vertex_ai import ModelGardenPlugin
from genkit.plugins.vertex_ai.model_garden import model_garden_name

ai = Genkit(
    plugins=[
        ModelGardenPlugin(
            project_id='my-gcp-project',
            location='us-central1',
        ),
    ],
)

response = await ai.generate(
    model=model_garden_name('meta/llama-3.1-405b-instruct-maas'),
    prompt='Write a function that adds two numbers together',
)

Another identifier shipped in the Python SDK registry is meta/llama-3.2-90b-vision-instruct-maas. Always confirm the exact model resource name for your project in the Vertex AI Model Garden console.

Anthropic (Claude) models on Vertex

Claude on Vertex uses anthropic/... model IDs. Version strings often include dates or @ — use the exact ID from the console:

from genkit import Genkit
from genkit.plugins.vertex_ai import ModelGardenPlugin
from genkit.plugins.vertex_ai.model_garden import model_garden_name

ai = Genkit(
    plugins=[
        ModelGardenPlugin(
            project_id='my-gcp-project',
            location='us-central1',
        ),
    ],
)

response = await ai.generate(
    model=model_garden_name('anthropic/claude-3-5-haiku-20241022'),
    prompt='What should I do when I visit Melbourne?',
)

Other OpenAI-compatible Model Garden endpoints

For additional publishers (for example Mistral), use the same model_garden_name() pattern with the full Model Garden model ID. Models not in the built-in registry still resolve via the generic OpenAI-compatible Model Garden path.

Vertex AI provides access to various third-party models through Model Garden. Consult the Vertex AI Model Garden documentation for the full list of supported models and their capabilities.

Evaluation Metrics

Genkit provides evaluation metrics through the Vertex AI plugin automatically when a project is configured:

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

# Evaluators are automatically registered when a project ID is provided
ai = Genkit(
    plugins=[VertexAI(project='your-project-id', location='us-central1')],
)

Available built-in metrics from the Vertex AI plugin include:

BLEU: Translation quality
ROUGE: Summarization quality
Fluency: Text fluency
Safety: Content safety
Groundedness: Factual accuracy
Summarization Quality/Helpfulness/Verbosity: Summary evaluation

See the evaluation documentation for more details on implementing comprehensive evaluation workflows.

Vector Search

Advanced Vertex AI features like Vector Search require custom implementation using the Google Cloud SDK directly in Python. See the Vertex AI documentation for implementation details.

Next Steps

Learn about generating content to understand how to use these models effectively
Explore evaluation to leverage Vertex AI’s evaluation metrics
See RAG to implement retrieval-augmented generation with Vector Search
Check out creating flows to build structured AI workflows
For simple API key access, see the Google AI plugin