Google Generative AI plugin

The genkit-plugin-google-genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API using API key authentication.

The plugin supports a wide range of capabilities:

Language Models: Gemini models for text generation, reasoning, and multimodal tasks
Embedding Models: Text and multimodal embeddings
Image Models: Imagen for generation and Gemini for image analysis
Video Models: Veo for video generation
Speech Models: Gemini TTS for text-to-speech generation

Installation

uv add genkit-plugin-google-genai

Configuration

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.5-flash',
)

Authentication

Requires a Gemini API Key from Google AI Studio. Provide via:

Environment variable: Set GEMINI_API_KEY
Plugin configuration: Pass api_key when initializing the plugin:

ai = Genkit(
    plugins=[GoogleAI(api_key='your-api-key')],
)

Language Models

Available Models

Gemini 3 Series - Latest experimental models:

gemini-3-pro-preview - Most capable model for complex tasks
gemini-3-flash-preview - Fast and intelligent for high-volume tasks

Gemini 2.5 Series - Stable models with advanced reasoning:

gemini-2.5-pro - Most capable stable model
gemini-2.5-flash - Fast and efficient for most use cases
gemini-2.5-flash-lite - Lightweight version for simple tasks

Basic Usage

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.5-flash',
)

response = await ai.generate(
    prompt='Explain how neural networks learn in simple terms.',
)
print(response.text)

Structured Output

Gemini models support structured output generation using Pydantic schemas:

from pydantic import BaseModel, Field
from genkit import Output

class Character(BaseModel):
    """An RPG character."""
    name: str = Field(description='Character name')
    bio: str = Field(description='Character backstory')
    age: int = Field(description='Character age')

response = await ai.generate(
    prompt='Generate a profile for a fictional fantasy character',
    output=Output(schema=Character),
)
print(response.output)  # Character instance

Thinking and Reasoning

Gemini 2.5+ models use an internal thinking process for complex reasoning:

response = await ai.generate(
    prompt='Solve this logic puzzle: ...',
    config={
        'thinking_config': {
            'include_thoughts': True,
        }
    },
)
print(response.text)

Google Search Grounding

Enable Google Search to provide answers with current information:

response = await ai.generate(
    prompt='What are the top tech news stories this week?',
    config={'google_search_retrieval': True},
)
print(response.text)

Safety Settings

Configure safety settings to control content filtering:

response = await ai.generate(
    prompt='Your prompt here',
    config={
        'safety_settings': [
            {
                'category': 'HARM_CATEGORY_HATE_SPEECH',
                'threshold': 'BLOCK_MEDIUM_AND_ABOVE',
            },
            {
                'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
                'threshold': 'BLOCK_MEDIUM_AND_ABOVE',
            },
        ],
    },
)

Multimodal Input

Image Understanding

import base64
from genkit import Part, TextPart, MediaPart, Media

# From file
with open('image.jpg', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode()

response = await ai.generate(
    prompt=[
        Part(root=TextPart(text='Describe what is in this image')),
        Part(root=MediaPart(media=Media(url=f'data:image/jpeg;base64,{image_data}', content_type='image/jpeg'))),
    ],
)

Video Understanding

from genkit import Part, TextPart, MediaPart, Media

response = await ai.generate(
    prompt=[
        Part(root=TextPart(text='What happens in this video?')),
        Part(root=MediaPart(media=Media(url='https://example.com/video.mp4', content_type='video/mp4'))),
    ],
)

Embedding Models

Available Models

gemini-embedding-001 - Latest Gemini embedding model (3072 dimensions)
text-embedding-004 - Text embedding model (768 dimensions)

Usage

embeddings = await ai.embed(
    embedder='googleai/text-embedding-004',
    content='Machine learning models process data to make predictions.',
)
print(embeddings)

Image Models

Available Models

Imagen 4 Series:

imagen-4.0-generate-001 - Standard quality
imagen-4.0-ultra-generate-001 - Ultra-high quality
imagen-4.0-fast-generate-001 - Fast generation

Imagen 3 Series:

imagen-3.0-generate-002

Usage

response = await ai.generate(
    model='googleai/imagen-3.0-generate-002',
    prompt='A serene Japanese garden with cherry blossoms and a koi pond.',
    config={
        'number_of_images': 4,
        'aspect_ratio': '16:9',
    },
)

# Access generated image
if response.message and response.message.content:
    media_part = response.message.content[0]
    print(f'Generated image: {media_part.root.media.url}')

Video Models (Veo)

Veo models generate videos from text prompts using the background model pattern (long-running operations that can take minutes).

Available Models

veo-2.0-generate-001 - Veo 2.0
veo-3.0-generate-001 - Veo 3.0
veo-3.1-generate-001 - Veo 3.1 with native audio

Usage

import asyncio
from genkit.plugins.google_genai import VeoVersion

# Start video generation (returns an Operation)
response = await ai.generate(
    model=f'googleai/{VeoVersion.VEO_2_0}',
    prompt='A majestic dragon soaring over a mystical forest at dawn.',
    config={
        'aspect_ratio': '16:9',
    },
)

# Video generation returns an operation that needs to be polled
operation = response.operation
if operation:
    # Poll until complete
    while not operation.done:
        operation = await ai.check_operation(operation)
        await asyncio.sleep(5)  # Wait between polls

    if operation.error:
        print(f'Error: {operation.error.message}')
    else:
        # Extract video from result
        video = operation.output.message.content[0]
        print(f'Video URL: {video.root.media.url}')

Configuration Options:

aspect_ratio: "16:9" or "9:16"
negative_prompt: Text to discourage in generation
person_generation: "dont_allow", "allow_adult", "allow_all"
duration_seconds: Video length (Veo 2 only, 5-8 seconds)

Speech Models (TTS)

Gemini TTS models convert text to natural-sounding speech.

Available Models

gemini-2.5-flash-preview-tts - Flash model with TTS
gemini-2.5-pro-preview-tts - Pro model with TTS

Usage

response = await ai.generate(
    model='googleai/gemini-2.5-flash-preview-tts',
    prompt='Say that Genkit is an amazing AI framework',
    config={
        'speech_config': {
            'voice_config': {
                'prebuilt_voice_config': {
                    'voice_name': 'Kore',
                }
            }
        }
    },
)

# Extract audio
if response.message and response.message.content:
    audio_part = response.message.content[0]
    audio_data = audio_part.root.media.url
    print(f'Audio generated: {audio_data[:50]}...')

Available Voices: Puck, Charon, Kore, Fenrir, Aoede, Zephyr, Algenib, and more.

Voice Configuration Options:

voice_name: Name of the prebuilt voice
speaking_rate: Speed of speech (0.25 to 4.0)
pitch: Voice pitch (-20.0 to 20.0)
volume_gain_db: Volume (-96.0 to 16.0)

Next Steps

Learn about generating content to understand how to use these models effectively
Explore creating flows to build structured AI workflows
To use the Gemini API at enterprise scale see the Vertex AI plugin