Skip to content

Google Generative AI plugin

The genkit-plugin-google-genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API using API key authentication.

The plugin supports a wide range of capabilities:

  • Language Models: Gemini models for text generation, reasoning, and multimodal tasks
  • Embedding Models: Text and multimodal embeddings
  • Image Models: Imagen for generation and Gemini for image analysis
  • Video Models: Veo for video generation
  • Speech Models: Gemini TTS for text-to-speech generation
Terminal window
uv add genkit-plugin-google-genai
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins=[GoogleAI()],
model='googleai/gemini-2.5-flash',
)

Requires a Gemini API Key from Google AI Studio. Provide via:

  1. Environment variable: Set GEMINI_API_KEY
  2. Plugin configuration: Pass api_key when initializing the plugin:
ai = Genkit(
plugins=[GoogleAI(api_key='your-api-key')],
)

Gemini 3 Series - Latest experimental models:

  • gemini-3-pro-preview - Most capable model for complex tasks
  • gemini-3-flash-preview - Fast and intelligent for high-volume tasks

Gemini 2.5 Series - Stable models with advanced reasoning:

  • gemini-2.5-pro - Most capable stable model
  • gemini-2.5-flash - Fast and efficient for most use cases
  • gemini-2.5-flash-lite - Lightweight version for simple tasks
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins=[GoogleAI()],
model='googleai/gemini-2.5-flash',
)
response = await ai.generate(
prompt='Explain how neural networks learn in simple terms.',
)
print(response.text)

Gemini models support structured output generation using Pydantic schemas:

from pydantic import BaseModel, Field
from genkit import Output
class Character(BaseModel):
"""An RPG character."""
name: str = Field(description='Character name')
bio: str = Field(description='Character backstory')
age: int = Field(description='Character age')
response = await ai.generate(
prompt='Generate a profile for a fictional fantasy character',
output=Output(schema=Character),
)
print(response.output) # Character instance

Gemini 2.5+ models use an internal thinking process for complex reasoning:

response = await ai.generate(
prompt='Solve this logic puzzle: ...',
config={
'thinking_config': {
'include_thoughts': True,
}
},
)
print(response.text)

Enable Google Search to provide answers with current information:

response = await ai.generate(
prompt='What are the top tech news stories this week?',
config={'google_search_retrieval': True},
)
print(response.text)

Configure safety settings to control content filtering:

response = await ai.generate(
prompt='Your prompt here',
config={
'safety_settings': [
{
'category': 'HARM_CATEGORY_HATE_SPEECH',
'threshold': 'BLOCK_MEDIUM_AND_ABOVE',
},
{
'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
'threshold': 'BLOCK_MEDIUM_AND_ABOVE',
},
],
},
)
import base64
from genkit import Part, TextPart, MediaPart, Media
# From file
with open('image.jpg', 'rb') as f:
image_data = base64.b64encode(f.read()).decode()
response = await ai.generate(
prompt=[
Part(root=TextPart(text='Describe what is in this image')),
Part(root=MediaPart(media=Media(url=f'data:image/jpeg;base64,{image_data}', content_type='image/jpeg'))),
],
)
from genkit import Part, TextPart, MediaPart, Media
response = await ai.generate(
prompt=[
Part(root=TextPart(text='What happens in this video?')),
Part(root=MediaPart(media=Media(url='https://example.com/video.mp4', content_type='video/mp4'))),
],
)
  • gemini-embedding-001 - Latest Gemini embedding model (3072 dimensions)
  • text-embedding-004 - Text embedding model (768 dimensions)
embeddings = await ai.embed(
embedder='googleai/text-embedding-004',
content='Machine learning models process data to make predictions.',
)
print(embeddings)

Imagen 4 Series:

  • imagen-4.0-generate-001 - Standard quality
  • imagen-4.0-ultra-generate-001 - Ultra-high quality
  • imagen-4.0-fast-generate-001 - Fast generation

Imagen 3 Series:

  • imagen-3.0-generate-002
response = await ai.generate(
model='googleai/imagen-3.0-generate-002',
prompt='A serene Japanese garden with cherry blossoms and a koi pond.',
config={
'number_of_images': 4,
'aspect_ratio': '16:9',
},
)
# Access generated image
if response.message and response.message.content:
media_part = response.message.content[0]
print(f'Generated image: {media_part.root.media.url}')

Veo models generate videos from text prompts using the background model pattern (long-running operations that can take minutes).

  • veo-2.0-generate-001 - Veo 2.0
  • veo-3.0-generate-001 - Veo 3.0
  • veo-3.1-generate-001 - Veo 3.1 with native audio
import asyncio
from genkit.plugins.google_genai import VeoVersion
# Start video generation (returns an Operation)
response = await ai.generate(
model=f'googleai/{VeoVersion.VEO_2_0}',
prompt='A majestic dragon soaring over a mystical forest at dawn.',
config={
'aspect_ratio': '16:9',
},
)
# Video generation returns an operation that needs to be polled
operation = response.operation
if operation:
# Poll until complete
while not operation.done:
operation = await ai.check_operation(operation)
await asyncio.sleep(5) # Wait between polls
if operation.error:
print(f'Error: {operation.error.message}')
else:
# Extract video from result
video = operation.output.message.content[0]
print(f'Video URL: {video.root.media.url}')

Configuration Options:

  • aspect_ratio: "16:9" or "9:16"
  • negative_prompt: Text to discourage in generation
  • person_generation: "dont_allow", "allow_adult", "allow_all"
  • duration_seconds: Video length (Veo 2 only, 5-8 seconds)

Gemini TTS models convert text to natural-sounding speech.

  • gemini-2.5-flash-preview-tts - Flash model with TTS
  • gemini-2.5-pro-preview-tts - Pro model with TTS
response = await ai.generate(
model='googleai/gemini-2.5-flash-preview-tts',
prompt='Say that Genkit is an amazing AI framework',
config={
'speech_config': {
'voice_config': {
'prebuilt_voice_config': {
'voice_name': 'Kore',
}
}
}
},
)
# Extract audio
if response.message and response.message.content:
audio_part = response.message.content[0]
audio_data = audio_part.root.media.url
print(f'Audio generated: {audio_data[:50]}...')

Available Voices: Puck, Charon, Kore, Fenrir, Aoede, Zephyr, Algenib, and more.

Voice Configuration Options:

  • voice_name: Name of the prebuilt voice
  • speaking_rate: Speed of speech (0.25 to 4.0)
  • pitch: Voice pitch (-20.0 to 20.0)
  • volume_gain_db: Volume (-96.0 to 16.0)