Google Generative AI plugin
The genkit-plugin-google-genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API using API key authentication.
The plugin supports a wide range of capabilities:
- Language Models: Gemini models for text generation, reasoning, and multimodal tasks
- Embedding Models: Text and multimodal embeddings
- Image Models: Imagen for generation and Gemini for image analysis
- Video Models: Veo for video generation
- Speech Models: Gemini TTS for text-to-speech generation
Installation
Section titled “Installation”uv add genkit-plugin-google-genaiConfiguration
Section titled “Configuration”from genkit import Genkitfrom genkit.plugins.google_genai import GoogleAI
ai = Genkit( plugins=[GoogleAI()], model='googleai/gemini-2.5-flash',)Authentication
Section titled “Authentication”Requires a Gemini API Key from Google AI Studio. Provide via:
- Environment variable: Set
GEMINI_API_KEY - Plugin configuration: Pass
api_keywhen initializing the plugin:
ai = Genkit( plugins=[GoogleAI(api_key='your-api-key')],)Language Models
Section titled “Language Models”Available Models
Section titled “Available Models”Gemini 3 Series - Latest experimental models:
gemini-3-pro-preview- Most capable model for complex tasksgemini-3-flash-preview- Fast and intelligent for high-volume tasks
Gemini 2.5 Series - Stable models with advanced reasoning:
gemini-2.5-pro- Most capable stable modelgemini-2.5-flash- Fast and efficient for most use casesgemini-2.5-flash-lite- Lightweight version for simple tasks
Basic Usage
Section titled “Basic Usage”from genkit import Genkitfrom genkit.plugins.google_genai import GoogleAI
ai = Genkit( plugins=[GoogleAI()], model='googleai/gemini-2.5-flash',)
response = await ai.generate( prompt='Explain how neural networks learn in simple terms.',)print(response.text)Structured Output
Section titled “Structured Output”Gemini models support structured output generation using Pydantic schemas:
from pydantic import BaseModel, Fieldfrom genkit import Output
class Character(BaseModel): """An RPG character.""" name: str = Field(description='Character name') bio: str = Field(description='Character backstory') age: int = Field(description='Character age')
response = await ai.generate( prompt='Generate a profile for a fictional fantasy character', output=Output(schema=Character),)print(response.output) # Character instanceThinking and Reasoning
Section titled “Thinking and Reasoning”Gemini 2.5+ models use an internal thinking process for complex reasoning:
response = await ai.generate( prompt='Solve this logic puzzle: ...', config={ 'thinking_config': { 'include_thoughts': True, } },)print(response.text)Google Search Grounding
Section titled “Google Search Grounding”Enable Google Search to provide answers with current information:
response = await ai.generate( prompt='What are the top tech news stories this week?', config={'google_search_retrieval': True},)print(response.text)Safety Settings
Section titled “Safety Settings”Configure safety settings to control content filtering:
response = await ai.generate( prompt='Your prompt here', config={ 'safety_settings': [ { 'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'BLOCK_MEDIUM_AND_ABOVE', }, { 'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'BLOCK_MEDIUM_AND_ABOVE', }, ], },)Multimodal Input
Section titled “Multimodal Input”Image Understanding
Section titled “Image Understanding”import base64from genkit import Part, TextPart, MediaPart, Media
# From filewith open('image.jpg', 'rb') as f: image_data = base64.b64encode(f.read()).decode()
response = await ai.generate( prompt=[ Part(root=TextPart(text='Describe what is in this image')), Part(root=MediaPart(media=Media(url=f'data:image/jpeg;base64,{image_data}', content_type='image/jpeg'))), ],)Video Understanding
Section titled “Video Understanding”from genkit import Part, TextPart, MediaPart, Media
response = await ai.generate( prompt=[ Part(root=TextPart(text='What happens in this video?')), Part(root=MediaPart(media=Media(url='https://example.com/video.mp4', content_type='video/mp4'))), ],)Embedding Models
Section titled “Embedding Models”Available Models
Section titled “Available Models”gemini-embedding-001- Latest Gemini embedding model (3072 dimensions)text-embedding-004- Text embedding model (768 dimensions)
embeddings = await ai.embed( embedder='googleai/text-embedding-004', content='Machine learning models process data to make predictions.',)print(embeddings)Image Models
Section titled “Image Models”Available Models
Section titled “Available Models”Imagen 4 Series:
imagen-4.0-generate-001- Standard qualityimagen-4.0-ultra-generate-001- Ultra-high qualityimagen-4.0-fast-generate-001- Fast generation
Imagen 3 Series:
imagen-3.0-generate-002
response = await ai.generate( model='googleai/imagen-3.0-generate-002', prompt='A serene Japanese garden with cherry blossoms and a koi pond.', config={ 'number_of_images': 4, 'aspect_ratio': '16:9', },)
# Access generated imageif response.message and response.message.content: media_part = response.message.content[0] print(f'Generated image: {media_part.root.media.url}')Video Models (Veo)
Section titled “Video Models (Veo)”Veo models generate videos from text prompts using the background model pattern (long-running operations that can take minutes).
Available Models
Section titled “Available Models”veo-2.0-generate-001- Veo 2.0veo-3.0-generate-001- Veo 3.0veo-3.1-generate-001- Veo 3.1 with native audio
import asynciofrom genkit.plugins.google_genai import VeoVersion
# Start video generation (returns an Operation)response = await ai.generate( model=f'googleai/{VeoVersion.VEO_2_0}', prompt='A majestic dragon soaring over a mystical forest at dawn.', config={ 'aspect_ratio': '16:9', },)
# Video generation returns an operation that needs to be polledoperation = response.operationif operation: # Poll until complete while not operation.done: operation = await ai.check_operation(operation) await asyncio.sleep(5) # Wait between polls
if operation.error: print(f'Error: {operation.error.message}') else: # Extract video from result video = operation.output.message.content[0] print(f'Video URL: {video.root.media.url}')Configuration Options:
aspect_ratio:"16:9"or"9:16"negative_prompt: Text to discourage in generationperson_generation:"dont_allow","allow_adult","allow_all"duration_seconds: Video length (Veo 2 only, 5-8 seconds)
Speech Models (TTS)
Section titled “Speech Models (TTS)”Gemini TTS models convert text to natural-sounding speech.
Available Models
Section titled “Available Models”gemini-2.5-flash-preview-tts- Flash model with TTSgemini-2.5-pro-preview-tts- Pro model with TTS
response = await ai.generate( model='googleai/gemini-2.5-flash-preview-tts', prompt='Say that Genkit is an amazing AI framework', config={ 'speech_config': { 'voice_config': { 'prebuilt_voice_config': { 'voice_name': 'Kore', } } } },)
# Extract audioif response.message and response.message.content: audio_part = response.message.content[0] audio_data = audio_part.root.media.url print(f'Audio generated: {audio_data[:50]}...')Available Voices: Puck, Charon, Kore, Fenrir, Aoede, Zephyr, Algenib, and more.
Voice Configuration Options:
voice_name: Name of the prebuilt voicespeaking_rate: Speed of speech (0.25 to 4.0)pitch: Voice pitch (-20.0 to 20.0)volume_gain_db: Volume (-96.0 to 16.0)
Next Steps
Section titled “Next Steps”- Learn about generating content to understand how to use these models effectively
- Explore creating flows to build structured AI workflows
- To use the Gemini API at enterprise scale see the Vertex AI plugin