Google Generative AI plugin
The Google AI plugin provides a unified interface to connect with Google’s generative AI models through the Gemini Developer API using API key authentication. The @genkit-ai/google-genai package is a drop-in replacement for the previous @genkit-ai/googleai package.
The plugin supports a wide range of capabilities:
- Language Models: Gemini models for text generation, reasoning, and multimodal tasks
- Embedding Models: Text and multimodal embeddings
- Image Models: Imagen for generation and Gemini for image analysis
- Video Models: Veo for video generation and Gemini for video understanding
- Speech Models: Polyglot text-to-speech generation
Installation
Section titled “Installation”npm i --save @genkit-ai/google-genaiConfiguration
Section titled “Configuration”import { genkit } from 'genkit';import { googleAI } from '@genkit-ai/google-genai';
const ai = genkit({ plugins: [ googleAI(), // Or with an explicit API key: // googleAI({ apiKey: 'your-api-key' }), ],});Authentication
Section titled “Authentication”Requires a Gemini API Key, which you can get from Google AI Studio. You can provide this key in several ways:
- Environment variables: Set
GEMINI_API_KEY - Plugin configuration: Pass
apiKeywhen initializing the plugin (shown above) - Per-request: Override the API key for specific requests in the config:
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Your prompt here', config: { apiKey: 'different-api-key', // Use a different API key for this request },});This per-request API key option is useful for routing specific requests to different API keys, such as for multi-tenant applications or cost tracking.
Language Models
Section titled “Language Models”You can create models that call the Google Generative AI API. The models support tool calls and some have multi-modal capabilities.
Available Models
Section titled “Available Models”Gemini 3+ Series - Latest experimental models with state-of-the-art reasoning:
gemini-3.1-pro-preview- Preview of the most capable model for complex tasksgemini-3.1-pro-preview-customtools- Model tuned for best custom tools supportgemini-3.1-flash-image-preview- Fast and efficient image generationgemini-3-flash-preview- Fast and intelligent model for high-volume tasksgemini-3-pro-image-preview- Supports image generation outputs
Gemini 2.5 Series - Latest stable models with advanced reasoning and multimodal capabilities:
gemini-2.5-pro- Most capable stable model for complex tasksgemini-2.5-flash- Fast and efficient for most use casesgemini-2.5-flash-lite- Lightweight version for simple tasksgemini-2.5-flash-image- Supports image generation outputs
Gemma 3 Series - Open models for various use cases:
gemma-3-27b-it- Large instruction-tuned modelgemma-3-12b-it- Medium instruction-tuned modelgemma-3-4b-it- Small instruction-tuned modelgemma-3-1b-it- Tiny instruction-tuned modelgemma-3n-e4b-it- Efficient 4-bit model
Basic Usage
Section titled “Basic Usage”import { genkit } from 'genkit';import { googleAI } from '@genkit-ai/google-genai';
const ai = genkit({ plugins: [googleAI()],});
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Explain how neural networks learn in simple terms.',});
console.log(response.text);Structured Output
Section titled “Structured Output”import { z } from 'genkit';
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), output: { schema: z.object({ name: z.string(), bio: z.string(), age: z.number(), }), }, prompt: 'Generate a profile for a fictional character',});
console.log(response.output);Schema Limitations
Section titled “Schema Limitations”The Gemini API relies on a specific subset of the OpenAPI 3.0 standard. When defining Zod schemas for structured output, keep the following limitations in mind:
Supported Features
- Objects & Arrays: Standard object properties and array items.
- Enums: Fully supported (
z.enum). - Nullable: Supported via
z.nullable()(mapped tonullable: true).
Critical Limitations
- Unions (
z.union): Complex unions are often problematic. The API has specific handling foranyOfbut may reject ambiguous or complexoneOfstructures. Prefer using a single object with optional fields or distinct tool definitions over complex unions. - Validation Keywords: Keywords like
pattern,minLength,maxLength,minItems, andmaxItemsare not supported by the Gemini API’s constrained decoding. Including them may result in400 InvalidArgumenterrors or them being ignored. - Recursion: Recursive schemas are generally not supported.
- Complexity: Deeply nested schemas or schemas with hundreds of properties may trigger complexity limits.
Best Practices
- Keep schemas simple and flat where possible.
- Use property descriptions (
.describe()) to guide the model instead of complex validation rules (e.g., “String must be an email” instead of a regex pattern). - If you need strict validation (e.g., regex), perform it in your application code after receiving the structured response.
Thinking and Reasoning
Section titled “Thinking and Reasoning”Gemini 2.5 and newer models use an internal thinking process that improves reasoning for complex tasks.
Thinking Level (Gemini 3.0+):
const response = await ai.generate({ model: googleAI.model('gemini-3.1-pro-preview'), prompt: 'what is heavier, one kilo of steel or one kilo of feathers', config: { thinkingConfig: { thinkingLevel: 'HIGH', // Or 'LOW', or 'MEDIUM' includeThoughts: true, // Include thought summaries }, },});Thinking Budget (Gemini 2.5):
const response = await ai.generate({ model: googleAI.model('gemini-2.5-pro'), prompt: 'what is heavier, one kilo of steel or one kilo of feathers', config: { thinkingConfig: { thinkingBudget: 8192, // Number of thinking tokens includeThoughts: true, // Include thought summaries }, },});
if (response.reasoning) { console.log('Reasoning:', response.reasoning);}Context Caching
Section titled “Context Caching”Gemini 2.5 and newer models automatically cache common content prefixes (min 1024 tokens for Flash, 2048 for Pro), providing a 75% token discount on cached tokens.
// Structure prompts with consistent content at the beginningconst baseContext = `You are a helpful cook... (large context) ...`.repeat(50);
// First request - content will be cachedawait ai.generate({ model: googleAI.model('gemini-pro-latest'), prompt: `${baseContext}\n\nTask 1...`,});
// Second request with same prefix - eligible for cache hitawait ai.generate({ model: googleAI.model('gemini-pro-latest'), prompt: `${baseContext}\n\nTask 2...`,});Safety Settings
Section titled “Safety Settings”You can configure safety settings to control content filtering for different harm categories:
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Your prompt here', config: { safetySettings: [ { category: 'HARM_CATEGORY_HATE_SPEECH', threshold: 'BLOCK_MEDIUM_AND_ABOVE', }, { category: 'HARM_CATEGORY_DANGEROUS_CONTENT', threshold: 'BLOCK_MEDIUM_AND_ABOVE', }, ], },});Available harm categories:
HARM_CATEGORY_HATE_SPEECHHARM_CATEGORY_DANGEROUS_CONTENTHARM_CATEGORY_HARASSMENTHARM_CATEGORY_SEXUALLY_EXPLICIT
Available thresholds:
HARM_BLOCK_THRESHOLD_UNSPECIFIEDBLOCK_LOW_AND_ABOVEBLOCK_MEDIUM_AND_ABOVEBLOCK_ONLY_HIGHBLOCK_NONE
Accessing Safety Ratings:
Safety ratings are typically only included when content is flagged. You can access them from the response custom metadata:
const geminiResponse = response.custom as any;const candidateSafetyRatings = geminiResponse?.candidates?.[0]?.safetyRatings;const promptSafetyRatings = geminiResponse?.promptFeedback?.safetyRatings;Google Search Grounding
Section titled “Google Search Grounding”Enable Google Search to provide answers with current information and verifiable sources.
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'What are the top tech news stories this week?', config: { googleSearchRetrieval: true, },});
// Access grounding metadataconst groundingMetadata = (response.custom as any)?.candidates?.[0]?.groundingMetadata;if (groundingMetadata) { console.log('Sources:', groundingMetadata.groundingChunks);}The following configuration options are available for Google Search grounding:
-
googleSearchRetrieval object | boolean
Enables Google Search grounding. Can be a boolean (
true) or a configuration object. Example:{ dynamicRetrievalConfig: { mode: 'MODE_DYNAMIC', dynamicThreshold: 0.7 } }- dynamicRetrievalConfig object
- mode string
The retrieval mode (e.g.,
'MODE_DYNAMIC'). - dynamicThreshold number
The threshold for dynamic retrieval (e.g.,
0.7).
- mode string
The retrieval mode (e.g.,
- dynamicRetrievalConfig object
Response Metadata:
-
webSearchQueries string[]
Array of search queries used to retrieve information. Example:
["What's the weather in Chicago this weekend?"] -
searchEntryPoint object
Contains the main search result content formatted for display.
- renderedContent string The HTML content of the search result.
-
groundingSupports object[]
Links specific response segments to supporting search result chunks.
- segment object
- text string The text of the segment.
- groundingChunkIndices number[] Indices of the chunks that support this segment.
- confidenceScores number[] Confidence scores for each supporting chunk.
- segment object
Google Maps Grounding
Section titled “Google Maps Grounding”Enable Google Maps to provide location-aware responses.
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Find coffee shops near Times Square', config: { tools: [{ googleMaps: {} }], },});You can also request a widget token to render an interactive map:
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Show me a map of San Francisco', config: { tools: [{ googleMaps: { enableWidget: true } }], },});The following configuration options are available for Google Maps grounding:
-
googleMaps object
Enables Google Maps grounding. Example:
{ enableWidget: true }- enableWidget boolean Whether to include a widget token in the response.
-
toolConfig object
Additional configuration for provider tools. Can improve relevance by providing location context for Google Maps. Example:
{ retrievalConfig: { latLng: { latitude: 37.7749, longitude: -122.4194 } } }- retrievalConfig object
- latLng object
- latitude number The latitude in degrees.
- longitude number The longitude in degrees.
- latLng object
- retrievalConfig object
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Find coffee shops near my current location.', config: { tools: [{ googleMaps: {} }], toolConfig: { retrievalConfig: { latLng: { latitude: 37.7749, longitude: -122.4194, }, }, } },});URL Context
Section titled “URL Context”Provide specific URLs for the model to analyze:
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: 'Summarize this page', config: { tools: [{ urlContext: {} }], },});When using urlContext, the model will fetch content from URLs found in your prompt.
Code Execution
Section titled “Code Execution”Enable the model to write and execute Python code for calculations and logic.
const response = await ai.generate({ model: googleAI.model('gemini-pro-latest'), prompt: 'Calculate the 20th Fibonacci number', config: { codeExecution: true, },});The following configuration options are available for code execution:
-
codeExecution boolean
Enables code execution for reasoning and calculations. Example:
true
Generating Text and Images (Nano Banana)
Section titled “Generating Text and Images (Nano Banana)”Some Gemini models (like gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, gemini-2.5-flash-image) can output images natively alongside text:
const response = await ai.generate({ model: googleAI.model('gemini-3.1-flash-image-preview'), prompt: 'Create a picture of a futuristic city and describe it', config: { responseModalities: ['IMAGE', 'TEXT'], },});
// Extract imageif (response.image) { console.log('Image:', response.image);}
// Extract textif (response.text) { console.log('Text:', response.text);}
// Extract all messages including text and imagesif (response.messages) { console.log('Messages:', response.messages);}The following configuration options are available for Gemini image generation:
-
responseModalities string[]
Specifies the output modalities. Options:
['TEXT', 'IMAGE'],['IMAGE']Default:['TEXT', 'IMAGE'] -
imageConfig object
-
aspectRatio string
Aspect ratio of the generated images. Not all models support all aspect ratios. Options:
'1:1','1:4','1:8','2:3','3:2','3:4','4:1','4:3','4:5','5:4','8:1','9:16','16:9','21:9'Default:'1:1' -
imageSize string
Resolution of the generated image. Supported by Gemini 3+ image models only. Options:
'1K','2K','4K'Default:'1K'
-
Multimodal Input Capabilities
Section titled “Multimodal Input Capabilities”Video Understanding
Section titled “Video Understanding”Gemini models can process videos to describe content, answer questions, and refer to timestamps (in MM:SS format).
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: [ { text: 'What happens at 00:05?' }, { media: { contentType: 'video/mp4', url: 'https://youtube.com/watch?v=...' } }, ],});Video Processing Details:
- Sampling: 1 frame per second (default)
- Context: 2M context models can handle up to 2 hours of video.
- Inputs: Up to 10 videos per request (Gemini 2.5+).
Image Understanding
Section titled “Image Understanding”Gemini models can reason about images passed as inline data or URLs.
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: [{ text: 'Describe what is in this image' }, { media: { url: 'https://example.com/image.jpg' } }],});Audio Understanding
Section titled “Audio Understanding”Gemini models can process audio files to transcribe speech text, answer questions about the audio content, or summarize recordings.
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: [ { text: 'Transcribe this audio clip' }, { media: { contentType: 'audio/mp3', url: 'https://example.com/audio.mp3' } }, ],});PDF Support
Section titled “PDF Support”Gemini models can process PDF documents to extract information, summarize content, or answer questions based on the visual layout and text.
const response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: [ { text: 'Summarize this document' }, { media: { contentType: 'application/pdf', url: 'https://example.com/doc.pdf' } }, ],});File Inputs and Gemini Files API
Section titled “File Inputs and Gemini Files API”Gemini models support various file types. For small files, you can use inline data. For larger files (up to 2GB), use the Gemini Files API.
Using Files API:
To use large files, you must upload them using the Google GenAI SDK or other supported methods. Genkit does not provide file management helpers, but you can pass the file URI to Genkit for generation:
import { GoogleGenAI } from '@google/genai';// ... init genaiClient ...
// Upload fileconst uploadedFile = await genaiClient.files.upload({ file: 'path/to/video.mp4', config: { mimeType: 'video/mp4' },});
// Use in generationconst response = await ai.generate({ model: googleAI.model('gemini-flash-latest'), prompt: [ { text: 'Describe this video' }, { media: { contentType: uploadedFile.mimeType, url: uploadedFile.uri, }, }, ],});Embedding Models
Section titled “Embedding Models”Available Models
Section titled “Available Models”gemini-embedding-001- Latest Gemini embedding model (3072 dimensions, customizable)
const embeddings = await ai.embed({ embedder: googleAI.embedder('gemini-embedding-001'), content: 'Machine learning models process data to make predictions.',});
console.log(embeddings);Image Models
Section titled “Image Models”Available Models
Section titled “Available Models”Imagen 4 Series - Latest generation with improved quality:
imagen-4.0-generate-001- Standard qualityimagen-4.0-ultra-generate-001- Ultra-high qualityimagen-4.0-fast-generate-001- Fast generation
const response = await ai.generate({ model: googleAI.model('imagen-4.0-generate-001'), prompt: 'A serene Japanese garden with cherry blossoms and a koi pond.', config: { numberOfImages: 4, aspectRatio: '16:9', personGeneration: 'allow_adult', addWatermark: true, },});
const generatedImage = response.media;Configuration Options:
-
numberOfImages number
Number of images to generate. Default:
4 -
aspectRatio string
Aspect ratio of the generated images. Options:
'1:1','3:4','4:3','9:16','16:9'Default:'1:1' -
personGeneration string
Policy for generating people. Options:
'dont_allow','allow_adult','allow_all' -
addWatermark boolean
Adds invisible SynthID watermark. Default:
true -
enhancePrompt boolean
Enables LLM-based rewrite for better prompt adherence. Default:
true -
negativePrompt string
Text to exclude from the generated image.
Video Models
Section titled “Video Models”The Google AI plugin provides access to video generation capabilities through the Veo models. These models can generate videos from text prompts or manipulate existing images to create dynamic video content.
Available Models
Section titled “Available Models”Veo 3.1 Series - Latest generation with native audio and high fidelity:
veo-3.1-generate-preview- High-quality video and audio generationveo-3.1-fast-generate-preview- Fast generation with high quality
Veo 3.0 Series:
veo-3.0-generate-001veo-3.0-fast-generate-001
Veo 2.0 Series:
veo-2.0-generate-001
Text-to-Video
Section titled “Text-to-Video”To generate a video from a text prompt using the Veo model:
import { googleAI } from '@genkit-ai/google-genai';import * as fs from 'fs';import { Readable } from 'stream';import { genkit, MediaPart } from 'genkit';
const ai = genkit({ plugins: [googleAI()],});
ai.defineFlow('text-to-video-veo', async () => { let { operation } = await ai.generate({ model: googleAI.model('veo-3.0-fast-generate-001'), prompt: 'A majestic dragon soaring over a mystical forest at dawn.', config: { aspectRatio: '16:9', }, });
if (!operation) { throw new Error('Expected the model to return an operation'); }
// Wait until the operation completes. while (!operation.done) { operation = await ai.checkOperation(operation); // Sleep for 5 seconds before checking again. await new Promise((resolve) => setTimeout(resolve, 5000)); }
if (operation.error) { throw new Error('failed to generate video: ' + operation.error.message); }
const video = operation.output?.message?.content.find((p) => !!p.media); if (!video) { throw new Error('Failed to find the generated video'); } await downloadVideo(video, 'output.mp4');});
async function downloadVideo(video: MediaPart, path: string) { const fetch = (await import('node-fetch')).default; // Add API key before fetching the video. const videoDownloadResponse = await fetch(`${video.media!.url}&key=${process.env.GEMINI_API_KEY}`); if (!videoDownloadResponse || videoDownloadResponse.status !== 200 || !videoDownloadResponse.body) { throw new Error('Failed to fetch video'); }
Readable.from(videoDownloadResponse.body).pipe(fs.createWriteStream(path));}Video Generation from Photo Reference
Section titled “Video Generation from Photo Reference”To use a photo as reference for the video using the Veo model (e.g. to make a static photo move), you can provide an image as part of the prompt.
const startingImage = fs.readFileSync('photo.jpg', { encoding: 'base64' });
let { operation } = await ai.generate({ model: googleAI.model('veo-2.0-generate-001'), prompt: [ { text: 'make the subject in the photo move', }, { media: { contentType: 'image/jpeg', url: `data:image/jpeg;base64,${startingImage}`, }, }, ], config: { durationSeconds: 5, aspectRatio: '9:16', personGeneration: 'allow_adult', },});The Veo models support various configuration options:
-
negativePrompt string
Text that describes anything you want to discourage the model from generating.
-
aspectRatio string
Changes the aspect ratio of the generated video.
"16:9""9:16"
-
personGeneration string
Allow the model to generate videos of people.
- Text-to-video generation:
"allow_all": Generate videos that include adults and children. Currently the only available value for Veo 3."dont_allow"(Veo 2 only): Don’t allow people or faces."allow_adult"(Veo 2 only): Generate videos with adults, but not children.
- Image-to-video generation (Veo 2 only):
"dont_allow": Don’t allow people or faces."allow_adult": Generate videos with adults, but not children.
- Text-to-video generation:
-
numberOfVideos number
Output videos requested.
1: Supported in Veo 3 and Veo 2.2: Supported in Veo 2 only.
-
durationSeconds number (Veo 2 only)
Length of each output video in seconds (5 to 8). Not configurable for Veo 3.1/3.0 (defaults to 8 seconds).
-
resolution string (Veo 3.1 only)
Resolution of the generated video.
"720p"(default)"1080p"(Available for 16:9 aspect ratio)"4k"(Veo 3.1 only)
-
seed number (Veo 3.1/3.0 only)
Sets the random seed for generation. Doesn’t guarantee determinism but improves consistency.
-
referenceImages object[] (Veo 3.1 only)
Provides up to 3 reference images to guide the video’s content or style.
-
enhancePrompt boolean (Veo 2 only)
Enable or disable the prompt rewriter. Enabled by default. For Veo 3.1/3.0, the prompt enhancer is always on.
Speech Models
Section titled “Speech Models”The Google GenAI plugin provides access to text-to-speech capabilities through Gemini TTS models. These models can convert text into natural-sounding speech for various applications.
Available Models
Section titled “Available Models”gemini-2.5-flash-preview-tts- Flash model with TTSgemini-2.5-pro-preview-tts- Pro model with TTS
Basic Usage
To convert text to single-speaker audio, set the response modality to “AUDIO”, and pass a speechConfig object with voiceConfig set. You’ll need to choose a voice name from the prebuilt output voices.
The plugin returns raw PCM data, which can then be converted to a standard format like WAV.
import wav from 'wav';import { Buffer } from 'node:buffer';
async function saveWavFile(filename: string, pcmData: Buffer, sampleRate = 24000) { return new Promise((resolve, reject) => { const writer = new wav.FileWriter(filename, { channels: 1, sampleRate, bitDepth: 16, }); writer.on('finish', resolve); writer.on('error', reject); writer.write(pcmData); writer.end(); });}
const response = await ai.generate({ model: googleAI.model('gemini-2.5-flash-preview-tts'), config: { responseModalities: ['AUDIO'], speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Algenib' }, }, }, }, prompt: 'Say that Genkit is an amazing AI framework',});
if (response.media?.url) { const data = response.media.url.split(',')[1]; if (data) { const pcmData = Buffer.from(data, 'base64'); await saveWavFile('output.wav', pcmData); }}Multi-Speaker
You can generate audio with multiple speakers, each with their own voice. The model automatically detects speaker labels in the text (like “Speaker1:” and “Speaker2:”) and applies the corresponding voice to each speaker’s lines.
const { media } = await ai.generate({ model: googleAI.model('gemini-2.5-flash-preview-tts'), prompt: ` Speaker A: Hello, how are you today? Speaker B: I am doing great, thanks for asking! `, config: { responseModalities: ['AUDIO'], speechConfig: { multiSpeakerVoiceConfig: { speakerVoiceConfigs: [ { speaker: 'Speaker A', voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Puck' } }, }, { speaker: 'Speaker B', voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Kore' } }, }, ], }, }, },});The following configuration options are available for speech generation:
-
speechConfig object
-
voiceConfig object
Defines the voice configuration for a single speaker.
-
prebuiltVoiceConfig object
-
voiceName string
The name of the voice to use. Options:
Puck,Charon,Kore,Fenrir,Aoede(and others). -
speakingRate number
Controls the speed of speech. Range:
0.25to4.0, default is1.0. -
pitch number
Adjusts the pitch of the voice. Range:
-20.0to20.0, default is0.0. -
volumeGainDb number
Controls the volume. Range:
-96.0to16.0, default is0.0.
-
-
-
multiSpeakerVoiceConfig object
Defines the voice configuration for multiple speakers.
-
speakerVoiceConfigs array
A list of voice configurations for each speaker.
-
speaker string
The name of the speaker (e.g., “Speaker A”) as used in the prompt.
-
voiceConfig object
The voice configuration for this speaker. See
voiceConfigabove.
-
-
-
Speech Emphasis
You can use markdown-style formatting in your prompt to add emphasis:
- Bold text (
**like this**) for stronger emphasis. - Italic text (
*like this*) for moderate emphasis.
prompt: 'Genkit is an **amazing** Gen AI *library*!';TTS models automatically detect the input language. Supported languages include en-US, fr-FR, de-DE, es-US, ja-JP, ko-KR, pt-BR, zh-CN, and more.
The Google Generative AI plugin provides interfaces to Google’s Gemini models through the Gemini API.
Configuration
Section titled “Configuration”To use this plugin, import the googlegenai package and pass
googlegenai.GoogleAI to WithPlugins() in the Genkit initializer:
import "github.com/firebase/genkit/go/plugins/googlegenai"g := genkit.Init(context.Background(), genkit.WithPlugins(&googlegenai.GoogleAI{}))The plugin requires an API key for the Gemini API, which you can get from Google AI Studio.
Configure the plugin to use your API key by doing one of the following:
-
Set the
GEMINI_API_KEYenvironment variable to your API key. -
Specify the API key when you initialize the plugin:
genkit.WithPlugins(&googlegenai.GoogleAI{APIKey: "YOUR_API_KEY"})However, don’t embed your API key directly in code! Use this feature only in conjunction with a service like Cloud Secret Manager or similar.
Generative models
Section titled “Generative models”To get a reference to a supported model, specify its identifier to googlegenai.GoogleAIModel:
model := googlegenai.GoogleAIModel(g, "gemini-2.5-flash")Alternatively, you may create a ModelRef which pairs the model name with its config:
modelRef := googlegenai.GoogleAIModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{ Temperature: genai.Ptr[float32](0.5), MaxOutputTokens: genai.Ptr[int32](500), // Other configuration...})Model references have a Generate() method that calls the Google API:
resp, err := genkit.Generate(ctx, g, ai.WithModel(modelRef), ai.WithPrompt("Tell me a joke."))if err != nil { return err}
log.Println(resp.Text())See Generating content with AI models for more information.
Embedding models
Section titled “Embedding models”To get a reference to a supported embedding model, specify its identifier to googlegenai.GoogleAIEmbedder:
embeddingModel := googlegenai.GoogleAIEmbedder(g, "text-embedding-004")Embedder references have an Embed() method that calls the Google AI API:
resp, err := genkit.Embed(ctx, g, ai.WithEmbedder(embeddingModel), ai.WithTextDocs(userInput))if err != nil { return err}See Retrieval-augmented generation (RAG) for more information.
Next Steps
Section titled “Next Steps”- Learn about generating content to understand how to use these models effectively
- Explore creating flows to build structured AI workflows
- To use the Gemini API at enterprise scale see the Vertex AI plugin
The genkit-plugin-google-genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API using API key authentication.
The plugin supports a wide range of capabilities:
- Language Models: Gemini models for text generation, reasoning, and multimodal tasks
- Embedding Models: Text and multimodal embeddings
- Image Models: Imagen for generation and Gemini for image analysis
- Video Models: Veo for video generation
- Speech Models: Gemini TTS for text-to-speech generation
Installation
Section titled “Installation”uv add genkit-plugin-google-genaiConfiguration
Section titled “Configuration”from genkit import Genkitfrom genkit.plugins.google_genai import GoogleAI
ai = Genkit( plugins=[GoogleAI()], model='googleai/gemini-2.5-flash',)Authentication
Section titled “Authentication”Requires a Gemini API Key from Google AI Studio. Provide via:
- Environment variable: Set
GEMINI_API_KEY - Plugin configuration: Pass
api_keywhen initializing the plugin:
ai = Genkit( plugins=[GoogleAI(api_key='your-api-key')],)Language Models
Section titled “Language Models”Available Models
Section titled “Available Models”Gemini 3 Series - Latest experimental models:
gemini-3-pro-preview- Most capable model for complex tasksgemini-3-flash-preview- Fast and intelligent for high-volume tasks
Gemini 2.5 Series - Stable models with advanced reasoning:
gemini-2.5-pro- Most capable stable modelgemini-2.5-flash- Fast and efficient for most use casesgemini-2.5-flash-lite- Lightweight version for simple tasks
Basic Usage
Section titled “Basic Usage”from genkit import Genkitfrom genkit.plugins.google_genai import GoogleAI
ai = Genkit( plugins=[GoogleAI()], model='googleai/gemini-2.5-flash',)
response = await ai.generate( prompt='Explain how neural networks learn in simple terms.',)print(response.text)Structured Output
Section titled “Structured Output”Gemini models support structured output generation using Pydantic schemas:
from pydantic import BaseModel, Fieldfrom genkit import Output
class Character(BaseModel): """An RPG character.""" name: str = Field(description='Character name') bio: str = Field(description='Character backstory') age: int = Field(description='Character age')
response = await ai.generate( prompt='Generate a profile for a fictional fantasy character', output=Output(schema=Character),)print(response.output) # Character instanceThinking and Reasoning
Section titled “Thinking and Reasoning”Gemini 2.5+ models use an internal thinking process for complex reasoning:
response = await ai.generate( prompt='Solve this logic puzzle: ...', config={ 'thinking_config': { 'include_thoughts': True, } },)print(response.text)Google Search Grounding
Section titled “Google Search Grounding”Enable Google Search to provide answers with current information:
response = await ai.generate( prompt='What are the top tech news stories this week?', config={'google_search_retrieval': True},)print(response.text)Safety Settings
Section titled “Safety Settings”Configure safety settings to control content filtering:
response = await ai.generate( prompt='Your prompt here', config={ 'safety_settings': [ { 'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'BLOCK_MEDIUM_AND_ABOVE', }, { 'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'BLOCK_MEDIUM_AND_ABOVE', }, ], },)Multimodal Input
Section titled “Multimodal Input”Image Understanding
Section titled “Image Understanding”import base64from genkit import Part, TextPart, MediaPart, Media
# From filewith open('image.jpg', 'rb') as f: image_data = base64.b64encode(f.read()).decode()
response = await ai.generate( prompt=[ Part(root=TextPart(text='Describe what is in this image')), Part(root=MediaPart(media=Media(url=f'data:image/jpeg;base64,{image_data}', content_type='image/jpeg'))), ],)Video Understanding
Section titled “Video Understanding”from genkit import Part, TextPart, MediaPart, Media
response = await ai.generate( prompt=[ Part(root=TextPart(text='What happens in this video?')), Part(root=MediaPart(media=Media(url='https://example.com/video.mp4', content_type='video/mp4'))), ],)Embedding Models
Section titled “Embedding Models”Available Models
Section titled “Available Models”gemini-embedding-001- Latest Gemini embedding model (3072 dimensions)text-embedding-004- Text embedding model (768 dimensions)
embeddings = await ai.embed( embedder='googleai/text-embedding-004', content='Machine learning models process data to make predictions.',)print(embeddings)Image Models
Section titled “Image Models”Available Models
Section titled “Available Models”Imagen 4 Series:
imagen-4.0-generate-001- Standard qualityimagen-4.0-ultra-generate-001- Ultra-high qualityimagen-4.0-fast-generate-001- Fast generation
Imagen 3 Series:
imagen-3.0-generate-002
response = await ai.generate( model='googleai/imagen-3.0-generate-002', prompt='A serene Japanese garden with cherry blossoms and a koi pond.', config={ 'number_of_images': 4, 'aspect_ratio': '16:9', },)
# Access generated imageif response.message and response.message.content: media_part = response.message.content[0] print(f'Generated image: {media_part.root.media.url}')Video Models (Veo)
Section titled “Video Models (Veo)”Veo models generate videos from text prompts using the background model pattern (long-running operations that can take minutes).
Available Models
Section titled “Available Models”veo-2.0-generate-001- Veo 2.0veo-3.0-generate-001- Veo 3.0veo-3.1-generate-001- Veo 3.1 with native audio
import asynciofrom genkit.plugins.google_genai import VeoVersion
# Start video generation (returns an Operation)response = await ai.generate( model=f'googleai/{VeoVersion.VEO_2_0}', prompt='A majestic dragon soaring over a mystical forest at dawn.', config={ 'aspect_ratio': '16:9', },)
# Video generation returns an operation that needs to be polledoperation = response.operationif operation: # Poll until complete while not operation.done: operation = await ai.check_operation(operation) await asyncio.sleep(5) # Wait between polls
if operation.error: print(f'Error: {operation.error.message}') else: # Extract video from result video = operation.output.message.content[0] print(f'Video URL: {video.root.media.url}')Configuration Options:
aspect_ratio:"16:9"or"9:16"negative_prompt: Text to discourage in generationperson_generation:"dont_allow","allow_adult","allow_all"duration_seconds: Video length (Veo 2 only, 5-8 seconds)
Speech Models (TTS)
Section titled “Speech Models (TTS)”Gemini TTS models convert text to natural-sounding speech.
Available Models
Section titled “Available Models”gemini-2.5-flash-preview-tts- Flash model with TTSgemini-2.5-pro-preview-tts- Pro model with TTS
response = await ai.generate( model='googleai/gemini-2.5-flash-preview-tts', prompt='Say that Genkit is an amazing AI framework', config={ 'speech_config': { 'voice_config': { 'prebuilt_voice_config': { 'voice_name': 'Kore', } } } },)
# Extract audioif response.message and response.message.content: audio_part = response.message.content[0] audio_data = audio_part.root.media.url print(f'Audio generated: {audio_data[:50]}...')Available Voices: Puck, Charon, Kore, Fenrir, Aoede, Zephyr, Algenib, and more.
Voice Configuration Options:
voice_name: Name of the prebuilt voicespeaking_rate: Speed of speech (0.25 to 4.0)pitch: Voice pitch (-20.0 to 20.0)volume_gain_db: Volume (-96.0 to 16.0)
Next Steps
Section titled “Next Steps”- Learn about generating content to understand how to use these models effectively
- Explore creating flows to build structured AI workflows
- To use the Gemini API at enterprise scale see the Vertex AI plugin
The genkit_google_genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API.
Installation
Section titled “Installation”dart pub add genkit_google_genaiConfiguration
Section titled “Configuration”To use the Google Gemini API, you need an API key.
import 'package:genkit/genkit.dart';import 'package:genkit_google_genai/genkit_google_genai.dart';
void main() { final ai = Genkit( plugins: [ googleAI(apiKey: 'YOUR_API_KEY'), // Optional if GEMINI_API_KEY env var is set ], );}Authentication
Section titled “Authentication”Requires a Gemini API Key, which you can get from Google AI Studio.
- Environment variables: Set
GEMINI_API_KEY - Plugin configuration: Pass
apiKeywhen initializing the plugin (shown above) - Per-request: Override the API key for specific requests in the config:
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'Your prompt here', config: GeminiOptions( apiKey: 'different-api-key', // Use a different API key for this request ),);Language Models
Section titled “Language Models”You can create models that call the Google Generative AI API. The models support all standard Genkit features including tool calls, streaming, and multimodal input.
Available Models
Section titled “Available Models”Gemini 3 Series - Latest experimental models with state-of-the-art reasoning:
gemini-3-pro-previewgemini-3-flash-previewgemini-3-pro-image-preview
Gemini 2.5 Series - Latest stable models:
gemini-2.5-progemini-2.5-flashgemini-2.5-flash-lite
Gemma 3 Series - Open models:
gemma-3-27b-itgemma-3-12b-itgemma-3-4b-itgemma-3-1b-it
Basic Usage
Section titled “Basic Usage”import 'package:genkit/genkit.dart';import 'package:genkit_google_genai/genkit_google_genai.dart';
void main() async { final ai = Genkit(plugins: [googleAI()]);
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'Explain how neural networks learn in simple terms.', );
print(response.text);}Structured Output
Section titled “Structured Output”Use the schemantic package to define strongly-typed schemas for structured output.
import 'package:genkit/genkit.dart';import 'package:genkit_google_genai/genkit_google_genai.dart';import 'package:schemantic/schemantic.dart';
// part 'character_profile.g.dart'; // Generated by build_runner
@Schema()abstract class $CharacterProfile { String get name; String get bio; int get age;}
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), outputSchema: CharacterProfile.$schema, prompt: 'Generate a profile for a fictional character',);
final profile = CharacterProfile.fromJson(response.output!);print('${profile.name} (${profile.age}): ${profile.bio}');Schema Limitations
Section titled “Schema Limitations”The Gemini API has specific limitations for JSON schemas:
- Unions:
oneOfallows only object targets. Primitive unions (e.g.,String | int) are not supported. - Validation: Regex patterns, min/max length, and other validation keywords are often ignored or may cause errors.
Thinking and Reasoning
Section titled “Thinking and Reasoning”Gemini 2.5 and newer models support “Thinking” to improve reasoning for complex tasks.
Thinking Budget (Gemini 2.5):
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-pro'), prompt: 'what is heavier, one kilo of steel or one kilo of feathers', config: GeminiOptions( thinkingConfig: ThinkingConfig( thinkingBudget: 2048, includeThoughts: true, ), ),);Multimodal Input
Section titled “Multimodal Input”Gemini models can accept various media types as input.
Video:
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'What happens in this video?', messages: [ Message( role: Role.user, content: [ MediaPart( media: Media( url: 'https://download.samplelib.com/mp4/sample-5s.mp4', contentType: 'video/mp4', ), ), ], ), ],);Audio:
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'Transcribe this audio', messages: [ Message( role: Role.user, content: [ MediaPart( media: Media( url: 'https://www2.cs.uic.edu/~i101/SoundFiles/BabyElephantWalk60.wav', contentType: 'audio/wav', ), ), ], ), ],);Safety Settings
Section titled “Safety Settings”Configure content filtering for different harm categories:
import 'package:genkit_google_genai/genkit_google_genai.dart';
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'Your prompt here', config: GeminiOptions( safetySettings: [ SafetySettings( category: 'HARM_CATEGORY_HATE_SPEECH', threshold: 'BLOCK_MEDIUM_AND_ABOVE', ), ], ),);Google Search Grounding
Section titled “Google Search Grounding”Enable Google Search to provide answers with current information.
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash'), prompt: 'What are the top tech news stories this week?', config: GeminiOptions( googleSearch: GoogleSearch(), ),);Code Execution
Section titled “Code Execution”Enable the model to write and execute Python code for calculations.
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-pro'), prompt: 'Calculate the 20th Fibonacci number', config: GeminiOptions( codeExecution: true, ),);Embedding Models
Section titled “Embedding Models”final embeddings = await ai.embedMany( embedder: googleAI.textEmbedding('text-embedding-004'), documents: [ DocumentData(content: [TextPart(text: 'Hello world')]), ],);
print(embeddings[0].embedding);Image Models
Section titled “Image Models”final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash-image'), prompt: 'A banana riding a bike',);
print(response.media);Speech Models
Section titled “Speech Models”The Google GenAI plugin supports Gemini text-to-speech models, including multi-speaker support.
import 'dart:convert';import 'dart:io';import 'package:genkit/genkit.dart';import 'package:genkit_google_genai/genkit_google_genai.dart';
final response = await ai.generate( model: googleAI.gemini('gemini-2.5-flash-preview-tts'), prompt: 'Say that Genkit is an amazing AI framework', config: GeminiTtsOptions( responseModalities: ['AUDIO'], speechConfig: SpeechConfig( voiceConfig: VoiceConfig( prebuiltVoiceConfig: PrebuiltVoiceConfig(voiceName: 'Puck'), ), ), ),);
if (response.media != null) { // Save the audio file final dataUrl = response.media!.url; final base64Data = dataUrl.split(',')[1]; final bytes = base64Decode(base64Data); await File('output.pcm').writeAsBytes(bytes);}Unsupported Features
Section titled “Unsupported Features”The following features documented in other languages are not yet fully supported in the Dart SDK:
- Context Caching: Automatic context caching is not explicitly exposed/documented for Dart yet.
- Google Maps Grounding: Not yet exposed in options.
- Files API: No helper methods for uploading files (use direct HTTP or Google Cloud libs).