Google Generative AI plugin

The Google AI plugin provides a unified interface to connect with Google’s generative AI models through the Gemini Developer API using API key authentication. The @genkit-ai/google-genai package is a drop-in replacement for the previous @genkit-ai/googleai package.

The plugin supports a wide range of capabilities:

Language Models: Gemini models for text generation, reasoning, and multimodal tasks
Embedding Models: Text and multimodal embeddings
Image Models: Imagen for generation and Gemini for image analysis
Video Models: Veo for video generation and Gemini for video understanding
Speech Models: Polyglot text-to-speech generation

Setup

Installation

npm i --save @genkit-ai/google-genai

Configuration

import { genkit } from 'genkit';

import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [
    googleAI(),
    // Or with an explicit API key:
    // googleAI({ apiKey: 'your-api-key' }),
  ],
});

Authentication

Requires a Gemini API Key, which you can get from Google AI Studio. You can provide this key in several ways:

Environment variables: Set GEMINI_API_KEY
Plugin configuration: Pass apiKey when initializing the plugin (shown above)
Per-request: Override the API key for specific requests in the config:

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Your prompt here',
  config: {
    apiKey: 'different-api-key', // Use a different API key for this request
  },
});

This per-request API key option is useful for routing specific requests to different API keys, such as for multi-tenant applications or cost tracking.

Language Models

You can create models that call the Google Generative AI API. The models support tool calls and some have multi-modal capabilities.

Available Models

Gemini 3+ Series - Latest experimental models with state-of-the-art reasoning:

gemini-3.5-flash - Fast and efficient for most use cases
gemini-3.1-pro-preview - Preview of the most capable model for complex tasks
gemini-3.1-pro-preview-customtools - Model tuned for best custom tools support
gemini-3.1-flash-image-preview - Fast and efficient image generation
gemini-3.1-flash-lite - Lightweight version for simple tasks
gemini-3.1-flash-lite-preview - Preview of the lightweight model
gemini-3-flash-preview - Fast and intelligent model for high-volume tasks
gemini-3-pro-image-preview - Supports image generation outputs

Gemini 2.5 Series - Latest stable models with advanced reasoning and multimodal capabilities:

gemini-2.5-pro - Most capable stable model for complex tasks
gemini-2.5-flash - Fast and efficient for most use cases
gemini-2.5-flash-lite - Lightweight version for simple tasks
gemini-2.5-flash-image - Supports image generation outputs

Gemma 4 Series - Open models for various use cases:

gemma-4-31b-it - Large instruction-tuned model
gemma-4-26b-a4b-it - Efficient 4-bit instruction-tuned model

Latest Aliases - Auto-updating aliases that point to the most recent versions:

gemini-pro-latest - Points to the latest Gemini Pro model
gemini-flash-latest - Points to the latest Gemini Flash model
gemini-flash-lite-latest - Points to the latest Gemini Flash Lite model

Basic Usage

import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [googleAI()],
});

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Explain how neural networks learn in simple terms.',
});

console.log(response.text);

Model Configuration

You can provide configuration options to tailor the model’s behavior, such as specifying the serviceTier.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-lite-latest'),
  prompt: 'Explain how neural networks learn in simple terms.',
  config: {
    serviceTier: 'flex', // Can be 'standard', 'flex', or 'priority'
  },
});

Structured Output

import { z } from 'genkit';

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  output: {
    schema: z.object({
      name: z.string(),
      bio: z.string(),
      age: z.number(),
    }),
  },
  prompt: 'Generate a profile for a fictional character',
});

console.log(response.output);

Schema Limitations

The Gemini API relies on a specific subset of the OpenAPI 3.0 standard. When defining Zod schemas for structured output, keep the following limitations in mind:

Supported Features

Objects & Arrays: Standard object properties and array items.
Enums: Fully supported (z.enum).
Nullable: Supported via z.nullable() (mapped to nullable: true).

Critical Limitations

Unions (z.union): Complex unions are often problematic. The API has specific handling for anyOf but may reject ambiguous or complex oneOf structures. Prefer using a single object with optional fields or distinct tool definitions over complex unions.
Validation Keywords: Keywords like pattern, minLength, maxLength, minItems, and maxItems are not supported by the Gemini API’s constrained decoding. Including them may result in 400 InvalidArgument errors or them being ignored.
Recursion: Recursive schemas are generally not supported.
Complexity: Deeply nested schemas or schemas with hundreds of properties may trigger complexity limits.

Best Practices

Keep schemas simple and flat where possible.
Use property descriptions (.describe()) to guide the model instead of complex validation rules (e.g., “String must be an email” instead of a regex pattern).
If you need strict validation (e.g., regex), perform it in your application code after receiving the structured response.

Thinking and Reasoning

Gemini 2.5 and newer models (as well as Gemma 4) use an internal thinking process that improves reasoning for complex tasks.

Thinking Level (Gemini 3.0+ and Gemma 4):

const response = await ai.generate({
  model: googleAI.model('gemini-3.1-pro-preview'),
  prompt: 'what is heavier, one kilo of steel or one kilo of feathers',
  config: {
    thinkingConfig: {
      thinkingLevel: 'HIGH', // Or 'MINIMAL', 'LOW', or 'MEDIUM'
      includeThoughts: true, // Include thought summaries
    },
  },
});

Thinking Budget (Gemini 2.5):

const response = await ai.generate({
  model: googleAI.model('gemini-2.5-pro'),
  prompt: 'what is heavier, one kilo of steel or one kilo of feathers',
  config: {
    thinkingConfig: {
      thinkingBudget: 8192, // Number of thinking tokens
      includeThoughts: true, // Include thought summaries
    },
  },
});

if (response.reasoning) {
  console.log('Reasoning:', response.reasoning);
}

Context Caching

Gemini 2.5 and newer models automatically cache common content prefixes (min 1024 tokens for Flash, 2048 for Pro), providing a 75% token discount on cached tokens.

// Structure prompts with consistent content at the beginning
const baseContext = `You are a helpful cook... (large context) ...`.repeat(50);

// First request - content will be cached
await ai.generate({
  model: googleAI.model('gemini-pro-latest'),
  prompt: `${baseContext}\n\nTask 1...`,
});

// Second request with same prefix - eligible for cache hit
await ai.generate({
  model: googleAI.model('gemini-pro-latest'),
  prompt: `${baseContext}\n\nTask 2...`,
});

Safety Settings

You can configure safety settings to control content filtering for different harm categories:

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Your prompt here',
  config: {
    safetySettings: [
      {
        category: 'HARM_CATEGORY_HATE_SPEECH',
        threshold: 'BLOCK_MEDIUM_AND_ABOVE',
      },
      {
        category: 'HARM_CATEGORY_DANGEROUS_CONTENT',
        threshold: 'BLOCK_MEDIUM_AND_ABOVE',
      },
    ],
  },
});

Available harm categories:

HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_DANGEROUS_CONTENT
HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_SEXUALLY_EXPLICIT

Available thresholds:

BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
BLOCK_ONLY_HIGH
BLOCK_NONE

Accessing Safety Ratings:

Safety ratings are typically only included when content is flagged. You can access them from the response custom metadata:

const geminiResponse = response.custom as any;
const candidateSafetyRatings = geminiResponse?.candidates?.[0]?.safetyRatings;
const promptSafetyRatings = geminiResponse?.promptFeedback?.safetyRatings;

Deep Research

Deep Research models can perform extensive research tasks over multiple turns, using specialized workflows.

Available Models:

deep-research-pro-preview-12-2025
deep-research-preview-04-2026
deep-research-max-preview-04-2026

Usage:

let { operation } = await ai.generate({
  model: googleAI.model('deep-research-preview-04-2026'),
  prompt: 'Analyze global semiconductor market trends. Include graphics showing market share changes.',
  config: {
    visualization: 'AUTO',
  },
});

if (!operation) throw new Error('No operation returned');

// Deep research operations are long-running and need to be polled
while (!operation.done) {
  operation = await ai.checkOperation(operation);
  await new Promise((resolve) => setTimeout(resolve, 30000)); // Check every 30 seconds
}

console.log(operation.output?.message?.content);

You can also use previousInteractionId for multi-turn research, set collaborativePlanning: true to get a research plan first, or use ai.cancelOperation(operation) to halt an ongoing research task.

Google Search Grounding

Enable Google Search to provide answers with current information and verifiable sources.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'What are the top tech news stories this week?',
  config: {
    googleSearchRetrieval: true,
  },
});

// Access grounding metadata
const groundingMetadata = (response.custom as any)?.candidates?.[0]
  ?.groundingMetadata;
if (groundingMetadata) {
  console.log('Sources:', groundingMetadata.groundingChunks);
}

The following configuration options are available for Google Search grounding:

googleSearchRetrieval object | boolean

Enables Google Search grounding. Can be a boolean (true) or a configuration object. Example: { dynamicRetrievalConfig: { mode: 'MODE_DYNAMIC', dynamicThreshold: 0.7 } }
- dynamicRetrievalConfig object
  - mode string The retrieval mode (e.g., 'MODE_DYNAMIC').
  - dynamicThreshold number The threshold for dynamic retrieval (e.g., 0.7).

Response Metadata:

webSearchQueries string[]

Array of search queries used to retrieve information. Example: ["What's the weather in Chicago this weekend?"]
searchEntryPoint object

Contains the main search result content formatted for display.
- renderedContent string The HTML content of the search result.
groundingSupports object[]

Links specific response segments to supporting search result chunks.
- segment object
  - text string The text of the segment.
- groundingChunkIndices number[] Indices of the chunks that support this segment.
- confidenceScores number[] Confidence scores for each supporting chunk.

File Search Grounding

Ground the model’s responses using documents stored in Google’s File Search API.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: "What is the character's name in the story?",
  config: {
    fileSearch: {
      fileSearchStoreNames: ['fileSearchStores/my-store-123'],
      metadataFilter: 'author=foo',
    },
  },
});

Google Maps Grounding

Enable Google Maps to provide location-aware responses.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Find coffee shops near the CN Tower',
  config: {
    tools: [{ googleMaps: {} }],
  },
});

You can also request a widget token to render an interactive map:

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Show me a map of San Francisco',
  config: {
    tools: [{ googleMaps: { enableWidget: true } }],
  },
});

The following configuration options are available for Google Maps grounding:

googleMaps object

Enables Google Maps grounding. Example: { enableWidget: true }
- enableWidget boolean Whether to include a widget token in the response.
retrievalConfig object

Additional configuration for provider tools. Can improve relevance by providing location context for Google Maps. Example: { retrievalConfig: { latLng: { latitude: 37.7749, longitude: -122.4194 } } }
- retrievalConfig object
  - latLng object
    - latitude number The latitude in degrees.
    - longitude number The longitude in degrees.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Describe some sights near me',
  config: {
    tools: [{ googleMaps: {} }],
    retrievalConfig: {
      latLng: {
        latitude: 43.0896,
        longitude: -79.0849,
      },
    },
  },
});

URL Context

Provide specific URLs for the model to analyze:

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Summarize this page',
  config: {
    tools: [{ urlContext: {} }],
  },
});

When using urlContext, the model will fetch content from URLs found in your prompt.

Combine built-in tools and Genkit tools

You can combine Gemini built-in tools and Genkit tools defined using ai.defineTool. Built-in tools are specified in the tools property of the config object, while Genkit tools are provided in the top-level tools property.

// const getWeather = ai.defineTool(...);

const response = await ai.generate({
  model: googleAI.model('gemini-3-flash-preview'),
  prompt:
    'What is the southernmost city in Canada? What is the weather like there today?',
  config: {
    tools: [{ googleSearch: {} }], // Built-in tools are defined in the config
    toolConfig: {
      includeServerSideToolInvocations: true,
    },
  },
  tools: [getWeather], // Genkit tools are defined in top-level tools
});

Code Execution

Enable the model to write and execute Python code for calculations and logic.

const response = await ai.generate({
  model: googleAI.model('gemini-pro-latest'),
  prompt: 'Calculate the 20th Fibonacci number',
  config: {
    codeExecution: true,
  },
});

The following configuration options are available for code execution:

codeExecution boolean

Enables code execution for reasoning and calculations. Example: true

Generating Text and Images (Nano Banana)

Some Gemini models (like gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, gemini-2.5-flash-image) can output images natively alongside text:

const response = await ai.generate({
  model: googleAI.model('gemini-3.1-flash-image-preview'),
  prompt: 'Create a picture of a futuristic city and describe it',
  config: {
    responseModalities: ['IMAGE', 'TEXT'],
  },
});

// Extract image
if (response.image) {
  console.log('Image:', response.image);
}

// Extract text
if (response.text) {
  console.log('Text:', response.text);
}

// Extract all messages including text and images
if (response.messages) {
  console.log('Messages:', response.messages);
}

The following configuration options are available for Gemini image generation:

responseModalities string[]

Specifies the output modalities. Options: ['TEXT', 'IMAGE'], ['IMAGE'] Default: ['TEXT', 'IMAGE']
imageConfig object
- aspectRatio string
  
  Aspect ratio of the generated images. Not all models support all aspect ratios. Options: '1:1', '1:4', '1:8', '2:3', '3:2', '3:4', '4:1', '4:3', '4:5', '5:4', '8:1', '9:16', '16:9', '21:9' Default: '1:1'
- imageSize string
  
  Resolution of the generated image. Supported by Gemini 3+ image models only. Options: '1K', '2K', '4K' Default: '1K'

Multimodal Input Capabilities

Video Understanding

Gemini models can process videos to describe content, answer questions, and refer to timestamps (in MM:SS format).

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: [
    { text: 'What happens at 00:05?' },
    {
      media: {
        contentType: 'video/mp4',
        url: 'https://youtube.com/watch?v=...',
      },
    },
  ],
});

Video Processing Details:

Sampling: 1 frame per second (default)
Context: 2M context models can handle up to 2 hours of video.
Inputs: Up to 10 videos per request (Gemini 2.5+).

Image Understanding

Gemini models can reason about images passed as inline data or URLs.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: [
    { text: 'Describe what is in this image' },
    { media: { url: 'https://example.com/image.jpg' } },
  ],
});

Audio Understanding

Gemini models can process audio files to transcribe speech text, answer questions about the audio content, or summarize recordings.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: [
    { text: 'Transcribe this audio clip' },
    {
      media: { contentType: 'audio/mp3', url: 'https://example.com/audio.mp3' },
    },
  ],
});

PDF Support

Gemini models can process PDF documents to extract information, summarize content, or answer questions based on the visual layout and text.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: [
    { text: 'Summarize this document' },
    {
      media: {
        contentType: 'application/pdf',
        url: 'https://example.com/doc.pdf',
      },
    },
  ],
});

File Inputs and Gemini Files API

Gemini models support various file types. For small files, you can use inline data. For larger files (up to 2GB), use the Gemini Files API.

Using Files API:

To use large files, you must upload them using the Google GenAI SDK or other supported methods. Genkit does not provide file management helpers, but you can pass the file URI to Genkit for generation:

import { GoogleGenAI } from '@google/genai';
// ... init genaiClient ...

// Upload file
const uploadedFile = await genaiClient.files.upload({
  file: 'path/to/video.mp4',
  config: { mimeType: 'video/mp4' },
});

// Use in generation
const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: [
    { text: 'Describe this video' },
    {
      media: {
        contentType: uploadedFile.mimeType,
        url: uploadedFile.uri,
      },
    },
  ],
});

Embedding Models

Available Models

gemini-embedding-2-preview — Latest embedding model with 3072 dimensions; supports multimodal input (text, images, video).
gemini-embedding-2 — Latest stable embedding model with 3072 dimensions; supports multimodal input.
gemini-embedding-001 — Default 3072 dimensions; set outputDimensionality in embed params (for example 768, 1536, or 3072) when you want a shorter vector.

Usage

const embeddings = await ai.embed({
  embedder: googleAI.embedder('gemini-embedding-001'),
  content: 'Machine learning models process data to make predictions.',
});

console.log(embeddings);

// Optional: request a shorter embedding (size indexes to match)
const compact = await ai.embed({
  embedder: googleAI.embedder('gemini-embedding-001'),
  content: 'Machine learning models process data to make predictions.',
  options: { outputDimensionality: 768 },
});

Image Models

Available Models

Imagen 4 Series - Latest generation with improved quality:

imagen-4.0-generate-001 - Standard quality
imagen-4.0-ultra-generate-001 - Ultra-high quality
imagen-4.0-fast-generate-001 - Fast generation

Usage

const response = await ai.generate({
  model: googleAI.model('imagen-4.0-generate-001'),
  prompt: 'A serene Japanese garden with cherry blossoms and a koi pond.',
  config: {
    numberOfImages: 4,
    aspectRatio: '16:9',
    personGeneration: 'allow_adult',
  },
});

const generatedImage = response.media;

Configuration Options:

numberOfImages number

Number of images to generate (1 to 4). Default: 1
aspectRatio string

Aspect ratio of the generated images. Options: '1:1', '3:4', '4:3', '9:16', '16:9' Default: '1:1'
personGeneration string

Policy for generating people. Options: 'dont_allow', 'allow_adult', 'allow_all'

Video Models

The Google AI plugin provides access to video generation capabilities through the Veo models. These models can generate videos from text prompts or manipulate existing images to create dynamic video content.

Available Models

Veo 3.1 Series - Latest generation with native audio and high fidelity:

veo-3.1-generate-preview - High-quality video and audio generation
veo-3.1-fast-generate-preview - Fast generation with high quality
veo-3.1-lite-generate-preview - Lightweight, fast video generation

Veo 3.0 Series:

veo-3.0-generate-001
veo-3.0-fast-generate-001

Veo 2.0 Series:

veo-2.0-generate-001

Usage

Text-to-Video

To generate a video from a text prompt using the Veo model:

import { googleAI } from '@genkit-ai/google-genai';
import * as fs from 'fs';
import { Readable } from 'stream';
import { genkit, MediaPart } from 'genkit';

const ai = genkit({
  plugins: [googleAI()],
});

ai.defineFlow('text-to-video-veo', async () => {
  let { operation } = await ai.generate({
    model: googleAI.model('veo-3.0-fast-generate-001'),
    prompt: 'A majestic dragon soaring over a mystical forest at dawn.',
    config: {
      aspectRatio: '16:9',
    },
  });

  if (!operation) {
    throw new Error('Expected the model to return an operation');
  }

  // Wait until the operation completes.
  while (!operation.done) {
    operation = await ai.checkOperation(operation);
    // Sleep for 5 seconds before checking again.
    await new Promise((resolve) => setTimeout(resolve, 5000));
  }

  if (operation.error) {
    throw new Error('failed to generate video: ' + operation.error.message);
  }

  const video = operation.output?.message?.content.find((p) => !!p.media);
  if (!video) {
    throw new Error('Failed to find the generated video');
  }
  await downloadVideo(video, 'output.mp4');
});

async function downloadVideo(video: MediaPart, path: string) {
  const fetch = (await import('node-fetch')).default;
  // Add API key before fetching the video.
  const videoDownloadResponse = await fetch(
    `${video.media!.url}&key=${process.env.GEMINI_API_KEY}`,
  );
  if (
    !videoDownloadResponse ||
    videoDownloadResponse.status !== 200 ||
    !videoDownloadResponse.body
  ) {
    throw new Error('Failed to fetch video');
  }

  Readable.from(videoDownloadResponse.body).pipe(fs.createWriteStream(path));
}

Video Generation from Photo Reference

To use a photo as reference for the video using the Veo model (e.g. to make a static photo move), you can provide an image as part of the prompt.

const startingImage = fs.readFileSync('photo.jpg', { encoding: 'base64' });

let { operation } = await ai.generate({
  model: googleAI.model('veo-2.0-generate-001'),
  prompt: [
    {
      text: 'make the subject in the photo move',
    },
    {
      media: {
        contentType: 'image/jpeg',
        url: `data:image/jpeg;base64,${startingImage}`,
      },
    },
  ],
  config: {
    durationSeconds: 5,
    aspectRatio: '9:16',
    personGeneration: 'allow_adult',
  },
});

Video Extension

You can extend an existing Veo-generated video by providing it as input to another generation request:

let { operation } = await ai.generate({
  model: googleAI.model('veo-3.1-generate-preview'),
  prompt: [
    { text: 'Track the butterfly into the garden as it lands on a flower.' },
    {
      media: {
        contentType: 'video/mp4',
        url: previousVeoVideo.media.url,
      },
    },
  ],
  config: {
    aspectRatio: '16:9', // Must match the original video
  },
});

The Veo models support various configuration options:

negativePrompt string

Text that describes anything you want to discourage the model from generating.
aspectRatio string

Changes the aspect ratio of the generated video.
- "16:9"
- "9:16"
personGeneration string

Allow the model to generate videos of people.
- Text-to-video generation:
  - "allow_all": Generate videos that include adults and children. Currently the only available value for Veo 3.
  - "dont_allow" (Veo 2 only): Don’t allow people or faces.
  - "allow_adult" (Veo 2 only): Generate videos with adults, but not children.
- Image-to-video generation (Veo 2 only):
  - "dont_allow": Don’t allow people or faces.
  - "allow_adult": Generate videos with adults, but not children.
durationSeconds number

Length of each output video in seconds (5 to 8). Not configurable for Veo 3.1/3.0 (defaults to 8 seconds).
resolution string (Veo 3.1 only)

Resolution of the generated video.
- "720p" (default)
- "1080p" (Available for 16:9 aspect ratio)
- "4k" (Veo 3.1 only)
seed number (Veo 3.1/3.0 only)

Sets the random seed for generation. Doesn’t guarantee determinism but improves consistency.
referenceImages object[] (Veo 3.1 only)

Provides up to 3 reference images to guide the video’s content or style.
enhancePrompt boolean (Veo 2 only)

Enable or disable the prompt rewriter. Enabled by default. For Veo 3.1/3.0, the prompt enhancer is always on.

Music Models (Lyria)

The Google AI plugin provides access to music and audio generation capabilities through the Lyria models.

Available Models

lyria-3-pro-preview - High-quality music generation
lyria-3-clip-preview - Fast generation for short music clips

Usage

const response = await ai.generate({
  model: googleAI.model('lyria-3-pro-preview'),
  prompt: 'A cheerful acoustic folk song with guitar and harmonica.',
});

// Access the generated audio media
const audioMedia = response.media;

Speech Models

The Google GenAI plugin provides access to text-to-speech capabilities through Gemini TTS models. These models can convert text into natural-sounding speech for various applications.

Available Models

gemini-3.1-flash-tts-preview - Gemini 3.1 Flash model with TTS
gemini-2.5-flash-preview-tts - Flash model with TTS
gemini-2.5-pro-preview-tts - Pro model with TTS

Usage

Basic Usage

To convert text to single-speaker audio, set the response modality to “AUDIO”, and pass a speechConfig object with voiceConfig set. You’ll need to choose a voice name from the prebuilt output voices.

The plugin returns raw PCM data, which can then be converted to a standard format like WAV.

import wav from 'wav';
import { Buffer } from 'node:buffer';

async function saveWavFile(
  filename: string,
  pcmData: Buffer,
  sampleRate = 24000,
) {
  return new Promise((resolve, reject) => {
    const writer = new wav.FileWriter(filename, {
      channels: 1,
      sampleRate,
      bitDepth: 16,
    });
    writer.on('finish', resolve);
    writer.on('error', reject);
    writer.write(pcmData);
    writer.end();
  });
}

const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash-preview-tts'),
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: 'Algenib' },
      },
    },
  },
  prompt: 'Say that Genkit is an amazing AI framework',
});

if (response.media?.url) {
  const data = response.media.url.split(',')[1];
  if (data) {
    const pcmData = Buffer.from(data, 'base64');
    await saveWavFile('output.wav', pcmData);
  }
}

Multi-Speaker

You can generate audio with multiple speakers, each with their own voice. The model automatically detects speaker labels in the text (like “Speaker1:” and “Speaker2:”) and applies the corresponding voice to each speaker’s lines.

const { media } = await ai.generate({
  model: googleAI.model('gemini-2.5-flash-preview-tts'),
  prompt: `
    Speaker A: Hello, how are you today?
    Speaker B: I am doing great, thanks for asking!
  `,
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      multiSpeakerVoiceConfig: {
        speakerVoiceConfigs: [
          {
            speaker: 'Speaker A',
            voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Puck' } },
          },
          {
            speaker: 'Speaker B',
            voiceConfig: { prebuiltVoiceConfig: { voiceName: 'Kore' } },
          },
        ],
      },
    },
  },
});

The following configuration options are available for speech generation:

speechConfig object
- voiceConfig object
  
  Defines the voice configuration for a single speaker.
  - prebuiltVoiceConfig object
    - voiceName string
      
      The name of the voice to use. Options: Puck, Charon, Kore, Fenrir, Aoede (and others).
    - speakingRate number
      
      Controls the speed of speech. Range: 0.25 to 4.0, default is 1.0.
    - pitch number
      
      Adjusts the pitch of the voice. Range: -20.0 to 20.0, default is 0.0.
    - volumeGainDb number
      
      Controls the volume. Range: -96.0 to 16.0, default is 0.0.
- multiSpeakerVoiceConfig object
  
  Defines the voice configuration for multiple speakers.
  - speakerVoiceConfigs array
    
    A list of voice configurations for each speaker.
    - speaker string
      
      The name of the speaker (e.g., “Speaker A”) as used in the prompt.
    - voiceConfig object
      
      The voice configuration for this speaker. See voiceConfig above.

Speech Emphasis

You can use markdown-style formatting in your prompt to add emphasis:

Bold text (**like this**) for stronger emphasis.
Italic text (*like this*) for moderate emphasis.

prompt: 'Genkit is an **amazing** Gen AI *library*!';

TTS models automatically detect the input language. Supported languages include en-US, fr-FR, de-DE, es-US, ja-JP, ko-KR, pt-BR, zh-CN, and more.