Google AI plugin

The Google GenAI plugin provides a unified interface to connect with Google’s generative AI models through the Gemini Developer API using API key authentication. It is a replacement for the previous googleAI plugin.

Installation

npm i --save @genkit-ai/google-genai

Configuration

import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [
    googleAI(),
    // Or with an explicit API key:
    // googleAI({ apiKey: 'your-api-key' }),
  ],
});

Authentication: Requires a Google AI API Key, which you can get from Google AI Studio. You can provide this key by setting the GEMINI_API_KEY or GOOGLE_API_KEY environment variables, or by passing it in the plugin configuration.

Usage Examples

Text Generation (Gemini)

import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [googleAI()],
});

const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash'),
  prompt: 'Tell me something interesting about Google AI.',
});

console.log(response.text());

Text Embedding

const embeddings = await ai.embed({
  embedder: googleAI.embedder('text-embedding-004'),
  content: 'Embed this text.',
});

Image Generation (Imagen)

const response = await ai.generate({
  model: googleAI.model('imagen-3.0-generate-002'),
  prompt: 'A beautiful watercolor painting of a castle in the mountains.',
});

const generatedImage = response.media();

Gemini API Features

The following features are available through the googleAI plugin.

Gemini Files API

You can use files uploaded to the Gemini Files API with Genkit:

import { GoogleAIFileManager } from '@google/generative-ai/server';
import { genkit } from 'genkit';
import { googleAI } from '@genkit-ai/google-genai';

const ai = genkit({
  plugins: [googleAI()],
});

const fileManager = new GoogleAIFileManager(process.env.GEMINI_API_KEY);
const uploadResult = await fileManager.uploadFile('path/to/file.jpg', {
  mimeType: 'image/jpeg',
  displayName: 'Your Image',
});

const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash'),
  prompt: [
    { text: 'Describe this image:' },
    {
      media: {
        contentType: uploadResult.file.mimeType,
        url: uploadResult.file.uri,
      },
    },
  ],
});

Video Generation (Veo) Models

The Google Generative AI plugin provides access to video generation capabilities through the Veo models. These models can generate videos from text prompts or manipulate existing images to create dynamic video content.

Basic Usage: Text-to-Video Generation

To generate a video from a text prompt using the Veo model:

import { googleAI } from '@genkit-ai/google-genai';
import * as fs from 'fs';
import { Readable } from 'stream';
import { MediaPart } from 'genkit';
import { genkit } from 'genkit';

const ai = genkit({
  plugins: [googleAI()],
});

ai.defineFlow('text-to-video-veo', async () => {
  let { operation } = await ai.generate({
    model: googleAI.model('veo-2.0-generate-001'),
    prompt: 'A majestic dragon soaring over a mystical forest at dawn.',
    config: {
      durationSeconds: 5,
      aspectRatio: '16:9',
    },
  });

  if (!operation) {
    throw new Error('Expected the model to return an operation');
  }

  // Wait until the operation completes.
  while (!operation.done) {
    operation = await ai.checkOperation(operation);
    // Sleep for 5 seconds before checking again.
    await new Promise((resolve) => setTimeout(resolve, 5000));
  }

  if (operation.error) {
    throw new Error('failed to generate video: ' + operation.error.message);
  }

  const video = operation.output?.message?.content.find((p) => !!p.media);
  if (!video) {
    throw new Error('Failed to find the generated video');
  }
  await downloadVideo(video, 'output.mp4');
});

async function downloadVideo(video: MediaPart, path: string) {
  const fetch = (await import('node-fetch')).default;
  // Add API key before fetching the video.
  const videoDownloadResponse = await fetch(`${video.media!.url}&key=${process.env.GEMINI_API_KEY}`);
  if (!videoDownloadResponse || videoDownloadResponse.status !== 200 || !videoDownloadResponse.body) {
    throw new Error('Failed to fetch video');
  }

  Readable.from(videoDownloadResponse.body).pipe(fs.createWriteStream(path));
}

Veo 3 uses the exact same API, just make sure you only use supported config options (see below).

To use the Veo 3 model, reference veo-3.0-generate-preview:

let { operation } = await ai.generate({
  model: googleAI.model('veo-3.0-generate-preview'),
  prompt: 'A cinematic shot of a an old car driving down a deserted road at sunset.',
});

Video Generation from Photo Reference

To use a photo as reference for the video using the Veo model (e.g. to make a static photo move), you can provide an image as part of the prompt.

const startingImage = fs.readFileSync('photo.jpg', { encoding: 'base64' });

let { operation } = await ai.generate({
  model: googleAI.model('veo-2.0-generate-001'),
  prompt: [
    {
      text: 'make the subject in the photo move',
    },
    {
      media: {
        contentType: 'image/jpeg',
        url: `data:image/jpeg;base64,${startingImage}`,
      },
    },
  ],
  config: {
    durationSeconds: 5,
    aspectRatio: '9:16',
    personGeneration: 'allow_adult',
  },
});

Configuration Options

The Veo models support various configuration options.

Veo Model Parameters

Full list of options can be found at https://ai.google.dev/gemini-api/docs/video#veo-model-parameters

negativePrompt: Text string that describes anything you want to discourage the model from generating
aspectRatio: Changes the aspect ratio of the generated video.
- "16:9": Supported in Veo 3 and Veo 2.
- "9:16": Supported in Veo 2 only (defaults to “16:9”).
personGeneration: Allow the model to generate videos of people. The following values are supported:
- Text-to-video generation:
  - "allow_all": Generate videos that include adults and children. Currently the only available personGeneration value for Veo 3.
  - "dont_allow": Veo 2 only. Don’t allow the inclusion of people or faces.
  - "allow_adult": Veo 2 only. Generate videos that include adults, but not children.
- Image-to-video generation: Veo 2 only
  - "dont_allow": Don’t allow the inclusion of people or faces.
  - "allow_adult": Generate videos that include adults, but not children.
numberOfVideos: Output videos requested
- 1: Supported in Veo 3 and Veo 2
- 2: Supported in Veo 2 only.
durationSeconds: Veo 2 only. Length of each output video in seconds, between 5 and 8. Not configurable for Veo 3, default setting is 8 seconds.
enhancePrompt: Veo 2 only. Enable or disable the prompt rewriter. Enabled by default. Not configurable for Veo 3, default prompt enhancer is always on.

Text-to-Speech (TTS) Models

The Google Genai plugin provides access to text-to-speech capabilities through Gemini TTS models. These models can convert text into natural-sounding speech for various applications.

Basic Usage

To generate audio using a TTS model:

import { googleAI } from '@genkit-ai/google-genai';
import { writeFile } from 'node:fs/promises';
import wav from 'wav'; // npm install wav && npm install -D @types/wav

const ai = genkit({
  plugins: [googleAI()],
});

const { media } = await ai.generate({
  model: googleAI.model('gemini-2.5-flash-preview-tts'),
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: 'Algenib' },
      },
    },
  },
  prompt: 'Say that Genkit is an amazing Gen AI library',
});

if (!media) {
  throw new Error('no media returned');
}
const audioBuffer = Buffer.from(media.url.substring(media.url.indexOf(',') + 1), 'base64');
// The googleAI plugin returns raw PCM data, which we convert to WAV format.
await writeFile('output.wav', await toWav(audioBuffer));

async function toWav(pcmData: Buffer, channels = 1, rate = 24000, sampleWidth = 2): Promise<string> {
  return new Promise((resolve, reject) => {
    // This code depends on `wav` npm library.
    const writer = new wav.Writer({
      channels,
      sampleRate: rate,
      bitDepth: sampleWidth * 8,
    });

    let bufs = [] as any[];
    writer.on('error', reject);
    writer.on('data', function (d) {
      bufs.push(d);
    });
    writer.on('end', function () {
      resolve(Buffer.concat(bufs).toString('base64'));
    });

    writer.write(pcmData);
    writer.end();
  });
}

Multi-speaker Audio Generation

You can generate audio with multiple speakers, each with their own voice:

const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash-preview-tts'),
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      multiSpeakerVoiceConfig: {
        speakerVoiceConfigs: [
          {
            speaker: 'Speaker1',
            voiceConfig: {
              prebuiltVoiceConfig: { voiceName: 'Algenib' },
            },
          },
          {
            speaker: 'Speaker2',
            voiceConfig: {
              prebuiltVoiceConfig: { voiceName: 'Achernar' },
            },
          },
        ],
      },
    },
  },
  prompt: `Here's the dialog:
    Speaker1: "Genkit is an amazing Gen AI library!"
    Speaker2: "I thought it was a framework."`,
});

When using multi-speaker configuration, the model automatically detects speaker labels in the text (like “Speaker1:” and “Speaker2:”) and applies the corresponding voice to each speaker’s lines.

Configuration Options

The Gemini TTS models support various configuration options:

Voice Selection

You can choose from different pre-built voices with unique characteristics:

speechConfig: {
  voiceConfig: {
    prebuiltVoiceConfig: {
      voiceName: 'Algenib' // Other options: 'Achernar', 'Ankaa', etc.
    },
  },
}

Speech Emphasis

You can use markdown-style formatting in your prompt to add emphasis:

Bold text (**like this**) for stronger emphasis
Italic text (*like this*) for moderate emphasis

Example:

prompt: 'Genkit is an **amazing** Gen AI *library*!';

Advanced Speech Parameters

For more control over the generated speech:

speechConfig: {
  voiceConfig: {
    prebuiltVoiceConfig: {
      voiceName: 'Algenib',
      speakingRate: 1.0,  // Range: 0.25 to 4.0, default is 1.0
      pitch: 0.0,         // Range: -20.0 to 20.0, default is 0.0
      volumeGainDb: 0.0,  // Range: -96.0 to 16.0, default is 0.0
    },
  },
}

speakingRate: Controls the speed of speech (higher values = faster speech)
pitch: Adjusts the pitch of the voice (higher values = higher pitch)
volumeGainDb: Controls the volume (higher values = louder)

For more detailed information about the Gemini TTS models and their configuration options, see the Google AI Speech Generation documentation.

Next Steps

Learn about generating content to understand how to use these models effectively
Explore creating flows to build structured AI workflows
To use the Gemini API at enterprise scale or leverage Vertex vector search and Model Garden, see the Vertex AI plugin

The Google Generative AI plugin provides interfaces to Google’s Gemini models through the Gemini API.

Configuration

To use this plugin, import the googlegenai package and pass googlegenai.GoogleAI to WithPlugins() in the Genkit initializer:

import "github.com/firebase/genkit/go/plugins/googlegenai"

g := genkit.Init(context.Background(), genkit.WithPlugins(&googlegenai.GoogleAI{}))

The plugin requires an API key for the Gemini API, which you can get from Google AI Studio.

Configure the plugin to use your API key by doing one of the following:

Set the GEMINI_API_KEY environment variable to your API key.
Specify the API key when you initialize the plugin:
```
genkit.WithPlugins(&googlegenai.GoogleAI{APIKey: "YOUR_API_KEY"})
```
However, don’t embed your API key directly in code! Use this feature only in conjunction with a service like Cloud Secret Manager or similar.

Usage

Generative models

To get a reference to a supported model, specify its identifier to googlegenai.GoogleAIModel:

model := googlegenai.GoogleAIModel(g, "gemini-2.5-flash")

Alternatively, you may create a ModelRef which pairs the model name with its config:

modelRef := googlegenai.GoogleAIModelRef("gemini-2.5-flash", &genai.GenerateContentConfig{
    Temperature: genai.Ptr[float32](0.5),
    MaxOutputTokens: genai.Ptr[int32](500),
    // Other configuration...
})

Model references have a Generate() method that calls the Google API:

resp, err := genkit.Generate(ctx, g, ai.WithModel(modelRef), ai.WithPrompt("Tell me a joke."))
if err != nil {
    return err
}

log.Println(resp.Text())

See Generating content with AI models for more information.

Embedding models

To get a reference to a supported embedding model, specify its identifier to googlegenai.GoogleAIEmbedder:

embeddingModel := googlegenai.GoogleAIEmbedder(g, "text-embedding-004")

Embedder references have an Embed() method that calls the Google AI API:

resp, err := genkit.Embed(ctx, g, ai.WithEmbedder(embeddingModel), ai.WithTextDocs(userInput))
if err != nil {
    return err
}

See Retrieval-augmented generation (RAG) for more information.

Next Steps

Learn about generating content to understand how to use these models effectively
Explore creating flows to build structured AI workflows
To use the Gemini API at enterprise scale see the Vertex AI plugin

The genkit-plugin-google-genai package provides the GoogleAI plugin for accessing Google’s generative AI models via the Google Gemini API (requires an API key).

Installation

pip3 install genkit-plugin-google-genai

Configuration

To use the Google Gemini API, you need an API key.

from genkit.ai import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
  plugins=[GoogleAI()],
  model='googleai/gemini-2.5-flash',
)

You will need to set GEMINI_API_KEY environment variable or you can provide the API Key directly:

ai = Genkit(
  plugins=[GoogleAI(api_key='...')]
)

Usage

Text Generation

response = await ai.generate('What should I do when I visit Melbourne?')
print(response.text)

Text Embedding

embeddings = await ai.embed(
    embedder='googleai/text-embedding-004',
    content='How many widgets do you have in stock?',
)

Image Generation

response = await ai.generate(
    model='googleai/imagen-3.0-generate-002',
    prompt='a banana riding a bicycle',
)

Next Steps

Learn about generating content to understand how to use these models effectively
Explore creating flows to build structured AI workflows
To use the Gemini API at enterprise scale see the Vertex AI plugin