<SYSTEM>Building AI Workflows: Guidance on how to generate content and interact with LLM and image models using Genkit.</SYSTEM>

# Creating persistent chat sessions

> Learn how to create persistent chat sessions in Genkit, including session basics, stateful sessions, multi-thread sessions, and session persistence.

Beta

This feature of Genkit is in **Beta,** which means it is not yet part of Genkit’s stable API. APIs of beta features may change in minor version releases.

Many of your users will have interacted with large language models for the first time through chatbots. Although LLMs are capable of much more than simulating conversations, it remains a familiar and useful style of interaction. Even when your users will not be interacting directly with the model in this way, the conversational style of prompting is a powerful way to influence the output generated by an AI model.

To support this style of interaction, Genkit provides a set of interfaces and abstractions that make it easier for you to build chat-based LLM applications.

## Before you begin

[Section titled “Before you begin”](#before-you-begin)

Before reading this page, you should be familiar with the content covered on the [Generating content with AI models](/docs/models) page.

If you want to run the code examples on this page, first complete the steps in the [Getting started](/docs/get-started) guide. All of the examples assume that you have already installed Genkit as a dependency in your project.

Note that the chat API is currently in beta and must be used from the `genkit/beta` package.

## Chat session basics

[Section titled “Chat session basics”](#chat-session-basics)

[Genkit by Example: Simple Chatbot ](https://examples.genkit.dev/chatbot-simple?utm_source=genkit.dev\&utm_content=contextlink)View a live example of a simple chatbot built with Genkit.

Here is a minimal, console-based, chatbot application:

```ts
import { genkit } from 'genkit/beta';
import { googleAI } from '@genkit-ai/googleai';


import { createInterface } from 'node:readline/promises';


const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model('gemini-2.5-flash'),
});


async function main() {
  const chat = ai.chat();
  console.log("You're chatting with Gemini. Ctrl-C to quit.\n");
  const readline = createInterface(process.stdin, process.stdout);
  while (true) {
    const userInput = await readline.question('> ');
    const { text } = await chat.send(userInput);
    console.log(text);
  }
}


main();
```

A chat session with this program looks something like the following example:

```plaintext
You're chatting with Gemini. Ctrl-C to quit.


> hi
Hi there! How can I help you today?


> my name is pavel
Nice to meet you, Pavel! What can I do for you today?


> what's my name?
Your name is Pavel! I remembered it from our previous interaction.


Is there anything else I can help you with?
```

As you can see from this brief interaction, when you send a message to a chat session, the model can make use of the session so far in its responses. This is possible because Genkit does a few things behind the scenes:

* Retrieves the chat history, if any exists, from storage (more on persistence and storage later)
* Sends the request to the model, as with `generate()`, but automatically include the chat history
* Saves the model response into the chat history

### Model configuration

[Section titled “Model configuration”](#model-configuration)

The `chat()` method accepts most of the same configuration options as `generate()`. To pass configuration options to the model:

```ts
const chat = ai.chat({
  model: googleAI.model('gemini-2.5-flash'),
  system: "You're a pirate first mate. Address the user as Captain and assist " + 'them however you can.',
  config: {
    temperature: 1.3,
  },
});
```

## Stateful chat sessions

[Section titled “Stateful chat sessions”](#stateful-chat-sessions)

In addition to persisting a chat session’s message history, you can also persist any arbitrary JavaScript object. Doing so can let you manage state in a more structured way then relying only on information in the message history.

To include state in a session, you need to instantiate a session explicitly:

```ts
interface MyState {
  userName: string;
}


const session = ai.createSession<MyState>({
  initialState: {
    userName: 'Pavel',
  },
});
```

You can then start a chat within the session:

```ts
const chat = session.chat();
```

To modify the session state based on how the chat unfolds, define [tools](/docs/tool-calling) and include them with your requests:

```ts
const changeUserName = ai.defineTool(
  {
    name: 'changeUserName',
    description: 'can be used to change user name',
    inputSchema: z.object({
      newUserName: z.string(),
    }),
  },
  async (input) => {
    await ai.currentSession<MyState>().updateState({
      userName: input.newUserName,
    });
    return `changed username to ${input.newUserName}`;
  },
);
```

```ts
const chat = session.chat({
  model: googleAI.model('gemini-2.5-flash'),
  tools: [changeUserName],
});
await chat.send('change user name to Kevin');
```

## Multi-thread sessions

[Section titled “Multi-thread sessions”](#multi-thread-sessions)

A single session can contain multiple chat threads. Each thread has its own message history, but they share a single session state.

```ts
const lawyerChat = session.chat('lawyerThread', {
  system: 'talk like a lawyer',
});
const pirateChat = session.chat('pirateThread', {
  system: 'talk like a pirate',
});
```

## Session persistence (EXPERIMENTAL)

[Section titled “Session persistence (EXPERIMENTAL)”](#session-persistence-experimental)

When you initialize a new chat or session, it’s configured by default to store the session in memory only. This is adequate when the session needs to persist only for the duration of a single invocation of your program, as in the sample chatbot from the beginning of this page. However, when integrating LLM chat into an application, you will usually deploy your content generation logic as stateless web API endpoints. For persistent chats to work under this setup, you will need to implement some kind of session storage that can persist state across invocations of your endpoints.

To add persistence to a chat session, you need to implement Genkit’s `SessionStore` interface. Here is an example implementation that saves session state to individual JSON files:

```ts
class JsonSessionStore<S = any> implements SessionStore<S> {
  async get(sessionId: string): Promise<SessionData<S> | undefined> {
    try {
      const s = await readFile(`${sessionId}.json`, { encoding: 'utf8' });
      const data = JSON.parse(s);
      return data;
    } catch {
      return undefined;
    }
  }


  async save(sessionId: string, sessionData: SessionData<S>): Promise<void> {
    const s = JSON.stringify(sessionData);
    await writeFile(`${sessionId}.json`, s, { encoding: 'utf8' });
  }
}
```

This implementation is probably not adequate for practical deployments, but it illustrates that a session storage implementation only needs to accomplish two tasks:

* Get a session object from storage using its session ID
* Save a given session object, indexed by its session ID

Once you’ve implemented the interface for your storage backend, pass an instance of your implementation to the session constructors:

```ts
// To create a new session:
const session = ai.createSession({
  store: new JsonSessionStore(),
});


// Save session.id so you can restore the session the next time the
// user makes a request.
```

```ts
// If the user has a session ID saved, load the session instead of creating
// a new one:
const session = await ai.loadSession(sessionId, {
  store: new JsonSessionStore(),
});
```

# Passing information through context

> Learn how Genkit's context object propagates generation and execution information throughout your application, making it available to flows, tools, and prompts.

[Genkit by Example: Action Context ](https://examples.genkit.dev/action-context?utm_source=genkit.dev\&utm_content=contextlink)See how action context can guide and secure workflows in a live demo.

There are different categories of information that a developer working with an LLM may be handling simultaneously:

* **Input:** Information that is directly relevant to guide the LLM’s response for a particular call. An example of this is the text that needs to be summarized.
* **Generation Context:** Information that is relevant to the LLM, but isn’t specific to the call. An example of this is the current time or a user’s name.
* **Execution Context:** Information that is important to the code surrounding the LLM call but not to the LLM itself. An example of this is a user’s current auth token.

Genkit provides a consistent `context` object that can propagate generation and execution context throughout the process. This context is made available to all actions including [flows](/docs/flows), [tools](/docs/tool-calling), and [prompts](/docs/dotprompt).

Context is automatically propagated to all actions called within the scope of execution: Context passed to a flow is made available to prompts executed within the flow. Context passed to the `generate()` method is available to tools called within the generation loop.

## Why is context important?

[Section titled “Why is context important?”](#why-is-context-important)

As a best practice, you should provide the minimum amount of information to the LLM that it needs to complete a task. This is important for multiple reasons:

* The less extraneous information the LLM has, the more likely it is to perform well at its task.
* If an LLM needs to pass around information like user or account IDs to tools, it can potentially be tricked into leaking information.

Context gives you a side channel of information that can be used by any of your code but doesn’t necessarily have to be sent to the LLM. As an example, it can allow you to restrict tool queries to the current user’s available scope.

## Context structure

[Section titled “Context structure”](#context-structure)

Context must be an object, but its properties are yours to decide. In some situations Genkit automatically populates context. For example, when using [persistent sessions](/docs/chat) the `state` property is automatically added to context.

One of the most common uses of context is to store information about the current user. We recommend adding auth context in the following format:

```js
{
  auth: {
    uid: "...", // the user's unique identifier
    token: {...}, // the decoded claims of a user's id token
    rawToken: "...", // the user's raw encoded id token
    // ...any other fields
  }
}
```

The context object can store any information that you might need to know somewhere else in the flow of execution.

## Use context in an action

[Section titled “Use context in an action”](#use-context-in-an-action)

To use context within an action, you can access the context helper that is automatically supplied to your function definition:

* Flow

  ```ts
  const summarizeHistory = ai.defineFlow({
    name: 'summarizeMessages',
    inputSchema: z.object({friendUid: z.string()}),
    outputSchema: z.string()
  }, async ({friendUid}, {context}) => {
    if (!context.auth?.uid) throw new Error("Must supply auth context.");
    const messages = await listMessagesBetween(friendUid, context.auth.uid);
    const {text} = await ai.generate({
      prompt:
        `Summarize the content of these messages: ${JSON.stringify(messages)}`,
    });
    return text;
  });
  ```

* Tool

  ```ts
  const searchNotes = ai.defineTool({
    name: 'searchNotes',
    description: "search the current user's notes for info",
    inputSchema: z.object({query: z.string()}),
    outputSchema: z.array(NoteSchema)
  }, async ({query}, {context}) => {
    if (!context.auth?.uid) throw new Error("Must be called by a signed-in user.");
    return searchUserNotes(context.auth.uid, query);
  });
  ```

* Prompt file

  When using [Dotprompt templates](/docs/dotprompt), context is made available with the `@` variable prefix. For example, a context object of `{auth: {name: 'Michael'}}` could be accessed in the prompt template like so.

  ```dotprompt
  ---
  input:
    schema:
      pirateStyle?: boolean
  ---


  {{#if pirateStyle}}Avast, {{@auth.name}}, how be ye today?{{else}}Hello, {{@auth.name}}, how are you today?{{/if}}
  ```

## Provide context at runtime

[Section titled “Provide context at runtime”](#provide-context-at-runtime)

To provide context to an action, you pass the context object as an option when calling the action.

* Flows

  ```ts
  const summarizeHistory = ai.defineFlow(/* ... */);


  const summary = await summarizeHistory(friend.uid, {
    context: { auth: currentUser },
  });
  ```

* Generation

  ```ts
  const { text } = await ai.generate({
    prompt: "Find references to ocelots in my notes.",
    // the context will propagate to tool calls
    tools: [searchNotes],
    context: { auth: currentUser },
  });
  ```

* Prompts

  ```ts
  const helloPrompt = ai.prompt("sayHello");
  helloPrompt({ pirateStyle: true }, { context: { auth: currentUser } });
  ```

## Context propagation and overrides

[Section titled “Context propagation and overrides”](#context-propagation-and-overrides)

By default, when you provide context it is automatically propagated to all actions called as a result of your original call. If your flow calls other flows, or your generation calls tools, the same context is provided.

If you wish to override context within an action, you can pass a different context object to replace the existing one:

```ts
const otherFlow = ai.defineFlow(/* ... */);


const myFlow = ai.defineFlow(
  {
    // ...
  },
  (input, { context }) => {
    // override the existing context completely
    otherFlow(
      {
        /*...*/
      },
      { context: { newContext: true } },
    );
    // or selectively override
    otherFlow(
      {
        /*...*/
      },
      { context: { ...context, updatedContext: true } },
    );
  },
);
```

When context is replaced, it propagates the same way. In this example, any actions that `otherFlow` called during its execution would inherit the overridden context.

# Managing prompts with Dotprompt

> This document explains how to manage prompts using Dotprompt, a Genkit library and file format designed to streamline prompt engineering and iteration.

Prompt engineering is the primary way that you, as an app developer, influence the output of generative AI models. For example, when using LLMs, you can craft prompts that influence the tone, format, length, and other characteristics of the models’ responses.

The way you write these prompts will depend on the model you’re using; a prompt written for one model might not perform well when used with another model. Similarly, the model parameters you set (temperature, top-k, and so on) will also affect output differently depending on the model.

Getting all three of these factors—the model, the model parameters, and the prompt—working together to produce the output you want is rarely a trivial process and often involves substantial iteration and experimentation. Genkit provides a library and file format called Dotprompt, that aims to make this iteration faster and more convenient.

[Dotprompt](https://github.com/google/dotprompt) is designed around the premise that **prompts are code**. You define your prompts along with the models and model parameters they’re intended for separately from your application code. Then, you (or, perhaps someone not even involved with writing application code) can rapidly iterate on the prompts and model parameters using the Genkit Developer UI. Once your prompts are working the way you want, you can import them into your application and run them using Genkit.

Your prompt definitions each go in a file with a `.prompt` extension. Here’s an example of what these files look like:

```dotprompt
---
model: googleai/gemini-2.5-flash
config:
  temperature: 0.9
input:
  schema:
    location: string
    style?: string
    name?: string
  default:
    location: a restaurant
---


You are the world's most welcoming AI assistant and are currently working at {{location}}.


Greet a guest{{#if name}} named {{name}}{{/if}}{{#if style}} in the style of {{style}}{{/if}}.
```

The portion in the triple-dashes is YAML front matter, similar to the front matter format used by GitHub Markdown and Jekyll; the rest of the file is the prompt, which can optionally use [Handlebars ](https://handlebarsjs.com/guide/)templates. The following sections will go into more detail about each of the parts that make a `.prompt` file and how to use them.

## Before you begin

[Section titled “Before you begin”](#before-you-begin)

Before reading this page, you should be familiar with the content covered on the [Generating content with AI models](/docs/models) page.

If you want to run the code examples on this page, first complete the steps in the [Get started](/docs/get-started) guide. All of the examples assume that you have already installed Genkit as a dependency in your project.

## Creating prompt files

[Section titled “Creating prompt files”](#creating-prompt-files)

Although Dotprompt provides several [different ways](#defining-prompts-in-code) to create and load prompts, it’s optimized for projects that organize their prompts as `.prompt` files within a single directory (or subdirectories thereof). This section shows you how to create and load prompts using this recommended setup.

### Creating a prompt directory

[Section titled “Creating a prompt directory”](#creating-a-prompt-directory)

The Dotprompt library expects to find your prompts in a directory at your project root and automatically loads any prompts it finds there. By default, this directory is named `prompts`. For example, using the default directory name, your project structure might look something like this:

```plaintext
your-project/
├── lib/
├── node_modules/
├── prompts/
│   └── hello.prompt
├── src/
├── package-lock.json
├── package.json
└── tsconfig.json
```

If you want to use a different directory, you can specify it when you configure Genkit:

```ts
const ai = genkit({
  promptDir: './llm_prompts',
  // (Other settings...)
});
```

### Creating a prompt file

[Section titled “Creating a prompt file”](#creating-a-prompt-file)

There are two ways to create a `.prompt` file: using a text editor, or with the developer UI.

#### Using a text editor

[Section titled “Using a text editor”](#using-a-text-editor)

If you want to create a prompt file using a text editor, create a text file with the `.prompt` extension in your prompts directory: for example, `prompts/hello.prompt`.

Here is a minimal example of a prompt file:

```dotprompt
---
model: vertexai/gemini-2.5-flash
---
You are the world's most welcoming AI assistant. Greet the user and offer your assistance.
```

The portion in the dashes is YAML front matter, similar to the front matter format used by GitHub markdown and Jekyll; the rest of the file is the prompt, which can optionally use Handlebars templates. The front matter section is optional, but most prompt files will at least contain metadata specifying a model. The remainder of this page shows you how to go beyond this, and make use of Dotprompt’s features in your prompt files.

#### Using the developer UI

[Section titled “Using the developer UI”](#using-the-developer-ui)

You can also create a prompt file using the model runner in the developer UI. Start with application code that imports the Genkit library and configures it to use the model plugin you’re interested in. For example:

```ts
import { genkit } from 'genkit';


// Import the model plugins you want to use.
import { googleAI } from '@genkit-ai/googleai';


const ai = genkit({
  // Initialize and configure the model plugins.
  plugins: [
    googleAI({
      apiKey: 'your-api-key', // Or (preferred): export GEMINI_API_KEY=...
    }),
  ],
});
```

It’s okay if the file contains other code, but the above is all that’s required.

Load the developer UI in the same project:

```bash
genkit start -- tsx --watch src/your-code.ts
```

In the Models section, choose the model you want to use from the list of models provided by the plugin.

![Genkit Developer UI Model Runner](/_astro/developer_ui_model_runner.cHO4a-_l_Z1Vv7kN.webp)

Then, experiment with the prompt and configuration until you get results you’re happy with. When you’re ready, press the Export button and save the file to your prompts directory.

## Running prompts

[Section titled “Running prompts”](#running-prompts)

After you’ve created prompt files, you can run them from your application code, or using the tooling provided by Genkit. Regardless of how you want to run your prompts, first start with application code that imports the Genkit library and the model plugins you’re interested in. For example:

```ts
import { genkit } from 'genkit';


// Import the model plugins you want to use.
import { googleAI } from '@genkit-ai/googleai';


const ai = genkit({
  // Initialize and configure the model plugins.
  plugins: [
    googleAI({
      apiKey: 'your-api-key', // Or (preferred): export GEMINI_API_KEY=...
    }),
  ],
});
```

It’s okay if the file contains other code, but the above is all that’s required. If you’re storing your prompts in a directory other than the default, be sure to specify it when you configure Genkit.

### Run prompts from code

[Section titled “Run prompts from code”](#run-prompts-from-code)

To use a prompt, first load it using the `prompt('file_name')` method:

```ts
const helloPrompt = ai.prompt('hello');
```

Once loaded, you can call the prompt like a function:

```ts
const response = await helloPrompt();


// Alternatively, use destructuring assignments to get only the properties
// you're interested in:
const { text } = await helloPrompt();
```

Or you can also run the prompt in streaming mode:

```ts
const { response, stream } = helloPrompt.stream();


for await (const chunk of stream) {
  console.log(chunk.text);
}
// optional final (aggregated) response
console.log((await response).text);
```

A callable prompt takes two optional parameters: the input to the prompt (see the section below on [specifying input schemas](#input-and-output-schemas)), and a configuration object, similar to that of the `generate()` method. For example:

```ts
const response2 = await helloPrompt(
  // Prompt input:
  { name: 'Ted' },


  // Generation options:
  {
    config: {
      temperature: 0.4,
    },
  },
);
```

Similarly for streaming:

```ts
const { stream } = helloPrompt.stream(input, options);
```

Any parameters you pass to the prompt call will override the same parameters specified in the prompt file.

See [Generate content with AI models](/docs/models) for descriptions of the available options.

### Using the developer UI

[Section titled “Using the developer UI”](#using-the-developer-ui-1)

As you’re refining your app’s prompts, you can run them in the Genkit developer UI to quickly iterate on prompts and model configurations, independently from your application code.

Load the developer UI from your project directory:

```bash
genkit start -- tsx --watch src/your-code.ts
```

![Genkit Developer UI Model Runner](/_astro/prompts-in-developer-ui.LmFDtByL_ZBrbGw.webp)

Once you’ve loaded prompts into the developer UI, you can run them with different input values, and experiment with how changes to the prompt wording or the configuration parameters affect the model output. When you’re happy with the result, you can click the **Export prompt** button to save the modified prompt back into your project directory.

## Model configuration

[Section titled “Model configuration”](#model-configuration)

In the front matter block of your prompt files, you can optionally specify model configuration values for your prompt:

```dotprompt
---
model: googleai/gemini-2.5-flash
config:
  temperature: 1.4
  topK: 50
  topP: 0.4
  maxOutputTokens: 400
  stopSequences:
    -   "<end>"
    -   "<fin>"
---
```

These values map directly to the `config` parameter accepted by the callable prompt:

```ts
const response3 = await helloPrompt(
  {},
  {
    config: {
      temperature: 1.4,
      topK: 50,
      topP: 0.4,
      maxOutputTokens: 400,
      stopSequences: ['<end>', '<fin>'],
    },
  },
);
```

See [Generate content with AI models](/docs/models) for descriptions of the available options.

## Input and output schemas

[Section titled “Input and output schemas”](#input-and-output-schemas)

You can specify input and output schemas for your prompt by defining them in the front matter section:

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    theme?: string
  default:
    theme: "pirate"
output:
  schema:
    dishname: string
    description: string
    calories: integer
    allergens(array): string
---
Invent a menu item for a {{theme}} themed restaurant.
```

These schemas are used in much the same way as those passed to a `generate()` request or a flow definition. For example, the prompt defined above produces structured output:

```ts
const menuPrompt = ai.prompt('menu');
const { output } = await menuPrompt({ theme: 'medieval' });


const dishName = output['dishname'];
const description = output['description'];
```

You have several options for defining schemas in a `.prompt` file: Dotprompt’s own schema definition format, Picoschema; standard JSON Schema; or, as references to schemas defined in your application code. The following sections describe each of these options in more detail.

### Picoschema

[Section titled “Picoschema”](#picoschema)

The schemas in the example above are defined in a format called Picoschema. Picoschema is a compact, YAML-optimized schema definition format that makes it easy to define the most important attributes of a schema for LLM usage. Here’s a longer example of a schema, which specifies the information an app might store about an article:

```yaml
schema:
  title: string # string, number, and boolean types are defined like this
  subtitle?: string # optional fields are marked with a `?`
  draft?: boolean, true when in draft state
  status?(enum, approval status): [PENDING, APPROVED]
  date: string, the date of publication e.g. '2024-04-09' # descriptions follow a comma
  tags(array, relevant tags for article): string # arrays are denoted via parentheses
  authors(array):
    name: string
    email?: string
  metadata?(object): # objects are also denoted via parentheses
    updatedAt?: string, ISO timestamp of last update
    approvedBy?: integer, id of approver
  extra?: any, arbitrary extra data
  (*): string, wildcard field
```

The above schema is equivalent to the following TypeScript interface:

```ts
interface Article {
  title: string;
  subtitle?: string | null;
  /** true when in draft state */
  draft?: boolean | null;
  /** approval status */
  status?: 'PENDING' | 'APPROVED' | null;
  /** the date of publication e.g. '2024-04-09' */
  date: string;
  /** relevant tags for article */
  tags: string[];
  authors: {
    name: string;
    email?: string | null;
  }[];
  metadata?: {
    /** ISO timestamp of last update */
    updatedAt?: string | null;
    /** id of approver */
    approvedBy?: number | null;
  } | null;
  /** arbitrary extra data */
  extra?: any;
  /** wildcard field */
}
```

Picoschema supports scalar types `string`, `integer`, `number`, `boolean`, and `any`. Objects, arrays, and enums are denoted by a parenthetical after the field name.

Objects defined by Picoschema have all properties required unless denoted optional by `?`, and do not allow additional properties. When a property is marked as optional, it is also made nullable to provide more leniency for LLMs to return null instead of omitting a field.

In an object definition, the special key `(*)` can be used to declare a “wildcard” field definition. This will match any additional properties not supplied by an explicit key.

### JSON Schema

[Section titled “JSON Schema”](#json-schema)

Picoschema does not support many of the capabilities of full JSON schema. If you require more robust schemas, you may supply a JSON Schema instead:

```yaml
output:
  schema:
    type: object
    properties:
      field1:
        type: number
        minimum: 20
```

### Zod schemas defined in code

[Section titled “Zod schemas defined in code”](#zod-schemas-defined-in-code)

In addition to directly defining schemas in the `.prompt` file, you can reference a schema registered with `defineSchema()` by name. If you’re using TypeScript, this approach will let you take advantage of the language’s static type checking features when you work with prompts.

To register a schema:

```ts
import { z } from 'genkit';


const MenuItemSchema = ai.defineSchema(
  'MenuItemSchema',
  z.object({
    dishname: z.string(),
    description: z.string(),
    calories: z.coerce.number(),
    allergens: z.array(z.string()),
  }),
);
```

Within your prompt, provide the name of the registered schema:

```dotprompt
---
model: googleai/gemini-2.5-flash-latest
output:
  schema: MenuItemSchema
---
```

The Dotprompt library will automatically resolve the name to the underlying registered Zod schema. You can then utilize the schema to strongly type the output of a Dotprompt:

```ts
const menuPrompt = ai.prompt<
  z.ZodTypeAny, // Input schema
  typeof MenuItemSchema, // Output schema
  z.ZodTypeAny // Custom options schema
>('menu');
const { output } = await menuPrompt({ theme: 'medieval' });


// Now data is strongly typed as MenuItemSchema:
const dishName = output?.dishname;
const description = output?.description;
```

## Prompt templates

[Section titled “Prompt templates”](#prompt-templates)

The portion of a `.prompt` file that follows the front matter (if present) is the prompt itself, which will be passed to the model. While this prompt could be a simple text string, very often you will want to incorporate user input into the prompt. To do so, you can specify your prompt using the [Handlebars](https://handlebarsjs.com/guide/) templating language. Prompt templates can include placeholders that refer to the values defined by your prompt’s input schema.

You already saw this in action in the section on input and output schemas:

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    theme?: string
  default:
    theme: "pirate"
output:
  schema:
    dishname: string
    description: string
    calories: integer
    allergens(array): string
---
Invent a menu item for a {{theme}} themed restaurant.
```

In this example, the Handlebars expression, `{{theme}}`, resolves to the value of the input’s `theme` property when you run the prompt. To pass input to the prompt, call the prompt as in the following example:

```ts
const menuPrompt = ai.prompt('menu');
const { output } = await menuPrompt({ theme: 'medieval' });
```

Note that because the input schema declared the `theme` property to be optional and provided a default, you could have omitted the property, and the prompt would have resolved using the default value.

Handlebars templates also support some limited logical constructs. For example, as an alternative to providing a default, you could define the prompt using Handlebars’s `#if` helper:

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    theme?: string
---
Invent a menu item for a {{#if theme}}{{theme}} themed{{/if}} restaurant.
```

In this example, the prompt renders as “Invent a menu item for a restaurant” when the `theme` property is unspecified.

See the [Handlebars documentation](https://handlebarsjs.com/guide/builtin-helpers.html) for information on all of the built-in logical helpers.

In addition to properties defined by your input schema, your templates can also refer to values automatically defined by Genkit. The next few sections describe these automatically-defined values and how you can use them.

### Multi-message prompts

[Section titled “Multi-message prompts”](#multi-message-prompts)

By default, Dotprompt constructs a single message with a “user” role. However, some prompts are best expressed as a combination of multiple messages, such as a system prompt.

The `{{role}}` helper provides a simple way to construct multi-message prompts:

```dotprompt
---
model: vertexai/gemini-2.5-flash
input:
  schema:
    userQuestion: string
---
{{role "system"}}
You are a helpful AI assistant that really loves to talk about food. Try to work
food items into all of your conversations.
{{role "user"}}
{{userQuestion}}
```

Note that your final prompt must contain at least one `user` role.

### Multi-modal prompts

[Section titled “Multi-modal prompts”](#multi-modal-prompts)

For models that support multimodal input, such as images alongside text, you can use the `{{media}}` helper:

```dotprompt
---
model: vertexai/gemini-2.5-flash
input:
  schema:
    photoUrl: string
---
Describe this image in a detailed paragraph:


{{media url=photoUrl}}
```

The URL can be `https:` or base64-encoded `data:` URIs for “inline” image usage. In code, this would be:

```ts
const multimodalPrompt = ai.prompt('multimodal');
const { text } = await multimodalPrompt({
  photoUrl: 'https://example.com/photo.jpg',
});
```

See also [Multimodal input](/docs/models#multimodal-input), on the Models page, for an example of constructing a `data:` URL.

### Partials

[Section titled “Partials”](#partials)

Partials are reusable templates that can be included inside any prompt. Partials can be especially helpful for related prompts that share common behavior.

When loading a prompt directory, any file prefixed with an underscore (`_`) is considered a partial. So a file `_personality.prompt` might contain:

```dotprompt
You should speak like a {{#if style}}{{style}}{{else}}helpful assistant.{{/if}}.
```

This can then be included in other prompts:

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    name: string
    style?: string
---


{{role "system"}}
{{>personality style=style}}


{{role "user"}}
Give the user a friendly greeting.


User's Name: {{name}}
```

Partials are inserted using the `{{>NAME_OF_PARTIAL args...}}` syntax. If no arguments are provided to the partial, it executes with the same context as the parent prompt.

Partials accept both named arguments as above or a single positional argument representing the context. This can be helpful for tasks such as rendering members of a list.

**\_destination.prompt**

```dotprompt
- {{name}} ({{country}})
```

**chooseDestination.prompt**

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    destinations(array):
      name: string
      country: string
---
Help the user decide between these vacation destinations:


{{#each destinations}}
{{>destination this}}
{{/each}}
```

#### Defining partials in code

[Section titled “Defining partials in code”](#defining-partials-in-code)

You can also define partials in code using `definePartial`:

```ts
ai.definePartial('personality', 'Talk like a {{#if style}}{{style}}{{else}}helpful assistant{{/if}}.');
```

Code-defined partials are available in all prompts.

### Defining Custom Helpers

[Section titled “Defining Custom Helpers”](#defining-custom-helpers)

You can define custom helpers to process and manage data inside of a prompt. Helpers are registered globally using `defineHelper`:

```ts
ai.defineHelper('shout', (text: string) => text.toUpperCase());
```

Once a helper is defined you can use it in any prompt:

```dotprompt
---
model: googleai/gemini-2.5-flash
input:
  schema:
    name: string
---


HELLO, {{shout name}}!!!
```

## Prompt variants

[Section titled “Prompt variants”](#prompt-variants)

Because prompt files are just text, you can (and should!) commit them to your version control system, allowing you to compare changes over time easily. Often, tweaked versions of prompts can only be fully tested in a production environment side-by-side with existing versions. Dotprompt supports this through its variants feature.

To create a variant, create a `[name].[variant].prompt` file. For instance, if you were using Gemini 2.0 Flash in your prompt but wanted to see if Gemini 2.5 Pro would perform better, you might create two files:

* `my_prompt.prompt`: the “baseline” prompt
* `my_prompt.gemini25pro.prompt`: a variant named `gemini25pro`

To use a prompt variant, specify the variant option when loading:

```ts
const myPrompt = ai.prompt('my_prompt', { variant: 'gemini25pro' });
```

The name of the variant is included in the metadata of generation traces, so you can compare and contrast actual performance between variants in the Genkit trace inspector.

## Defining prompts in code

[Section titled “Defining prompts in code”](#defining-prompts-in-code)

All of the examples discussed so far have assumed that your prompts are defined in individual `.prompt` files in a single directory (or subdirectories thereof), accessible to your app at runtime. Dotprompt is designed around this setup, and its authors consider it to be the best developer experience overall.

However, if you have use cases that are not well supported by this setup, you can also define prompts in code using the `definePrompt()` function:

The first parameter to this function is analogous to the front matter block of a `.prompt` file; the second parameter can either be a Handlebars template string, as in a prompt file, or a function that returns a `GenerateRequest`:

```ts
const myPrompt = ai.definePrompt({
  name: 'myPrompt',
  model: 'googleai/gemini-2.5-flash',
  input: {
    schema: z.object({
      name: z.string(),
    }),
  },
  prompt: 'Hello, {{name}}. How are you today?',
});
```

```ts
const myPrompt = ai.definePrompt({
  name: 'myPrompt',
  model: 'googleai/gemini-2.5-flash',
  input: {
    schema: z.object({
      name: z.string(),
    }),
  },
  messages: async (input) => {
    return [
      {
        role: 'user',
        content: [{ text: `Hello, ${input.name}. How are you today?` }],
      },
    ];
  },
});
```

# Error Types

> Learn about Genkit's specialized error types, GenkitError and UserFacingError, and how they are used to differentiate between internal and user-facing issues.

Genkit knows about two specialized types: `GenkitError` and `UserFacingError`. `GenkitError` is intended for use by Genkit itself or Genkit plugins. `UserFacingError` is intended for [`ContextProviders`](/docs/deploy-node) and your code. The separation between these two error types helps you better understand where your error is coming from.

Genkit plugins for web hosting (e.g. [`@genkit-ai/express`](https://js.api.genkit.dev/modules/_genkit-ai_express.html) or [`@genkit-ai/next`](https://js.api.genkit.dev/modules/_genkit-ai_next.html)) SHOULD capture all other Error types and instead report them as an internal error in the response. This adds a layer of security to your application by ensuring that internal details of your application do not leak to attackers.

# Evaluation

> Learn about Genkit's evaluation capabilities, including inference-based and raw evaluation, dataset creation, and how to use the Developer UI and CLI for testing and analysis.

Evaluation is a form of testing that helps you validate your LLM’s responses and ensure they meet your quality bar.

Genkit supports third-party evaluation tools through plugins, paired with powerful observability features that provide insight into the runtime state of your LLM-powered applications. Genkit tooling helps you automatically extract data including inputs, outputs, and information from intermediate steps to evaluate the end-to-end quality of LLM responses as well as understand the performance of your system’s building blocks.

### Types of evaluation

[Section titled “Types of evaluation”](#types-of-evaluation)

Genkit supports two types of evaluation:

* **Inference-based evaluation**: This type of evaluation runs against a collection of pre-determined inputs, assessing the corresponding outputs for quality.

  This is the most common evaluation type, suitable for most use cases. This approach tests a system’s actual output for each evaluation run.

  You can perform the quality assessment manually, by visually inspecting the results. Alternatively, you can automate the assessment by using an evaluation metric.

* **Raw evaluation**: This type of evaluation directly assesses the quality of inputs without any inference. This approach typically is used with automated evaluation using metrics. All required fields for evaluation (e.g., `input`, `context`, `output` and `reference`) must be present in the input dataset. This is useful when you have data coming from an external source (e.g., collected from your production traces) and you want to have an objective measurement of the quality of the collected data.

  For more information, see the [Advanced use](#advanced-use) section of this page.

This section explains how to perform inference-based evaluation using Genkit.

## Quick start

[Section titled “Quick start”](#quick-start)

### Setup

[Section titled “Setup”](#setup)

1. Use an existing Genkit app or create a new one by following our [Get started](/docs/get-started) guide.

2. Add the following code to define a simple RAG application to evaluate. For this guide, we use a dummy retriever that always returns the same documents.

   ```js
   import { genkit, z, Document } from 'genkit';
   import { googleAI } from '@genkit-ai/googleai';


   // Initialize Genkit
   export const ai = genkit({ plugins: [googleAI()] });


   // Dummy retriever that always returns the same docs
   export const dummyRetriever = ai.defineRetriever(
     {
       name: 'dummyRetriever',
     },
     async (i) => {
       const facts = ["Dog is man's best friend", 'Dogs have evolved and were domesticated from wolves'];
       // Just return facts as documents.
       return { documents: facts.map((t) => Document.fromText(t)) };
     },
   );


   // A simple question-answering flow
   export const qaFlow = ai.defineFlow(
     {
       name: 'qaFlow',
       inputSchema: z.object({ query: z.string() }),
       outputSchema: z.object({ answer: z.string() }),
     },
     async ({ query }) => {
       const factDocs = await ai.retrieve({
         retriever: dummyRetriever,
         query,
       });


       const { text } = await ai.generate({
         model: googleAI.model('gemini-2.5-flash'),
         prompt: `Answer this question with the given context ${query}`,
         docs: factDocs,
       });
       return { answer: text };
     },
   );
   ```

3. (Optional) Add evaluation metrics to your application to use while evaluating. This guide uses the `MALICIOUSNESS` metric from the `genkitEval` plugin.

   ```js
   import { genkitEval, GenkitMetric } from '@genkit-ai/evaluator';
   import { googleAI } from '@genkit-ai/googleai';


   export const ai = genkit({
     plugins: [
       ...// Add this plugin to your Genkit initialization block
       genkitEval({
         judge: googleAI.model('gemini-2.5-flash'),
         metrics: [GenkitMetric.MALICIOUSNESS],
       }),
     ],
   });
   ```

   **Note:** The configuration above requires installation of the [`@genkit-ai/evaluator`](https://www.npmjs.com/package/@genkit-ai/evaluator) package.

   ```bash
   npm install @genkit-ai/evaluator
   ```

4. Start your Genkit application.

   ```bash
   genkit start -- <command to start your app>
   ```

### Create a dataset

[Section titled “Create a dataset”](#create-a-dataset)

Create a dataset to define the examples we want to use for evaluating our flow.

1. Go to the Dev UI at `http://localhost:4000` and click the **Datasets** button to open the Datasets page.

2. Click on the **Create Dataset** button to open the create dataset dialog.

   a. Provide a `datasetId` for your new dataset. This guide uses `myFactsQaDataset`.

   b. Select `Flow` dataset type.

   c. Leave the validation target field empty and click **Save**

3. Your new dataset page appears, showing an empty dataset. Add examples to it by following these steps:

   a. Click the **Add example** button to open the example editor panel.

   b. Only the `input` field is required. Enter `{"query": "Who is man's best friend?"}` in the `input` field, and click **Save** to add the example has to your dataset.

   c. Repeat steps (a) and (b) a couple more times to add more examples. This guide adds the following example inputs to the dataset:

   ```plaintext
   {"query": "Can I give milk to my cats?"}
   {"query": "From which animals did dogs evolve?"}
   ```

By the end of this step, your dataset should have 3 examples in it, with the values mentioned above.

### Run evaluation and view results

[Section titled “Run evaluation and view results”](#run-evaluation-and-view-results)

To start evaluating the flow, click the **Run new evaluation** button on your dataset page. You can also start a new evaluation from the *Evaluations* tab.

1. Select the `Flow` radio button to evaluate a flow.

2. Select `qaFlow` as the target flow to evaluate.

3. Select `myFactsQaDataset` as the target dataset to use for evaluation.

4. (Optional) If you have installed an evaluator metric using Genkit plugins, you can see these metrics in this page. Select the metrics that you want to use with this evaluation run. This is entirely optional: Omitting this step will still return the results in the evaluation run, but without any associated metrics.

5. Finally, click **Run evaluation** to start evaluation. Depending on the flow you’re testing, this may take a while. Once the evaluation is complete, a success message appears with a link to view the results. Click on the link to go to the *Evaluation details* page.

You can see the details of your evaluation on this page, including original input, extracted context and metrics (if any).

## Core concepts

[Section titled “Core concepts”](#core-concepts)

### Terminology

[Section titled “Terminology”](#terminology)

* **Evaluation**: An evaluation is a process that assesses system performance. In Genkit, such a system is usually a Genkit primitive, such as a flow or a model. An evaluation can be automated or manual (human evaluation).

* **Bulk inference** Inference is the act of running an input on a flow or model to get the corresponding output. Bulk inference involves performing inference on multiple inputs simultaneously.

* **Metric** An evaluation metric is a criterion on which an inference is scored. Examples include accuracy, faithfulness, maliciousness, whether the output is in English, etc.

* **Dataset** A dataset is a collection of examples to use for inference-based\
  evaluation. A dataset typically consists of `input` and optional `reference` fields. The `reference` field does not affect the inference step of evaluation but it is passed verbatim to any evaluation metrics. In Genkit, you can create a dataset through the Dev UI. There are two types of datasets in Genkit: *Flow* datasets and *Model* datasets.

### Schema validation

[Section titled “Schema validation”](#schema-validation)

Depending on the type, datasets have schema validation support in the Dev UI:

* Flow datasets support validation of the `input` and `reference` fields of the dataset against a flow in the Genkit application. Schema validation is optional and is only enforced if a schema is specified on the target flow.

* Model datasets have implicit schema, supporting both `string` and `GenerateRequest` input types. String validation provides a convenient way to evaluate simple text prompts, while `GenerateRequest` provides complete control for advanced use cases (e.g. providing model parameters, message history, tools, etc). You can find the full schema for `GenerateRequest` in our [API reference docs](https://js.api.genkit.dev/interfaces/genkit._.GenerateRequest.html).

Note: Schema validation is a helper tool for editing examples, but it is possible to save an example with invalid schema. These examples may fail when the running an evaluation.

## Supported evaluators

[Section titled “Supported evaluators”](#supported-evaluators)

### Genkit evaluators

[Section titled “Genkit evaluators”](#genkit-evaluators)

Genkit includes a small number of native evaluators, inspired by [RAGAS](https://docs.ragas.io/en/stable/), to help you get started:

* Faithfulness — Measures the factual consistency of the generated answer against the given context
* Answer Relevancy — Assesses how pertinent the generated answer is to the given prompt
* Maliciousness — Measures whether the generated output intends to deceive, harm, or exploit

### Evaluator plugins

[Section titled “Evaluator plugins”](#evaluator-plugins)

Genkit supports additional evaluators through plugins, like the Vertex Rapid Evaluators, which you can access via the [VertexAI Plugin](/docs/plugins/vertex-ai#evaluators).

## Advanced use

[Section titled “Advanced use”](#advanced-use)

### Evaluation comparison

[Section titled “Evaluation comparison”](#evaluation-comparison)

The Developer UI offers visual tools for side-by-side comparison of multiple evaluation runs. This feature allows you to analyze variations across different executions within a unified interface, making it easier to assess changes in output quality. Additionally, you can highlight outputs based on the performance of specific metrics, indicating improvements or regressions.

When comparing evaluations, one run is designated as the *Baseline*. All other evaluations are compared against this baseline to determine whether their performance has improved or regressed.

![Evaluation comparison with metric highlighting](/_astro/evals_compare_light.BfzHwe1o_LOp20.webp) ![Evaluation comparison with metric highlighting](/_astro/evals_compare_dark.DfqBTUeH_Z2q4UrA.webp)

#### Prerequisites

[Section titled “Prerequisites”](#prerequisites)

To use the evaluation comparison feature, the following conditions must be met:

* Evaluations must originate from a dataset source. Evaluations from file sources are not comparable.
* All evaluations being compared must be from the same dataset.
* For metric highlighting, all evaluations must use at least one common metric that produces a `number` or `boolean` score.

#### Comparing evaluations

[Section titled “Comparing evaluations”](#comparing-evaluations)

1. Ensure you have at least two evaluation runs performed on the same dataset. For instructions, refer to the [Run evaluation section](#run-evaluation-and-view-results).

2. In the Developer UI, navigate to the **Datasets** page.

3. Select the relevant dataset and open its **Evaluations** tab. You should see all evaluation runs associated with that dataset.

4. Choose one evaluation to serve as the baseline for comparison.

5. On the evaluation results page, click the **+ Comparison** button. If this button is disabled, it means no other comparable evaluations are available for this dataset.

6. A new column will appear with a dropdown menu. Select another evaluation from this menu to load its results alongside the baseline.

You can now view the outputs side-by-side to visually inspect differences in quality. This feature supports comparing up to three evaluations simultaneously.

##### Metric highlighting (Optional)

[Section titled “Metric highlighting (Optional)”](#metric-highlighting-optional)

If your evaluations include metrics, you can enable metric highlighting to color-code the results. This feature helps you quickly identify changes in performance: improvements are colored green, while regressions are red.

Note that highlighting is only supported for numeric and boolean metrics, and the selected metric must be present in all evaluations being compared.

To enable metric highlighting:

1. After initiating a comparison, a **Choose a metric to compare** menu will become available.

2. Select a metric from the dropdown. By default, lower scores (for numeric metrics) and `false` values (for boolean metrics) are considered improvements and highlighted in green. You can reverse this logic by ticking the checkbox in the menu.

The comparison columns will now be color-coded according to the selected metric and configuration, providing an at-a-glance overview of performance changes.

### Evaluation using the CLI

[Section titled “Evaluation using the CLI”](#evaluation-using-the-cli)

Genkit CLI provides a rich API for performing evaluation. This is especially useful in environments where the Dev UI is not available (e.g. in a CI/CD workflow).

Genkit CLI provides 3 main evaluation commands: `eval:flow`, `eval:extractData`, and `eval:run`.

#### `eval:flow` command

[Section titled “eval:flow command”](#evalflow-command)

The `eval:flow` command runs inference-based evaluation on an input dataset. This dataset may be provided either as a JSON file or by referencing an existing dataset in your Genkit runtime.

```bash
# Referencing an existing dataset
genkit eval:flow qaFlow --input myFactsQaDataset


# or, using a dataset from a file
genkit eval:flow qaFlow --input testInputs.json
```

Note: Make sure that you start your genkit app before running these CLI commands.

```bash
genkit start -- <command to start your app>
```

Here, `testInputs.json` should be an array of objects containing an `input` field and an optional `reference` field, like below:

```json
[
  {
    "input": { "query": "What is the French word for Cheese?" }
  },
  {
    "input": { "query": "What green vegetable looks like cauliflower?" },
    "reference": "Broccoli"
  }
]
```

If your flow requires auth, you may specify it using the `--context` argument:

```bash
genkit eval:flow qaFlow --input testInputs.json --context '{"auth": {"email_verified": true}}'
```

By default, the `eval:flow` and `eval:run` commands use all available metrics for evaluation. To run on a subset of the configured evaluators, use the `--evaluators` flag and provide a comma-separated list of evaluators by name:

```bash
genkit eval:flow qaFlow --input testInputs.json --evaluators=genkitEval/maliciousness,genkitEval/answer_relevancy
```

You can view the results of your evaluation run in the Dev UI at `localhost:4000/evaluate`.

#### `eval:extractData` and `eval:run` commands

[Section titled “eval:extractData and eval:run commands”](#evalextractdata-and-evalrun-commands)

To support *raw evaluation*, Genkit provides tools to extract data from traces and run evaluation metrics on extracted data. This is useful, for example, if you are using a different framework for evaluation or if you are collecting inferences from a different environment to test locally for output quality.

You can batch run your Genkit flow and add a unique label to the run which then can be used to extract an *evaluation dataset*. A raw evaluation dataset is a collection of inputs for evaluation metrics, *without* running any prior inference.

Run your flow over your test inputs:

```bash
genkit flow:batchRun qaFlow testInputs.json --label firstRunSimple
```

Extract the evaluation data:

```bash
genkit eval:extractData qaFlow --label firstRunSimple --output factsEvalDataset.json
```

The exported data has a format different from the dataset format presented earlier. This is because this data is intended to be used with evaluation metrics directly, without any inference step. Here is the syntax of the extracted data.

```json
Array<{
  "testCaseId": string,
  "input": any,
  "output": any,
  "context": any[],
  "traceIds": string[],
}>;
```

The data extractor automatically locates retrievers and adds the produced docs to the context array. You can run evaluation metrics on this extracted dataset using the `eval:run` command.

```bash
genkit eval:run factsEvalDataset.json
```

By default, `eval:run` runs against all configured evaluators, and as with `eval:flow`, results for `eval:run` appear in the evaluation page of Developer UI, located at `localhost:4000/evaluate`.

### Batching evaluations

[Section titled “Batching evaluations”](#batching-evaluations)

You can speed up evaluations by processing the inputs in batches using the CLI and Dev UI. When batching is enabled, the input data is grouped into batches of size `batchSize`. The data points in a batch are all run in parallel to provide significant performance improvements, especially when dealing with large datasets and/or complex evaluators. By default (when the flag is omitted), batching is disabled.

The `batchSize` option has been integrated into the `eval:flow` and `eval:run` CLI commands. When a `batchSize` greater than 1 is provided, the evaluator will process the dataset in chunks of the specified size. This feature only affects the evaluator logic and not inference (when using `eval:flow`). Here are some examples of enabling batching with the CLI:

```bash
genkit eval:flow myFlow --input yourDataset.json --evaluators=custom/myEval --batchSize 10
```

Or, with `eval:run`

```bash
genkit eval:run yourDataset.json --evaluators=custom/myEval --batchSize 10
```

Batching is also available in the Dev UI for Genkit (JS) applications. You can set batch size when running a new evaluation, to enable parallelization.

### Custom extractors

[Section titled “Custom extractors”](#custom-extractors)

Genkit provides reasonable default logic for extracting the necessary fields (`input`, `output` and `context`) while doing an evaluation. However, you may find that you need more control over the extraction logic for these fields. Genkit supports customs extractors to achieve this. You can provide custom extractors to be used in `eval:extractData` and `eval:flow` commands.

First, as a preparatory step, introduce an auxilary step in our `qaFlow` example:

```js
export const qaFlow = ai.defineFlow(
  {
    name: 'qaFlow',
    inputSchema: z.object({ query: z.string() }),
    outputSchema: z.object({ answer: z.string() }),
  },
  async ({ query }) => {
    const factDocs = await ai.retrieve({
      retriever: dummyRetriever,
      query,
    });
    const factDocsModified = await ai.run('factModified', async () => {
      // Let us use only facts that are considered silly. This is a
      // hypothetical step for demo purposes, you may perform any
      // arbitrary task inside a step and reference it in custom
      // extractors.
      //
      // Assume you have a method that checks if a fact is silly
      return factDocs.filter((d) => isSillyFact(d.text));
    });


    const { text } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Answer this question with the given context ${query}`,
      docs: factDocsModified,
    });
    return { answer: text };
  },
);
```

Next, configure a custom extractor to use the output of the `factModified` step when evaluating this flow.

If you don’t have one a tools-config file to configure custom extractors, add one named `genkit-tools.conf.js` to your project root.

```bash
cd /path/to/your/genkit/app


touch genkit-tools.conf.js
```

In the tools config file, add the following code:

```js
module.exports = {
  evaluators: [
    {
      actionRef: '/flow/qaFlow',
      extractors: {
        context: { outputOf: 'factModified' },
      },
    },
  ],
};
```

This config overrides the default extractors of Genkit’s tooling, specifically changing what is considered as `context` when evaluating this flow.

Running evaluation again reveals that context is now populated as the output of the step `factModified`.

```bash
genkit eval:flow qaFlow --input testInputs.json
```

Evaluation extractors are specified as follows:

* `evaluators` field accepts an array of EvaluatorConfig objects, which are scoped by `flowName`

* `extractors` is an object that specifies the extractor overrides. The current supported keys in `extractors` are `[input, output, context]`. The acceptable value types are:

  * `string` - this should be a step name, specified as a string. The output of this step is extracted for this key.
  * `{ inputOf: string }` or `{ outputOf: string }` - These objects represent specific channels (input or output) of a step. For example, `{ inputOf: 'foo-step' }` would extract the input of step `foo-step` for this key.
  * `(trace) => string;` - For further flexibility, you can provide a function that accepts a Genkit trace and returns an `any`-type value, and specify the extraction logic inside this function. Refer to `genkit/genkit-tools/common/src/types/trace.ts` for the exact TraceData schema.

**Note:** The extracted data for all these extractors is the type corresponding to the extractor. For example, if you use context: `{ outputOf: 'foo-step' }`, and `foo-step` returns an array of objects, the extracted context is also an array of objects.

### Synthesizing test data using an LLM

[Section titled “Synthesizing test data using an LLM”](#synthesizing-test-data-using-an-llm)

Here is an example flow that uses a PDF file to generate potential user questions.

```ts
import { genkit, z } from 'genkit';
import { googleAI } from '@genkit-ai/googleai';
import { chunk } from 'llm-chunk'; // npm install llm-chunk
import path from 'path';
import { readFile } from 'fs/promises';
import pdf from 'pdf-parse'; // npm install pdf-parse


const ai = genkit({ plugins: [googleAI()] });


const chunkingConfig = {
  minLength: 1000, // number of minimum characters into chunk
  maxLength: 2000, // number of maximum characters into chunk
  splitter: 'sentence', // paragraph | sentence
  overlap: 100, // number of overlap chracters
  delimiters: '', // regex for base split method
} as any;


async function extractText(filePath: string) {
  const pdfFile = path.resolve(filePath);
  const dataBuffer = await readFile(pdfFile);
  const data = await pdf(dataBuffer);
  return data.text;
}


export const synthesizeQuestions = ai.defineFlow(
  {
    name: 'synthesizeQuestions',
    inputSchema: z.object({ filePath: z.string().describe('PDF file path') }),
    outputSchema: z.object({
      questions: z.array(
        z.object({
          query: z.string(),
        }),
      ),
    }),
  },
  async ({ filePath }) => {
    filePath = path.resolve(filePath);
    // `extractText` loads the PDF and extracts its contents as text.
    const pdfTxt = await ai.run('extract-text', () => extractText(filePath));


    const chunks = await ai.run('chunk-it', async () => chunk(pdfTxt, chunkingConfig));


    const questions = [];
    for (var i = 0; i < chunks.length; i++) {
      const { text } = await ai.generate({
        model: googleAI.model('gemini-2.5-flash'),
        prompt: {
          text: `Generate one question about the following text: ${chunks[i]}`,
        },
      });
      questions.push({ query: text });
    }
    return { questions };
  },
);
```

You can then use this command to export the data into a file and use for evaluation.

```bash
genkit flow:run synthesizeQuestions '{"filePath": "my_input.pdf"}' --output synthesizedQuestions.json
```

# Defining AI workflows

> Learn how to define and manage AI workflows in Genkit using flows, which provide type safety, integration with the developer UI, and simplified deployment.

The core of your app’s AI features are generative model requests, but it’s rare that you can simply take user input, pass it to the model, and display the model output back to the user. Usually, there are pre- and post-processing steps that must accompany the model call. For example:

* Retrieving contextual information to send with the model call
* Retrieving the history of the user’s current session, for example in a chat app
* Using one model to reformat the user input in a way that’s suitable to pass to another model
* Evaluating the “safety” of a model’s output before presenting it to the user
* Combining the output of several models

Every step of this workflow must work together for any AI-related task to succeed.

In Genkit, you represent this tightly-linked logic using a construction called a flow. Flows are written just like functions, using ordinary TypeScript code, but they add additional capabilities intended to ease the development of AI features:

* **Type safety**: Input and output schemas defined using Zod, which provides both static and runtime type checking
* **Integration with developer UI**: Debug flows independently of your application code using the developer UI. In the developer UI, you can run flows and view traces for each step of the flow.
* **Simplified deployment**: Deploy flows directly as web API endpoints, using Cloud Functions for Firebase or any platform that can host a web app.

Unlike similar features in other frameworks, Genkit’s flows are lightweight and unobtrusive, and don’t force your app to conform to any specific abstraction. All of the flow’s logic is written in standard TypeScript, and code inside a flow doesn’t need to be flow-aware.

## Defining and calling flows

[Section titled “Defining and calling flows”](#defining-and-calling-flows)

In its simplest form, a flow just wraps a function. The following example wraps a function that calls `generate()`:

```typescript
export const menuSuggestionFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: z.object({ menuItem: z.string() }),
  },
  async ({ theme }) => {
    const { text } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Invent a menu item for a ${theme} themed restaurant.`,
    });
    return { menuItem: text };
  },
);
```

Just by wrapping your `generate()` calls like this, you add some functionality: doing so lets you run the flow from the Genkit CLI and from the developer UI, and is a requirement for several of Genkit’s features, including deployment and observability (later sections discuss these topics).

### Input and output schemas

[Section titled “Input and output schemas”](#input-and-output-schemas)

One of the most important advantages Genkit flows have over directly calling a model API is type safety of both inputs and outputs. When defining flows, you can define schemas for them using Zod, in much the same way as you define the output schema of a `generate()` call; however, unlike with `generate()`, you can also specify an input schema.

While it’s not mandatory to wrap your input and output schemas in `z.object()`, it’s considered best practice for these reasons:

* **Better developer experience**: Wrapping schemas in objects provides a better experience in the Developer UI by giving you labeled input fields.
* **Future-proof API design**: Object-based schemas allow for easy extensibility in the future. You can add new fields to your input or output schemas without breaking existing clients, which is a core principle of robust API design.

All examples in this documentation use object-based schemas to follow these best practices.

Here’s a refinement of the last example, which defines a flow that takes a string as input and outputs an object:

```typescript
import { z } from 'genkit';


const MenuItemSchema = z.object({
  dishname: z.string(),
  description: z.string(),
});


export const menuSuggestionFlowWithSchema = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: MenuItemSchema,
  },
  async ({ theme }) => {
    const { output } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Invent a menu item for a ${theme} themed restaurant.`,
      output: { schema: MenuItemSchema },
    });
    if (output == null) {
      throw new Error("Response doesn't satisfy schema.");
    }
    return output;
  },
);
```

Note that the schema of a flow does not necessarily have to line up with the schema of the `generate()` calls within the flow (in fact, a flow might not even contain `generate()` calls). Here’s a variation of the example that passes a schema to `generate()`, but uses the structured output to format a simple string, which the flow returns.

```typescript
export const menuSuggestionFlowMarkdown = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: z.object({ formattedMenuItem: z.string() }),
  },
  async ({ theme }) => {
    const { output } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Invent a menu item for a ${theme} themed restaurant.`,
      output: { schema: MenuItemSchema },
    });
    if (output == null) {
      throw new Error("Response doesn't satisfy schema.");
    }
    return {
      formattedMenuItem: `**${output.dishname}**: ${output.description}`
    };
  },
);
```

### Calling flows

[Section titled “Calling flows”](#calling-flows)

Once you’ve defined a flow, you can call it from your Node.js code:

```typescript
const { text } = await menuSuggestionFlow({ theme: 'bistro' });
```

The argument to the flow must conform to the input schema, if you defined one.

If you defined an output schema, the flow response will conform to it. For example, if you set the output schema to `MenuItemSchema`, the flow output will contain its properties:

```typescript
const { dishname, description } = await menuSuggestionFlowWithSchema({ theme: 'bistro' });
```

## Streaming flows

[Section titled “Streaming flows”](#streaming-flows)

Flows support streaming using an interface similar to `generate()`’s streaming interface. Streaming is useful when your flow generates a large amount of output, because you can present the output to the user as it’s being generated, which improves the perceived responsiveness of your app. As a familiar example, chat-based LLM interfaces often stream their responses to the user as they are generated.

Here’s an example of a flow that supports streaming:

```typescript
export const menuSuggestionStreamingFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    streamSchema: z.string(),
    outputSchema: z.object({ theme: z.string(), menuItem: z.string() }),
  },
  async ({ theme }, { sendChunk }) => {
    const { stream, response } = ai.generateStream({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `Invent a menu item for a ${theme} themed restaurant.`,
    });


    for await (const chunk of stream) {
      // Here, you could process the chunk in some way before sending it to
      // the output stream via sendChunk(). In this example, we output
      // the text of the chunk, unmodified.
      sendChunk(chunk.text);
    }


    const { text: menuItem } = await response;


    return {
      theme,
      menuItem,
    };
  },
);
```

* The `streamSchema` option specifies the type of values your flow streams. This does not necessarily need to be the same type as the `outputSchema`, which is the type of the flow’s complete output.
* The second parameter to your flow definition is called `sideChannel`. It provides features such as request context and the `sendChunk` callback. The `sendChunk` callback takes a single parameter, of the type specified by `streamSchema`. Whenever data becomes available within your flow, send the data to the output stream by calling this function.

In the above example, the values streamed by the flow are directly coupled to the values streamed by the `generate()` call inside the flow. Although this is often the case, it doesn’t have to be: you can output values to the stream using the callback as often as is useful for your flow.

### Calling streaming flows

[Section titled “Calling streaming flows”](#calling-streaming-flows)

Streaming flows are also callable, but they immediately return a response object rather than a promise:

```typescript
const response = menuSuggestionStreamingFlow.stream({ theme: 'Danube' });
```

The response object has a stream property, which you can use to iterate over the streaming output of the flow as it’s generated:

```typescript
for await (const chunk of response.stream) {
  console.log('chunk', chunk);
}
```

You can also get the complete output of the flow, as you can with a non-streaming flow:

```typescript
const output = await response.output;
```

Note that the streaming output of a flow might not be the same type as the complete output; the streaming output conforms to `streamSchema`, whereas the complete output conforms to `outputSchema`.

## Running flows from the command line

[Section titled “Running flows from the command line”](#running-flows-from-the-command-line)

You can run flows from the command line using the Genkit CLI tool:

```bash
genkit flow:run menuSuggestionFlow '{"theme": "French"}'
```

For streaming flows, you can print the streaming output to the console by adding the `-s` flag:

```bash
genkit flow:run menuSuggestionFlow '{"theme": "French"}' -s
```

Running a flow from the command line is useful for testing a flow, or for running flows that perform tasks needed on an ad hoc basis—for example, to run a flow that ingests a document into your vector database.

## Debugging flows

[Section titled “Debugging flows”](#debugging-flows)

One of the advantages of encapsulating AI logic within a flow is that you can test and debug the flow independently from your app using the Genkit developer UI.

To start the developer UI, run the following commands from your project directory:

```bash
genkit start -- tsx --watch src/your-code.ts
```

From the **Run** tab of developer UI, you can run any of the flows defined in your project:

![Genkit DevUI flows](/_astro/devui-flows.CU7lon_X_Z1bEbxA.webp)

After you’ve run a flow, you can inspect a trace of the flow invocation by either clicking **View trace** or looking on the **Inspect** tab.

In the trace viewer, you can see details about the execution of the entire flow, as well as details for each of the individual steps within the flow. For example, consider the following flow, which contains several generation requests:

```typescript
const PrixFixeMenuSchema = z.object({
  starter: z.string(),
  soup: z.string(),
  main: z.string(),
  dessert: z.string(),
});


export const complexMenuSuggestionFlow = ai.defineFlow(
  {
    name: 'complexMenuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: PrixFixeMenuSchema,
  },
  async ({ theme }): Promise<z.infer<typeof PrixFixeMenuSchema>> => {
    const chat = ai.chat({ model: googleAI.model('gemini-2.5-flash') });
    await chat.send('What makes a good prix fixe menu?');
    await chat.send(
      'What are some ingredients, seasonings, and cooking techniques that ' + `would work for a ${theme} themed menu?`,
    );
    const { output } = await chat.send({
      prompt: `Based on our discussion, invent a prix fixe menu for a ${theme} ` + 'themed restaurant.',
      output: {
        schema: PrixFixeMenuSchema,
      },
    });
    if (!output) {
      throw new Error('No data generated.');
    }
    return output;
  },
);
```

When you run this flow, the trace viewer shows you details about each generation request including its output:

![Genkit DevUI flows](/_astro/devui-inspect.DMsKRir5_2mbUjn.webp)

### Flow steps

[Section titled “Flow steps”](#flow-steps)

In the last example, you saw that each `generate()` call showed up as a separate step in the trace viewer. Each of Genkit’s fundamental actions show up as separate steps of a flow:

* `generate()`
* `Chat.send()`
* `embed()`
* `index()`
* `retrieve()`

If you want to include code other than the above in your traces, you can do so by wrapping the code in a `run()` call. You might do this for calls to third-party libraries that are not Genkit-aware, or for any critical section of code.

For example, here’s a flow with two steps: the first step retrieves a menu using some unspecified method, and the second step includes the menu as context for a `generate()` call.

```ts
export const menuQuestionFlow = ai.defineFlow(
  {
    name: 'menuQuestionFlow',
    inputSchema: z.object({ question: z.string() }),
    outputSchema: z.object({ answer: z.string() }),
  },
  async ({ question }): Promise<{ answer: string }> => {
    const menu = await ai.run('retrieve-daily-menu', async (): Promise<string> => {
      // Retrieve today's menu. (This could be a database access or simply
      // fetching the menu from your website.)


      // ...


      return menu;
    });
    const { text } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      system: "Help the user answer questions about today's menu.",
      prompt: question,
      docs: [{ content: [{ text: menu }] }],
    });
    return { answer: text };
  },
);
```

Because the retrieval step is wrapped in a `run()` call, it’s included as a step in the trace viewer:

![Genkit DevUI flows](/_astro/devui-runstep.BapAMTA1_Z134JbI.webp)

## Deploying flows

[Section titled “Deploying flows”](#deploying-flows)

You can deploy your flows directly as web API endpoints, ready for you to call from your app clients. Deployment is discussed in detail on several other pages, but this section gives brief overviews of your deployment options.

### Cloud Functions for Firebase

[Section titled “Cloud Functions for Firebase”](#cloud-functions-for-firebase)

To deploy flows with Cloud Functions for Firebase, use the `onCallGenkit` feature of `firebase-functions/https`. `onCallGenkit` wraps your flow in a callable function. You may set an auth policy and configure App Check.

```typescript
import { hasClaim, onCallGenkit } from 'firebase-functions/https';
import { defineSecret } from 'firebase-functions/params';


const apiKey = defineSecret('GOOGLE_AI_API_KEY');


const menuSuggestionFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: z.object({ menuItem: z.string() }),
  },
  async ({ theme }) => {
    // ...
    return { menuItem: "Generated menu item would go here" };
  },
);


export const menuSuggestion = onCallGenkit(
  {
    secrets: [apiKey],
    authPolicy: hasClaim('email_verified'),
  },
  menuSuggestionFlow,
);
```

For more information, see the following pages:

* [Deploy with Firebase](/docs/firebase)
* [Authorization and integrity](/docs/auth#authorize-using-cloud-functions-for-firebase)
* [Firebase plugin](/docs/plugins/firebase)

### Express.js

[Section titled “Express.js”](#expressjs)

To deploy flows using any Node.js hosting platform, such as Cloud Run, define your flows using `defineFlow()` and then call `startFlowServer()`:

```typescript
import { startFlowServer } from '@genkit-ai/express';


export const menuSuggestionFlow = ai.defineFlow(
  {
    name: 'menuSuggestionFlow',
    inputSchema: z.object({ theme: z.string() }),
    outputSchema: z.object({ result: z.string() }),
  },
  async ({ theme }) => {
    // ...
  },
);


startFlowServer({
  flows: [menuSuggestionFlow],
});
```

By default, `startFlowServer` will serve all the flows defined in your codebase as HTTP endpoints (for example, `http://localhost:3400/menuSuggestionFlow`). You can call a flow with a POST request as follows:

```bash
curl -X POST "http://localhost:3400/menuSuggestionFlow" \
  -H "Content-Type: application/json"  -d '{"data": {"theme": "banana"}}'
```

If needed, you can customize the flows server to serve a specific list of flows, as shown below. You can also specify a custom port (it will use the PORT environment variable if set) or specify CORS settings.

```typescript
export const flowA = ai.defineFlow(
  {
    name: 'flowA',
    inputSchema: z.object({ subject: z.string() }),
    outputSchema: z.object({ response: z.string() }),
  },
  async ({ subject }) => {
    // ...
    return { response: "Generated response would go here" };
  }
);


export const flowB = ai.defineFlow(
  {
    name: 'flowB',
    inputSchema: z.object({ subject: z.string() }),
    outputSchema: z.object({ response: z.string() }),
  },
  async ({ subject }) => {
    // ...
    return { response: "Generated response would go here" };
  }
);


startFlowServer({
  flows: [flowB],
  port: 4567,
  cors: {
    origin: '*',
  },
});
```

For information on deploying to specific platforms, see [Deploy with Cloud Run](/docs/cloud-run) and [Deploy flows to any Node.js platform](/docs/deploy-node).

# Pause generation using interrupts

> Learn how to use interrupts in Genkit to pause and resume LLM generation, enabling human-in-the-loop interactions, asynchronous processing, and controlled task completion.

Beta

This feature of Genkit is in **Beta,** which means it is not yet part of Genkit’s stable API. APIs of beta features may change in minor version releases.

*Interrupts* are a special kind of [tool](/docs/tool-calling) that can pause the LLM generation-and-tool-calling loop to return control back to you. When you’re ready, you can then *resume* generation by sending *replies* that the LLM processes for further generation.

The most common uses for interrupts fall into a few categories:

* **Human-in-the-Loop:** Enabling the user of an interactive AI to clarify needed information or confirm the LLM’s action before it is completed, providing a measure of safety and confidence.
* **Async Processing:** Starting an asynchronous task that can only be completed out-of-band, such as sending an approval notification to a human reviewer or kicking off a long-running background process.
* **Exit from an Autonomous Task:** Providing the model a way to mark a task as complete, in a workflow that might iterate through a long series of tool calls.

[Genkit by Example: Human-in-the-Loop ](https://examples.genkit.dev/chatbot-hitl?utm_source=genkit.dev\&utm_content=contextlink)See a live demo of how interrupts can allow the LLM to ask structured questions of the user.

## Before you begin

[Section titled “Before you begin”](#before-you-begin)

All of the examples documented here assume that you have already set up a project with Genkit dependencies installed. If you want to run the code examples on this page, first complete the steps in the [Get started](/docs/get-started) guide.

Before diving too deeply, you should also be familiar with the following concepts:

* [Generating content](/docs/models) with AI models.
* Genkit’s system for [defining input and output schemas](/docs/flows).
* General methods of [tool-calling](/docs/tool-calling).

## Overview of interrupts

[Section titled “Overview of interrupts”](#overview-of-interrupts)

At a high level, this is what an interrupt looks like when interacting with an LLM:

1. The calling application prompts the LLM with a request. The prompt includes a list of tools, including at least one for an interrupt that the LLM can use to generate a response.
2. The LLM generates either a complete response or a tool call request in a specific format. To the LLM, an interrupt call looks like any other tool call.
3. If the LLM calls an interrupt tool, the Genkit library automatically pauses generation rather than immediately passing responses back to the model for additional processing.
4. The developer checks whether an interrupt call is made, and performs whatever task is needed to collect the information needed for the interrupt response.
5. The developer resumes generation by passing an interrupt response to the model. This action triggers a return to Step 2.

## Define manual-response interrupts

[Section titled “Define manual-response interrupts”](#define-manual-response-interrupts)

The most common kind of interrupt allows the LLM to request clarification from the user, for example by asking a multiple-choice question.

For this use case, use the Genkit instance’s `defineInterrupt()` method:

```ts
import { genkit, z } from 'genkit';
import { googleAI } from '@genkitai/google-ai';


const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model('gemini-2.5-flash'),
});


const askQuestion = ai.defineInterrupt({
  name: 'askQuestion',
  description: 'use this to ask the user a clarifying question',
  inputSchema: z.object({
    choices: z.array(z.string()).describe('the choices to display to the user'),
    allowOther: z.boolean().optional().describe('when true, allow write-ins'),
  }),
  outputSchema: z.string(),
});
```

Note that the `outputSchema` of an interrupt corresponds to the response data you will provide as opposed to something that will be automatically populated by a tool function.

### Use interrupts

[Section titled “Use interrupts”](#use-interrupts)

Interrupts are passed into the `tools` array when generating content, just like other types of tools. You can pass both normal tools and interrupts to the same `generate` call:

* Generate

  ```ts
  const response = await ai.generate({
    prompt: "Ask me a movie trivia question.",
    tools: [askQuestion],
  });
  ```

* definePrompt

  ```ts
  const triviaPrompt = ai.definePrompt({
    name: "triviaPrompt",
    tools: [askQuestion],
    input: {
      schema: z.object({ subject: z.string() }),
    },
    prompt: "Ask me a trivia question about {{subject}}.",
  });


  const response = await triviaPrompt({ subject: "computer history" });
  ```

* Prompt file

  ```dotprompt
  ---
  tools: [askQuestion]
  input:
    schema:
      partyType: string
  ---


  {{role "system"}}
  Use the askQuestion tool if you need to clarify something.


  {{role "user"}}
  Help me plan a
  {{partyType}}
  party next week.
  ```

  Then you can execute the prompt in your code as follows:

  ```ts
  // assuming prompt file is named partyPlanner.prompt
  const partyPlanner = ai.prompt("partyPlanner");


  const response = await partyPlanner({ partyType: "birthday" });
  ```

* Chat

  ```ts
  const chat = ai.chat({
    system: "Use the askQuestion tool if you need to clarify something.",
    tools: [askQuestion],
  });


  const response = await chat.send("make a plan for my birthday party");
  ```

Genkit immediately returns a response on receipt of an interrupt tool call.

### Respond to interrupts

[Section titled “Respond to interrupts”](#respond-to-interrupts)

If you’ve passed one or more interrupts to your generate call, you need to check the response for interrupts so that you can handle them:

```ts
// you can check the 'finishReason' of the response
response.finishReason === 'interrupted';
// or you can check to see if any interrupt requests are on the response
response.interrupts.length > 0;
```

Responding to an interrupt is done using the `resume` option on a subsequent `generate` call, making sure to pass in the existing history. Each tool has a `.respond()` method on it to help construct the response.

Once resumed, the model re-enters the generation loop, including tool execution, until either it completes or another interrupt is triggered:

```ts
let response = await ai.generate({
  tools: [askQuestion],
  system: 'ask clarifying questions until you have a complete solution',
  prompt: 'help me plan a backyard BBQ',
});


while (response.interrupts.length) {
  const answers = [];
  // multiple interrupts can be called at once, so we handle them all
  for (const question of response.interrupts) {
    answers.push(
      // use the `respond` method on our tool to populate answers
      askQuestion.respond(
        question,
        // send the tool request input to the user to respond
        await askUser(question.toolRequest.input),
      ),
    );
  }


  response = await ai.generate({
    tools: [askQuestion],
    messages: response.messages,
    resume: {
      respond: answers,
    },
  });
}


// no more interrupts, we can see the final response
console.log(response.text);
```

## Tools with restartable interrupts

[Section titled “Tools with restartable interrupts”](#tools-with-restartable-interrupts)

Another common pattern for interrupts is the need to *confirm* an action that the LLM suggests before actually performing it. For example, a payments app might want the user to confirm certain kinds of transfers.

For this use case, you can use the standard `defineTool` method to add custom logic around when to trigger an interrupt, and what to do when an interrupt is *restarted* with additional metadata.

### Define a restartable tool

[Section titled “Define a restartable tool”](#define-a-restartable-tool)

Every tool has access to two special helpers in the second argument of its implementation definition:

* `interrupt`: when called, this method throws a special kind of exception that is caught to pause the generation loop. You can provide additional metadata as an object.
* `resumed`: when a request from an interrupted generation is restarted using the `{resume: {restart: ...}}` option (see below), this helper contains the metadata provided when restarting.

If you were building a payments app, for example, you might want to confirm with the user before making a transfer exceeding a certain amount:

```ts
const transferMoney = ai.defineTool({
  name: 'transferMoney',
  description: 'Transfers money between accounts.',
  inputSchema: z.object({
    toAccountId: z.string().describe('the account id of the transfer destination'),
    amount: z.number().describe('the amount in integer cents (100 = $1.00)'),
  }),
  outputSchema: z.object({
    status: z.string().describe('the outcome of the transfer'),
    message: z.string().optional(),
  })
}, async (input, {context, interrupt, resumed})) {
  // if the user rejected the transaction
  if (resumed?.status === "REJECTED") {
    return {status: 'REJECTED', message: 'The user rejected the transaction.'};
  }
  // trigger an interrupt to confirm if amount > $100
  if (resumed?.status !== "APPROVED" && input.amount > 10000) {
    interrupt({
      message: "Please confirm sending an amount > $100.",
    });
  }
  // complete the transaction if not interrupted
  return doTransfer(input);
}
```

In this example, on first execution (when `resumed` is undefined), the tool checks to see if the amount exceeds $100, and triggers an interrupt if so. On second execution, it looks for a status in the new metadata provided and performs the transfer or returns a rejection response, depending on whether it is approved or rejected.

### Restart tools after interruption

[Section titled “Restart tools after interruption”](#restart-tools-after-interruption)

Interrupt tools give you full control over:

1. When an initial tool request should trigger an interrupt.
2. When and whether to resume the generation loop.
3. What additional information to provide to the tool when resuming.

In the example shown in the previous section, the application might ask the user to confirm the interrupted request to make sure the transfer amount is okay:

```ts
let response = await ai.generate({
  tools: [transferMoney],
  prompt: "Transfer $1000 to account ABC123",
});


while (response.interrupts.length) {
  const confirmations = [];
  // multiple interrupts can be called at once, so we handle them all
  for (const interrupt of response.interrupts) {
    confirmations.push(
      // use the 'restart' method on our tool to provide `resumed` metadata
      transferMoney.restart(
        interrupt,
        // send the tool request input to the user to respond. assume that this
        // returns `{status: "APPROVED"}` or `{status: "REJECTED"}`
        await requestConfirmation(interrupt.toolRequest.input);
      )
    );
  }


  response = await ai.generate({
    tools: [transferMoney],
    messages: response.messages,
    resume: {
      restart: confirmations,
    }
  })
}


// no more interrupts, we can see the final response
console.log(response.text);
```

# Observe local metrics

> Learn about Genkit's local observability features, including tracing, metrics collection, and logging, powered by OpenTelemetry and integrated with the Genkit Developer UI.

Genkit provides a robust set of built-in observability features, including tracing and metrics collection powered by [OpenTelemetry](https://opentelemetry.io/). For local observability, such as during the development phase, the Genkit Developer UI provides detailed trace viewing and debugging capabilities. For production observability, we provide Genkit Monitoring in the Firebase console via the Firebase plugin. Alternatively, you can export your OpenTelemetry data to the observability tooling of your choice.

## Tracing & Metrics

[Section titled “Tracing & Metrics”](#tracing--metrics)

Genkit automatically collects traces and metrics without requiring explicit configuration, allowing you to observe and debug your Genkit code’s behavior in the Developer UI. Genkit stores these traces, enabling you to analyze your Genkit flows step-by-step with detailed input/output logging and statistics. In production, Genkit can export traces and metrics to Firebase Genkit Monitoring for further analysis.

## Log and export events

[Section titled “Log and export events”](#log-and-export-events)

Genkit provides a centralized logging system that you can configure using the logging module. One advantage of using the Genkit-provided logger is that it automatically exports logs to Genkit Monitoring when the Firebase Telemetry plugin is enabled.

```typescript
import { logger } from 'genkit/logging';


// Set the desired log level
logger.setLogLevel('debug');
```

## Production Observability

[Section titled “Production Observability”](#production-observability)

The [Genkit Monitoring](https://console.firebase.google.com/project/_/genai_monitoring) dashboard helps you understand the overall health of your Genkit features. It is also useful for debugging stability and content issues that may indicate problems with your LLM prompts and/or Genkit Flows. See the [Getting Started](/docs/observability/getting-started) guide for more details.

# Generating content with AI models

> Learn how to generate content with AI models using Genkit's unified interface, covering basic usage, configuration, structured output, streaming, and multimodal input/output.

<!-- notice here, the key rule is `[&_svg]:open:-rotate-180` -->

At the heart of generative AI are AI *models*. Currently, the two most prominent examples of generative models are large language models (LLMs) and image generation models. These models take input, called a *prompt* (most commonly text, an image, or a combination of both), and from it produce as output text, an image, or even audio or video.

The output of these models can be surprisingly convincing: LLMs generate text that appears as though it could have been written by a human being, and image generation models can produce images that are very close to real photographs or artwork created by humans.

In addition, LLMs have proven capable of tasks beyond simple text generation:

* Writing computer programs
* Planning subtasks that are required to complete a larger task
* Organizing unorganized data
* Understanding and extracting information data from a corpus of text
* Following and performing automated activities based on a text description of the activity

There are many models available to you, from several different providers. Each model has its own strengths and weaknesses and one model might excel at one task but perform less well at others. Apps making use of generative AI can often benefit from using multiple different models depending on the task at hand.

As an app developer, you typically don’t interact with generative AI models directly, but rather through services available as web APIs. Although these services often have similar functionality, they all provide them through different and incompatible APIs. If you want to make use of multiple model services, you have to use each of their proprietary SDKs, potentially incompatible with each other. And if you want to upgrade from one model to the newest and most capable one, you might have to build that integration all over again.

Genkit addresses this challenge by providing a single interface that abstracts away the details of accessing potentially any generative AI model service, with several pre-built implementations already available. Building your AI-powered app around Genkit simplifies the process of making your first generative AI call and makes it equally easy to combine multiple models or swap one model for another as new models emerge.

### Before you begin

[Section titled “Before you begin”](#before-you-begin)

If you want to run the code examples on this page, first complete the steps in the [Getting started](/docs/get-started) guide. All of the examples assume that you have already installed Genkit as a dependency in your project.

### Models supported by Genkit

[Section titled “Models supported by Genkit”](#models-supported-by-genkit)

Genkit is designed to be flexible enough to use potentially any generative AI model service. Its core libraries define the common interface for working with models, and model plugins define the implementation details for working with a specific model and its API.

The Genkit team maintains plugins for working with models provided by Vertex AI, Google Generative AI, and Ollama:

* Gemini family of LLMs, through the [Google Cloud Vertex AI plugin](/docs/plugins/vertex-ai)
* Gemini family of LLMs, through the [Google AI plugin](/docs/plugins/google-genai)
* Imagen2 and Imagen3 image generation models, through Google Cloud Vertex AI
* Anthropic’s Claude 3 family of LLMs, through Google Cloud Vertex AI’s model garden
* Gemma 2, Llama 3, and many more open models, through the [Ollama plugin](/docs/plugins/ollama) (you must host the Ollama server yourself)
* GPT, Dall-E and Whisper family of models, through the [OpenAI plugin](/docs/plugins/openai)
* Grok family of models, through the [xAI plugin](/docs/plugins/xai)
* DeepSeek Chat and DeepSeek Reasoner models, through the [DeepSeek plugin](/docs/plugins/deepseek)

In addition, there are also several community-supported plugins that provide interfaces to these models:

* Claude 3 family of LLMs, through the [Anthropic plugin](https://thefireco.github.io/genkit-plugins/docs/plugins/genkitx-anthropic)
* GPT family of LLMs through the [Azure OpenAI plugin](https://thefireco.github.io/genkit-plugins/docs/plugins/genkitx-azure-openai)
* Command R family of LLMs through the [Cohere plugin](https://thefireco.github.io/genkit-plugins/docs/plugins/genkitx-cohere)
* Mistral family of LLMs through the [Mistral plugin](https://thefireco.github.io/genkit-plugins/docs/plugins/genkitx-mistral)
* Gemma 2, Llama 3, and many more open models hosted on Groq, through the [Groq plugin](https://thefireco.github.io/genkit-plugins/docs/plugins/genkitx-groq)

You can discover more by searching for [packages tagged with `genkit-model` on npmjs.org](https://www.npmjs.com/search?q=keywords%3Agenkit-model).

### Loading and configuring model plugins

[Section titled “Loading and configuring model plugins”](#loading-and-configuring-model-plugins)

Before you can use Genkit to start generating content, you need to load and configure a model plugin. If you’re coming from the Getting Started guide, you’ve already done this. Otherwise, see the [Getting Started](/docs/get-started) guide or the individual plugin’s documentation and follow the steps there before continuing.

### The generate() method

[Section titled “The generate() method”](#the-generate-method)

In Genkit, the primary interface through which you interact with generative AI models is the `generate()` method.

The simplest `generate()` call specifies the model you want to use and a text prompt:

```ts
import { googleAI } from '@genkit-ai/googleai';
import { genkit } from 'genkit';


const ai = genkit({
  plugins: [googleAI()],
  // Optional. Specify a default model.
  model: googleAI.model('gemini-2.5-flash'),
});


async function run() {
  const response = await ai.generate('Invent a menu item for a restaurant with a pirate theme.');
  console.log(response.text);
}


run();
```

When you run this brief example, it will print out some debugging information followed by the output of the `generate()` call, which will usually be Markdown text as in the following example:

```md
## The Blackheart's Bounty


**A hearty stew of slow-cooked beef, spiced with rum and molasses, served in a
hollowed-out cannonball with a side of crusty bread and a dollop of tangy
pineapple salsa.**


**Description:** This dish is a tribute to the hearty meals enjoyed by pirates
on the high seas. The beef is tender and flavorful, infused with the warm spices
of rum and molasses. The pineapple salsa adds a touch of sweetness and acidity,
balancing the richness of the stew. The cannonball serving vessel adds a fun and
thematic touch, making this dish a perfect choice for any pirate-themed
adventure.
```

Run the script again and you’ll get a different output.

The preceding code sample sent the generation request to the default model, which you specified when you configured the Genkit instance.

You can also specify a model for a single `generate()` call:

```ts
import { googleAI } from '@genkit-ai/googleai';


const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash'),
  prompt: 'Invent a menu item for a restaurant with a pirate theme.',
});
```

This example uses a model reference function provided by the model plugin. Model references carry static type information about the model and its options which can be useful for code completion in the IDE and at compile time. Many plugins use this pattern, but not all, so in cases where they don’t, refer to the plugin documentation for their preferred way to create function references.

Sometimes you may see code samples where model references are imported as constants:

```ts
import { googleAI, gemini20Flash } from '@genkit-ai/googleai';


const ai = genkit({
  plugins: [googleAI()],
  model: gemini20Flash,
});
```

Some plugins may still use this pattern. For plugins that switched to the new syntax those constants are still there and continue to work, but new constants for new future models may not to be added in the future.

Another option is to specify the model using a string identifier. This way will work for all plugins regardless of how they chose to handle typed model references, however you won’t have the help of static type checking:

```ts
const response = await ai.generate({
  model: 'googleai/gemini-2.5-flash-001',
  prompt: 'Invent a menu item for a restaurant with a pirate theme.',
});
```

A model string identifier looks like `providerid/modelid`, where the provider ID (in this case, `googleai`) identifies the plugin, and the model ID is a plugin-specific string identifier for a specific version of a model.

Some model plugins, such as the Ollama plugin, provide access to potentially dozens of different models and therefore do not export individual model references. In these cases, you can only specify a model to `generate()` using its string identifier.

These examples also illustrate an important point: when you use `generate()` to make generative AI model calls, changing the model you want to use is simply a matter of passing a different value to the model parameter. By using `generate()` instead of the native model SDKs, you give yourself the flexibility to more easily use several different models in your app and change models in the future.

So far you have only seen examples of the simplest `generate()` calls. However, `generate()` also provides an interface for more advanced interactions with generative models, which you will see in the sections that follow.

### System prompts

[Section titled “System prompts”](#system-prompts)

Some models support providing a *system prompt*, which gives the model instructions as to how you want it to respond to messages from the user. You can use the system prompt to specify a persona you want the model to adopt, the tone of its responses, the format of its responses, and so on.

If the model you’re using supports system prompts, you can provide one with the `system` parameter:

```ts
const response = await ai.generate({
  prompt: 'What is your quest?',
  system: "You are a knight from Monty Python's Flying Circus.",
});
```

### Multi-turn conversations with messages

[Section titled “Multi-turn conversations with messages”](#multi-turn-conversations-with-messages)

For multi-turn conversations, you can use the `messages` parameter instead of `prompt` to provide a conversation history. This is particularly useful when you need to maintain context across multiple interactions with the model.

The `messages` parameter accepts an array of message objects, where each message has a `role` (one of `'system'`, `'user'`, `'model'`, or `'tool'`) and `content`:

```ts
const response = await ai.generate({
  messages: [
    { role: 'user', content: 'Hello, can you help me plan a trip?' },
    { role: 'model', content: 'Of course! I\'d be happy to help you plan a trip. Where are you thinking of going?' },
    { role: 'user', content: 'I want to visit Japan for two weeks in spring.' }
  ],
});
```

You can also combine `messages` with other parameters like `system` prompts:

```ts
const response = await ai.generate({
  system: 'You are a helpful travel assistant.',
  messages: [
    { role: 'user', content: 'What should I pack for Japan in spring?' }
  ],
});
```

**When to use `messages` vs. Chat API:**

* Use the `messages` parameter for simple multi-turn conversations where you manually manage the conversation history
* For persistent chat sessions with automatic history management, use the [Chat API](/docs/chat) instead

### Model parameters

[Section titled “Model parameters”](#model-parameters)

The `generate()` function takes a `config` parameter, through which you can specify optional settings that control how the model generates content:

```ts
const response = await ai.generate({
  prompt: 'Invent a menu item for a restaurant with a pirate theme.',
  config: {
    maxOutputTokens: 512,
    stopSequences: ['\n'],
    temperature: 1.0,
    topP: 0.95,
    topK: 40,
  },
});
```

The exact parameters that are supported depend on the individual model and model API. However, the parameters in the previous example are common to almost every model. The following is an explanation of these parameters:

#### Parameters that control output length

[Section titled “Parameters that control output length”](#parameters-that-control-output-length)

**maxOutputTokens**

LLMs operate on units called *tokens*. A token usually, but does not necessarily, map to a specific sequence of characters. When you pass a prompt to a model, one of the first steps it takes is to *tokenize* your prompt string into a sequence of tokens. Then, the LLM generates a sequence of tokens from the tokenized input. Finally, the sequence of tokens gets converted back into text, which is your output.

The maximum output tokens parameter simply sets a limit on how many tokens to generate using the LLM. Every model potentially uses a different tokenizer, but a good rule of thumb is to consider a single English word to be made of 2 to 4 tokens.

As stated earlier, some tokens might not map to character sequences. One such example is that there is often a token that indicates the end of the sequence: when an LLM generates this token, it stops generating more. Therefore, it’s possible and often the case that an LLM generates fewer tokens than the maximum because it generated the “stop” token.

**stopSequences**

You can use this parameter to set the tokens or token sequences that, when generated, indicate the end of LLM output. The correct values to use here generally depend on how the model was trained, and are usually set by the model plugin. However, if you have prompted the model to generate another stop sequence, you might specify it here.

Note that you are specifying character sequences, and not tokens per se. In most cases, you will specify a character sequence that the model’s tokenizer maps to a single token.

#### Parameters that control “creativity”

[Section titled “Parameters that control “creativity””](#parameters-that-control-creativity)

The *temperature*, *top-p*, and *top-k* parameters together control how “creative” you want the model to be. Below are very brief explanations of what these parameters mean, but the more important point to take away is this: these parameters are used to adjust the character of an LLM’s output. The optimal values for them depend on your goals and preferences, and are likely to be found only through experimentation.

**temperature**

LLMs are fundamentally token-predicting machines. For a given sequence of tokens (such as the prompt) an LLM predicts, for each token in its vocabulary, the likelihood that the token comes next in the sequence. The temperature is a scaling factor by which these predictions are divided before being normalized to a probability between 0 and 1.

Low temperature values—between 0.0 and 1.0—amplify the difference in likelihoods between tokens, with the result that the model will be even less likely to produce a token it already evaluated to be unlikely. This is often perceived as output that is less creative. Although 0.0 is technically not a valid value, many models treat it as indicating that the model should behave deterministically, and to only consider the single most likely token.

High temperature values—those greater than 1.0—compress the differences in likelihoods between tokens, with the result that the model becomes more likely to produce tokens it had previously evaluated to be unlikely. This is often perceived as output that is more creative. Some model APIs impose a maximum temperature, often 2.0.

**topP**

*Top-p* is a value between 0.0 and 1.0 that controls the number of possible tokens you want the model to consider, by specifying the cumulative probability of the tokens. For example, a value of 1.0 means to consider every possible token (but still take into account the probability of each token). A value of 0.4 means to only consider the most likely tokens, whose probabilities add up to 0.4, and to exclude the remaining tokens from consideration.

**topK**

*Top-k* is an integer value that also controls the number of possible tokens you want the model to consider, but this time by explicitly specifying the maximum number of tokens. Specifying a value of 1 means that the model should behave deterministically.

#### Experiment with model parameters

[Section titled “Experiment with model parameters”](#experiment-with-model-parameters)

You can experiment with the effect of these parameters on the output generated by different model and prompt combinations by using the Developer UI. Start the developer UI with the `genkit start` command and it will automatically load all of the models defined by the plugins configured in your project. You can quickly try different prompts and configuration values without having to repeatedly make these changes in code.

### Structured output

[Section titled “Structured output”](#structured-output)

[Genkit by Example: Structured Output ](https://examples.genkit.dev/structured-output?utm_source=genkit.dev\&utm_content=contextlink)View a live example of using structured output to generate a D\&D character sheet.

When using generative AI as a component in your application, you often want output in a format other than plain text. Even if you’re just generating content to display to the user, you can benefit from structured output simply for the purpose of presenting it more attractively to the user. But for more advanced applications of generative AI, such as programmatic use of the model’s output, or feeding the output of one model into another, structured output is a must.

In Genkit, you can request structured output from a model by specifying a schema when you call `generate()`:

```ts
import { z } from 'genkit';
```

```ts
const MenuItemSchema = z.object({
  name: z.string().describe('The name of the menu item.'),
  description: z.string().describe('A description of the menu item.'),
  calories: z.number().describe('The estimated number of calories.'),
  allergens: z.array(z.string()).describe('Any known allergens in the menu item.'),
});


const response = await ai.generate({
  prompt: 'Suggest a menu item for a pirate-themed restaurant.',
  output: { schema: MenuItemSchema },
});
```

Model output schemas are specified using the [Zod](https://zod.dev/) library. In addition to a schema definition language, Zod also provides runtime type checking, which bridges the gap between static TypeScript types and the unpredictable output of generative AI models. Zod lets you write code that can rely on the fact that a successful generate call will always return output that conforms to your TypeScript types.

When you specify a schema in `generate()`, Genkit does several things behind the scenes:

* Augments the prompt with additional guidance about the desired output format. This also has the side effect of specifying to the model what content exactly you want to generate (for example, not only suggest a menu item but also generate a description, a list of allergens, and so on).
* Parses the model output into a JavaScript object.
* Verifies that the output conforms with the schema.

To get structured output from a successful generate call, use the response object’s `output` property:

```ts
const menuItem = response.output; // Typed as z.infer<typeof MenuItemSchema>
console.log(menuItem?.name);
```

#### Handling errors

[Section titled “Handling errors”](#handling-errors)

Note in the prior example that the `output` property can be `null`. This can happen when the model fails to generate output that conforms to the schema. The best strategy for dealing with such errors will depend on your exact use case, but here are some general hints:

* **Try a different model**. For structured output to succeed, the model must be capable of generating output in JSON. The most powerful LLMs, like Gemini and Claude, are versatile enough to do this; however, smaller models, such as some of the local models you would use with Ollama, might not be able to generate structured output reliably unless they have been specifically trained to do so.

* **Make use of Zod’s coercion abilities**: You can specify in your schemas that Zod should try to coerce non-conforming types into the type specified by the schema. If your schema includes primitive types other than strings, using Zod coercion can reduce the number of `generate()` failures you experience. The following version of `MenuItemSchema` uses type coercion to automatically correct situations where the model generates calorie information as a string instead of a number:

  ```ts
  const MenuItemSchema = z.object({
    name: z.string().describe('The name of the menu item.'),
    description: z.string().describe('A description of the menu item.'),
    calories: z.coerce.number().describe('The estimated number of calories.'),
    allergens: z.array(z.string()).describe('Any known allergens in the menu item.'),
  });
  ```

* **Retry the generate() call**. If the model you’ve chosen only rarely fails to generate conformant output, you can treat the error as you would treat a network error, and simply retry the request using some kind of incremental back-off strategy.

### Streaming

[Section titled “Streaming”](#streaming)

When generating large amounts of text, you can improve the experience for your users by presenting the output as it’s generated—streaming the output. A familiar example of streaming in action can be seen in most LLM chat apps: users can read the model’s response to their message as it’s being generated, which improves the perceived responsiveness of the application and enhances the illusion of chatting with an intelligent counterpart.

In Genkit, you can stream output using the `generateStream()` method. Its syntax is similar to the `generate()` method:

```ts
const { stream, response } = ai.generateStream({
  prompt: 'Tell me a story about a boy and his dog.',
});
```

The response object has a `stream` property, which you can use to iterate over the streaming output of the request as it’s generated:

```ts
for await (const chunk of stream) {
  console.log(chunk.text);
}
```

You can also get the complete output of the request, as you can with a non-streaming request:

```ts
const finalResponse = await response;
console.log(finalResponse.text);
```

Streaming also works with structured output:

```ts
const { stream, response } = ai.generateStream({
  prompt: 'Suggest three pirate-themed menu items.',
  output: { schema: z.array(MenuItemSchema) },
});


for await (const chunk of stream) {
  console.log(chunk.output);
}


const finalResponse = await response;
console.log(finalResponse.output);
```

Streaming structured output works a little differently from streaming text: the `output` property of a response chunk is an object constructed from the accumulation of the chunks that have been produced so far, rather than an object representing a single chunk (which might not be valid on its own). **Every chunk of structured output in a sense supersedes the chunk that came before it**.

For example, here’s what the first five outputs from the prior example might look like:

```js
null;


{
  starters: [{}];
}


{
  starters: [{ name: "Captain's Treasure Chest", description: 'A' }];
}


{
  starters: [
    {
      name: "Captain's Treasure Chest",
      description: 'A mix of spiced nuts, olives, and marinated cheese served in a treasure chest.',
      calories: 350,
    },
  ];
}


{
  starters: [
    {
      name: "Captain's Treasure Chest",
      description: 'A mix of spiced nuts, olives, and marinated cheese served in a treasure chest.',
      calories: 350,
      allergens: [Array],
    },
    { name: 'Shipwreck Salad', description: 'Fresh' },
  ];
}
```

### Multimodal input

[Section titled “Multimodal input”](#multimodal-input)

[Genkit by Example: Image Analysis ](https://examples.genkit.dev/image-analysis?utm_source=genkit.dev\&utm_content=contextlink)See a live demo of how Genkit can enable image analysis using multimodal input.

The examples you’ve seen so far have used text strings as model prompts. While this remains the most common way to prompt generative AI models, many models can also accept other media as prompts. Media prompts are most often used in conjunction with text prompts that instruct the model to perform some operation on the media, such as to caption an image or transcribe an audio recording.

The ability to accept media input and the types of media you can use are completely dependent on the model and its API. For example, the Gemini 1.5 series of models can accept images, video, and audio as prompts.

To provide a media prompt to a model that supports it, instead of passing a simple text prompt to `generate`, pass an array consisting of a media part and a text part:

```ts
const response = await ai.generate({
  prompt: [{ media: { url: 'https://.../image.jpg' } }, { text: 'What is in this image?' }],
});
```

In the above example, you specified an image using a publicly-accessible HTTPS URL. You can also pass media data directly by encoding it as a data URL. For example:

```ts
import { readFile } from 'node:fs/promises';
```

```ts
const data = await readFile('image.jpg');
const response = await ai.generate({
  prompt: [{ media: { url: `data:image/jpeg;base64,${data.toString('base64')}` } }, { text: 'What is in this image?' }],
});
```

All models that support media input support both data URLs and HTTPS URLs. Some model plugins add support for other media sources. For example, the Vertex AI plugin also lets you use Cloud Storage (`gs://`) URLs.

### Generating Media

[Section titled “Generating Media”](#generating-media)

While most examples in this guide focus on generating text with LLMs, Genkit also supports generating other types of media, including **images** and **audio**. Thanks to its unified `generate()` interface, working with media models is just as straightforward as generating text.

#### Image Generation

[Section titled “Image Generation”](#image-generation)

To generate an image using a model like Imagen from Vertex AI, follow these steps:

1. **Install a data URL parser.** Genkit outputs media as data URLs, so you’ll need to decode them before saving to disk. This example uses [`data-urls`](https://www.npmjs.com/package/data-urls):

   ```bash
   npm install data-urls
   npm install --save-dev @types/data-urls
   ```

2. **Generate the image and save it to a file:**

   ```ts
   import { vertexAI } from '@genkit-ai/vertexai';
   import { parseDataUrl } from 'data-urls';
   import { writeFile } from 'node:fs/promises';


   const response = await ai.generate({
     model: vertexAI.model('imagen-3.0-fast-generate-001'),
     prompt: 'An illustration of a dog wearing a space suit, photorealistic',
     output: { format: 'media' },
   });


   const imagePart = response.output;
   if (imagePart?.media?.url) {
     const parsed = parseDataUrl(imagePart.media.url);
     if (parsed) {
       await writeFile('dog.png', parsed.body);
     }
   }
   ```

This will generate an image and save it as a PNG file named `dog.png`.

#### Audio Generation

[Section titled “Audio Generation”](#audio-generation)

You can also use Genkit to generate audio with a text-to-speech (TTS) models. This is especially useful for voice features, narration, or accessibility support.

Here’s how to convert text into speech and save it as an audio file:

```ts
import { googleAI } from '@genkit-ai/googleai';
import { writeFile } from 'node:fs/promises';
import { Buffer } from 'node:buffer';


const response = await ai.generate({
  model: googleAI.model('gemini-2.5-flash-preview-tts'),


  // Gemini-specific configuration for audio generation
  // Available configuration options will depend on model and provider
  config: {
    responseModalities: ['AUDIO'],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: 'Algenib' },
      },
    },
  },
  prompt: 'Say that Genkit is an amazing AI framework',
});


// Handle the audio data (returned as a data URL)
if (response.media?.url) {
  // Extract base64 data from the data URL
  const audioBuffer = Buffer.from(
    response.media.url.substring(response.media.url.indexOf(',') + 1),
    'base64'
  );


  // Save to a file
  await writeFile('output.wav', audioBuffer);
}
```

This code generates speech using the Gemini TTS model and saves the result to a file named `output.wav`.

### Next steps

[Section titled “Next steps”](#next-steps)

#### Learn more about Genkit

[Section titled “Learn more about Genkit”](#learn-more-about-genkit)

* As an app developer, the primary way you influence the output of generative AI models is through prompting. Read [Prompt management](/docs/dotprompt) to learn how Genkit helps you develop effective prompts and manage them in your codebase.
* Although `generate()` is the nucleus of every generative AI powered application, real-world applications usually require additional work before and after invoking a generative AI model. To reflect this, Genkit introduces the concept of *flows*, which are defined like functions but add additional features such as observability and simplified deployment. To learn more, see [Defining workflows](/docs/flows).

#### Advanced LLM use

[Section titled “Advanced LLM use”](#advanced-llm-use)

* Many of your users will have interacted with large language models for the first time through chatbots. Although LLMs are capable of much more than simulating conversations, it remains a familiar and useful style of interaction. Even when your users will not be interacting directly with the model in this way, the conversational style of prompting is a powerful way to influence the output generated by an AI model. Read [Multi-turn chats](/docs/chat) to learn how to use Genkit as part of an LLM chat implementation.
* One way to enhance the capabilities of LLMs is to prompt them with a list of ways they can request more information from you, or request you to perform some action. This is known as *tool calling* or *function calling*. Models that are trained to support this capability can respond to a prompt with a specially-formatted response, which indicates to the calling application that it should perform some action and send the result back to the LLM along with the original prompt. Genkit has library functions that automate both the prompt generation and the call-response loop elements of a tool calling implementation. See [Tool calling](/docs/tool-calling) to learn more.
* Retrieval-augmented generation (RAG) is a technique used to introduce domain-specific information into a model’s output. This is accomplished by inserting relevant information into a prompt before passing it on to the language model. A complete RAG implementation requires you to bring several technologies together: text embedding generation models, vector databases, and large language models. See [Retrieval-augmented generation (RAG)](/docs/rag) to learn how Genkit simplifies the process of coordinating these various elements.

#### Testing model output

[Section titled “Testing model output”](#testing-model-output)

As a software engineer, you’re used to deterministic systems where the same input always produces the same output. However, with AI models being probabilistic, the output can vary based on subtle nuances in the input, the model’s training data, and even randomness deliberately introduced by parameters like temperature.

Genkit’s evaluators are structured ways to assess the quality of your LLM’s responses, using a variety of strategies. Read more on the [Evaluation](/docs/evaluation) page.

# Building multi-agent systems

> Learn how to build multi-agent systems in Genkit by delegating tasks to specialized agents, addressing challenges of complex agentic workflows.

Beta

This feature of Genkit is in **Beta,** which means it is not yet part of Genkit’s stable API. APIs of beta features may change in minor version releases.

A powerful application of large language models are LLM-powered agents. An agent is a system that can carry out complex tasks by planning how to break tasks into smaller ones, and (with the help of [tool calling](/docs/tool-calling)) execute tasks that interact with external resources such as databases or even physical devices.

Here are some excerpts from a very simple customer service agent built using a single prompt and several tools:

```typescript
const menuLookupTool = ai.defineTool(
  {
    name: 'menuLookupTool',
    description: 'use this tool to look up the menu for a given date',
    inputSchema: z.object({
      date: z.string().describe('the date to look up the menu for'),
    }),
    outputSchema: z.string().describe('the menu for a given date'),
  },
  async (input) => {
    // Retrieve the menu from a database, website, etc.
    // ...
  },
);


const reservationTool = ai.defineTool(
  {
    name: 'reservationTool',
    description: 'use this tool to try to book a reservation',
    inputSchema: z.object({
      partySize: z.coerce.number().describe('the number of guests'),
      date: z.string().describe('the date to book for'),
    }),
    outputSchema: z
      .string()
      .describe(
        "true if the reservation was successfully booked and false if there's" +
          ' no table available for the requested time',
      ),
  },
  async (input) => {
    // Access your database to try to make the reservation.
    // ...
  },
);
```

```typescript
const chat = ai.chat({
  model: googleAI.model('gemini-2.5-flash'),
  system:
    "You are an AI customer service agent for Pavel's Cafe. Use the tools " +
    'available to you to help the customer. If you cannot help the ' +
    'customer with the available tools, politely explain so.',
  tools: [menuLookupTool, reservationTool],
});
```

A simple architecture like the one shown above can be sufficient when your agent only has a few capabilities. However, even for the limited example above, you can see that there are some capabilities that customers would likely expect: for example, listing the customer’s current reservations, canceling a reservation, and so on. As you build more and more tools to implement these additional capabilities, you start to run into some problems:

* The more tools you add, the more you stretch the model’s ability to consistently and correctly employ the right tool for the job.
* Some tasks might best be served through a more focused back and forth between the user and the agent, rather than by a single tool call.
* Some tasks might benefit from a specialized prompt. For example, if your agent is responding to an unhappy customer, you might want its tone to be more business-like, whereas the agent that greets the customer initially can have a more friendly and lighthearted tone.

One approach you can use to deal with these issues that arise when building complex agents is to create many specialized agents and use a general purpose agent to delegate tasks to them. Genkit supports this architecture by allowing you to specify prompts as tools. Each prompt represents a single specialized agent, with its own set of tools available to it, and those agents are in turn available as tools to your single orchestration agent, which is the primary interface with the user.

Here’s what an expanded version of the previous example might look like as a multi-agent system:

```typescript
// Define a prompt that represents a specialist agent
const reservationAgent = ai.definePrompt({
  name: 'reservationAgent',
  description: 'Reservation Agent can help manage guest reservations',
  tools: [reservationTool, reservationCancelationTool, reservationListTool],
  system: 'Help guests make and manage reservations',
});


// Or load agents from .prompt files
const menuInfoAgent = ai.prompt('menuInfoAgent');
const complaintAgent = ai.prompt('complaintAgent');


// The triage agent is the agent that users interact with initially
const triageAgent = ai.definePrompt({
  name: 'triageAgent',
  description: 'Triage Agent',
  tools: [reservationAgent, menuInfoAgent, complaintAgent],
  system: `You are an AI customer service agent for Pavel's Cafe.
  Greet the user and ask them how you can help. If appropriate, transfer to an
  agent that can better handle the request. If you cannot help the customer with
  the available tools, politely explain so.`,
});
```

```typescript
// Start a chat session, initially with the triage agent
const chat = ai.chat(triageAgent);
```

# Retrieval-augmented generation (RAG)

> Learn how Genkit simplifies retrieval-augmented generation (RAG) by providing abstractions and plugins for indexers, embedders, and retrievers to incorporate external data into LLM responses.

Genkit provides abstractions that help you build retrieval-augmented generation (RAG) flows, as well as plugins that provide integrations with related tools.

## What is RAG?

[Section titled “What is RAG?”](#what-is-rag)

Retrieval-augmented generation is a technique used to incorporate external sources of information into an LLM’s responses. It’s important to be able to do so because, while LLMs are typically trained on a broad body of material, practical use of LLMs often requires specific domain knowledge (for example, you might want to use an LLM to answer customers’ questions about your company’s products).

One solution is to fine-tune the model using more specific data. However, this can be expensive both in terms of compute cost and in terms of the effort needed to prepare adequate training data.

In contrast, RAG works by incorporating external data sources into a prompt at the time it’s passed to the model. For example, you could imagine the prompt, “What is Bart’s relationship to Lisa?” might be expanded (“augmented”) by prepending some relevant information, resulting in the prompt, “Homer and Marge’s children are named Bart, Lisa, and Maggie. What is Bart’s relationship to Lisa?”

This approach has several advantages:

* It can be more cost-effective because you don’t have to retrain the model.
* You can continuously update your data source and the LLM can immediately make use of the updated information.
* You now have the potential to cite references in your LLM’s responses.

On the other hand, using RAG naturally means longer prompts, and some LLM API services charge for each input token you send. Ultimately, you must evaluate the cost tradeoffs for your applications.

RAG is a very broad area and there are many different techniques used to achieve the best quality RAG. The core Genkit framework offers three main abstractions to help you do RAG:

* Indexers: add documents to an “index”.
* Embedders: transforms documents into a vector representation
* Retrievers: retrieve documents from an “index”, given a query.

These definitions are broad on purpose because Genkit is un-opinionated about what an “index” is or how exactly documents are retrieved from it. Genkit only provides a `Document` format and everything else is defined by the retriever or indexer implementation provider.

### Indexers

[Section titled “Indexers”](#indexers)

The index is responsible for keeping track of your documents in such a way that you can quickly retrieve relevant documents given a specific query. This is most often accomplished using a vector database, which indexes your documents using multidimensional vectors called embeddings. A text embedding (opaquely) represents the concepts expressed by a passage of text; these are generated using special-purpose ML models. By indexing text using its embedding, a vector database is able to cluster conceptually related text and retrieve documents related to a novel string of text (the query).

Before you can retrieve documents for the purpose of generation, you need to ingest them into your document index. A typical ingestion flow does the following:

1. Split up large documents into smaller documents so that only relevant portions are used to augment your prompts – “chunking”. This is necessary because many LLMs have a limited context window, making it impractical to include entire documents with a prompt.

   Genkit doesn’t provide built-in chunking libraries; however, there are open source libraries available that are compatible with Genkit.

2. Generate embeddings for each chunk. Depending on the database you’re using, you might explicitly do this with an embedding generation model, or you might use the embedding generator provided by the database.

3. Add the text chunk and its index to the database.

You might run your ingestion flow infrequently or only once if you are working with a stable source of data. On the other hand, if you are working with data that frequently changes, you might continuously run the ingestion flow (for example, in a Cloud Firestore trigger, whenever a document is updated).

### Embedders

[Section titled “Embedders”](#embedders)

An embedder is a function that takes content (text, images, audio, etc.) and creates a numeric vector that encodes the semantic meaning of the original content. As mentioned above, embedders are leveraged as part of the process of indexing, however, they can also be used independently to create embeddings without an index.

### Retrievers

[Section titled “Retrievers”](#retrievers)

A retriever is a concept that encapsulates logic related to any kind of document retrieval. The most popular retrieval cases typically include retrieval from vector stores, however, in Genkit a retriever can be any function that returns data.

To create a retriever, you can use one of the provided implementations or create your own.

## Supported indexers, retrievers, and embedders

[Section titled “Supported indexers, retrievers, and embedders”](#supported-indexers-retrievers-and-embedders)

Genkit provides indexer and retriever support through its plugin system. The following plugins are officially supported:

* [Astra DB](/docs/plugins/astra-db) - DataStax Astra DB vector database
* [Chroma DB](/docs/plugins/chroma) vector database
* [Cloud Firestore vector store](/docs/plugins/firebase)
* [Cloud SQL for PostgreSQL](/docs/plugins/cloud-sql-pg) with pgvector extension
* [LanceDB](/docs/plugins/lancedb) open-source vector database
* [Neo4j](/docs/plugins/neo4j) graph database with vector search
* [Pinecone](/docs/plugins/pinecone) cloud vector database
* [Vertex AI Vector Search](/docs/plugins/vertex-ai)

In addition, Genkit supports the following vector stores through predefined code templates, which you can customize for your database configuration and schema:

* PostgreSQL with [`pgvector`](/docs/templates/pgvector)

## Defining a RAG Flow

[Section titled “Defining a RAG Flow”](#defining-a-rag-flow)

The following examples show how you could ingest a collection of restaurant menu PDF documents into a vector database and retrieve them for use in a flow that determines what food items are available.

### Install dependencies for processing PDFs

[Section titled “Install dependencies for processing PDFs”](#install-dependencies-for-processing-pdfs)

```bash
npm install llm-chunk pdf-parse @genkit-ai/dev-local-vectorstore


npm install --save-dev @types/pdf-parse
```

### Add a local vector store to your configuration

[Section titled “Add a local vector store to your configuration”](#add-a-local-vector-store-to-your-configuration)

```ts
import { devLocalIndexerRef, devLocalVectorstore } from '@genkit-ai/dev-local-vectorstore';
import { googleAI } from '@genkit-ai/googleai';
import { z, genkit } from 'genkit';


const ai = genkit({
  plugins: [
    // googleAI provides the gemini-embedding-001 embedder
    googleAI(),


    // the local vector store requires an embedder to translate from text to vector
    devLocalVectorstore([
      {
        indexName: 'menuQA',
        embedder: googleAI.embedder('gemini-embedding-001'),
      },
    ]),
  ],
});
```

### Define an Indexer

[Section titled “Define an Indexer”](#define-an-indexer)

The following example shows how to create an indexer to ingest a collection of PDF documents and store them in a local vector database.

It uses the local file-based vector similarity retriever that Genkit provides out-of-the-box for simple testing and prototyping (*do not use in production*)

#### Create the indexer

[Section titled “Create the indexer”](#create-the-indexer)

```ts
export const menuPdfIndexer = devLocalIndexerRef('menuQA');
```

#### Create chunking config

[Section titled “Create chunking config”](#create-chunking-config)

This example uses the `llm-chunk` library which provides a simple text splitter to break up documents into segments that can be vectorized.

The following definition configures the chunking function to guarantee a document segment of between 1000 and 2000 characters, broken at the end of a sentence, with an overlap between chunks of 100 characters.

```ts
const chunkingConfig = {
  minLength: 1000,
  maxLength: 2000,
  splitter: 'sentence',
  overlap: 100,
  delimiters: '',
} as any;
```

More chunking options for this library can be found in the [llm-chunk documentation](https://www.npmjs.com/package/llm-chunk).

#### Define your indexer flow

[Section titled “Define your indexer flow”](#define-your-indexer-flow)

```ts
import { Document } from 'genkit/retriever';
import { chunk } from 'llm-chunk';
import { readFile } from 'fs/promises';
import path from 'path';
import pdf from 'pdf-parse';


async function extractTextFromPdf(filePath: string) {
  const pdfFile = path.resolve(filePath);
  const dataBuffer = await readFile(pdfFile);
  const data = await pdf(dataBuffer);
  return data.text;
}


export const indexMenu = ai.defineFlow(
  {
    name: 'indexMenu',
    inputSchema: z.object({ filePath: z.string().describe('PDF file path') }),
    outputSchema: z.object({
      success: z.boolean(),
      documentsIndexed: z.number(),
      error: z.string().optional(),
    }),
  },
  async ({ filePath }) => {
    try {
      filePath = path.resolve(filePath);


      // Read the pdf
      const pdfTxt = await ai.run('extract-text', () => extractTextFromPdf(filePath));


      // Divide the pdf text into segments
      const chunks = await ai.run('chunk-it', async () => chunk(pdfTxt, chunkingConfig));


      // Convert chunks of text into documents to store in the index.
      const documents = chunks.map((text) => {
        return Document.fromText(text, { filePath });
      });


      // Add documents to the index
      await ai.index({
        indexer: menuPdfIndexer,
        documents,
      });


      return {
        success: true,
        documentsIndexed: documents.length,
      };
    } catch (err) {
      // For unexpected errors that throw exceptions
      return {
        success: false,
        documentsIndexed: 0,
        error: err instanceof Error ? err.message : String(err)
      };
    }
  },
);
```

#### Run the indexer flow

[Section titled “Run the indexer flow”](#run-the-indexer-flow)

```bash
genkit flow:run indexMenu '{"filePath": "menu.pdf"}'
```

After running the `indexMenu` flow, the vector database will be seeded with documents and ready to be used in Genkit flows with retrieval steps.

### Define a flow with retrieval

[Section titled “Define a flow with retrieval”](#define-a-flow-with-retrieval)

The following example shows how you might use a retriever in a RAG flow. Like the indexer example, this example uses Genkit’s file-based vector retriever, which you should not use in production.

```ts
import { devLocalRetrieverRef } from '@genkit-ai/dev-local-vectorstore';
import { googleAI } from '@genkit-ai/googleai';


// Define the retriever reference
export const menuRetriever = devLocalRetrieverRef('menuQA');


export const menuQAFlow = ai.defineFlow(
  {
    name: 'menuQA',
    inputSchema: z.object({ query: z.string() }),
    outputSchema: z.object({ answer: z.string() })
  },
  async ({ query }) => {
    // retrieve relevant documents
    const docs = await ai.retrieve({
      retriever: menuRetriever,
      query,
      options: { k: 3 },
    });


    // generate a response
    const { text } = await ai.generate({
      model: googleAI.model('gemini-2.5-flash'),
      prompt: `
You are acting as a helpful AI assistant that can answer
questions about the food available on the menu at Genkit Grub Pub.


Use only the context provided to answer the question.
If you don't know, do not make up an answer.
Do not add or change items on the menu.


Question: ${query}`,
      docs,
    });


    return { answer: text };
  },
);
```

#### Run the retriever flow

[Section titled “Run the retriever flow”](#run-the-retriever-flow)

```bash
genkit flow:run menuQA '{"query": "Recommend a dessert from the menu while avoiding dairy and nuts"}'
```

The output for this command should contain a response from the model, grounded in the indexed `menu.pdf` file.

## Write your own indexers and retrievers

[Section titled “Write your own indexers and retrievers”](#write-your-own-indexers-and-retrievers)

It’s also possible to create your own retriever. This is useful if your documents are managed in a document store that is not supported in Genkit (eg: MySQL, Google Drive, etc.). The Genkit SDK provides flexible methods that let you provide custom code for fetching documents. You can also define custom retrievers that build on top of existing retrievers in Genkit and apply advanced RAG techniques (such as reranking or prompt extensions) on top.

### Simple Retrievers

[Section titled “Simple Retrievers”](#simple-retrievers)

Simple retrievers let you easily convert existing code into retrievers:

```ts
import { z } from 'genkit';
import { searchEmails } from './db';


ai.defineSimpleRetriever(
  {
    name: 'myDatabase',
    configSchema: z
      .object({
        limit: z.number().optional(),
      })
      .optional(),
    // we'll extract "message" from the returned email item
    content: 'message',
    // and several keys to use as metadata
    metadata: ['from', 'to', 'subject'],
  },
  async (query, config) => {
    const result = await searchEmails(query.text, { limit: config.limit });
    return result.data.emails;
  },
);
```

### Custom Retrievers

[Section titled “Custom Retrievers”](#custom-retrievers)

```ts
import { CommonRetrieverOptionsSchema } from 'genkit/retriever';
import { z } from 'genkit';


export const menuRetriever = devLocalRetrieverRef('menuQA');


const advancedMenuRetrieverOptionsSchema = CommonRetrieverOptionsSchema.extend({
  preRerankK: z.number().max(1000),
});


const advancedMenuRetriever = ai.defineRetriever(
  {
    name: `custom/advancedMenuRetriever`,
    configSchema: advancedMenuRetrieverOptionsSchema,
  },
  async (input, options) => {
    const extendedPrompt = await extendPrompt(input);
    const docs = await ai.retrieve({
      retriever: menuRetriever,
      query: extendedPrompt,
      options: { k: options.preRerankK || 10 },
    });
    const rerankedDocs = await rerank(docs);
    return rerankedDocs.slice(0, options.k || 3);
  },
);
```

(`extendPrompt` and `rerank` is something you would have to implement yourself, not provided by the framework)

And then you can just swap out your retriever:

```ts
const docs = await ai.retrieve({
  retriever: advancedRetriever,
  query: input,
  options: { preRerankK: 7, k: 3 },
});
```

### Rerankers and Two-Stage Retrieval

[Section titled “Rerankers and Two-Stage Retrieval”](#rerankers-and-two-stage-retrieval)

A reranking model — also known as a cross-encoder — is a type of model that, given a query and document, will output a similarity score. We use this score to reorder the documents by relevance to our query. Reranker APIs take a list of documents (for example the output of a retriever) and reorders the documents based on their relevance to the query. This step can be useful for fine-tuning the results and ensuring that the most pertinent information is used in the prompt provided to a generative model.

#### Reranker Example

[Section titled “Reranker Example”](#reranker-example)

A reranker in Genkit is defined in a similar syntax to retrievers and indexers. Here is an example using a reranker in Genkit. This flow reranks a set of documents based on their relevance to the provided query using a predefined Vertex AI reranker.

```ts
const FAKE_DOCUMENT_CONTENT = [
  'pythagorean theorem',
  'e=mc^2',
  'pi',
  'dinosaurs',
  'quantum mechanics',
  'pizza',
  'harry potter',
];


export const rerankFlow = ai.defineFlow(
  {
    name: 'rerankFlow',
    inputSchema: z.object({ query: z.string() }),
    outputSchema: z.array(
      z.object({
        text: z.string(),
        score: z.number(),
      }),
    ),
  },
  async ({ query }) => {
    const documents = FAKE_DOCUMENT_CONTENT.map((text) => ({ content: text }));


    const rerankedDocuments = await ai.rerank({
      reranker: 'vertexai/semantic-ranker-512',
      query: { content: query },
      documents,
    });


    return rerankedDocuments.map((doc) => ({
      text: doc.content,
      score: doc.metadata.score,
    }));
  },
);
```

This reranker uses the Vertex AI genkit plugin with `semantic-ranker-512` to score and rank documents. The higher the score, the more relevant the document is to the query.

#### Custom Rerankers

[Section titled “Custom Rerankers”](#custom-rerankers)

You can also define custom rerankers to suit your specific use case. This is helpful when you need to rerank documents using your own custom logic or a custom model. Here’s a simple example of defining a custom reranker:

```ts
export const customReranker = ai.defineReranker(
  {
    name: 'custom/reranker',
    configSchema: z.object({
      k: z.number().optional(),
    }),
  },
  async (query, documents, options) => {
    // Your custom reranking logic here
    const rerankedDocs = documents.map((doc) => {
      const score = Math.random(); // Assign random scores for demonstration
      return {
        ...doc,
        metadata: { ...doc.metadata, score },
      };
    });


    return rerankedDocs.sort((a, b) => b.metadata.score - a.metadata.score).slice(0, options.k || 3);
  },
);
```

Once defined, this custom reranker can be used just like any other reranker in your RAG flows, giving you flexibility to implement advanced reranking strategies.

# Tool calling

> Learn how to enable LLMs to interact with external applications and data using Genkit's tool calling feature, covering tool definition, usage, and advanced scenarios.

*Tool calling*, also known as *function calling*, is a structured way to give LLMs the ability to make requests back to the application that called it. You define the tools you want to make available to the model, and the model will make tool requests to your app as necessary to fulfill the prompts you give it.

The use cases of tool calling generally fall into a few themes:

**Giving an LLM access to information it wasn’t trained with**

* Frequently changing information, such as a stock price or the current weather.
* Information specific to your app domain, such as product information or user profiles.

Note the overlap with [retrieval augmented generation](/docs/rag) (RAG), which is also a way to let an LLM integrate factual information into its generations. RAG is a heavier solution that is most suited when you have a large amount of information or the information that’s most relevant to a prompt is ambiguous. On the other hand, if retrieving the information the LLM needs is a simple function call or database lookup, tool calling is more appropriate.

**Introducing a degree of determinism into an LLM workflow**

* Performing calculations that the LLM cannot reliably complete itself.
* Forcing an LLM to generate verbatim text under certain circumstances, such as when responding to a question about an app’s terms of service.

**Performing an action when initiated by an LLM**

* Turning on and off lights in an LLM-powered home assistant
* Reserving table reservations in an LLM-powered restaurant agent

## Before you begin

[Section titled “Before you begin”](#before-you-begin)

If you want to run the code examples on this page, first complete the steps in the [Getting started](/docs/get-started) guide. All of the examples assume that you have already set up a project with Genkit dependencies installed.

This page discusses one of the advanced features of Genkit model abstraction, so before you dive too deeply, you should be familiar with the content on the [Generating content with AI models](/docs/models) page. You should also be familiar with Genkit’s system for defining input and output schemas, which is discussed on the [Flows](/docs/flows) page.

## Overview of tool calling

[Section titled “Overview of tool calling”](#overview-of-tool-calling)

[Genkit by Example: Tool Calling ](https://examples.genkit.dev/tool-calling?utm_source=genkit.dev\&utm_content=contextlink)See how Genkit can enable rich UI for tool calling in a live demo.

At a high level, this is what a typical tool-calling interaction with an LLM looks like:

1. The calling application prompts the LLM with a request and also includes in the prompt a list of tools the LLM can use to generate a response.
2. The LLM either generates a complete response or generates a tool call request in a specific format.
3. If the caller receives a complete response, the request is fulfilled and the interaction ends; but if the caller receives a tool call, it performs whatever logic is appropriate and sends a new request to the LLM containing the original prompt or some variation of it as well as the result of the tool call.
4. The LLM handles the new prompt as in Step 2.

For this to work, several requirements must be met:

* The model must be trained to make tool requests when it’s needed to complete a prompt. Most of the larger models provided through web APIs, such as Gemini and Claude, can do this, but smaller and more specialized models often cannot. Genkit will throw an error if you try to provide tools to a model that doesn’t support it.
* The calling application must provide tool definitions to the model in the format it expects.
* The calling application must prompt the model to generate tool calling requests in the format the application expects.

## Tool calling with Genkit

[Section titled “Tool calling with Genkit”](#tool-calling-with-genkit)

Genkit provides a single interface for tool calling with models that support it. Each model plugin ensures that the last two of the above criteria are met, and the Genkit instance’s `generate()` function automatically carries out the tool calling loop described earlier.

### Model support

[Section titled “Model support”](#model-support)

Tool calling support depends on the model, the model API, and the Genkit plugin. Consult the relevant documentation to determine if tool calling is likely to be supported. In addition:

* Genkit will throw an error if you try to provide tools to a model that doesn’t support it.
* If the plugin exports model references, the `info.supports.tools` property will indicate if it supports tool calling.

### Defining tools

[Section titled “Defining tools”](#defining-tools)

Use the Genkit instance’s `defineTool()` function to write tool definitions:

```ts
import { genkit, z } from 'genkit';
import { googleAI } from '@genkitai/google-ai';


const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model('gemini-2.5-flash'),
});


const getWeather = ai.defineTool(
  {
    name: 'getWeather',
    description: 'Gets the current weather in a given location',
    inputSchema: z.object({
      location: z.string().describe('The location to get the current weather for'),
    }),
    outputSchema: z.string(),
  },
  async (input) => {
    // Here, we would typically make an API call or database query. For this
    // example, we just return a fixed value.
    return `The current weather in ${input.location} is 63°F and sunny.`;
  },
);
```

The syntax here looks just like the `defineFlow()` syntax; however, `name`, `description`, and `inputSchema` parameters are required. When writing a tool definition, take special care with the wording and descriptiveness of these parameters. They are vital for the LLM to make effective use of the available tools.

### Using tools

[Section titled “Using tools”](#using-tools)

Include defined tools in your prompts to generate content.

* Generate

  ```ts
  const response = await ai.generate({
    prompt: "What is the weather in Baltimore?",
    tools: [getWeather],
  });
  ```

* definePrompt

  ```ts
  const weatherPrompt = ai.definePrompt(
    {
      name: "weatherPrompt",
      tools: [getWeather],
    },
    "What is the weather in {{location}}?"
  );


  const response = await weatherPrompt({ location: "Baltimore" });
  ```

* Prompt file

  ```dotprompt
  ---
  tools: [getWeather]
  input:
    schema:
      location: string
  ---


  What is the weather in {{location}}?
  ```

  Then you can execute the prompt in your code as follows:

  ```ts
  // assuming prompt file is named weatherPrompt.prompt
  const weatherPrompt = ai.prompt("weatherPrompt");


  const response = await weatherPrompt({ location: "Baltimore" });
  ```

* Chat

  ```ts
  const chat = ai.chat({
    system: "Answer questions using the tools you have.",
    tools: [getWeather],
  });


  const response = await chat.send("What is the weather in Baltimore?");


  // Or, specify tools that are message-specific
  const response = await chat.send({
    prompt: "What is the weather in Baltimore?",
    tools: [getWeather],
  });
  ```

### Streaming and Tool Calling

[Section titled “Streaming and Tool Calling”](#streaming-and-tool-calling)

When combining tool calling with streaming responses, you will receive `toolRequest` and `toolResponse` content parts in the chunks of the stream. For example, the following code:

```ts
const { stream } = ai.generateStream({
  prompt: "What is the weather in Baltimore?",
  tools: [getWeather],
});


for await (const chunk of stream) {
  console.log(chunk);
}
```

Might produce a sequence of chunks similar to:

```ts
{index: 0, role: "model", content: [{text: "Okay, I'll check the weather"}]}
{index: 0, role: "model", content: [{text: "for Baltimore."}]}
// toolRequests will be emitted as a single chunk by most models
{index: 0, role: "model", content: [{toolRequest: {name: "getWeather", input: {location: "Baltimore"}}}]}
// when streaming multiple messages, Genkit increments the index and indicates the new role
{index: 1, role: "tool", content: [{toolResponse: {name: "getWeather", output: "Temperature: 68 degrees\nStatus: Cloudy."}}]}
{index: 2, role: "model", content: [{text: "The weather in Baltimore is 68 degrees and cloudy."}]}
```

You can use these chunks to dynamically construct the full generated message sequence.

### Limiting Tool Call Iterations with `maxTurns`

[Section titled “Limiting Tool Call Iterations with maxTurns”](#limiting-tool-call-iterations-with-maxturns)

When working with tools that might trigger multiple sequential calls, you can control resource usage and prevent runaway execution using the `maxTurns` parameter. This sets a hard limit on how many back-and-forth interactions the model can have with your tools in a single generation cycle.

**Why use maxTurns?**

* **Cost Control**: Prevents unexpected API usage charges from excessive tool calls
* **Performance**: Ensures responses complete within reasonable timeframes
* **Safety**: Guards against infinite loops in complex tool interactions
* **Predictability**: Makes your application behavior more deterministic

The default value is 5 turns, which works well for most scenarios. Each “turn” represents one complete cycle where the model can make tool calls and receive responses.

**Example: Web Research Agent**

Consider a research agent that might need to search multiple times to find comprehensive information:

```ts
const webSearch = ai.defineTool(
  {
    name: 'webSearch',
    description: 'Search the web for current information',
    inputSchema: z.object({
      query: z.string().describe('Search query'),
    }),
    outputSchema: z.string(),
  },
  async (input) => {
    // Simulate web search API call
    return `Search results for "${input.query}": [relevant information here]`;
  },
);


const response = await ai.generate({
  prompt: 'Research the latest developments in quantum computing, including recent breakthroughs, key companies, and future applications.',
  tools: [webSearch],
  maxTurns: 8, // Allow up to 8 research iterations
});
```

**Example: Financial Calculator**

Here’s a more complex scenario where an agent might need multiple calculation steps:

```ts
const calculator = ai.defineTool(
  {
    name: 'calculator',
    description: 'Perform mathematical calculations',
    inputSchema: z.object({
      expression: z.string().describe('Mathematical expression to evaluate'),
    }),
    outputSchema: z.number(),
  },
  async (input) => {
    // Safe evaluation of mathematical expressions
    return eval(input.expression); // In production, use a safe math parser
  },
);


const stockAnalyzer = ai.defineTool(
  {
    name: 'stockAnalyzer',
    description: 'Get current stock price and basic metrics',
    inputSchema: z.object({
      symbol: z.string().describe('Stock symbol (e.g., AAPL)'),
    }),
    outputSchema: z.object({
      price: z.number(),
      change: z.number(),
      volume: z.number(),
    }),
  },
  async (input) => {
    // Simulate stock API call
    return {
      price: 150.25,
      change: 2.50,
      volume: 45000000
    };
  },
);
```

* Generate

  ```typescript
  const response = await ai.generate({
    prompt: 'Calculate the total value of my portfolio: 100 shares of AAPL, 50 shares of GOOGL, and 200 shares of MSFT. Also calculate what percentage each holding represents.',
    tools: [calculator, stockAnalyzer],
    maxTurns: 12, // Multiple stock lookups + calculations needed
  });
  ```

* definePrompt

  ```typescript
  const portfolioAnalysisPrompt = ai.definePrompt(
    {
      name: "portfolioAnalysis",
      tools: [calculator, stockAnalyzer],
      maxTurns: 12,
    },
    "Calculate the total value of my portfolio: {{holdings}}. Also calculate what percentage each holding represents."
  );


  const response = await portfolioAnalysisPrompt({
    holdings: "100 shares of AAPL, 50 shares of GOOGL, and 200 shares of MSFT"
  });
  ```

* Prompt file

  ```dotprompt
  ---
  tools: [calculator, stockAnalyzer]
  maxTurns: 12
  input:
    schema:
      holdings: string
  ---


  Calculate the total value of my portfolio: {{holdings}}. Also calculate what percentage each holding represents.
  ```

  Then execute the prompt:

  ```typescript
  const portfolioAnalysisPrompt = ai.prompt("portfolioAnalysis");


  const response = await portfolioAnalysisPrompt({
    holdings: "100 shares of AAPL, 50 shares of GOOGL, and 200 shares of MSFT"
  });
  ```

* Chat

  ```typescript
  const chat = ai.chat({
    system: "You are a financial analysis assistant. Use the available tools to provide accurate calculations and current market data.",
    tools: [calculator, stockAnalyzer],
    maxTurns: 12,
  });


  const response = await chat.send("Calculate the total value of my portfolio: 100 shares of AAPL, 50 shares of GOOGL, and 200 shares of MSFT. Also calculate what percentage each holding represents.");
  ```

**What happens when maxTurns is reached?**

When the limit is hit, Genkit stops the tool-calling loop and returns the model’s current response, even if it was in the middle of using tools. The model will typically provide a partial answer or explain that it couldn’t complete all the requested operations.

### Dynamically defining tools at runtime

[Section titled “Dynamically defining tools at runtime”](#dynamically-defining-tools-at-runtime)

As most things in Genkit tools need to be predefined during your app’s initialization. This is necessary so that you would be able interact with your tools from the Genkit Dev UI. This is typically the recommended way. However there are scenarios when the tool must be defined dynamically per user request.

You can dynamically define tools using `ai.dynamicTool` function. It is very similar to `ai.defineTool` method, however dynamic tools are not tracked by Genkit runtime, so cannot be interacted with from Genkit Dev UI and must be passed to the `ai.generate` call by reference (for regular tools you can also use a string tool name).

```ts
import { genkit, z } from 'genkit';
import { googleAI } from '@genkit-ai/googleai';


const ai = genkit({
  plugins: [googleAI()],
  model: googleAI.model('gemini-2.5-flash'),
});


ai.defineFlow('weatherFlow', async () => {
  const getWeather = ai.dynamicTool(
    {
      name: 'getWeather',
      description: 'Gets the current weather in a given location',
      inputSchema: z.object({
        location: z.string().describe('The location to get the current weather for'),
      }),
      outputSchema: z.string(),
    },
    async (input) => {
      return `The current weather in ${input.location} is 63°F and sunny.`;
    },
  );


  const { text } = await ai.generate({
    prompt: 'What is the weather in Baltimore?',
    tools: [getWeather],
  });


  return text;
});
```

When defining dynamic tools, to specify input and output schemas you can either use Zod as shown in the previous example, or you can pass in manually constructed JSON Schema.

```ts
const getWeather = ai.dynamicTool(
  {
    name: 'getWeather',
    description: 'Gets the current weather in a given location',
    inputJsonSchema: myInputJsonSchema,
    outputJsonSchema: myOutputJsonSchema,
  },
  async (input) => {
    /* ... */
  },
);
```

Dynamic tools don’t require the implementation function. If you don’t pass in the function the tool will behave like an [interrupt](/docs/interrupts) and you can do manual tool call handling:

```ts
const getWeather = ai.dynamicTool({
  name: 'getWeather',
  description: 'Gets the current weather in a given location',
  inputJsonSchema: myInputJsonSchema,
  outputJsonSchema: myOutputJsonSchema,
});
```

### Pause the tool loop by using interrupts

[Section titled “Pause the tool loop by using interrupts”](#pause-the-tool-loop-by-using-interrupts)

By default, Genkit repeatedly calls the LLM until every tool call has been resolved. You can conditionally pause execution in situations where you want to, for example:

* Ask the user a question or display UI.
* Confirm a potentially risky action with the user.
* Request out-of-band approval for an action.

**Interrupts** are special tools that can halt the loop and return control to your code so that you can handle more advanced scenarios. Visit the [interrupts guide](/docs/interrupts) to learn how to use them.

### Explicitly handling tool calls

[Section titled “Explicitly handling tool calls”](#explicitly-handling-tool-calls)

If you want full control over this tool-calling loop, for example to apply more complicated logic, set the `returnToolRequests` parameter to `true`. Now it’s your responsibility to ensure all of the tool requests are fulfilled:

```ts
const getWeather = ai.defineTool(
  {
    // ... tool definition ...
  },
  async ({ location }) => {
    // ... tool implementation ...
  },
);


const generateOptions: GenerateOptions = {
  prompt: "What's the weather like in Baltimore?",
  tools: [getWeather],
  returnToolRequests: true,
};


let llmResponse;
while (true) {
  llmResponse = await ai.generate(generateOptions);
  const toolRequests = llmResponse.toolRequests;
  if (toolRequests.length < 1) {
    break;
  }
  const toolResponses: ToolResponsePart[] = await Promise.all(
    toolRequests.map(async (part) => {
      switch (part.toolRequest.name) {
        case 'specialTool':
          return {
            toolResponse: {
              name: part.toolRequest.name,
              ref: part.toolRequest.ref,
              output: await getWeather(part.toolRequest.input),
            },
          };
        default:
          throw Error('Tool not found');
      }
    }),
  );
  generateOptions.messages = llmResponse.messages;
  generateOptions.prompt = toolResponses;
}
```