Announcing Genkit Middleware: Intercept, extend, and harden your agentic apps

May 14, 2026 - 4 min read

Chris Gill

Product Manager at Google

Genkit is an open-source framework for building full-stack, AI-powered and agentic applications for any platform with support for TypeScript, Go, Dart, and Python. Building a production-ready agentic applications and AI features requires more than powerful models and careful prompting. You might need retries and fallbacks for maximum reliability, human approval before destructive tool calls, and observability across every layer.

Genkit solves this with middleware: composable hooks that intercept generation calls, including the tool execution loop, and inject custom behaviors. The middleware system is available today in TypeScript, Go, and Dart, with Python support coming soon.

How Genkit middleware works

Every generate() call in Genkit runs a tool loop: the model produces output, any requested tools execute, the results feed back into a new model call, and the cycle repeats until the model is done. Middleware hooks attach at three layers of this loop:

Hook	When it runs	Typical use
Generate	Once per tool-loop iteration	Context injection, message rewriting, conversation-level logic
Model	Once per model API call	Retry, fallback, caching, latency logging
Tool	Once per tool execution	Human-in-the-loop, sandboxing, per-tool logging

Pre-built middleware

Genkit offers several pre-built middleware solutions for common use-cases. Here’s what’s available today:

1. Retry

Automatically retries failed model API calls on transient errors (RESOURCE_EXHAUSTED, UNAVAILABLE, etc.) using exponential backoff with jitter. Only the model call is retried; the surrounding tool loop is not replayed.

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'Summarize the quarterly earnings report.',
  use: [
    retry({ maxRetries: 3, initialDelayMs: 1000, backoffFactor: 2 }),
  ],
});

resp, err := genkit.Generate(ctx, g,
    ai.WithModelName("googleai/gemini-flash-latest"),
    ai.WithPrompt("Summarize the quarterly earnings report."),
    ai.WithUse(&middleware.Retry{
        MaxRetries:     3,
        InitialDelayMs: 1000,
        BackoffFactor:  2,
    }),
)

final response = await ai.generate(
  model: googleAI.gemini('gemini-flash-latest'),
  prompt: 'Summarize the quarterly earnings report.',
  use: [
    retry(maxRetries: 3, initialDelayMs: 1000, backoffFactor: 2),
  ],
);

2. Fallback

Switches to an alternative model when the primary model fails on a specified set of error codes. Useful for falling back to a smaller, faster model, or a different provider entirely, when your primary model exceeds its quota.

TypeScript
Go

const response = await ai.generate({
  model: googleAI.model('gemini-pro-latest'),
  prompt: 'Analyze this complex document...',
  use: [
    fallback({
      models: [googleAI.model('gemini-flash-latest')],
      statuses: ['RESOURCE_EXHAUSTED'],
    }),
  ],
});

resp, err := genkit.Generate(ctx, g,
    ai.WithModelName("googleai/gemini-pro-latest"),
    ai.WithPrompt("Analyze this complex document..."),
    ai.WithUse(&middleware.Fallback{
        Models: []ai.ModelRef{
            googlegenai.ModelRef("googleai/gemini-flash-latest", nil),
        },
        Statuses: []core.StatusName{core.RESOURCE_EXHAUSTED},
    }),
)

Fallback isn’t available in the Dart SDK yet.

3. Tool approval

Restricts tool execution to an allow-list. Any tool not on the list triggers an interrupt, enabling human-in-the-loop confirmation before the action proceeds.

// 1. Initial attempt — an empty allow-list interrupts every tool call.
const response = await ai.generate({
  prompt: 'Delete the temp files',
  tools: [deleteFilesTool],
  use: [toolApproval({ approved: [] })],
});

if (response.finishReason === 'interrupted') {
  const interrupt = response.interrupts[0];

  // 2. Prompt the user for approval, then recreate the tool request.
  const approvedPart = restartTool(interrupt, { toolApproved: true });

  // 3. Resume execution with the approval.
  const resumed = await ai.generate({
    messages: response.messages,
    resume: { restart: [approvedPart] },
    use: [toolApproval({ approved: [] })],
  });
}

resp, _ := genkit.Generate(ctx, g,
    ai.WithPrompt("Delete the temp files"),
    ai.WithTools(deleteFilesTool),
    ai.WithUse(&middleware.ToolApproval{
        AllowedTools: []string{}, // empty = every tool call interrupts
    }),
)

if len(resp.Interrupts()) > 0 {
    interrupt := resp.Interrupts()[0]

    // Prompt the user for approval, then resume with the approval flag.
    approved, _ := deleteFilesTool.RestartWith(interrupt,
        ai.WithResumedMetadata[DeleteInput](map[string]any{"toolApproved": true}),
    )

    resp, err := genkit.Generate(ctx, g,
        ai.WithMessages(resp.History()...),
        ai.WithTools(deleteFilesTool),
        ai.WithToolRestarts(approved),
        ai.WithUse(&middleware.ToolApproval{}),
    )
    fmt.Println(resp.Text())
}

// 1. Initial attempt — an empty allow-list interrupts every tool call.
final response = await ai.generate(
  prompt: 'Delete the temp files',
  tools: [deleteFilesTool],
  use: [toolApproval(approved: [])],
);

if (response.finishReason == FinishReason.interrupted) {
  final part = response.interrupts.first;

  // 2. Ask the user for approval, then resume execution.
  final resumed = await ai.generate(
    messages: response.messages,
    resume: [InterruptResponse(part, true)],
    use: [toolApproval(approved: [])],
  );
}

4. Skills

Scans a directory for SKILL.md files and injects their content into the system prompt. Also exposes a use_skill tool so the model can load specific skills on demand.

const response = await ai.generate({
  prompt: 'How do I deploy this service?',
  use: [skills({ skillPaths: ['./skills'] })],
});

resp, err := genkit.Generate(ctx, g,
    ai.WithPrompt("How do I deploy this service?"),
    ai.WithUse(&middleware.Skills{SkillPaths: []string{"./skills"}}),
)

final response = await ai.generate(
  prompt: 'How do I deploy this service?',
  use: [skills(skillPaths: ['./skills'])],
);

5. Filesystem

Gives the model scoped access to the local filesystem through injected tools (list_files, read_file, plus write_file and edit_file when writes are enabled). Path safety is enforced so the model can never escape the root directory.

const response = await ai.generate({
  prompt: 'Create a hello world program in the workspace',
  use: [filesystem({ rootDirectory: './workspace', allowWriteAccess: true })],
});

resp, err := genkit.Generate(ctx, g,
    ai.WithPrompt("Create a hello world program in the workspace"),
    ai.WithUse(&middleware.Filesystem{
        RootDir:          "./workspace",
        AllowWriteAccess: true,
    }),
)

final response = await ai.generate(
  prompt: 'Create a hello world program in the workspace',
  use: [filesystem(rootDirectory: './workspace')],
);

Building custom middleware

The pre-built middleware covers common scenarios, but the real power of the system is in writing your own. Imagine you’re building an agentic customer-support app and need to ensure the model never mentions competitor products or internal pricing data. Rather than encoding these rules in every prompt, you can enforce them deterministically with middleware.

Custom middleware follows a simple contract across all languages: provide a name and a factory that returns the hooks you want. The factory runs once per generate() invocation, and you implement only the hooks you need.

Here’s a complete, custom content filter that rejects any response containing a forbidden term:

import { generateMiddleware, z } from 'genkit';

export const contentFilter = generateMiddleware(
  {
    name: 'contentFilter',
    description: 'Rejects model responses containing forbidden terms',
    configSchema: z.object({ forbiddenTerms: z.array(z.string()) }),
  },
  ({ config }) => ({
    model: async (req, ctx, next) => {
      const resp = await next(req, ctx);
      const text = resp.text.toLowerCase();
      for (const term of config?.forbiddenTerms ?? []) {
        if (text.includes(term.toLowerCase())) {
          throw new Error(`content filter: response contains "${term}"`);
        }
      }
      return resp;
    },
  }),
);

// ContentFilter rejects model responses containing any forbidden term.
type ContentFilter struct {
    ForbiddenTerms []string `json:"forbiddenTerms"`
}

func (ContentFilter) Name() string { return "app/contentFilter" }

func (f ContentFilter) New(ctx context.Context) (*ai.Hooks, error) {
    return &ai.Hooks{
        WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
            resp, err := next(ctx, p)
            if err != nil {
                return nil, err
            }
            text := strings.ToLower(resp.Text())
            for _, term := range f.ForbiddenTerms {
                if strings.Contains(text, strings.ToLower(term)) {
                    return nil, fmt.Errorf("content filter: response contains %q", term)
                }
            }
            return resp, nil
        },
    }, nil
}

// Reject model responses containing any forbidden term.
class ContentFilter extends GenerateMiddleware {
  final List<String> forbiddenTerms;

  ContentFilter({this.forbiddenTerms = const []});

  @override
  Future model(ModelRequest request, dynamic ctx, dynamic next) async {
    final response = await next(request, ctx);
    final text = response.text.toLowerCase();
    for (final term in forbiddenTerms) {
      if (text.contains(term.toLowerCase())) {
        throw Exception('content filter: response contains "$term"');
      }
    }
    return response;
  }
}

Register it with defineMiddleware to expose a contentFilter() helper you can drop into a use array. See the middleware docs for the full registered pattern.

You can even compose and stack different middleware solutions. Middleware stacks left-to-right, with the first listed being the outermost wrapper:

const response = await ai.generate({
  model: googleAI.model('gemini-flash-latest'),
  prompt: 'What CRM should our customer use?',
  use: [
    retry({ maxRetries: 3 }), // outer: retries the inner stack
    contentFilter({           // inner: validates model output
      forbiddenTerms: ['CompetitorCRM', 'RivalCo', 'internal price'],
    }),
  ],
});

resp, err := genkit.Generate(ctx, g,
    ai.WithModelName("googleai/gemini-flash-latest"),
    ai.WithPrompt("What CRM should our customer use?"),
    ai.WithUse(
        &middleware.Retry{MaxRetries: 3},           // outer: retries the inner stack
        &ContentFilter{                              // inner: validates model output
            ForbiddenTerms: []string{"CompetitorCRM", "RivalCo", "internal price"},
        },
    ),
)

final response = await ai.generate(
  model: googleAI.gemini('gemini-flash-latest'),
  prompt: 'What CRM should our customer use?',
  use: [
    retry(maxRetries: 3),                 // outer: retries the inner stack
    contentFilter(forbiddenTerms: const [ // inner: validates model output
      'CompetitorCRM', 'RivalCo', 'internal price',
    ]),
  ],
);

Here Retry wraps ContentFilter, which wraps the model call. Order matters, and Genkit makes it explicit.

If you think you’ve built a middleware that will be valuable to other developers, you can publish it as a package for others to benefit from!

The Developer UI experience

You can use the Genkit Developer UI to inspect, test, and debug your application, including middleware execution. When you register middleware, it becomes visible in the Dev UI: you can inspect its configuration, trace execution through each hook layer, and test different combinations.

Get started

We’re excited for the capabilities that Genkit middleware unlocks for your apps, and we’re looking forward to seeing what custom middleware you’ll build to solve for your use-cases. Check out the middleware documentation to dive deeper, or get started with Genkit if you’re new to the framework.

Have an idea for a new pre-built middleware? File an issue. We’d love to hear what would improve your development experience!

Happy coding! 🚀