Skip to content

Middleware

Genkit allows you to use middleware to modify the behavior of generate() calls. Middleware can be used for various purposes, such as retrying failed requests, falling back to different models, or injecting tools and context.

You can use pre-packaged middleware or build your own custom middleware.

The middleware framework is part of the core ai package, and the pre-packaged middleware ships in plugins/middleware. Both come with the core Genkit module:

Terminal window
go get github.com/firebase/genkit/go@latest

Register the Middleware plugin during genkit.Init to expose the built-ins to the Dev UI and to other-runtime callers:

import (
"context"
"github.com/genkit-ai/genkit/go/ai"
"github.com/genkit-ai/genkit/go/core"
"github.com/genkit-ai/genkit/go/genkit"
"github.com/genkit-ai/genkit/go/plugins/googlegenai"
"github.com/genkit-ai/genkit/go/plugins/middleware"
)
ctx := context.Background()
g := genkit.Init(ctx, genkit.WithPlugins(
&googlegenai.GoogleAI{},
&middleware.Middleware{},
))

For pure Go programs that just attach middleware to a genkit.Generate() call, plugin registration is optional. Passing a middleware value directly to ai.WithUse invokes its New method on the local fast path without consulting the registry.

The plugins/middleware package provides several useful middleware options out of the box. This list represents the middleware built and maintained by the Genkit team, but there may also be community-built middleware available.

Automatically retries failed model API calls on transient error codes (such as RESOURCE_EXHAUSTED and UNAVAILABLE) using exponential backoff with jitter. Only the model API call is retried; the surrounding tool loop is not replayed.

resp, err := genkit.Generate(ctx, g,
ai.WithModelName("googleai/gemini-flash-latest"),
ai.WithPrompt("Heavy reasoning task..."),
ai.WithUse(&middleware.Retry{
MaxRetries: 3,
InitialDelayMs: 1000,
BackoffFactor: 2,
}),
)

Configuration options:

  • MaxRetries (optional): The maximum number of times to retry a failed request (default: 3).
  • Statuses (optional): A list of core.StatusName values that should trigger a retry (default: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL). Non-GenkitError errors such as network failures are always retried regardless of this list.
  • InitialDelayMs (optional): The initial delay between retries in milliseconds (default: 1000).
  • MaxDelayMs (optional): The upper bound on retry delay in milliseconds (default: 60000).
  • BackoffFactor (optional): The factor by which the delay increases after each retry (default: 2).
  • NoJitter (optional): If true, disables random jitter on the delay (default: false).

Automatically switches to a different model if the primary model fails on a fallback-eligible status. Useful for falling back to a smaller or faster model when a large model exceeds quota limits.

resp, err := genkit.Generate(ctx, g,
ai.WithModelName("googleai/gemini-pro-latest"),
ai.WithPrompt("Try the pro model first..."),
ai.WithUse(&middleware.Fallback{
Models: []ai.ModelRef{
googlegenai.ModelRef("googleai/gemini-flash-latest", nil),
},
Statuses: []core.StatusName{core.RESOURCE_EXHAUSTED},
}),
)

Configuration options:

  • Models (required): An ordered list of ai.ModelRef values to try after the primary fails. Each ref’s Config is used verbatim for that model; the original request’s config is not inherited. Use googlegenai.ModelRef (or the equivalent helper for your provider) to attach configuration.
  • Statuses (optional): A list of core.StatusName values that should trigger a fallback (default: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL, NOT_FOUND, UNIMPLEMENTED).

3. Tool approval middleware (ToolApproval)

Section titled “3. Tool approval middleware (ToolApproval)”

Restricts tool execution to an allow list. Tools not in the list trigger a tool interrupt that you can resolve by prompting the user and then resuming with an explicit approval flag.

// 1. Initial attempt: any tool not in AllowedTools interrupts the call.
resp, err := genkit.Generate(ctx, g,
ai.WithPrompt("write a file"),
ai.WithTools(writeFileTool),
ai.WithUse(&middleware.ToolApproval{
AllowedTools: []string{}, // Empty list interrupts every tool call.
}),
)
if err != nil {
log.Fatal(err)
}
if resp.FinishReason == ai.FinishReasonInterrupted {
interrupt := resp.Interrupts()[0]
// 2. Ask the user for approval, then re-create the tool request with the approval flag.
approved, err := writeFileTool.RestartWith(interrupt,
ai.WithResumedMetadata[WriteFileInput](map[string]any{"toolApproved": true}),
)
if err != nil {
log.Fatal(err)
}
// 3. Resume execution.
resumed, err := genkit.Generate(ctx, g,
ai.WithMessages(resp.History()...),
ai.WithTools(writeFileTool),
ai.WithToolRestarts(approved),
ai.WithUse(&middleware.ToolApproval{}),
)
_ = resumed
}

A bare resume without the toolApproved flag is not treated as approval, so unrelated resume flows can’t bypass approval gating.

Configuration options:

  • AllowedTools (optional): The list of tool names pre-approved to run without interruption. Tools not in this list trigger an interrupt. An empty list interrupts every tool.

Scans a directory for SKILL.md files (and their YAML frontmatter) and injects them into the system prompt. It also provides a use_skill tool the model can call to load a specific skill’s full body on demand.

resp, err := genkit.Generate(ctx, g,
ai.WithPrompt("How do I run tests in this repo?"),
ai.WithUse(&middleware.Skills{SkillPaths: []string{"./skills"}}),
)

Configuration options:

  • SkillPaths (optional): A list of directories to scan for skills. Each direct subdirectory containing a SKILL.md file is exposed as a skill (default: ["skills"]).

Grants the model access to a single root directory by injecting standard file manipulation tools (list_files, read_file, plus write_file and edit_file when writes are enabled). Path safety is enforced by os.Root (Go 1.24+), which rejects any path that resolves outside the root, including via .., absolute paths, or symbolic links.

resp, err := genkit.Generate(ctx, g,
ai.WithPrompt("Create a hello world program in the workspace"),
ai.WithUse(&middleware.Filesystem{
RootDir: "./workspace",
AllowWriteAccess: true,
}),
)

Configuration options:

  • RootDir (required): The root directory all filesystem operations are confined to.
  • AllowWriteAccess (optional): If true, additionally registers write_file and edit_file (default: false).
  • ToolNamePrefix (optional): A prefix prepended to each tool name. Use distinct prefixes when attaching multiple Filesystem middlewares to one call so their tool names don’t collide.

A middleware in Go is any value that satisfies the ai.Middleware interface:

type Middleware interface {
Name() string // stable, registered identifier
New(ctx context.Context) (*ai.Hooks, error) // builds a per-call hook bundle
}

New is invoked once per genkit.Generate() call. The returned *ai.Hooks bundle is reused across every iteration of the tool loop within that call:

type Hooks struct {
// Tools are extra tools to register for this Generate call alongside any user-supplied tools.
Tools []ai.Tool
// WrapGenerate wraps each iteration of the tool loop.
WrapGenerate func(ctx context.Context, params *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error)
// WrapModel wraps each model API call.
WrapModel func(ctx context.Context, params *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error)
// WrapTool wraps each tool execution. May run concurrently for parallel tool calls.
WrapTool func(ctx context.Context, params *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error)
}

Implement only the hooks your middleware needs. A nil hook field is treated as a pass-through.

A Generate call runs a tool loop: the model produces output, any tool calls execute, results feed back into a new model call, and so on until the model stops. The hooks attach at three different layers of this loop:

HookFiresUse for
WrapGenerateOnce per tool-loop iteration. N tool turns means N+1 invocations.Logic that needs to see the whole conversation: rewrites, system-prompt injection, message accumulation.
WrapModelOnce per model API call, inside an iteration.Logic about the model call itself: retry, fallback, caching.
WrapToolOnce per tool execution. May run concurrently for parallel tool calls in the same iteration.Logic about a single tool execution: approval, sandboxing, logging.

WrapGenerate and WrapModel are not called concurrently within a single Generate call. WrapTool may be, since multiple tools can execute in parallel.

Here is a custom middleware that logs how long each model call takes:

type Logger struct {
Prefix string `json:"prefix,omitempty"`
}
func (Logger) Name() string { return "mine/logger" }
func (l Logger) New(ctx context.Context) (*ai.Hooks, error) {
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
start := time.Now()
resp, err := next(ctx, p)
log.Printf("%s model call took %s", l.Prefix, time.Since(start))
return resp, err
},
}, nil
}

To use it:

resp, err := genkit.Generate(ctx, g,
ai.WithPrompt("Hello"),
ai.WithUse(Logger{Prefix: "[trace]"}),
)

State that should be shared across the hooks of a single Generate call lives in closures captured by New. Each call gets a fresh Hooks bundle, so nothing leaks between calls:

type Counter struct{}
func (Counter) Name() string { return "mine/counter" }
func (Counter) New(ctx context.Context) (*ai.Hooks, error) {
var modelCalls int
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
modelCalls++
return next(ctx, p)
},
WrapGenerate: func(ctx context.Context, p *ai.GenerateParams, next ai.GenerateNext) (*ai.ModelResponse, error) {
// The same `modelCalls` is visible here: both closures capture it from `New`.
resp, err := next(ctx, p)
log.Printf("iteration %d: %d model calls so far", p.Iteration, modelCalls)
return resp, err
},
}, nil
}

WrapTool may run concurrently for parallel tool calls in the same iteration, so any state it touches must be guarded with sync primitives:

func (Counter) New(ctx context.Context) (*ai.Hooks, error) {
var (
mu sync.Mutex
toolCalls int
)
return &ai.Hooks{
WrapTool: func(ctx context.Context, p *ai.ToolParams, next ai.ToolNext) (*ai.MultipartToolResponse, error) {
mu.Lock()
toolCalls++
mu.Unlock()
return next(ctx, p)
},
}, nil
}

The built-in Filesystem middleware uses this pattern: New allocates a per-call file-state cache and a path-lock map, then the read, write, and edit tool implementations close over both.

Plugin-provided middleware and plugin-level state

Section titled “Plugin-provided middleware and plugin-level state”

Middleware shipped as part of a plugin needs two things the simple cases above don’t:

  1. A way to be registered automatically when the plugin is added to genkit.Init, so the Dev UI and cross-runtime callers can address it by name.
  2. A way to keep plugin-level state (an HTTP client, a logger, a database handle) that isn’t part of the JSON-serializable config.

Both are handled by implementing ai.MiddlewarePlugin on the plugin struct and putting plugin-level state on unexported fields of the config struct. The plugin’s Middlewares method passes a prototype with those fields populated to ai.NewMiddleware, which captures the prototype in a build closure. JSON-dispatched calls (Dev UI or cross-runtime) recreate the config by value-copying that prototype, which preserves the unexported fields and overlays only the JSON config:

import (
"context"
"fmt"
"io"
"time"
"github.com/genkit-ai/genkit/go/ai"
"github.com/genkit-ai/genkit/go/core/api"
)
type Logger struct {
Prefix string `json:"prefix,omitempty"`
out io.Writer // unexported; preserved across JSON dispatch by value-copy
}
func (Logger) Name() string { return "mine/logger" }
func (l Logger) New(ctx context.Context) (*ai.Hooks, error) {
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
start := time.Now()
resp, err := next(ctx, p)
fmt.Fprintf(l.out, "%s model call took %s\n", l.Prefix, time.Since(start))
return resp, err
},
}, nil
}
type LoggerPlugin struct{ Out io.Writer }
func (p *LoggerPlugin) Name() string { return "mine/logger" }
func (p *LoggerPlugin) Init(ctx context.Context) []api.Action { return nil }
func (p *LoggerPlugin) Middlewares(ctx context.Context) ([]*ai.MiddlewareDesc, error) {
return []*ai.MiddlewareDesc{
ai.NewMiddleware("logs model call latency", Logger{out: p.Out}),
}, nil
}

Application code then registers the plugin once during Init, which makes the middleware available everywhere by name:

g := genkit.Init(ctx, genkit.WithPlugins(
&googlegenai.GoogleAI{},
&LoggerPlugin{Out: os.Stderr},
))
resp, err := genkit.Generate(ctx, g,
ai.WithPrompt("Hello"),
ai.WithUse(Logger{Prefix: "[trace]"}),
)

When the Dev UI dispatches the same middleware with JSON like {"prefix": "[debug]"}, Genkit value-copies the prototype to recreate the config: out (which isn’t in JSON) is preserved from the plugin’s prototype, while the unmarshaled JSON overrides Prefix.

The built-in plugins/middleware package follows exactly this pattern. See plugin.go for a minimal real-world example.

When your application code defines a middleware directly rather than wrapping it in a plugin, use genkit.DefineMiddleware to register it with the Genkit instance:

genkit.DefineMiddleware(g, "logs model call latency", Logger{out: os.Stderr})

Registration surfaces the middleware in the Dev UI and lets cross-runtime callers reference it by name. For pure Go use, registration is not required: passing a middleware value directly to ai.WithUse invokes its New method on the local fast path. Registration is what makes the middleware visible to the Dev UI.

For ad-hoc middleware that doesn’t need a named type or Dev UI visibility, use ai.MiddlewareFunc:

ai.WithUse(ai.MiddlewareFunc(func(ctx context.Context) (*ai.Hooks, error) {
return &ai.Hooks{
WrapModel: func(ctx context.Context, p *ai.ModelParams, next ai.ModelNext) (*ai.ModelResponse, error) {
log.Printf("model call: %d messages", len(p.Request.Messages))
return next(ctx, p)
},
}, nil
}))

The adapter satisfies Middleware with a placeholder name. Inline middleware is resolved on the local fast path and never touches the registry, so the placeholder is fine.

ai.WithUse(A, B, C) composes left to right with the first listed middleware as the outermost wrapper, like HTTP middleware: at call time the chain expands to A { B { C { actual } } }. Each layer’s next continuation runs the next inner layer:

ai.WithUse(
&middleware.Retry{MaxRetries: 3}, // outer: retries the whole inner stack
&middleware.Fallback{Models: fallbackModels}, // inner: tries fallback models on failure
)
// effective chain: Retry { Fallback { model } }

Order matters. Retry outside Fallback retries the entire fallback cascade as a unit. Swap them and you’d retry the primary first and fall back only after exhausting retries.

For more complex examples of building custom middleware, you can refer to the source code of the built-in middleware in the Genkit GitHub repository.