Pause generation using interrupts

Interrupts are a special kind of tool that can pause the LLM generation-and-tool-calling loop to return control back to you. When you’re ready, you can then resume generation by sending replies that the LLM processes for further generation.

The most common uses for interrupts fall into a few categories:

Human-in-the-Loop: Enabling the user of an interactive AI to clarify needed information or confirm the LLM’s action before it is completed, providing a measure of safety and confidence.
Async Processing: Starting an asynchronous task that can only be completed out-of-band, such as sending an approval notification to a human reviewer or kicking off a long-running background process.
Exit from an Autonomous Task: Providing the model a way to mark a task as complete, in a workflow that might iterate through a long series of tool calls.

Before you begin

All of the examples documented here assume that you have already set up a project with Genkit dependencies installed. If you want to run the code examples on this page, first complete the steps in the Get started guide.

Before diving too deeply, you should also be familiar with the following concepts:

Generating content with AI models.
Genkit’s system for defining input and output schemas.
General methods of tool-calling.

Overview of interrupts

At a high level, this is what an interrupt looks like when interacting with an LLM:

The calling application prompts the LLM with a request. The prompt includes a list of tools, including at least one for an interrupt that the LLM can use to generate a response.
The LLM generates either a complete response or a tool call request in a specific format. To the LLM, an interrupt call looks like any other tool call.
If the LLM calls an interrupting tool, the Genkit library automatically pauses generation rather than immediately passing responses back to the model for additional processing.
The developer checks whether an interrupt call is made, and performs whatever task is needed to collect the information needed for the interrupt response.
The developer resumes generation by passing an interrupt response to the model. This action triggers a return to Step 2.

Define manual-response interrupts

The most common kind of interrupt allows the LLM to request clarification from the user, for example by asking a multiple-choice question.

For this use case, use the Genkit instance’s tool() decorator with ctx.interrupt():

from genkit import FinishReason, Genkit, tool_response, ToolRunContext
from genkit.plugins.google_genai import GoogleAI
from pydantic import BaseModel, Field

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-flash-latest',
)

class QuestionInput(BaseModel):
    """Input schema for the question tool."""
    question: str = Field(description='the question to ask')
    choices: list[str] = Field(description='the choices to display to the user')
    allow_other: bool = Field(default=False, description='when true, allow write-ins')

@ai.tool()
async def ask_question(input: QuestionInput, ctx: ToolRunContext) -> str:
    """Use this to ask the user a clarifying question."""
    # Interrupt with metadata that the caller can use.
    ctx.interrupt({
        'question': input.question,
        'choices': input.choices,
        'allow_other': input.allow_other,
    })

Note that the output type of an interrupt tool corresponds to the response data you will provide when resuming, as opposed to something that will be automatically populated by the tool function.

To resume, build each entry in tool_responses with tool_response from genkit, wrapping the interrupted ToolRequestPart in Part(root=...) (see below).

Use interrupts

Interrupts are passed into the tools list when generating content, just like other types of tools. You can pass both normal tools and interrupts to the same generate call:

response = await ai.generate(
    prompt='Ask me a movie trivia question.',
    tools=['ask_question'],
)

Genkit immediately returns a response on receipt of an interrupt tool call.

Respond to interrupts

If you’ve passed one or more interrupts to your generate call, you need to check the response for interrupts so that you can handle them:

# You can check the finish_reason (use the enum for comparisons)
if response.finish_reason == FinishReason.INTERRUPTED:
    print("Generation interrupted.")

# Or you can check if any interrupt requests are on the response
if response.interrupts:
    print(f"Interrupts found: {len(response.interrupts)}")
    for interrupt in response.interrupts:
        # Access the interrupt metadata
        tool_input = interrupt.tool_request.input
        print(f"Question: {tool_input.get('question')}")
        print(f"Choices: {tool_input.get('choices')}")

Responding to an interrupt is done using the tool_responses option on a subsequent generate call, making sure to pass in the existing message history. Use tool_response with each interrupted request and the user’s answer:

from genkit import tool_response

# Get the user's answer (e.g., from user input)
user_answer = 'b'  # User selected option b

tool_request = response.interrupts[0]

# Resume generation with the tool response
response = await ai.generate(
    messages=response.messages,
    tool_responses=[tool_response(tool_request, user_answer)],
    tools=['ask_question'],
)

Handle multiple interrupts in a loop

For interactive applications, you’ll often need to handle multiple interrupts in a loop until the model completes its task:

from genkit import tool_response

async def interactive_session():
    response = await ai.generate(
        prompt='Help me plan a backyard BBQ.',
        system='Ask clarifying questions until you have a complete solution.',
        tools=['ask_question'],
    )

    # Continue until no more interrupts
    while response.interrupts:
        answers = []

        # Handle all interrupts (multiple can occur at once)
        for interrupt in response.interrupts:
            tool_input = interrupt.tool_request.input
            question = tool_input.get('question', 'Unknown question')
            choices = tool_input.get('choices', [])

            # Display to user and get their answer
            print(f"\nQuestion: {question}")
            for i, choice in enumerate(choices):
                print(f"  {i + 1}. {choice}")

            user_input = input("Your answer: ")
            answers.append(tool_response(interrupt, user_input))

        # Resume generation with all answers
        response = await ai.generate(
            messages=response.messages,
            tool_responses=answers,
            tools=['ask_question'],
        )

    # No more interrupts - print final response
    print(f"\nFinal response: {response.text}")

Using interrupts with flows

You can also use interrupts within flows for more structured applications:

from genkit import Genkit, ToolRunContext
from genkit.plugins.google_genai import GoogleAI
from pydantic import BaseModel, Field

ai = Genkit(plugins=[GoogleAI()])

class TriviaQuestion(BaseModel):
    """A trivia question with multiple choice answers."""
    question: str = Field(description='the trivia question')
    answers: list[str] = Field(description='multiple choice answers')

@ai.tool()
async def present_question(input: TriviaQuestion, ctx: ToolRunContext) -> None:
    """Presents a trivia question to the user."""
    ctx.interrupt(input.model_dump())

@ai.flow()
async def play_trivia(theme: str) -> str:
    """Plays a trivia game on the given theme."""
    response = await ai.generate(
        prompt=f'Ask me a trivia question about {theme}.',
        tools=['present_question'],
    )

    if response.interrupts:
        interrupt = response.interrupts[0]
        question_data = interrupt.tool_request.input

        # In a real app, you'd get this from user input
        return f"Question: {question_data.get('question')}\nAnswers: {question_data.get('answers')}"

    return response.text

Tools with restartable interrupts

Another common pattern is the need to confirm an action that the LLM suggests before actually performing it. For example, a payments app might want the user to confirm certain kinds of transfers before proceeding.

Define a restartable tool

When defining a tool, you can check your application state or use ctx.is_resumed() to determine whether the action has already been approved. If it’s the first execution, raise an Interrupt exception to pause the loop:

from genkit import Interrupt, ToolRunContext
from pydantic import BaseModel, Field

class TransferInput(BaseModel):
    to_account: str
    amount: float

@ai.tool()
async def transfer_money(input: TransferInput, ctx: ToolRunContext) -> dict:
    # Require confirmation for large transfers (only on first execution)
    if not ctx.is_resumed() and input.amount > 100:
        raise Interrupt({
            'reason': 'confirm_large',
            'to_account': input.to_account,
            'amount': input.amount,
        })

    # Execute the transfer (runs when resumed after approval)
    return {
        'status': 'confirmed',
        'message': f'Transferred ${input.amount} to {input.to_account}',
    }

Restart tools after interruption

To restart the interrupted tool, use the restart_tool() function to construct a restarted tool request part, and pass it to the resume_restart parameter of ai.generate().

You can customize the restart behavior by providing optional arguments to restart_tool():

resumed_metadata: Pass arbitrary state (e.g. {'approved_by': 'user'}) to the tool context. The tool function can retrieve this via ctx.resumed_metadata.
replace_input: Provide a new input payload (e.g. a modified Pydantic model or dictionary) to re-run the tool with modified arguments.

from genkit import restart_tool, respond_to_interrupt

response = await ai.generate(
    prompt='Transfer $250 to account ABC123',
    tools=[transfer_money],
)

messages = response.messages

if response.interrupts:
    interrupt = response.interrupts[0]

    # Ask the user to confirm the transfer...
    if user_approved:
        # Rerun the tool by passing a restart part to resume_restart
        restart = restart_tool(
            interrupt,
            resumed_metadata={'approved_by': 'user'}
        )
        response = await ai.generate(
            messages=messages,
            resume_restart=restart,
            tools=[transfer_money],
        )
    else:
        # Decline by providing a direct response without re-running the tool
        decline = respond_to_interrupt(
            {'status': 'cancelled'},
            interrupt=interrupt,
        )
        response = await ai.generate(
            messages=messages,
            resume_respond=decline,
            tools=[transfer_money],
        )

print(response.text)

Replacing input and accessing original input on restart

If you decide to adjust the tool arguments upon restart (for example, asking the user to lower a transfer amount that exceeded limits), pass the adjusted input payload to replace_input:

# Inside your interrupt-handling loop:
meta = interrupt.tool_request.input
adjusted_input = TransferInput(to_account=meta.get('to_account'), amount=100.0)

restart = restart_tool(
    interrupt,
    resumed_metadata={'approved_by': 'user'},
    replace_input=adjusted_input,
)

When a tool is restarted with a replaced input, the original input arguments are automatically stashed. Inside the tool function, you can retrieve the original arguments by checking ctx.original_input (which will be a dictionary):

@ai.tool()
async def transfer_money(input: TransferInput, ctx: ToolRunContext) -> dict:
    # ... interrupt logic ...

    # Check if the input was replaced upon restart
    if ctx.original_input:
        original = ctx.original_input
        print(f"Adjusted transfer amount from {original.get('amount')} to {input.amount}")

    # Execute the transfer with the current (possibly adjusted) input
    return {
        'status': 'confirmed',
        'message': f"Transferred ${input.amount} to {input.to_account}",
    }