Skip to content

Is it thinking or is it broken? Building transparent AI chat UIs with Genkit

Streaming thoughts in Genkit

One of the worst user experiences imaginable is that your users are convinced your app is broken while your app is working correctly. This is even more apparent with AI powered apps since processing takes some time to occur with AI models. Your user sends a chat message and then they are stuck waiting on a response with no insight into whether or not the app is doing anything. In this post, we are going to break down how to give your users insights into your agents’ thought process to make sure your users aren’t left hanging.

Genkit has the capability through server sent events to emit a stream of data to clients connecting to your agent. Some of the things that can be streamed include the regular text responses but most models now include things like thinking to improve their reasoning and responses to users. The following is an example of the experience you can create with Genkit that visualizes your thinking process for your model.

Agent Streaming Session Demo

Hello! I am your simulated Genkit AI instance powered by Gemini 3.5 Flash. Ask me a question or try a suggestion below, and I'll stream my step-by-step thinking process alongside my final response!

Suggestions:

This not only gives you insight into what the agent is doing but also improves the output of the response. By using thinking, we can improve our agents’ responses and also provide a better user experience. Below is a Genkit code snippet on how to extract a thought from an agent and then by using a server sent events stream that thought process down to the user.

// 1. Generate a stream with thinking config enabled
const { stream, response } = await ai.generateStream({
model: googleAI.model('gemini-flash-latest'),
prompt: 'Why is Genkit so cool?',
config: {
thinkingConfig: {
includeThoughts: true,
thinkingLevel: 'MEDIUM',
},
},
});
// 2. Consume the stream and separate reasoning from the final response
for await (const chunk of stream) {
// Extract reasoning (thinking process)
const thoughtText = chunk.reasoning || '';
if (thoughtText) {
sendChunk({ type: 'thought', content: thoughtText });
}
// Extract standard text response
const text = chunk.text;
if (text) {
sendChunk({ type: 'text', content: text });
}
}

After looking at this sample, you may be wondering what sendChunk is returning. This is the Output Schema that we defined for our Genkit flow. In this example, it looks something like this:

const StreamChunkSchema = z.object({
messageId: z.string(),
type: z.enum(['thought', 'text']),
content: z.string(),
currentStep: z.string().optional(),
});

By having the thoughts being streamed from the server, we can now put together responses that our client would receive to give some clarity to our users. We designed a React component that is a thought box that uses CSS to show an animation as new thoughts are streamed down. We also provided an option to then receive those thoughts and store all of them in a smaller text box that is collapsible so users can inspect the full thought contents as they are received giving the user a choice to see what exactly the model is doing or to continue to recieve high level updates.

import React, { useState } from 'react';
interface ThoughtBoxProps {
content: string; // The full accumulated thought content
stepName?: string; // The current step description (e.g. "Analyzing input")
}
export const ThoughtBox: React.FC<ThoughtBoxProps> = ({ content, stepName }) => {
const [isOpen, setIsOpen] = useState(false);
const activeStep = stepName || 'Thinking';
return (
<div className="message thought-message">
<div className="thought-box">
{/* Animated header showing the current step */}
<div className="thought-header">
<div key={`indicator-${activeStep}`} className="thought-indicator indicator-animate" />
<div key={`label-${activeStep}`} className="thought-step-label step-animate">
Thinking: {activeStep}
</div>
</div>
{/* Collapsible details wrapper for the full thought content */}
<details
className="thought-details"
onToggle={(e) => setIsOpen(e.currentTarget.open)}
>
<summary className="thought-summary">
<span className="summary-text">
{isOpen ? 'Hide Full Reasoning' : 'Show Full Reasoning'}
</span>
</summary>
<div className="thought-body">{content}</div>
</details>
</div>
</div>
);
};

With this component, we can easily add it to other apps that are not strictly chat based apps but other apps that might want to have a thought component rendered while the models are thinking.

Using a small React component with our Genkit flow allows us to keep the user informed of what is going on while our models are thinking. This makes it appear that the model is working and processing information keeping the user informed and limiting their impression that the app has stalled or is broken.

If you want to try out the React component and Genkit streaming server-sent events yourself, you can check out the streaming thoughts sample in our samples repository.