Writing Plugins: Guidance about how to author plugins for Genkit.
# Writing Genkit plugins
> Learn how to extend Genkit's capabilities by writing custom plugins, covering plugin creation, options, building models, and publishing to NPM.
Genkit’s capabilities are designed to be extended by plugins. Genkit plugins are configurable modules that can provide models, retrievers, indexers, trace stores, and more. You’ve already seen plugins in action just by using Genkit:
```ts
import { genkit } from 'genkit';
import { vertexAI } from '@genkit-ai/vertexai';
const ai = genkit({
plugins: [vertexAI({ projectId: 'my-project' })],
});
```
The Vertex AI plugin takes configuration (such as the user’s Google Cloud project ID) and registers a variety of new models, embedders, and more with the Genkit registry. The registry powers Genkit’s local UI for running and inspecting models, prompts, and more as well as serves as a lookup service for named actions at runtime.
## Creating a Plugin
[Section titled “Creating a Plugin”](#creating-a-plugin)
To create a plugin you’ll generally want to create a new NPM package:
```bash
mkdir genkitx-my-plugin
cd genkitx-my-plugin
npm init -y
npm install genkit
npm install --save-dev typescript
npx tsc --init
```
Then, define and export your plugin from your main entry point using the `genkitPlugin` helper:
```ts
import { Genkit, z, modelActionMetadata } from 'genkit';
import { GenkitPlugin, genkitPlugin } from 'genkit/plugin';
import { ActionMetadata, ActionType } from 'genkit/registry';
interface MyPluginOptions {
// add any plugin configuration here
}
export function myPlugin(options?: MyPluginOptions): GenkitPlugin {
return genkitPlugin(
'myPlugin',
// Initializer function (required): Registers actions defined upfront.
async (ai: Genkit) => {
// Example: Define a model that's always available
ai.defineModel({ name: 'myPlugin/always-available-model', ... });
ai.defineEmbedder(/* ... */);
// ... other upfront definitions
},
// Dynamic Action Resolver (optional): Defines actions on-demand.
async (ai: Genkit, actionType: ActionType, actionName: string) => {
// Called when an action (e.g., 'myPlugin/some-dynamic-model') is
// requested but not found in the registry.
if (actionType === 'model' && actionName === 'some-dynamic-model') {
ai.defineModel({ name: `myPlugin/${actionName}`, ... });
}
// ... handle other dynamic actions
},
// List Actions function (optional): Lists all potential actions.
async (): Promise => {
// Returns metadata for all actions the plugin *could* provide,
// even if not yet defined dynamically. Used by Dev UI, etc.
// Example: Fetch available models from an API
const availableModels = await fetchMyModelsFromApi();
return availableModels.map(model => modelActionMetadata({
type: 'model',
name: `myPlugin/${model.id}`,
// ... other metadata
}));
}
);
}
```
The `genkitPlugin` function accepts up to three arguments:
1. **Plugin Name (string, required):** A unique identifier for your plugin (e.g., `'myPlugin'`).
2. **Initializer Function (`async (ai: Genkit) => void`, required):** This function runs when Genkit starts. Use it to register actions (models, embedders, etc.) that should always be available using `ai.defineModel()`, `ai.defineEmbedder()`, etc.
3. **Dynamic Action Resolver (`async (ai: Genkit, actionType: ActionType, actionName: string) => void`, optional):** This function is called when Genkit tries to access an action (by type and name) that hasn’t been registered yet. It lets you define actions dynamically, just-in-time. For example, if a user requests `model: 'myPlugin/some-model'`, and it wasn’t defined in the initializer, this function runs, giving you a chance to define it using `ai.defineModel()`. This is useful when a plugin supports many possible actions (like numerous models) and you don’t want to register them all at startup.
4. **List Actions Function (`async () => Promise`, optional):** This function should return metadata for *all* actions your plugin can potentially provide, including those that would be dynamically defined. This is primarily used by development tools like the Genkit Developer UI to populate lists of available models, embedders, etc., allowing users to discover and select them even if they haven’t been explicitly defined yet. This function is generally *not* called during normal flow execution.
### Plugin options guidance
[Section titled “Plugin options guidance”](#plugin-options-guidance)
In general, your plugin should take a single `options` argument that includes any plugin-wide configuration necessary to function. For any plugin option that requires a secret value, such as API keys, you should offer both an option and a default environment variable to configure it:
```ts
import { GenkitError, Genkit, z } from 'genkit';
import { GenkitPlugin, genkitPlugin } from 'genkit/plugin';
interface MyPluginOptions {
apiKey?: string;
}
export function myPlugin(options?: MyPluginOptions) {
return genkitPlugin('myPlugin', async (ai: Genkit) => {
if (!apiKey)
throw new GenkitError({
source: 'my-plugin',
status: 'INVALID_ARGUMENT',
message:
'Must supply either `options.apiKey` or set `MY_PLUGIN_API_KEY` environment variable.',
});
ai.defineModel(...);
ai.defineEmbedder(...)
// ....
});
};
```
## Building your plugin
[Section titled “Building your plugin”](#building-your-plugin)
A single plugin can activate many new things within Genkit. For example, the Vertex AI plugin activates several new models as well as an embedder.
### Model plugins
[Section titled “Model plugins”](#model-plugins)
Genkit model plugins add one or more generative AI models to the Genkit registry. A model represents any generative model that is capable of receiving a prompt as input and generating text, media, or data as output. Generally, a model plugin will make one or more `defineModel` calls in its initialization function.
A custom model generally consists of three components:
1. Metadata defining the model’s capabilities.
2. A configuration schema with any specific parameters supported by the model.
3. A function that implements the model accepting `GenerateRequest` and returning `GenerateResponse`.
To build a model plugin, you’ll need to use the `genkit/model` package:
At a high level, a model plugin might look something like this:
```ts
import { genkitPlugin, GenkitPlugin } from 'genkit/plugin';
import { GenerationCommonConfigSchema } from 'genkit/model';
import { simulateSystemPrompt } from 'genkit/model/middleware';
import { Genkit, GenkitError, z } from 'genkit';
export interface MyPluginOptions {
// ...
}
export function myPlugin(options?: MyPluginOptions): GenkitPlugin {
return genkitPlugin('my-plugin', async (ai: Genkit) => {
ai.defineModel({
// be sure to include your plugin as a provider prefix
name: 'my-plugin/my-model',
// label for your model as shown in Genkit Developer UI
label: 'My Awesome Model',
// optional list of supported versions of your model
versions: ['my-model-001', 'my-model-001'],
// model support attributes
supports: {
multiturn: true, // true if your model supports conversations
media: true, // true if your model supports multimodal input
tools: true, // true if your model supports tool/function calling
systemRole: true, // true if your model supports the system role
output: ['text', 'media', 'json'], // types of output your model supports
},
// Zod schema for your model's custom configuration
configSchema: GenerationCommonConfigSchema.extend({
safetySettings: z.object({...}),
}),
// list of middleware for your model to use
use: [simulateSystemPrompt()]
}, async request => {
const myModelRequest = toMyModelRequest(request);
const myModelResponse = await myModelApi(myModelRequest);
return toGenerateResponse(myModelResponse);
});
});
};
```
#### Transforming Requests and Responses
[Section titled “Transforming Requests and Responses”](#transforming-requests-and-responses)
The primary work of a Genkit model plugin is transforming the `GenerateRequest` from Genkit’s common format into a format that is recognized and supported by your model’s API, and then transforming the response from your model into the `GenerateResponseData` format used by Genkit.
Sometimes, this may require massaging or manipulating data to work around model limitations. For example, if your model does not natively support a `system` message, you may need to transform a prompt’s system message into a user/model message pair.
#### Action References (Models, Embedders, etc.)
[Section titled “Action References (Models, Embedders, etc.)”](#action-references-models-embedders-etc)
While actions like models and embedders can always be referenced by their string name (e.g., `'myPlugin/my-model'`) after being defined (either upfront or dynamically), providing strongly-typed references offers better developer experience through improved type checking and IDE autocompletion.
The recommended pattern is to attach helper methods directly to your exported plugin function. These methods use reference builders like `modelRef` and `embedderRef` from Genkit core.
First, define the type for your plugin function including the helper methods:
```ts
import { GenkitPlugin } from 'genkit/plugin';
import { ModelReference, EmbedderReference, modelRef, embedderRef, z } from 'genkit';
// Define your model's specific config schema if it has one
const MyModelConfigSchema = z.object({
customParam: z.string().optional(),
});
// Define the type for your plugin function
export type MyPlugin = {
// The main plugin function signature
(options?: MyPluginOptions): GenkitPlugin;
// Helper method for creating model references
model(
name: string, // e.g., 'some-model-name'
config?: z.infer,
): ModelReference;
// Helper method for creating embedder references
embedder(
name: string, // e.g., 'my-embedder'
config?: Record, // Or a specific config schema
): EmbedderReference;
// ... add helpers for other action types if needed
};
```
Then, implement the plugin function and attach the helper methods before exporting:
```ts
// (Previous imports and MyPluginOptions interface definition)
import { modelRef, embedderRef } from 'genkit/model'; // Ensure modelRef/embedderRef are imported
function myPluginFn(options?: MyPluginOptions): GenkitPlugin {
return genkitPlugin(
'myPlugin',
async (ai: Genkit) => {
// Initializer...
},
async (ai, actionType, actionName) => {
// Dynamic resolver...
// Example: Define model if requested dynamically
if (actionType === 'model') {
ai.defineModel(
{
name: `myPlugin/${actionName}`,
// ... other model definition properties
configSchema: MyModelConfigSchema, // Use the defined schema
},
async (request) => {
/* ... model implementation ... */
},
);
}
// Handle other dynamic actions...
},
async () => {
// List actions...
},
);
}
// Create the final export conforming to the MyPlugin type
export const myPlugin = myPluginFn as MyPlugin;
// Implement the helper methods
myPlugin.model = (
name: string,
config?: z.infer,
): ModelReference => {
return modelRef({
name: `myPlugin/${name}`, // Automatically prefixes the name
configSchema: MyModelConfigSchema,
config,
});
};
myPlugin.embedder = (name: string, config?: Record): EmbedderReference => {
return embedderRef({
name: `myPlugin/${name}`,
config,
});
};
```
Now, users can import your plugin and use the helper methods for type-safe action references:
```ts
import { genkit } from 'genkit';
import { myPlugin } from 'genkitx-my-plugin'; // Assuming your package name
const ai = genkit({
plugins: [
myPlugin({
/* options */
}),
],
});
async function run() {
const { text } = await ai.generate({
// Use the helper for a type-safe model reference
model: myPlugin.model('some-model-name', { customParam: 'value' }),
prompt: 'Tell me a story.',
});
console.log(text);
const embeddings = await ai.embed({
// Use the helper for a type-safe embedder reference
embedder: myPlugin.embedder('my-embedder'),
content: 'Embed this text.',
});
console.log(embeddings);
}
run();
```
This approach keeps the plugin definition clean while providing a convenient and type-safe way for users to reference the actions provided by your plugin. It works seamlessly with both statically and dynamically defined actions, as the references only contain metadata, not the implementation itself.
## Publishing a plugin
[Section titled “Publishing a plugin”](#publishing-a-plugin)
Genkit plugins can be published as normal NPM packages. To increase discoverability and maximize consistency, your package should be named `genkitx-{name}` to indicate it is a Genkit plugin and you should include as many of the following `keywords` in your `package.json` as are relevant to your plugin:
* `genkit-plugin`: always include this keyword in your package to indicate it is a Genkit plugin.
* `genkit-model`: include this keyword if your package defines any models.
* `genkit-retriever`: include this keyword if your package defines any retrievers.
* `genkit-indexer`: include this keyword if your package defines any indexers.
* `genkit-embedder`: include this keyword if your package defines any indexers.
* `genkit-telemetry`: include this keyword if your package defines a telemetry provider.
* `genkit-deploy`: include this keyword if your package includes helpers to deploy Genkit apps to cloud providers.
* `genkit-flow`: include this keyword if your package enhances Genkit flows.
A plugin that provided a retriever, embedder, and model might have a `package.json` that looks like:
```js
{
"name": "genkitx-my-plugin",
"keywords": ["genkit-plugin", "genkit-retriever", "genkit-embedder", "genkit-model"],
// ... dependencies etc.
}
```
# Writing a Genkit Evaluator
> Learn how to write custom Genkit evaluators for heuristic and LLM-based assessments, including defining prompts, scoring functions, and evaluator actions.
You can extend Genkit to support custom evaluation, using either an LLM as a judge, or by programmatic (heuristic) evaluation.
## Evaluator definition
[Section titled “Evaluator definition”](#evaluator-definition)
Evaluators are functions that assess an LLM’s response. There are two main approaches to automated evaluation: heuristic evaluation and LLM-based evaluation. In the heuristic approach, you define a deterministic function. By contrast, in an LLM-based assessment, the content is fed back to an LLM, and the LLM is asked to score the output according to criteria set in a prompt.
The `ai.defineEvaluator` method, which you use to define an evaluator action in Genkit, supports either approach. This document explores a couple of examples of how to use this method for heuristic and LLM-based evaluations.
### LLM-based Evaluators
[Section titled “LLM-based Evaluators”](#llm-based-evaluators)
An LLM-based evaluator leverages an LLM to evaluate the `input`, `context`, and `output` of your generative AI feature.
LLM-based evaluators in Genkit are made up of 3 components:
* A prompt
* A scoring function
* An evaluator action
#### Define the prompt
[Section titled “Define the prompt”](#define-the-prompt)
For this example, the evaluator leverages an LLM to determine whether a food (the `output`) is delicious or not. First, provide context to the LLM, then describe what you want it to do, and finally, give it a few examples to base its response on.
Genkit’s `definePrompt` utility provides an easy way to define prompts with input and output validation. The following code is an example of setting up an evaluation prompt with `definePrompt`.
```ts
import { z } from "genkit";
const DELICIOUSNESS_VALUES = ['yes', 'no', 'maybe'] as const;
const DeliciousnessDetectionResponseSchema = z.object({
reason: z.string(),
verdict: z.enum(DELICIOUSNESS_VALUES),
});
function getDeliciousnessPrompt(ai: Genkit) {
return ai.definePrompt({
name: 'deliciousnessPrompt',
input: {
schema: z.object({
responseToTest: z.string(),
}),
},
output: {
schema: DeliciousnessDetectionResponseSchema,
}
prompt: `You are a food critic. Assess whether the provided output sounds delicious, giving only "yes" (delicious), "no" (not delicious), or "maybe" (undecided) as the verdict.
Examples:
Output: Chicken parm sandwich
Response: { "reason": "A classic and beloved dish.", "verdict": "yes" }
Output: Boston Logan Airport tarmac
Response: { "reason": "Not edible.", "verdict": "no" }
Output: A juicy piece of gossip
Response: { "reason": "Metaphorically 'tasty' but not food.", "verdict": "maybe" }
New Output: {{ responseToTest }}
Response:
`
});
}
```
#### Define the scoring function
[Section titled “Define the scoring function”](#define-the-scoring-function)
Define a function that takes an example that includes `output` as required by the prompt, and scores the result. Genkit testcases include `input` as a required field, with `output` and `context` as optional fields. It is the responsibility of the evaluator to validate that all fields required for evaluation are present.
```ts
import { ModelArgument } from 'genkit';
import { BaseEvalDataPoint, Score } from 'genkit/evaluator';
/**
* Score an individual test case for delciousness.
*/
export async function deliciousnessScore(
ai: Genkit,
judgeLlm: ModelArgument,
dataPoint: BaseEvalDataPoint,
judgeConfig?: CustomModelOptions,
): Promise {
const d = dataPoint;
// Validate the input has required fields
if (!d.output) {
throw new Error('Output is required for Deliciousness detection');
}
// Hydrate the prompt and generate an evaluation result
const deliciousnessPrompt = getDeliciousnessPrompt(ai);
const response = await deliciousnessPrompt(
{
responseToTest: d.output as string,
},
{
model: judgeLlm,
config: judgeConfig,
},
);
// Parse the output
const parsedResponse = response.output;
if (!parsedResponse) {
throw new Error(`Unable to parse evaluator response: ${response.text}`);
}
// Return a scored response
return {
score: parsedResponse.verdict,
details: { reasoning: parsedResponse.reason },
};
}
```
#### Define the evaluator action
[Section titled “Define the evaluator action”](#define-the-evaluator-action)
The final step is to write a function that defines the `EvaluatorAction`.
```ts
import { EvaluatorAction } from 'genkit/evaluator';
/**
* Create the Deliciousness evaluator action.
*/
export function createDeliciousnessEvaluator(
ai: Genkit,
judge: ModelArgument,
judgeConfig?: z.infer,
): EvaluatorAction {
return ai.defineEvaluator(
{
name: `myCustomEvals/deliciousnessEvaluator`,
displayName: 'Deliciousness',
definition: 'Determines if output is considered delicous.',
isBilled: true,
},
async (datapoint: BaseEvalDataPoint) => {
const score = await deliciousnessScore(ai, judge, datapoint, judgeConfig);
return {
testCaseId: datapoint.testCaseId,
evaluation: score,
};
},
);
}
```
The `defineEvaluator` method is similar to other Genkit constructors like `defineFlow` and `defineRetriever`. This method requires an `EvaluatorFn` to be provided as a callback. The `EvaluatorFn` method accepts a `BaseEvalDataPoint` object, which corresponds to a single entry in a dataset under evaluation, along with an optional custom-options parameter if specified. The function processes the datapoint and returns an `EvalResponse` object.
The Zod Schemas for `BaseEvalDataPoint` and `EvalResponse` are as follows.
##### `BaseEvalDataPoint`
[Section titled “BaseEvalDataPoint”](#baseevaldatapoint)
```ts
export const BaseEvalDataPoint = z.object({
testCaseId: z.string(),
input: z.unknown(),
output: z.unknown().optional(),
context: z.array(z.unknown()).optional(),
reference: z.unknown().optional(),
testCaseId: z.string().optional(),
traceIds: z.array(z.string()).optional(),
});
export const EvalResponse = z.object({
sampleIndex: z.number().optional(),
testCaseId: z.string(),
traceId: z.string().optional(),
spanId: z.string().optional(),
evaluation: z.union([ScoreSchema, z.array(ScoreSchema)]),
});
```
##### `ScoreSchema`
[Section titled “ScoreSchema”](#scoreschema)
```ts
const ScoreSchema = z.object({
id: z.string().describe('Optional ID to differentiate multiple scores').optional(),
score: z.union([z.number(), z.string(), z.boolean()]).optional(),
error: z.string().optional(),
details: z
.object({
reasoning: z.string().optional(),
})
.passthrough()
.optional(),
});
```
The `defineEvaluator` object lets the user provide a name, a user-readable display name, and a definition for the evaluator. The display name and definiton are displayed along with evaluation results in the Dev UI. It also has an optional `isBilled` field that marks whether this evaluator can result in billing (e.g., it uses a billed LLM or API). If an evaluator is billed, the UI prompts the user for a confirmation in the CLI before allowing them to run an evaluation. This step helps guard against unintended expenses.
### Heuristic Evaluators
[Section titled “Heuristic Evaluators”](#heuristic-evaluators)
A heuristic evaluator can be any function used to evaluate the `input`, `context`, or `output` of your generative AI feature.
Heuristic evaluators in Genkit are made up of 2 components:
* A scoring function
* An evaluator action
#### Define the scoring function
[Section titled “Define the scoring function”](#define-the-scoring-function-1)
As with the LLM-based evaluator, define the scoring function. In this case, the scoring function does not need a judge LLM.
```ts
import { BaseEvalDataPoint, Score } from 'genkit/evaluator';
const US_PHONE_REGEX = /[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4}/i;
/**
* Scores whether a datapoint output contains a US Phone number.
*/
export async function usPhoneRegexScore(dataPoint: BaseEvalDataPoint): Promise {
const d = dataPoint;
if (!d.output || typeof d.output !== 'string') {
throw new Error('String output is required for regex matching');
}
const matches = US_PHONE_REGEX.test(d.output as string);
const reasoning = matches ? `Output matched US_PHONE_REGEX` : `Output did not match US_PHONE_REGEX`;
return {
score: matches,
details: { reasoning },
};
}
```
#### Define the evaluator action
[Section titled “Define the evaluator action”](#define-the-evaluator-action-1)
```ts
import { Genkit } from 'genkit';
import { BaseEvalDataPoint, EvaluatorAction } from 'genkit/evaluator';
/**
* Configures a regex evaluator to match a US phone number.
*/
export function createUSPhoneRegexEvaluator(ai: Genkit): EvaluatorAction {
return ai.defineEvaluator(
{
name: `myCustomEvals/usPhoneRegexEvaluator`,
displayName: 'Regex Match for US PHONE NUMBER',
definition: 'Uses Regex to check if output matches a US phone number',
isBilled: false,
},
async (datapoint: BaseEvalDataPoint) => {
const score = await usPhoneRegexScore(datapoint);
return {
testCaseId: datapoint.testCaseId,
evaluation: score,
};
},
);
}
```
## Putting it together
[Section titled “Putting it together”](#putting-it-together)
### Plugin definition
[Section titled “Plugin definition”](#plugin-definition)
Plugins are registered with the framework by installing them at the time of initializing Genkit. To define a new plugin, use the `genkitPlugin` helper method to instantiate all Genkit actions within the plugin context.
This code sample shows two evaluators: the LLM-based deliciousness evaluator, and the regex-based US phone number evaluator. Instantiating these evaluators within the plugin context registers them with the plugin.
```ts
import { GenkitPlugin, genkitPlugin } from 'genkit/plugin';
export function myCustomEvals(options: {
judge: ModelArgument;
judgeConfig?: ModelCustomOptions;
}): GenkitPlugin {
// Define the new plugin
return genkitPlugin('myCustomEvals', async (ai: Genkit) => {
const { judge, judgeConfig } = options;
// The plugin instatiates our custom evaluators within the context
// of the `ai` object, making them available
// throughout our Genkit application.
createDeliciousnessEvaluator(ai, judge, judgeConfig);
createUSPhoneRegexEvaluator(ai);
});
}
export default myCustomEvals;
```
### Configure Genkit
[Section titled “Configure Genkit”](#configure-genkit)
Add the `myCustomEvals` plugin to your Genkit configuration.
For evaluation with Gemini, disable safety settings so that the evaluator can accept, detect, and score potentially harmful content.
```ts
import { googleAI } from '@genkit-ai/googleai';
const ai = genkit({
plugins: [
vertexAI(),
...
myCustomEvals({
judge: googleAI.model("gemini-2.5-flash"),
}),
],
...
});
```
## Using your custom evaluators
[Section titled “Using your custom evaluators”](#using-your-custom-evaluators)
Once you instantiate your custom evaluators within the Genkit app context (either through a plugin or directly), they are ready to be used. The following example illustrates how to try out the deliciousness evaluator with a few sample inputs and outputs.
1. Create a json file `deliciousness_dataset.json` with the following content:
```json
[
{
"testCaseId": "delicous_mango",
"input": "What is a super delicious fruit",
"output": "A perfectly ripe mango – sweet, juicy, and with a hint of tropical sunshine."
},
{
"testCaseId": "disgusting_soggy_cereal",
"input": "What is something that is tasty when fresh but less tasty after some time?",
"output": "Stale, flavorless cereal that's been sitting in the box too long."
}
]
```
2. Use the Genkit CLI to run the evaluator against these test cases.
```bash
# Start your genkit runtime
genkit start --
genkit eval:run deliciousness_dataset.json --evaluators=myCustomEvals/deliciousnessEvaluator
```
3. Navigate to `localhost:4000/evaluate` to view your results in the Genkit UI.
It is important to note that confidence in custom evaluators increases as you benchmark them with standard datasets or approaches. Iterate on the results of such benchmarks to improve your evaluators’ performance until it reaches the targeted level of quality.