Ollama Plugin
The Ollama plugin provides interfaces to any of the local LLMs supported by Ollama.
Prerequisites
Section titled “Prerequisites”This plugin requires that you first install and run the Ollama server. You can follow the instructions on the Download Ollama page.
Use the Ollama CLI to download the models you are interested in. For example:
ollama pull llama3.2ollama pull gemma2ollama pull mistralFor development, you can run Ollama on your development machine. Deployed apps usually run Ollama on a GPU-accelerated machine that is different from the one hosting the app backend running Genkit.
Installation
Section titled “Installation”uv add genkit-plugin-ollamaConfiguration
Section titled “Configuration”To use this plugin, import Ollama and specify it when you initialize Genkit:
from genkit import Genkitfrom genkit.plugins.ollama import Ollama, ollama_namefrom genkit.plugins.ollama.models import ModelDefinition
ai = Genkit( plugins=[ Ollama( models=[ ModelDefinition(name='llama3.2'), ModelDefinition(name='gemma2'), ], server_address='http://127.0.0.1:11434', # default local address ) ], model=ollama_name('llama3.2'), # optional default model)Authentication
Section titled “Authentication”If you would like to access remote deployments of Ollama that require custom headers (such as API keys), you can specify those in the Ollama plugin configuration:
ai = Genkit( plugins=[ Ollama( models=[ModelDefinition(name='gemma2')], server_address='https://my-deployment', request_headers={ 'api-key': 'API Key goes here' }, ) ],)This plugin doesn’t statically export model references. Specify one of the models you configured using the ollama_name() helper or a string identifier:
from genkit import Genkitfrom genkit.plugins.ollama import Ollama, ollama_namefrom genkit.plugins.ollama.models import ModelDefinition
ai = Genkit( plugins=[ Ollama( models=[ModelDefinition(name='llama3.2')], ) ],)
@ai.flow()async def llama_flow(prompt: str) -> str: """Generate text using Llama.
Args: prompt: The prompt to generate from.
Returns: The generated text. """ response = await ai.generate( model=ollama_name('llama3.2'), prompt=prompt, ) return response.textOr reference the model directly by string:
response = await ai.generate( model='ollama/llama3.2', prompt='Tell me a joke.',)Advanced usage
Section titled “Advanced usage”Embeddings
Section titled “Embeddings”The Ollama plugin supports embeddings, which can be used for similarity searches and other NLP tasks:
from genkit import Genkitfrom genkit.plugins.ollama import Ollamafrom genkit.plugins.ollama.embedders import EmbeddingDefinition
ai = Genkit( plugins=[ Ollama( server_address='http://localhost:11434', embedders=[ EmbeddingDefinition( name='nomic-embed-text', dimensions=768, ) ], ) ],)
@ai.flow()async def get_embeddings(text: str) -> list[float]: """Generate embeddings for text.
Args: text: The text to embed.
Returns: The embedding vector. """ result = await ai.embed( embedder='ollama/nomic-embed-text', content=text, ) return resultStreaming
Section titled “Streaming”Ollama models support streaming responses for real-time output:
from genkit import ActionRunContext
@ai.flow()async def streaming_story(topic: str, ctx: ActionRunContext | None = None) -> str: """Generate a story with streaming output.
Args: topic: Story topic. ctx: Action context for streaming chunks.
Returns: The complete generated story. """ response = await ai.generate( model=ollama_name('llama3.2'), prompt=f'Write a short story about {topic}', on_chunk=ctx.send_chunk if ctx else None, ) return response.textTool Calling
Section titled “Tool Calling”Some Ollama models support tool calling (e.g., Mistral, Llama 3.1+):
from pydantic import BaseModel, Field
class WeatherInput(BaseModel): """Input for weather tool.""" location: str = Field(description='City name')
@ai.tool(description='Get current weather for a location')def get_weather(input: WeatherInput) -> str: """Get the current weather for a location.""" # In a real implementation, call a weather API return f'The weather in {input.location} is 72°F and sunny.'
@ai.flow()async def weather_flow(location: str) -> str: """Get weather information using Ollama with tool calling.
Note: Requires a model that supports tools, such as mistral-nemo or llama3.1 and newer.
Args: location: The location to get weather for.
Returns: Weather information for the location. """ response = await ai.generate( model=ollama_name('mistral-nemo'), prompt=f"What's the weather like in {location}?", tools=['get_weather'], ) return response.textStructured Output
Section titled “Structured Output”Generate structured data using Pydantic models:
from pydantic import BaseModel, Fieldfrom genkit import Output
class Recipe(BaseModel): """A cooking recipe.""" name: str = Field(description='Recipe name') ingredients: list[str] = Field(description='List of ingredients') steps: list[str] = Field(description='Cooking steps') prep_time_minutes: int = Field(description='Preparation time in minutes')
@ai.flow()async def create_recipe(dish: str) -> Recipe: """Generate a recipe with structured output.
Args: dish: The dish to create a recipe for.
Returns: A structured recipe. """ response = await ai.generate( model=ollama_name('llama3.2'), prompt=f'Create a recipe for {dish}', output=Output(schema=Recipe), ) return response.outputCustom Server Configuration
Section titled “Custom Server Configuration”For production deployments or custom Ollama server locations:
ai = Genkit( plugins=[ Ollama( models=[ModelDefinition(name='llama3.2')], server_address='http://ollama-server.internal:11434', request_headers={ 'X-Custom-Header': 'value', }, ) ],)