Deploy with Cloud Run
You can deploy Genkit flows as HTTPS endpoints using Cloud Run. This page walks you through deploying a FastAPI-based Genkit application to Cloud Run with automatic scaling and containerization.
Before you begin
Section titled “Before you begin”- Install the Google Cloud CLI.
- You should be familiar with Genkit’s concept of flows and how to write them.
- It would be helpful, but not required, if you’ve already used Google Cloud and Cloud Run before.
1. Set up a Google Cloud project
Section titled “1. Set up a Google Cloud project”If you don’t already have a Google Cloud project set up, follow these steps:
-
Create a new Google Cloud project using the Cloud console or choose an existing one.
-
Link the project to a billing account, which is required for Cloud Run.
-
Configure the Google Cloud CLI to use your project:
gcloud init2. Prepare your Python project for deployment
Section titled “2. Prepare your Python project for deployment”Initialize your project with uv
Section titled “Initialize your project with uv”Create a new project or navigate to your existing project:
# Create project directorymkdir genkit-cloudruncd genkit-cloudrun
# Initialize with uvuv init
# Add dependenciesuv add genkit genkit-plugin-google-genai fastapi uvicorn slowapiCreate your FastAPI application with Genkit
Section titled “Create your FastAPI application with Genkit”Genkit flows work seamlessly with FastAPI as they’re both built on ASGI standards.
Create a main.py file:
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModel, Field
from genkit import Genkitfrom genkit.plugins.google_genai import GoogleAI
# Initialize Genkitai = Genkit( plugins=[GoogleAI()], model='googleai/gemini-2.5-flash',)
# Define input/output schemasclass JokeRequest(BaseModel): """Request schema for joke generation.""" topic: str = Field(description="Topic for the joke", min_length=1)
class JokeResponse(BaseModel): """Response schema for joke generation.""" joke: str topic: str
class SummaryRequest(BaseModel): """Request schema for text summarization.""" text: str = Field(description="Text to summarize", min_length=10)
# Application lifespan for startup/shutdown@asynccontextmanagerasync def lifespan(app: FastAPI): """Manage application lifespan.""" print("🚀 Starting Genkit Cloud Run service") yield print("👋 Shutting down Genkit Cloud Run service")
# Create FastAPI appapp = FastAPI( title="Genkit Cloud Run Service", description="AI-powered API with Genkit and FastAPI", version="1.0.0", lifespan=lifespan,)
# Health check endpoint@app.get("/")async def root(): """Root endpoint with service info.""" return { "service": "Genkit Cloud Run", "status": "running", "docs": "/docs" }
@app.get("/health")async def health_check(): """Health check endpoint for Cloud Run.""" return {"status": "healthy"}
# Define Genkit flow@ai.flow()async def joke_flow(topic: str) -> str: """Generate a joke about the given topic.
Args: topic: The topic for the joke.
Returns: A funny joke about the topic. """ response = await ai.generate( prompt=f'Tell a short, funny joke about {topic}. Be creative!', ) return response.text
# FastAPI endpoint that uses the flow@app.post("/joke", response_model=JokeResponse)async def generate_joke(request: JokeRequest): """Generate a joke via REST API.
Args: request: The joke request with topic.
Returns: The generated joke. """ try: joke = await joke_flow(request.topic) return JokeResponse(joke=joke, topic=request.topic) except Exception as e: raise HTTPException(status_code=500, detail=f"Failed to generate joke: {str(e)}")
@ai.flow()async def summarize_flow(text: str) -> str: """Summarize the provided text.
Args: text: The text to summarize.
Returns: A concise summary. """ response = await ai.generate( prompt=f'Summarize the following text in 2-3 sentences:\n\n{text}', ) return response.text
@app.post("/summarize")async def summarize_text(request: SummaryRequest): """Summarize text via REST API.
Args: request: The text to summarize.
Returns: The summary. """ try: summary = await summarize_flow(request.text) return {"summary": summary} except Exception as e: raise HTTPException(status_code=500, detail=f"Failed to summarize: {str(e)}")
if __name__ == "__main__": import uvicorn port = int(os.environ.get("PORT", 8080)) uvicorn.run(app, host="0.0.0.0", port=port)Optional: Add authorization
Section titled “Optional: Add authorization”All deployed flows should require some form of authorization. You have two options:
Cloud IAM-based authorization: Use Google Cloud’s native access management to gate access to your endpoints. See Authentication in the Cloud Run docs.
Custom authorization with FastAPI: Use FastAPI’s dependency injection for JWT auth:
from fastapi import Depends, HTTPException, statusfrom fastapi.security import HTTPBearer, HTTPAuthorizationCredentialsimport jwt
security = HTTPBearer()
async def verify_token( credentials: HTTPAuthorizationCredentials = Depends(security)) -> dict: """Verify JWT token and return user info.
Args: credentials: HTTP authorization credentials.
Returns: User information from token.
Raises: HTTPException: If token is invalid. """ try: token = credentials.credentials # Replace with your actual token verification payload = jwt.decode(token, "your-secret-key", algorithms=["HS256"]) return { "user_id": payload.get("user_id"), "email": payload.get("email"), } except jwt.InvalidTokenError: raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid authentication credentials", headers={"WWW-Authenticate": "Bearer"}, )
@app.post("/protected-joke", response_model=JokeResponse)async def protected_generate_joke( request: JokeRequest, user: dict = Depends(verify_token)): """Generate a joke with authentication required.
Args: request: The joke request. user: Authenticated user information.
Returns: The generated joke. """ joke = await joke_flow(request.topic) return JokeResponse(joke=joke, topic=request.topic)Create a Dockerfile for Cloud Run
Section titled “Create a Dockerfile for Cloud Run”Create a Dockerfile for containerized deployment:
FROM python:3.11-slim
WORKDIR /app
# Install uvCOPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Copy dependency filesCOPY pyproject.toml uv.lock* ./
# Install dependenciesRUN uv sync --frozen --no-dev
# Copy application codeCOPY . .
# Expose portEXPOSE 8080
# Run with uvicornCMD ["uv", "run", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]Create .dockerignore
Section titled “Create .dockerignore”Create a .dockerignore file to exclude unnecessary files:
__pycache__*.pyc*.pyo*.pyd.Python.venv.uv.git.gitignore*.md.DS_StoreMake API credentials available to deployed flows
Section titled “Make API credentials available to deployed flows”Gemini (Google AI)
-
Generate an API key for the Gemini API using Google AI Studio.
-
Store the API key in Secret Manager:
- Enable the Secret Manager API.
- Create a new secret containing your API key on the Secret Manager page.
- Grant your default compute service account the Secret Manager Secret Accessor role.
Gemini (Vertex AI)
-
Enable the Vertex AI API for your project.
-
On the IAM page, ensure the Default compute service account has the Vertex AI User role.
3. Deploy to Cloud Run
Section titled “3. Deploy to Cloud Run”Deploy your application using the gcloud tool. Cloud Run will automatically build
your container using the Dockerfile.
Gemini (Google AI)
gcloud run deploy genkit-service \ --source . \ --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest \ --allow-unauthenticatedGemini (Vertex AI)
gcloud run deploy genkit-service \ --source . \ --set-env-vars GOOGLE_CLOUD_PROJECT=<your-project-id> \ --set-env-vars GOOGLE_CLOUD_LOCATION=us-central1 \ --allow-unauthenticatedWhen asked if you want to allow unauthenticated invocations:
- Answer
Yif you’re using custom authorization in code. - Answer
Nto require IAM credentials (omit--allow-unauthenticatedflag).
Alternative: Deploy with existing container
Section titled “Alternative: Deploy with existing container”If you prefer to build and push the container separately:
# Build and push to Artifact Registrygcloud builds submit --tag gcr.io/<your-project-id>/genkit-service
# Deploy the containergcloud run deploy genkit-service \ --image gcr.io/<your-project-id>/genkit-service \ --update-secrets=GEMINI_API_KEY=<your-secret-name>:latest4. Test the deployed flow
Section titled “4. Test the deployed flow”After deployment, the tool will print the service URL. Test your endpoints:
# Save the service URLSERVICE_URL="https://<service-url>"
# Test health endpointcurl $SERVICE_URL/health
# Test joke generationcurl -X POST $SERVICE_URL/joke \ -H "Content-Type: application/json" \ -d '{"topic": "programming"}'
# With IAM authentication (if required)curl -X POST $SERVICE_URL/joke \ -H "Authorization: Bearer $(gcloud auth print-identity-token)" \ -H "Content-Type: application/json" \ -d '{"topic": "artificial intelligence"}'
# Test summarizationcurl -X POST $SERVICE_URL/summarize \ -H "Authorization: Bearer $(gcloud auth print-identity-token)" \ -H "Content-Type: application/json" \ -d '{"text": "Cloud Run is a fully managed compute platform that automatically scales your stateless containers. It abstracts away infrastructure management so you can focus on building applications."}'5. View automatic API documentation
Section titled “5. View automatic API documentation”FastAPI automatically generates interactive API documentation. After deployment, visit:
- Swagger UI:
https://<service-url>/docs - ReDoc:
https://<service-url>/redoc
These provide interactive documentation where you can test your endpoints directly in the browser.
Advanced Configuration
Section titled “Advanced Configuration”Environment Variables
Section titled “Environment Variables”Set additional environment variables for your deployment:
gcloud run deploy genkit-service \ --source . \ --set-env-vars LOG_LEVEL=info \ --set-env-vars MAX_WORKERS=4 \ --update-secrets=GEMINI_API_KEY=<your-secret-name>:latestResource Limits
Section titled “Resource Limits”Configure CPU and memory allocation:
gcloud run deploy genkit-service \ --source . \ --cpu 2 \ --memory 2Gi \ --max-instances 10 \ --update-secrets=GEMINI_API_KEY=<your-secret-name>:latestCustom Domains
Section titled “Custom Domains”Add a custom domain to your Cloud Run service:
# Map your domaingcloud run domain-mappings create \ --service genkit-service \ --domain api.yourdomain.comMonitoring and Logging
Section titled “Monitoring and Logging”View logs in Cloud Console or using gcloud:
# Stream logsgcloud run logs tail genkit-service --follow
# View recent logsgcloud run logs read genkit-service --limit 50Production Best Practices
Section titled “Production Best Practices”1. Use Structured Logging
Section titled “1. Use Structured Logging”import loggingimport json
logging.basicConfig( level=logging.INFO, format='%(message)s')logger = logging.getLogger(__name__)
@app.post("/joke")async def generate_joke(request: JokeRequest): logger.info(json.dumps({ "event": "joke_request", "topic": request.topic })) joke = await joke_flow(request.topic) logger.info(json.dumps({ "event": "joke_generated", "topic": request.topic, "length": len(joke) })) return JokeResponse(joke=joke, topic=request.topic)2. Add Request Validation
Section titled “2. Add Request Validation”from fastapi import Requestimport time
@app.middleware("http")async def add_process_time_header(request: Request, call_next): """Add processing time to response headers.""" start_time = time.time() response = await call_next(request) process_time = time.time() - start_time response.headers["X-Process-Time"] = str(process_time) return response3. Implement Rate Limiting
Section titled “3. Implement Rate Limiting”Use Cloud Armor or implement rate limiting in your application:
from slowapi import Limiter, _rate_limit_exceeded_handlerfrom slowapi.util import get_remote_addressfrom slowapi.errors import RateLimitExceeded
limiter = Limiter(key_func=get_remote_address)app.state.limiter = limiterapp.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.post("/joke")@limiter.limit("10/minute")async def generate_joke(request: Request, joke_request: JokeRequest): joke = await joke_flow(joke_request.topic) return JokeResponse(joke=joke, topic=joke_request.topic)4. Enable CORS for Web Applications
Section titled “4. Enable CORS for Web Applications”from fastapi.middleware.cors import CORSMiddleware
app.add_middleware( CORSMiddleware, allow_origins=["https://yourdomain.com"], allow_credentials=True, allow_methods=["POST", "GET"], allow_headers=["*"],)Next Steps
Section titled “Next Steps”- Learn about FastAPI integration for more advanced patterns
- Explore authorization options for securing your endpoints
- Set up observability to monitor your deployed flows