Open Source · Apache 2.0
Mellea
build predictable AI without guesswork
Inside every AI-powered pipeline, workflow, or script, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs. Mellea lets you test and reason about every LLM call using type-annotated outputs, verifiable requirements, and automatic retries.
Overview
How it works
Structured, testable Python for every LLM call — no more flaky agents or brittle prompts. Mellea lets you instruct LLMs, validate outputs against your requirements, and recover from failures automatically. Works across OpenAI, Ollama, vLLM, HuggingFace, Watsonx, LiteLLM, and Bedrock.
Python not Prose
The @generative decorator turns typed function signatures into LLM specifications. Docstrings are prompts, type hints are schemas — no templates, no parsers.
Constrained Decoding
Grammar-constrained generation for Ollama, vLLM, and HuggingFace. Unlike Instructor and PydanticAI, valid output is enforced at the token level — not retried into existence.
Learn more →Requirements Driven
Declare rules — tone, length, content, custom logic — and Mellea validates every output before it leaves. Automatic retries mean bad output never reaches your users.
Learn more →Predictable and Resilient
Need higher confidence? Switch from single-shot to majority voting or best-of-n with one parameter. No code rewrites, no new infrastructure.
Learn more →MCP Compatible
Expose any Mellea program as an MCP tool. The calling agent gets validated output — requirements checked, retries run — not raw LLM responses.
Learn more →Safety & Guardrails
Built-in Granite Guardian integration detects harmful outputs, hallucinations, and jailbreak attempts before they reach your users — no external service required.
Learn more →Examples
See it in action
Generative Functions
Write a typed Python function, get structured LLM output. Docstrings are prompts, type hints are schemas — no parsers, no chains.
Learn more →Instruct, Validate, Repair
Safety & Guardrails
from typing import Literal
from pydantic import BaseModel
from mellea import generative, start_session
class ReviewAnalysis(BaseModel):
sentiment: Literal["positive", "negative", "neutral"]
score: int # 1-5
summary: str # one sentence
@generative
def analyze_review(text: str) -> ReviewAnalysis:
"""Extract sentiment, a 1-5 score, and a one-sentence summary."""
...
m = start_session()
result = analyze_review(m, text="Battery life is great but the screen is dim")
print(result.sentiment) # "positive", "negative", or "neutral" — always
print(result.score) # an int, 1-5 — always
print(result.summary) # a str — alwaysThe next era of software requires moving past “agent soup” and opaque prompting. Mellea brings the rigor of traditional software engineering to generative AI — decomposed, verifiable, composable tasks that you can test, debug, and trust.

