Open Source · Apache 2.0

Mellea
build predictable AI without guesswork

Inside every AI-powered pipeline, workflow, or script, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs. Mellea lets you test and reason about every LLM call using type-annotated outputs, verifiable requirements, and automatic retries.

uv pip install mellea
Get Started →Read the blog
Unittestable
100%Open source
Typedconstrained output
AnyLLM provider

How it works

Structured, testable Python for every LLM call — no more flaky agents or brittle prompts. Mellea lets you instruct LLMs, validate outputs against your requirements, and recover from failures automatically. Works across OpenAI, Ollama, vLLM, HuggingFace, Watsonx, LiteLLM, and Bedrock.

Without Mellea — complex unstructured system prompt
With Mellea — structured, type-annotated Python

Python not Prose

The @generative decorator turns typed function signatures into LLM specifications. Docstrings are prompts, type hints are schemas — no templates, no parsers.

Learn more →

Constrained Decoding

Grammar-constrained generation for Ollama, vLLM, and HuggingFace. Unlike Instructor and PydanticAI, valid output is enforced at the token level — not retried into existence.

Learn more →

Requirements Driven

Declare rules — tone, length, content, custom logic — and Mellea validates every output before it leaves. Automatic retries mean bad output never reaches your users.

Learn more →

Predictable and Resilient

Need higher confidence? Switch from single-shot to majority voting or best-of-n with one parameter. No code rewrites, no new infrastructure.

Learn more →

MCP Compatible

Expose any Mellea program as an MCP tool. The calling agent gets validated output — requirements checked, retries run — not raw LLM responses.

Learn more →

Safety & Guardrails

Built-in Granite Guardian integration detects harmful outputs, hallucinations, and jailbreak attempts before they reach your users — no external service required.

Learn more →

See it in action

python
from typing import Literal
from pydantic import BaseModel
from mellea import generative, start_session

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    score: int    # 1-5
    summary: str  # one sentence

@generative
def analyze_review(text: str) -> ReviewAnalysis:
    """Extract sentiment, a 1-5 score, and a one-sentence summary."""
    ...

m = start_session()
result = analyze_review(m, text="Battery life is great but the screen is dim")

print(result.sentiment)  # "positive", "negative", or "neutral" — always
print(result.score)      # an int, 1-5 — always
print(result.summary)    # a str — always

The next era of software requires moving past “agent soup” and opaque prompting. Mellea brings the rigor of traditional software engineering to generative AI — decomposed, verifiable, composable tasks that you can test, debug, and trust.