Mellea
build predictable AI without guesswork

Inside every AI-powered pipeline, workflow, or script, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs, no guarantees. Mellea wraps those calls in Python you can read, test, and reason about: type-annotated outputs, verifiable requirements, automatic retries.

GitHub Quick Start

· ·

How it works

Replace flaky agents and brittle prompts with structured, testable Python. Mellea lets you instruct LLMs, validate outputs against your requirements, and recover from failures automatically. Works across OpenAI, Ollama, vLLM, HuggingFace, Watsonx, LiteLLM, and Bedrock.

Get started

Python not Prose The @generative decorator turns typed function signatures into LLM specifications. Docstrings are prompts, type hints are schemas — no templates, no parsers. Learn more
Constrained Decoding Grammar-constrained generation for Ollama, vLLM, and HuggingFace. Unlike Instructor and PydanticAI, valid output is enforced at the token level — not retried into existence. Learn more
Requirements Driven Define rules for any LLM call. Mellea validates outputs against them and retries automatically — bad output never reaches your users. Learn more
Predictable and Resilient Pluggable sampling strategies — rejection sampling, majority voting, inference-time scaling. One parameter change. No rewrites. Learn more
MCP Compatible Expose any Mellea program as an MCP tool. The calling agent gets validated output — requirements checked, retries run — not raw LLM responses. Learn more

Get started

See it in action

Write a typed Python function, get structured LLM output. Docstrings are prompts, type hints are schemas — no parsers, no chains.

Learn more

from typing import Literal
from pydantic import BaseModel
from mellea import generative

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    score: int    # 1-5
    summary: str  # one sentence

@generative
def analyze_review(text: str) -> ReviewAnalysis:
    """Extract sentiment, a 1-5 score, and a one-sentence summary."""
    ...

result = analyze_review(m, text="Battery life is great but the screen is dim")

print(result.sentiment)  # "positive", "negative", or "neutral" — always
print(result.score)      # an int, 1-5 — always
print(result.summary)    # a str — always

Add requirements to any LLM call. Mellea validates outputs and retries automatically — swap between rejection sampling, majority voting, and more with one parameter.

Learn more

import mellea
from mellea.stdlib.sampling import RejectionSamplingStrategy


def write_email_with_strategy(m: mellea.MelleaSession, name: str, notes: str) -> str:
    email_candidate = m.instruct(
        f"Write an email to {name} using the notes following: {notes}.",
        requirements=[
            "The email should have a salutation.",
            "Use a formal tone.",
        ],
        strategy=RejectionSamplingStrategy(loop_budget=3),
        return_sampling_results=True,
    )

    if email_candidate.success:
        return str(email_candidate.result)

    # If sampling fails, use the first generation
    print("Expect sub-par result.")
    return email_candidate.sample_generations[0].value

Add LLM query capabilities to any existing Python class with a single decorator. No rewrites, no wrappers.

Learn more

from mellea.stdlib.components.mify import mify

@mify
class Customer:
    def __init__(self, name: str, last_purchase: str) -> None:
        self.name = name
        self.last_purchase = last_purchase

customer = Customer("Alice", "noise-cancelling headphones")
answer = m.query(customer, "What would Alice enjoy as a follow-up gift?")
print(str(answer))

from typing import Literal
from pydantic import BaseModel
from mellea import generative

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    score: int    # 1-5
    summary: str  # one sentence

@generative
def analyze_review(text: str) -> ReviewAnalysis:
    """Extract sentiment, a 1-5 score, and a one-sentence summary."""
    ...

result = analyze_review(m, text="Battery life is great but the screen is dim")

print(result.sentiment)  # "positive", "negative", or "neutral" — always
print(result.score)      # an int, 1-5 — always
print(result.summary)    # a str — always

From Magical Incantations to Predictable Systems

The next era of software requires moving past "agent soup" and opaque prompting. Mellea represents a shift toward generative computing — where complex reasoning is decomposed into verifiable, composable tasks. We are building toward a world where you can integrate LLMs into mission-critical systems with the same confidence as calling any other function: testable, traceable, and fully under your control.

Mellea build predictable AI without guesswork

How it works

See it in action

From Magical Incantations to Predictable Systems

Mellea
build predictable AI without guesswork