Skip to main content

Mellea
build predictable AI without guesswork

Inside every AI-powered pipeline, workflow, or script, the unreliable part is the same: the LLM call itself. Silent failures, untestable outputs, no guarantees. Mellea wraps those calls in Python you can read, test, and reason about: type-annotated outputs, verifiable requirements, automatic retries.

GitHub
· ·

How it works

Replace flaky agents and brittle prompts with structured, testable Python. Mellea lets you instruct LLMs, validate outputs against your requirements, and recover from failures automatically. Works across OpenAI, Ollama, vLLM, HuggingFace, Watsonx, LiteLLM, and Bedrock.

Get started
Without Mellea With Mellea
  • Python not Prose The @generative decorator turns typed function signatures into LLM specifications. Docstrings are prompts, type hints are schemas — no templates, no parsers.
  • Requirements Driven Attach verifiable requirements to every LLM call. Mellea checks outputs before your users see them.
  • Predictable and Resilient Pluggable sampling strategies — rejection sampling, majority voting, inference-time scaling. One parameter change. No rewrites.
  • MCP Compatible Expose any Mellea program as an MCP tool. The calling agent gets validated output — requirements checked, retries run — not raw LLM responses.
  • Constrained Decoding Grammar-constrained generation for Ollama, vLLM, and HuggingFace. Unlike Instructor and PydanticAI, valid output is enforced at the token level — not retried into existence.

See it in action

Write a typed Python function, get structured LLM output. Docstrings are prompts, type hints are schemas — no parsers, no chains.

Learn more
@mellea.generative
def classify_sentiment(text: str) -> Literal["positive", "negative"]:
  """Classify the sentiment of the input text as 'positive' or 'negative'."""

sentiment = classify_sentiment(m, text=customer_review)

if sentiment == "positive":
    msg = m.instruct("Thank the customer for their post")
else:
    msg = m.instruct(
       description="Apologize for the customer's negative experience and offer a 5% discount for their next visit",
       grounding_context={"review": customer_review}
    )

post_response(msg)

Add requirements to any LLM call. Mellea validates outputs and retries automatically — swap between rejection sampling, majority voting, and more with one parameter.

Learn more
import mellea
from mellea.stdlib.sampling import RejectionSamplingStrategy


def write_email_with_strategy(m: mellea.MelleaSession, name: str, notes: str) -> str:
    email_candidate = m.instruct(
        f"Write an email to {name} using the notes following: {notes}.",
        requirements=[
            "The email should have a salutation.",
            "Use a formal tone.",
        ],
        strategy=RejectionSamplingStrategy(loop_budget=3),
        return_sampling_results=True,
    )

    if email_candidate.success:
        return str(email_candidate.result)

    # If sampling fails, use the first generation
    print("Expect sub-par result.")
    return email_candidate.sample_generations[0].value

Add LLM query capabilities to any existing Python class with a single decorator. No rewrites, no wrappers.

Learn more
import mellea
from mellea.stdlib.mify import mify, MifiedProtocol
import pandas
from io import StringIO


@mify(fields_include={"table"}, template="{{ table }}")
class MyCompanyDatabase:
  table: str = """| Store      | Sales   |
                    | ---------- | ------- |
                    | Northeast  | $250    |
                    | Southeast  | $80     |
                    | Midwest    | $420    |"""

  def transpose(self):
    pandas.read_csv(
      StringIO(self.table),
      sep='|',
      skipinitialspace=True,
      header=0,
      index_col=False
    )


m = mellea.start_session()
db = MyCompanyDatabase()
assert isinstance(db, MifiedProtocol)
answer = m.query(db, "What were sales for the Northeast branch this month?")
print(str(answer))
@mellea.generative
def classify_sentiment(text: str) -> Literal["positive", "negative"]:
  """Classify the sentiment of the input text as 'positive' or 'negative'."""

sentiment = classify_sentiment(m, text=customer_review)

if sentiment == "positive":
    msg = m.instruct("Thank the customer for their post")
else:
    msg = m.instruct(
       description="Apologize for the customer's negative experience and offer a 5% discount for their next visit",
       grounding_context={"review": customer_review}
    )

post_response(msg)