Mellea is a library for writing generative programs.
A generative program is any computer program that contains calls to an LLM. As we will see throughout the documentation, LLMs can be incorporated into software in a wide variety of ways. Some ways of incorporating LLMs into programs tend to result in robust and performant systems, while others result in software that is brittle and error-prone.Generative programs are distinguished from classical programs by their use of functions that invoke generative models. These generative calls can produce many different data types — strings, booleans, structured data, code, images/video, and so on. The model(s) and software underlying generative calls can be combined and composed in certain situations and in certain ways (e.g., LoRA adapters as a special case). In addition to invoking generative calls, generative programs can invoke other functions, written in languages that do not have an LLM in their base, so that we can, for example, pass the output of a generative function into a DB retrieval system and feed the output of that into another generator. Writing generative programs is difficult because generative programs interleave deterministic and stochastic operations.Requirement verification plays an important role in circumscribing periods of nondeterminism in a generative program. We can implement validators that produce boolean or other outputs, and repeat loops until the validator says yes, or perhaps the iteration count gets too high and we trigger some exception handling process. Thus we can determine the degree of certainty in the output of a generative function and then act based upon the amount of certainty. Verification can happen in a variety of ways — from querying a generative function, to precise programmatic checks, and a variety of combinations besides.In programs that contain long computation paths — including most that contain iteration or recursion — incremental accrual of uncertainty is multiplicative, and therefore must itself be occasionally circumscribed by incremental requirement verification throughout the generative program’s execution. These incremental checks can be used to establish patterns of variation, or properties which are invariant, both of which can help ensure that the execution converges to a desired state and does not “go wrong”. The construction of these incremental checks is one of the important tasks in generative programming, and can itself be treated as a task amenable to generative programming. Like other requirement checks, these variants and invariants may be explicit and programmatic or can be solved via a generative function. In any case, each generative program results in a trace of computations — some successful, others failures.Figuring out what to do about failure paths is yet another crux faced by authors of generative programs. Successful traces can be collected, leading to a final high-confidence result; alternatively, traces with some failures or low-confidence answers can accumulate. Generative programs then try to repair these failed validations. The repair process can be manual, or automated, or offer a combination of user interactions and automated repair mechanisms. As a generative program executes in this way, context accrues. The accrual of ever-larger contexts becomes a challenge unto itself.Memory management therefore plays an important role in context engineering. Mellea therefore provides a mechanism for mapping components of KV Cache onto developer and user-facing abstractions, and for automating the construction of context and handling of cached keys and values.As the Mellea developers built this library for generative programming, we found some useful principles that you will see re-occur throughout this tutorial:
circumscribe LLM calls with requirement verifiers. We will see variations on this principle throughout the tutorial.
Generative programs should use simple and composable prompting styles. Mellea takes a middle-ground between the “framework chooses the prompt” and “client code chooses the prompt” paradigms. By keeping prompts small and self-contained, then chaining together many such prompts, we can usually get away with one of a few prompt styles. When a new prompt style is needed, that prompt should be co-designed with the software that will use the prompt. In Mellea, we encourage this by decomposing generative programs into Components; more on this in Architecture.
Generative models and inference-time programs should be co-designed. Ideally, the style and domain of prompting used at inference time should match the style and domain of prompting using in pretraining, mid-training, and/or post-training. And, similarly, models should be built with runtime components and use-patterns in mind. We will see some early examples of this in Adapters.
Generative programs should carefully manage context. Each Component manages context of a single call, as we see in Quickstart, Architecture, Generative Slots, and MObject. Additionally, Mellea provides some useful mechanisms for re-using context across multiple calls (Context Management).
Although good generative programs can be written in any language and framework, getting it right is not trivial. Mellea is just one point in the design space of LLM libraries, but we think it is a good one. Our hope is that Mellea will help you write generative programs that are robust, performant, and fit-for-purpose.