mellea.backends.huggingface
class mellea.backends.huggingface.HFAloraCacheInfo()
class mellea.backends.huggingface.LocalHFBackend(model_id: str | ModelIdentifier, formatter: Formatter | None = None, use_caches: bool = True, cache: Cache | None = None, custom_config: TransformersTorchConfig | None = None, default_to_constraint_checking_alora: bool = True, model_options: dict | None = None)
Component
s into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397).
This backend is designed for running an HF model for small-scale inference locally on your machine.
This backend is NOT designed for inference scaling on CUDA-enabled hardware.
custom_config
if provided.
WARNING: initializing a LocalHFBackend
will download and load the model on your local machine.
model_id
: str | ModelIdentifier
: Used to load the model and tokenizer via transformers Auto* classes, and then moves the model to the best available device (cuda > mps > cpu). If loading the model and/or tokenizer from a string will not work, or if you want to use a different device string, then you can use custom_config.formatter
: Formatter
: A mechanism for turning stdlib
stuff into strings. Experimental Span-based models should use mellea.backends.span.*
backends.use_caches
: bool
: If set to False, then caching will not be used even if a Cache is provided.cache
: Optional[Cache]
: The caching strategy to use. If None, LRUCache(3)
will be used.custom_config
: Optional[TransformersTorchConfig]
: Overrides loading from the model_id
. If set, then the specified tokenizer/model/device will be used instead of auto-loading from the model_id.default_to_constraint_checking_alora
: If set to False then aloras will be deactivated. This is primarily for performance benchmarking and debugging.model_options
: Optional[dict]
: Default model options.mellea.backends.huggingface.LocalHFBackend.alora_model()
mellea.backends.huggingface.LocalHFBackend.alora_model(model: aLoRAPeftModelForCausalLM | None)
mellea.backends.huggingface.LocalHFBackend.generate_from_context(action: Component | CBlock, ctx: Context, format: type[BaseModelSubclass] | None = None, model_options: dict | None = None, generate_logs: list[GenerateLog] | None = None, tool_calls: bool = False)
mellea.backends.huggingface.LocalHFBackend._generate_from_context_alora(action: Component | CBlock, ctx: Context, format: type[BaseModelSubclass] | None = None, model_options: dict[str, Any], generate_logs: list[GenerateLog] | None = None)
mellea.backends.huggingface.LocalHFBackend._generate_from_context_standard(action: Component | CBlock, ctx: Context, format: type[BaseModelSubclass] | None = None, model_options: dict[str, Any], generate_logs: list[GenerateLog] | None = None, tool_calls: bool = False)
mellea.backends.huggingface.LocalHFBackend._generate_from_raw(actions: list[Component | CBlock], format: type[BaseModelSubclass] | None = None, model_options: dict | None = None, generate_logs: list[GenerateLog] | None = None)
mellea.backends.huggingface.LocalHFBackend.cache_get(id: str)
mellea.backends.huggingface.LocalHFBackend.cache_put(id: str, v: HFAloraCacheInfo)
mellea.backends.huggingface.LocalHFBackend._simplify_and_merge(model_options: dict[str, Any] | None)
model_options
: the model_options for this callmellea.backends.huggingface.LocalHFBackend._make_backend_specific_and_remove(model_options: dict[str, Any])
model_options
: the model_options for this callmellea.backends.huggingface.LocalHFBackend._extract_model_tool_requests(tools: dict[str, Callable], decoded_result: str)
mellea.backends.huggingface.LocalHFBackend.add_alora(alora: HFAlora)
alora
: str
: identifier for the ALora adaptermellea.backends.huggingface.LocalHFBackend.get_alora(alora_name: str)
mellea.backends.huggingface.LocalHFBackend.get_aloras()
class mellea.backends.huggingface.HFAlora(name: str, path_or_model_id: str, generation_prompt: str, backend: LocalHFBackend)
name
: str
: An arbitrary name/label to assign to an ALora. This is irrelevant from the alora’s (huggingface) model id.path_or_model_id
: str
: A local path to ALora’s weights or a Huggingface model_id to an ALora.generation_prompt
: str
: A prompt used to “activate” the Lora. This string goes between the pre-activation context and the aLora generate call. This needs to be provided by the entity that trained the ALora.backend
: LocalHFBackend
: Mained as a pointer to the backend to which this this ALora is attached.