Glossary
PEFT
A family of fine-tuning methods that update only a small fraction of a base model's parameters, making fine-tuning feasible on consumer hardware and storage-efficient at deployment.
A bucket of fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch.
Open full entry techniques that leave the base model frozen and
train a small set of new parameters. LoRA is the dominant one; others
include adapter layers (Houlsby 2019), prefix tuning, prompt tuning, and
IA3. All trade some expressiveness for orders-of-magnitude reductions in
trainable parameter count and adapter storage.
The practical benefits compound at deployment. A single base model can serve many fine-tuned personalities or domains by swapping adapters at inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry time, and adapters themselves are megabytes rather than gigabytes. vLLMruntimeAn open-source inference engine introduced by UC Berkeley in 2023, built around PagedAttention to manage KV cache memory and serve tokens efficiently under load. Open full entry , SGLangruntimeAn open inference engine from the LMSYS team featuring RadixAttention for prefix sharing and a structured-generation frontend, particularly strong on agent and tool-calling workloads. Open full entry , and other open runtimes support per-request adapter selection for multi-tenant LoRAtrainingA parameter-efficient fine-tuning method that injects small low-rank adapter matrices into a frozen base model, training a tiny fraction of weights instead of the full model. Open full entry serving.
The Hugging FacetrainingThe model hub, dataset hub, and open-source library suite (Transformers, Datasets, Tokenizers, Accelerate, PEFT, TRL) that anchors the open-AI ecosystem's distribution and tooling layer.
Open full entry PEFT library is the canonical implementation. It maps
the same PeftModel interface over LoRAtrainingA parameter-efficient fine-tuning method that injects small low-rank adapter matrices into a frozen base model, training a tiny fraction of weights instead of the full model.
Open full entry , AdaLoRA, prefix tuning, prompt
tuning, IA3, and the related variants, so training code can change
methods with a one-line config edit.