Glossary

PEFT

A family of fine-tuning methods that update only a small fraction of a base model's parameters, making fine-tuning feasible on consumer hardware and storage-efficient at deployment.

Training also: Weights aka parameter-efficient fine-tuning

A bucket of fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch. Open full entry techniques that leave the base model frozen and train a small set of new parameters. LoRA is the dominant one; others include adapter layers (Houlsby 2019), prefix tuning, prompt tuning, and IA3. All trade some expressiveness for orders-of-magnitude reductions in trainable parameter count and adapter storage.

The practical benefits compound at deployment. A single base model can serve many fine-tuned personalities or domains by swapping adapters at inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry time, and adapters themselves are megabytes rather than gigabytes. vLLMruntimeAn open-source inference engine introduced by UC Berkeley in 2023, built around PagedAttention to manage KV cache memory and serve tokens efficiently under load. Open full entry , SGLangruntimeAn open inference engine from the LMSYS team featuring RadixAttention for prefix sharing and a structured-generation frontend, particularly strong on agent and tool-calling workloads. Open full entry , and other open runtimes support per-request adapter selection for multi-tenant LoRAtrainingA parameter-efficient fine-tuning method that injects small low-rank adapter matrices into a frozen base model, training a tiny fraction of weights instead of the full model. Open full entry serving.

The Hugging FacetrainingThe model hub, dataset hub, and open-source library suite (Transformers, Datasets, Tokenizers, Accelerate, PEFT, TRL) that anchors the open-AI ecosystem's distribution and tooling layer. Open full entry PEFT library is the canonical implementation. It maps the same PeftModel interface over LoRAtrainingA parameter-efficient fine-tuning method that injects small low-rank adapter matrices into a frozen base model, training a tiny fraction of weights instead of the full model. Open full entry , AdaLoRA, prefix tuning, prompt tuning, IA3, and the related variants, so training code can change methods with a one-line config edit.

Sources

Mentioned in

Back to glossary