Glossary
LoRA
A parameter-efficient fine-tuning method that injects small low-rank adapter matrices into a frozen base model, training a tiny fraction of weights instead of the full model.
A fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch. Open full entry technique that freezes the original model weights and learns a pair of low-rank matrices A and B alongside each target layer, where the effective weight update is B times A. The rank r is small (typically 4 to 64), so the number of trainable parameters is a fraction of a percent of the base model.
This decouples adapter storage from base storage: many LoRA adapters can share one base model on disk, swap in at inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry time, and run without a full forward pass through new parameters. Open-source training stacks like AxolotltrainingAn open YAML-driven fine-tuning framework that orchestrates Hugging Face Transformers, PEFT, TRL, and DeepSpeed for one-shot LoRA, QLoRA, and full fine-tuning workflows. Open full entry and UnslothtrainingAn open fine-tuning library that uses hand-written Triton kernels and a manual gradient implementation to run LoRA and QLoRA fine-tuning roughly 2x faster than the Hugging Face baseline. Open full entry ship LoRA as the default fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch. Open full entry path; the Hugging FacetrainingThe model hub, dataset hub, and open-source library suite (Transformers, Datasets, Tokenizers, Accelerate, PEFT, TRL) that anchors the open-AI ecosystem's distribution and tooling layer. Open full entry PEFTtrainingA family of fine-tuning methods that update only a small fraction of a base model's parameters, making fine-tuning feasible on consumer hardware and storage-efficient at deployment. Open full entry library is the canonical reference implementation.
Compared to full fine-tuning: cheaper to train, smaller artifacts to ship,
slightly less expressive on tasks far from the base model’s distribution.
Compared to QLoRA: LoRA keeps the base in original precision; QLoRAtrainingA fine-tuning method that combines 4-bit quantization of the frozen base model with LoRA adapters, making large-model fine-tuning fit on a single consumer GPU.
Open full entry adds
a 4-bit quantized base to fit training on a single consumer GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks.
Open full entry .