Glossary

fine-tuning

Continued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch.

Training also: Weights aka fine-tune, finetune, finetuning

Continuing gradient descent on a pretrained base model with a smaller, curated dataset to adapt the model to a domain, style, or task. The parameter updates are smaller and the run is shorter than pretrainingtrainingThe first and most compute-expensive training phase, where a base model learns general capabilities by predicting the next token on trillions of words of web and book data. Open full entry , typically hours to days on a handful of GPUs instead of weeks on thousands.

Three common shapes. Full fine-tuning updates every parameter. Parameter- efficient fine-tuning (PEFTtrainingA family of fine-tuning methods that update only a small fraction of a base model's parameters, making fine-tuning feasible on consumer hardware and storage-efficient at deployment. Open full entry ) methods like LoRA and QLoRA update a small set of adapters with the base frozen. Instruction tuning is fine- tuning on instruction-response pairs to make a base model follow prompts; preference tuning via DPO is fine-tuning on ranked pairs to shape style and refusal behavior.

Open-source training stacks (AxolotltrainingAn open YAML-driven fine-tuning framework that orchestrates Hugging Face Transformers, PEFT, TRL, and DeepSpeed for one-shot LoRA, QLoRA, and full fine-tuning workflows. Open full entry , UnslothtrainingAn open fine-tuning library that uses hand-written Triton kernels and a manual gradient implementation to run LoRA and QLoRA fine-tuning roughly 2x faster than the Hugging Face baseline. Open full entry , TRLtrainingHugging Face's library for preference and reinforcement learning on transformer models, the canonical open implementation of RLHF, DPO, KTO, ORPO, and related preference-tuning methods. Open full entry , LlamaweightsMeta's open-weight model family, the most widely deployed open release through 2024 to 2026, released under the source-available Community License with an MAU cap and acceptable-use clause. Open full entry -Factory) all target fine-tuning as the primary use case. The trade-off versus a hosted API: more control over what the model knows and refuses, at the cost of GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry time and the data work to assemble a good fine-tuning set.

Sources

Hugging Face: Fine-tuning a pretrained model

Back to glossary

fine-tuning

Sources

Mentioned in