Glossary
BF16
A 16-bit floating-point format with FP32's exponent range and only 7 mantissa bits. Designed for neural-network training; standard across 2026 accelerators alongside FP16.
A 16-bit floating-point format with 1 sign bit, 8 exponent bits, and 7 mantissa bits, designed by Google Brain (the name is short for brain floating point). The crucial design choice is matching FP32’s exponent range while sacrificing mantissa precision. Numbers that would overflow or underflow in FP16siliconA 16-bit floating-point format used as the default precision for deep learning training and inference, halving memory versus FP32 with small quality cost on most workloads. Open full entry stay in range under BF16, which removes the loss-scaling gymnastics that FP16 training requires.
NVIDIA added native BF16 support in the Ampere generation (A100, 2020) and AMD, Intel, Apple, and most newer accelerators followed. The format is the default for transformer training on every major 2026 platform. Llama, Mistral, Qwen, Gemma, OLMo, and most other open-weights releases ship BF16 base weights with FP16siliconA 16-bit floating-point format used as the default precision for deep learning training and inference, halving memory versus FP32 with small quality cost on most workloads. Open full entry or FP8siliconAn 8-bit floating-point format used for AI inference and increasingly for training, halving memory and bandwidth versus FP16 with minimal quality loss on most workloads. Open full entry quantizations layered on for inference.
BF16 and FP16 use the same 16-bit width but trade precision for range in opposite directions. FP16 has more mantissa precision (10 bits vs 7), so single arithmetic operations are slightly more accurate; BF16 has the full FP32 exponent range, so training is much more numerically stable. For inference of pretrained weights the two formats are roughly interchangeable on quality; the choice is typically a hardware-support question.