Why people cared
Llama 3.3 70B is what happens when a lab spends six months on post-training the same base checkpoint. Meta did not release a new pretrain in December 2024; they released a re-trained 70B that reached benchmark scores within range of the 405B sibling on MMLU and HumanEval, at a fraction of the deployment cost. The recipe (published only in the model card, not a paper) emphasized synthetic data from the 405B teacher, additional preference data, and refined rejection sampling. The release lands at an interesting moment for the open-weights story: 70B-class checkpoints from Meta, Qwen, DeepSeek, and Mistral are now close enough on most benchmarks that license, vendor support, and inference economics matter more than capability deltas. Llama 3.3 70B became the default 70B-class checkpoint through 2025 for most production workloads where Apache-2.0 was not a hard requirement. It is also the final dense Llama: the Llama 4 family that followed in April 2025 went MoE, following DeepSeek V3 and Qwen 3.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 70B
- Active params
- 70.6B
- Context window
- 131K tokens
- Attention
- gqa
- Position encoding
- rope-llama3
- Pretraining tokens
- 15.0T
- Training hardware
- H100
- Post-training
- sft, dpo, rejection-sampling
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
Code
| HumanEval | 88.4 | as of 2024-12-06 | source ↗ |
Recommended use cases
- mid-tier production deployment
- code assistance
- general chat at 70B cost
Available quantizations
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Approached 405B-class quality at 70B compute cost
Known limitations
- · Still under Llama Community License, not OSI-approved. source ↗
Lineage
Same base; post-training refresh closed much of the 70B-vs-405B gap.
Derived from
Llama 3.1 70B Instruct 2024-07-23