The Open-Source AI Stack
RSS
All models

Models · Compare

Llama 3.3 70B Instruct vs OpenAI o1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Llama 3.3 70B Instruct B: OpenAI o1
Released 2024-12-06
Developer MetaOpenAI
Openness Source-availableProprietary
License Llama 3.3 Community LicenseProprietary
OSI-approved nono
Data released nono
Training code nono
Architecture denseunknown
Total params 70B
Active params
Experts
Context window 131K
Attention gqaunknown
Position enc. rope-llama3unknown
Pretraining tokens 15.0T
Post-training sft, dpo, rejection-samplingrlhf
Training hardware H100
$/M input $15.00
$/M output $60.00
Output tok/sec 75.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU 86.0 2024-12-06
MMLU-Pro 84.1 2026-05-21
GPQA-Diamond 50.5 2024-12-06 77.3 2024-12-05

Code

HumanEval 88.4 2024-12-06
LiveCodeBench 67.9 2026-05-21

Math

MATH 97.0 2026-05-21
AIME 2024 83.3 2024-12-05

Context · A

An incremental post-training refresh of the 70B class that approached 405B-class quality on several benchmarks without the deployment cost. The last dense Llama before Llama 4 went MoE.

Context · B

The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.

Llama 3.3 70B Instruct detail → · OpenAI o1 detail →