Models · Compare

OpenAI o1 vs Llama 3.3 70B Instruct

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: OpenAI o1	B: Llama 3.3 70B Instruct
Released	—	2024-12-06
Developer	OpenAI	Meta
Openness	Proprietary	Source-available
License	Proprietary	Llama 3.3 Community License
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	unknown	dense
Total params	—	70B
Active params	—	—
Experts	—	—
Context window	—	131K
Attention	unknown	gqa
Position enc.	unknown	rope-llama3
Pretraining tokens	—	15.0T
Post-training	rlhf	sft, dpo, rejection-sampling
Training hardware	—	H100
$/M input	$15.00	—
$/M output	$60.00	—
Output tok/sec	75.8	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU	—	86.0 2024-12-06
MMLU-Pro	84.1 2026-05-21	—
GPQA-Diamond	77.3 2024-12-05	50.5 2024-12-06

Code

HumanEval	—	88.4 2024-12-06
LiveCodeBench	67.9 2026-05-21	—

Math

MATH	97.0 2026-05-21	—
AIME 2024	83.3 2024-12-05	—

Context · A

The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.

Context · B

An incremental post-training refresh of the 70B class that approached 405B-class quality on several benchmarks without the deployment cost. The last dense Llama before Llama 4 went MoE.

OpenAI o1 detail → · Llama 3.3 70B Instruct detail →