Models · Compare

Claude Opus 4.5 vs OLMo 3 32B Instruct

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Claude Opus 4.5	B: OLMo 3 32B Instruct
Released	2025-11-24	2025-11-20
Developer	Anthropic	AI2
Openness	Proprietary	Open
License	Proprietary	Apache-2.0
OSI-approved	no	yes
Data released	no	yes
Training code	no	yes
Architecture	unknown	dense
Total params	—	32B
Active params	—	—
Experts	—	—
Context window	—	66K
Attention	unknown	gqa
Position enc.	unknown	rope
Pretraining tokens	—	6.0T
Post-training	rlhf, constitutional	sft, dpo, rlvr
Training hardware	—	—
$/M input	$5.00	—
$/M output	$25.00	—
Output tok/sec	51.8	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU	—	85.4 2025-11-20
MMLU-Pro	88.9 2026-05-21	—
GPQA-Diamond	81.0 2026-05-21	—

Code

LiveCodeBench

73.8 2026-05-21

—

Math

MATH	—	96.1 2025-11-20
AIME 2024	—	76.8 2025-11-20
AIME 2025	62.7 2026-05-21	—

Held-out / arena

IFEval

—

89.0 2025-11-20

Context · A

Anthropic's first Opus to cross 80 percent on SWE-Bench Verified per the lab's own numbers, released November 24 2025 at a two-thirds price cut versus Opus 4.1 ($5 / $25 per Mtok). Added an effort parameter for adjustable reasoning intensity, with medium-effort runs matching Sonnet 4.5 using 76 percent fewer output tokens.

Context · B

AI2's first 32B fully open frontier model, released November 20 2025 alongside a 7B and Base, Think, Instruct, and RL-Zero variants. Built on the Dolma 3 9.3T-token corpus (5.9T used for pretraining) and the Dolci post-training suite. All weights, data descriptions, intermediate checkpoints, and code released.

Claude Opus 4.5 detail → · OLMo 3 32B Instruct detail →