Models · Compare
Claude Opus 4.5 vs OLMo 3 32B Instruct
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Claude Opus 4.5 | B: OLMo 3 32B Instruct |
|---|---|---|
| Released | 2025-11-24 | 2025-11-20 |
| Developer | Anthropic | AI2 |
| Openness | Proprietary | Open |
| License | Proprietary | Apache-2.0 |
| OSI-approved | no | yes |
| Data released | no | yes |
| Training code | no | yes |
| Architecture | unknown | dense |
| Total params | — | 32B |
| Active params | — | — |
| Experts | — | — |
| Context window | — | 66K |
| Attention | unknown | gqa |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 6.0T |
| Post-training | rlhf, constitutional | sft, dpo, rlvr |
| Training hardware | — | — |
| $/M input | $5.00 | — |
| $/M output | $25.00 | — |
| Output tok/sec | 51.8 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU | — | 85.4 2025-11-20 |
| MMLU-Pro | 88.9 2026-05-21 | — |
| GPQA-Diamond | 81.0 2026-05-21 | — |
Code
| LiveCodeBench | 73.8 2026-05-21 | — |
Math
| MATH | — | 96.1 2025-11-20 |
| AIME 2024 | — | 76.8 2025-11-20 |
| AIME 2025 | 62.7 2026-05-21 | — |
Held-out / arena
| IFEval | — | 89.0 2025-11-20 |
Context · A
Anthropic's first Opus to cross 80 percent on SWE-Bench Verified per the lab's own numbers, released November 24 2025 at a two-thirds price cut versus Opus 4.1 ($5 / $25 per Mtok). Added an effort parameter for adjustable reasoning intensity, with medium-effort runs matching Sonnet 4.5 using 76 percent fewer output tokens.
Context · B
AI2's first 32B fully open frontier model, released November 20 2025 alongside a 7B and Base, Think, Instruct, and RL-Zero variants. Built on the Dolma 3 9.3T-token corpus (5.9T used for pretraining) and the Dolci post-training suite. All weights, data descriptions, intermediate checkpoints, and code released.