Models · Compare
Grok 4.1 vs OLMo 3 32B Instruct
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Grok 4.1 | B: OLMo 3 32B Instruct |
|---|---|---|
| Released | — | 2025-11-20 |
| Developer | xAI | AI2 |
| Openness | Proprietary | Open |
| License | Proprietary | Apache-2.0 |
| OSI-approved | no | yes |
| Data released | no | yes |
| Training code | no | yes |
| Architecture | unknown | dense |
| Total params | — | 32B |
| Active params | — | — |
| Experts | — | — |
| Context window | — | 66K |
| Attention | unknown | gqa |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 6.0T |
| Post-training | rlhf | sft, dpo, rlvr |
| Training hardware | — | — |
| $/M input | — | — |
| $/M output | — | — |
| Output tok/sec | — | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU | — | 85.4 2025-11-20 |
Math
| MATH | — | 96.1 2025-11-20 |
| AIME 2024 | — | 76.8 2025-11-20 |
Held-out / arena
| IFEval | — | 89.0 2025-11-20 |
Context · A
Incremental Grok 4 update released November 17 2025 after a two-week silent rollout where xAI ran live blind evaluations and reported users picked 4.1 responses 64.78 percent of the time. Headline gains were emotional intelligence and reduced hallucinations. Shipped alongside Grok 4.1 Fast with a 2M-token context window.
Context · B
AI2's first 32B fully open frontier model, released November 20 2025 alongside a 7B and Base, Think, Instruct, and RL-Zero variants. Built on the Dolma 3 9.3T-token corpus (5.9T used for pretraining) and the Dolci post-training suite. All weights, data descriptions, intermediate checkpoints, and code released.