Models · Compare
OpenAI o4-mini vs Llama-3.1-Nemotron Ultra 253B v1
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: OpenAI o4-mini | B: Llama-3.1-Nemotron Ultra 253B v1 |
|---|---|---|
| Released | 2025-04-16 | 2025-04-11 |
| Developer | OpenAI | NVIDIA |
| Openness | Proprietary | Open weights |
| License | Proprietary | NVIDIA Open Model License |
| OSI-approved | no | no |
| Data released | no | yes |
| Training code | no | no |
| Architecture | unknown | dense |
| Total params | — | 253B |
| Active params | — | — |
| Experts | — | — |
| Context window | 200K | 131K |
| Attention | unknown | skip-attention |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 65B |
| Post-training | rlhf | sft, grpo |
| Training hardware | — | H100 |
| $/M input | $1.10 | — |
| $/M output | $4.40 | — |
| Output tok/sec | 151.2 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 83.2 2026-05-21 | — |
| GPQA-Diamond | 78.4 2026-05-21 | 76.0 2025-04-11 |
Code
| LiveCodeBench | 85.9 2026-05-21 | 66.3 2025-04-11 |
Math
| MATH | 98.9 2026-05-21 | 97.0 2025-04-11 |
| AIME 2024 | 94.0 2026-05-21 | — |
| AIME 2025 | 90.7 2026-05-21 | 72.5 2025-04-11 |
Held-out / arena
| IFEval | — | 88.8 2025-04-11 |
Context · A
Released April 16, 2025 alongside o3 as a smaller multimodal reasoning model with a 200K-token context. Initially launched to paid subscribers and expanded to all ChatGPT users on April 24, 2025; OpenAI positioned it as the cost-efficient successor to o3-mini with image-input support.
Context · B
Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.
OpenAI o4-mini detail → · Llama-3.1-Nemotron Ultra 253B v1 detail →