Models · Compare
OpenAI o3 vs Llama-3.1-Nemotron Ultra 253B v1
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: OpenAI o3 | B: Llama-3.1-Nemotron Ultra 253B v1 |
|---|---|---|
| Released | 2025-04-16 | 2025-04-11 |
| Developer | OpenAI | NVIDIA |
| Openness | Proprietary | Open weights |
| License | Proprietary | NVIDIA Open Model License |
| OSI-approved | no | no |
| Data released | no | yes |
| Training code | no | no |
| Architecture | unknown | dense |
| Total params | — | 253B |
| Active params | — | — |
| Experts | — | — |
| Context window | 200K | 131K |
| Attention | unknown | skip-attention |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 65B |
| Post-training | rlhf | sft, grpo |
| Training hardware | — | H100 |
| $/M input | $2.00 | — |
| $/M output | $8.00 | — |
| Output tok/sec | 88 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 85.3 2026-05-21 | — |
| GPQA-Diamond | 87.7 2025-04-16 | 76.0 2025-04-11 |
Code
| SWE-Bench Verified | 71.7 2025-04-16 | — |
| LiveCodeBench | 80.8 2026-05-21 | 66.3 2025-04-11 |
Math
| MATH | 99.2 2026-05-21 | 97.0 2025-04-11 |
| AIME 2024 | 90.3 2026-05-21 | — |
| AIME 2025 | 88.3 2026-05-21 | 72.5 2025-04-11 |
Held-out / arena
| IFEval | — | 88.8 2025-04-11 |
Context · A
Released April 16, 2025 as the full-size successor to o1 in the o-series reasoning lineage, with multimodal (text plus image) input and a 200K-token context. OpenAI reported large jumps over o1 on SWE-bench Verified and Codeforces and roughly 3x the accuracy of o1 on ARC-AGI.
Context · B
Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.
OpenAI o3 detail → · Llama-3.1-Nemotron Ultra 253B v1 detail →