The Open-Source AI Stack
RSS
All models

Models · Compare

Llama-3.1-Nemotron Ultra 253B v1 vs OpenAI o3

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Llama-3.1-Nemotron Ultra 253B v1 B: OpenAI o3
Released 2025-04-112025-04-16
Developer NVIDIAOpenAI
Openness Open weightsProprietary
License NVIDIA Open Model LicenseProprietary
OSI-approved nono
Data released yesno
Training code nono
Architecture denseunknown
Total params 253B
Active params
Experts
Context window 131K200K
Attention skip-attentionunknown
Position enc. ropeunknown
Pretraining tokens 65B
Post-training sft, grporlhf
Training hardware H100
$/M input $2.00
$/M output $8.00
Output tok/sec 88

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 85.3 2026-05-21
GPQA-Diamond 76.0 2025-04-11 87.7 2025-04-16

Code

SWE-Bench Verified 71.7 2025-04-16
LiveCodeBench 66.3 2025-04-11 80.8 2026-05-21

Math

MATH 97.0 2025-04-11 99.2 2026-05-21
AIME 2024 90.3 2026-05-21
AIME 2025 72.5 2025-04-11 88.3 2026-05-21

Held-out / arena

IFEval 88.8 2025-04-11

Context · A

Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.

Context · B

Released April 16, 2025 as the full-size successor to o1 in the o-series reasoning lineage, with multimodal (text plus image) input and a 200K-token context. OpenAI reported large jumps over o1 on SWE-bench Verified and Codeforces and roughly 3x the accuracy of o1 on ARC-AGI.

Llama-3.1-Nemotron Ultra 253B v1 detail → · OpenAI o3 detail →