The Open-Source AI Stack
RSS
All models

Models · Compare

Gemini 2.5 Flash vs Llama-3.1-Nemotron Ultra 253B v1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Gemini 2.5 Flash B: Llama-3.1-Nemotron Ultra 253B v1
Released 2025-04-11
Developer Google DeepMindNVIDIA
Openness ProprietaryOpen weights
License ProprietaryNVIDIA Open Model License
OSI-approved nono
Data released noyes
Training code nono
Architecture unknowndense
Total params 253B
Active params
Experts
Context window 1.0M131K
Attention unknownskip-attention
Position enc. unknownrope
Pretraining tokens 65B
Post-training rlhfsft, grpo
Training hardware H100
$/M input $0.30
$/M output $2.50
Output tok/sec 196.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 80.9 2026-05-21
GPQA-Diamond 68.3 2026-05-21 76.0 2025-04-11

Code

LiveCodeBench 49.5 2026-05-21 66.3 2025-04-11

Math

MATH 93.2 2026-05-21 97.0 2025-04-11
AIME 2024 50.0 2026-05-21
AIME 2025 60.3 2026-05-21 72.5 2025-04-11

Held-out / arena

IFEval 88.8 2025-04-11

Context · A

Google's cost-efficient thinking model in the 2.5 family, sharing the reasoning-first design that Google introduced with 2.5 Pro on March 25 2025. Google described the 2.5 family as 'thinking models, capable of reasoning through their thoughts before responding,' with a 1M token context window at launch and 2M planned.

Context · B

Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.

Gemini 2.5 Flash detail → · Llama-3.1-Nemotron Ultra 253B v1 detail →