Gemini 2.5 Pro · Models · The Open-Source AI Stack

Cost

$1.25 / Mtok input

$10.00 / Mtok output

Google API · as of 2026-05-21

via Artificial Analysis ↗

Speed

127.3 tok/sec output

22248 ms TTFT

Google API · as of 2026-05-21

via Artificial Analysis ↗

Why people cared

Gemini 2.5 Pro was the first Gemini release to clearly lead on LMArena Elo, GPQA-Diamond, and AIME 2024 at the same time. Released in March 2025, it folded together Google's investments in long-context modeling (the 1M-token native window had been a Gemini differentiator since 1.5 Pro the previous February) and the reasoning-model wave o1 and R1 had established. The pricing structure (input under $1.25/M for prompts up to 200K, $2.50/M above; output at $10/M with extended thinking included) made it cheaper than Claude 3.7 Sonnet at the headline input rate but more expensive at high output volumes, which split the market between the two for production workloads. Google also published the Gemini 2.5 family across Pro, Flash, and Flash-Lite, with the smaller variants positioning aggressively against open-weights MoE on cost. Gemini's lasting differentiator through 2025 remained its multimodal handling (vision, video, and audio integrated more deeply than competing closed frontiers) and its native long-context performance, where benchmarks measuring needle-in-a-haystack retrieval at 1M tokens consistently favored Gemini 2.5 Pro over peers.

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: unknown
Total params: not disclosed
Active params: not disclosed
Context window: 1.0M tokens
Attention: unknown
Position encoding: unknown
Post-training: rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro

86.2

as of 2026-05-21

source ↗

Code

LiveCodeBench

80.1

as of 2026-05-21

source ↗

Math

MATH	96.7	as of 2026-05-21	source ↗
AIME 2025	87.7	as of 2026-05-21	source ↗

Recommended use cases

long-context tasks
multimodal reasoning
general chat

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

· 1M-token native context
· LMArena Elo leadership at release

Known limitations

· Input pricing tier changes above 200K tokens; long-context use cases can cost 2x the headline rate. source ↗

Lineage

Continues the Gemini Pro line with native 1M-token context.

Sources

Gemini 2.5 Pro announcement (Google DeepMind, Mar 25 2025) ↗