The Open-Source AI Stack

Models · gemini

Gemini 2.5 Flash

Proprietary Google DeepMind · — · Proprietary

Google's cost-efficient thinking model in the 2.5 family, sharing the reasoning-first design that Google introduced with 2.5 Pro on March 25 2025. Google described the 2.5 family as 'thinking models, capable of reasoning through their thoughts before responding,' with a 1M token context window at launch and 2M planned.

Cost

$0.30 / Mtok input

$2.50 / Mtok output

Google API · as of 2026-05-21

Speed

196.8 tok/sec output

560 ms TTFT

· as of 2026-05-21

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: unknown
Total params: not disclosed
Active params: not disclosed
Context window: 1.0M tokens
Attention: unknown
Position encoding: unknown
Post-training: rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro	80.9	as of 2026-05-21	source ↗
GPQA-Diamond	68.3	as of 2026-05-21	source ↗

Code

LiveCodeBench

49.5

as of 2026-05-21

Math

MATH	93.2	as of 2026-05-21	source ↗
AIME 2024	50.0	as of 2026-05-21	source ↗
AIME 2025	60.3	as of 2026-05-21	source ↗

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

· Thinking-enabled reasoning at Flash tier
· 1M token context with thinking
· Configurable thinking budget

Sources

Gemini 2.5: Our most intelligent AI model (Google, Mar 25 2025) ↗