The Open-Source AI Stack
RSS
All models

Models · gemini

Gemini 2.5 Flash

Proprietary Google DeepMind · · Proprietary

Google's cost-efficient thinking model in the 2.5 family, sharing the reasoning-first design that Google introduced with 2.5 Pro on March 25 2025. Google described the 2.5 family as 'thinking models, capable of reasoning through their thoughts before responding,' with a 1M token context window at launch and 2M planned.

Cost

$0.30 / Mtok input
$2.50 / Mtok output

Google API · as of 2026-05-21

source ↗

Speed

196.8 tok/sec output
560 ms TTFT

· as of 2026-05-21

source ↗

Architecture

tokens in Embedding vocab not disclosed × N layers Architecture not disclosed (proprietary or undocumented) Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
unknown
Total params
not disclosed
Active params
not disclosed
Context window
1.0M tokens
Attention
unknown
Position encoding
unknown
Post-training
rlhf
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro 80.9 as of 2026-05-21 source ↗
GPQA-Diamond 68.3 as of 2026-05-21 source ↗

Code

LiveCodeBench 49.5 as of 2026-05-21 source ↗

Math

MATH 93.2 as of 2026-05-21 source ↗
AIME 2024 50.0 as of 2026-05-21 source ↗
AIME 2025 60.3 as of 2026-05-21 source ↗

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

  • · Thinking-enabled reasoning at Flash tier
  • · 1M token context with thinking
  • · Configurable thinking budget

Sources