The Open-Source AI Stack
RSS
All models

Models · gemini-3

Gemini 3 Flash

Proprietary Google DeepMind · 2025-12-17 · Proprietary

Speed tier of the Gemini 3 family, released December 17 2025. Google reported 3x the speed of Gemini 2.5 Pro while outperforming it on coding and reasoning, with 30 percent fewer tokens on average for everyday tasks. Became the default in the Gemini app and rolled into Search AI Mode at launch.

Cost

$0.50 / Mtok input
$3.00 / Mtok output

Google Gemini API · as of 2026-05-21

source ↗

Speed

185.9 tok/sec output
1022 ms TTFT

· as of 2026-05-21

source ↗

Architecture

tokens in Embedding vocab not disclosed × N layers Architecture not disclosed (proprietary or undocumented) Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
unknown
Total params
not disclosed
Active params
not disclosed
Context window
1.0M tokens
Attention
unknown
Position encoding
unknown
Post-training
rlhf
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro 88.2 as of 2026-05-21 source ↗
GPQA-Diamond 90.4 as of 2025-12-17 source ↗

Code

SWE-Bench Verified 78.0 as of 2025-12-17 source ↗
LiveCodeBench 79.7 as of 2026-05-21 source ↗

Math

AIME 2025 55.7 as of 2026-05-21 source ↗

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

  • · Pro-grade reasoning at Flash latency
  • · 30 percent token reduction on average tasks
  • · Day-one default in Gemini app and Search AI Mode

Lineage

Speed-optimized Gemini 3 sibling at one-tenth Pro pricing.

Sources