Models · Compare

Gemini 1.5 Pro vs StarCoder 2 15B

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Gemini 1.5 Pro	B: StarCoder 2 15B
Released	2024-02-15	2024-02-28
Developer	Google DeepMind	BigCode
Openness	Proprietary	Source-available
License	Proprietary	BigCode OpenRAIL-M v1
OSI-approved	no	no
Data released	no	yes
Training code	no	yes
Architecture	unknown	dense
Total params	—	15B
Active params	—	—
Experts	—	—
Context window	2.1M	16K
Attention	unknown	hybrid-gqa-sliding
Position enc.	unknown	rope
Pretraining tokens	—	4.0T
Post-training	rlhf	—
Training hardware	—	H100
$/M input	$0.00	—
$/M output	$0.00	—
Output tok/sec	0	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	75.0 2026-05-21	—
GPQA-Diamond	58.9 2026-05-21	—

Code

HumanEval	—	46.3 2024-02-29
LiveCodeBench	31.6 2026-05-21	—

Math

MATH	87.6 2026-05-21	—
AIME 2024	23.0 2026-05-21	—

Context · A

Google's first long-context Gemini checkpoint, introduced with a 128K standard window and a 1M token preview tier. Google described the design as a mixture-of-experts that activates a subset of expert networks per input, and demonstrated 99% recall on needle-in-a-haystack across 1M tokens at launch. The context window was later extended to 2M tokens in private preview, announced May 14 2024.

Context · B

BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.

Gemini 1.5 Pro detail → · StarCoder 2 15B detail →