Models · Compare

StarCoder 2 15B vs Claude 3 Opus

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: StarCoder 2 15B	B: Claude 3 Opus
Released	2024-02-28	2024-03-04
Developer	BigCode	Anthropic
Openness	Source-available	Proprietary
License	BigCode OpenRAIL-M v1	Proprietary
OSI-approved	no	no
Data released	yes	no
Training code	yes	no
Architecture	dense	unknown
Total params	15B	—
Active params	—	—
Experts	—	—
Context window	16K	200K
Attention	hybrid-gqa-sliding	unknown
Position enc.	rope	unknown
Pretraining tokens	4.0T	—
Post-training	—	rlhf, constitutional
Training hardware	H100	—
$/M input	—	$15.00
$/M output	—	$75.00
Output tok/sec	—	0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	—	69.6 2026-05-21
GPQA-Diamond	—	48.9 2026-05-21

Code

HumanEval	46.3 2024-02-29	—
LiveCodeBench	—	27.9 2026-05-21

Math

MATH	—	64.1 2026-05-21
AIME 2024	—	3.3 2026-05-21

Context · A

BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.

Context · B

The flagship of the Claude 3 family at launch, marketed as Anthropic's most capable model with vision input and a 200K context window. Set the price ceiling for closed frontier chat at $15 input / $75 output per million tokens, a tier matched by GPT-4 Turbo at the time.

StarCoder 2 15B detail → · Claude 3 Opus detail →