The Open-Source AI Stack
RSS
All models

Models · Compare

StarCoder 2 15B vs Claude 3 Opus

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: StarCoder 2 15B B: Claude 3 Opus
Released 2024-02-282024-03-04
Developer BigCodeAnthropic
Openness Source-availableProprietary
License BigCode OpenRAIL-M v1Proprietary
OSI-approved nono
Data released yesno
Training code yesno
Architecture denseunknown
Total params 15B
Active params
Experts
Context window 16K200K
Attention hybrid-gqa-slidingunknown
Position enc. ropeunknown
Pretraining tokens 4.0T
Post-training rlhf, constitutional
Training hardware H100
$/M input $15.00
$/M output $75.00
Output tok/sec 0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 69.6 2026-05-21
GPQA-Diamond 48.9 2026-05-21

Code

HumanEval 46.3 2024-02-29
LiveCodeBench 27.9 2026-05-21

Math

MATH 64.1 2026-05-21
AIME 2024 3.3 2026-05-21

Context · A

BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.

Context · B

The flagship of the Claude 3 family at launch, marketed as Anthropic's most capable model with vision input and a 200K context window. Set the price ceiling for closed frontier chat at $15 input / $75 output per million tokens, a tier matched by GPT-4 Turbo at the time.

StarCoder 2 15B detail → · Claude 3 Opus detail →