Models · Compare
StarCoder 2 15B vs Claude 3 Opus
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: StarCoder 2 15B | B: Claude 3 Opus |
|---|---|---|
| Released | 2024-02-28 | 2024-03-04 |
| Developer | BigCode | Anthropic |
| Openness | Source-available | Proprietary |
| License | BigCode OpenRAIL-M v1 | Proprietary |
| OSI-approved | no | no |
| Data released | yes | no |
| Training code | yes | no |
| Architecture | dense | unknown |
| Total params | 15B | — |
| Active params | — | — |
| Experts | — | — |
| Context window | 16K | 200K |
| Attention | hybrid-gqa-sliding | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | 4.0T | — |
| Post-training | — | rlhf, constitutional |
| Training hardware | H100 | — |
| $/M input | — | $15.00 |
| $/M output | — | $75.00 |
| Output tok/sec | — | 0 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | — | 69.6 2026-05-21 |
| GPQA-Diamond | — | 48.9 2026-05-21 |
Code
| HumanEval | 46.3 2024-02-29 | — |
| LiveCodeBench | — | 27.9 2026-05-21 |
Math
| MATH | — | 64.1 2026-05-21 |
| AIME 2024 | — | 3.3 2026-05-21 |
Context · A
BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.
Context · B
The flagship of the Claude 3 family at launch, marketed as Anthropic's most capable model with vision input and a 200K context window. Set the price ceiling for closed frontier chat at $15 input / $75 output per million tokens, a tier matched by GPT-4 Turbo at the time.