Models · Compare
Claude 3 Opus vs StarCoder 2 15B
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Claude 3 Opus | B: StarCoder 2 15B |
|---|---|---|
| Released | 2024-03-04 | 2024-02-28 |
| Developer | Anthropic | BigCode |
| Openness | Proprietary | Source-available |
| License | Proprietary | BigCode OpenRAIL-M v1 |
| OSI-approved | no | no |
| Data released | no | yes |
| Training code | no | yes |
| Architecture | unknown | dense |
| Total params | — | 15B |
| Active params | — | — |
| Experts | — | — |
| Context window | 200K | 16K |
| Attention | unknown | hybrid-gqa-sliding |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 4.0T |
| Post-training | rlhf, constitutional | — |
| Training hardware | — | H100 |
| $/M input | $15.00 | — |
| $/M output | $75.00 | — |
| Output tok/sec | 0 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 69.6 2026-05-21 | — |
| GPQA-Diamond | 48.9 2026-05-21 | — |
Code
| HumanEval | — | 46.3 2024-02-29 |
| LiveCodeBench | 27.9 2026-05-21 | — |
Math
| MATH | 64.1 2026-05-21 | — |
| AIME 2024 | 3.3 2026-05-21 | — |
Context · A
The flagship of the Claude 3 family at launch, marketed as Anthropic's most capable model with vision input and a 200K context window. Set the price ceiling for closed frontier chat at $15 input / $75 output per million tokens, a tier matched by GPT-4 Turbo at the time.
Context · B
BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.