Models · Compare
Claude 3 Haiku vs StarCoder 2 15B
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Claude 3 Haiku | B: StarCoder 2 15B |
|---|---|---|
| Released | 2024-03-13 | 2024-02-28 |
| Developer | Anthropic | BigCode |
| Openness | Proprietary | Source-available |
| License | Proprietary | BigCode OpenRAIL-M v1 |
| OSI-approved | no | no |
| Data released | no | yes |
| Training code | no | yes |
| Architecture | unknown | dense |
| Total params | — | 15B |
| Active params | — | — |
| Experts | — | — |
| Context window | 200K | 16K |
| Attention | unknown | hybrid-gqa-sliding |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 4.0T |
| Post-training | rlhf, constitutional | — |
| Training hardware | — | H100 |
| $/M input | $0.25 | — |
| $/M output | $1.25 | — |
| Output tok/sec | 0 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| GPQA-Diamond | 37.4 2026-05-21 | — |
Code
| HumanEval | — | 46.3 2024-02-29 |
| LiveCodeBench | 15.4 2026-05-21 | — |
Math
| MATH | 39.4 2026-05-21 | — |
| AIME 2024 | 1.0 2026-05-21 | — |
Context · A
The fast, low-cost tier of the Claude 3 family. Anthropic claimed throughput of roughly 21K tokens per second on prompts under 32K tokens at launch, and the $0.25 input / $1.25 output price point made it competitive with GPT-3.5-class small models while carrying the 200K context and vision capabilities of its larger siblings.
Context · B
BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.