The Open-Source AI Stack
RSS
All models

Models · Compare

Claude 3 Haiku vs StarCoder 2 15B

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Claude 3 Haiku B: StarCoder 2 15B
Released 2024-03-132024-02-28
Developer AnthropicBigCode
Openness ProprietarySource-available
License ProprietaryBigCode OpenRAIL-M v1
OSI-approved nono
Data released noyes
Training code noyes
Architecture unknowndense
Total params 15B
Active params
Experts
Context window 200K16K
Attention unknownhybrid-gqa-sliding
Position enc. unknownrope
Pretraining tokens 4.0T
Post-training rlhf, constitutional
Training hardware H100
$/M input $0.25
$/M output $1.25
Output tok/sec 0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

GPQA-Diamond 37.4 2026-05-21

Code

HumanEval 46.3 2024-02-29
LiveCodeBench 15.4 2026-05-21

Math

MATH 39.4 2026-05-21
AIME 2024 1.0 2026-05-21

Context · A

The fast, low-cost tier of the Claude 3 family. Anthropic claimed throughput of roughly 21K tokens per second on prompts under 32K tokens at launch, and the $0.25 input / $1.25 output price point made it competitive with GPT-3.5-class small models while carrying the 200K context and vision capabilities of its larger siblings.

Context · B

BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.

Claude 3 Haiku detail → · StarCoder 2 15B detail →