The Open-Source AI Stack
RSS
All models

Models · starcoder

StarCoder 2 15B

Source-available BigCode · 2024-02-28 · BigCode OpenRAIL-M v1

BigCode's StarCoder 2 15B trained on 4T+ tokens of The Stack v2, a publicly released code dataset spanning 600+ languages and permissive licenses only. Sliding-window attention plus grouped-query attention gave it 16K context at the 15B scale. The accompanying data, training code, and search index for attribution were all released alongside the weights.

Architecture

tokens in Embedding vocab not disclosed · starcoder2 tokenizer × N layers GQA + Sliding-window (interleaved) RoPE context 16,384 tokens Dense MLP SwiGLU activation (standard) 15B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
15B
Active params
15B
Context window
16K tokens
Attention
hybrid-gqa-sliding
Position encoding
rope
Pretraining tokens
4.0T
Training hardware
H100
OSI-approved
no
Data released
yes
Training code
released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

Code

HumanEval 46.3 as of 2024-02-29 source ↗

Recommended use cases

  • code completion and FIM
  • research baseline with full data release
  • fine-tune starting point for code chat

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
bitsandbytes On-the-fly NF4 / INT8 weight quantization inside Transformers. runs on Transformers

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Full data release via The Stack v2
  • · Pretraining attribution search index
  • · Fill-in-the-middle objective at 4T tokens

Known limitations

  • · Base model is not instruction-tuned; designed for code completion rather than chat. source ↗
  • · BigCode OpenRAIL-M v1 carries use restrictions and is not OSI-approved. source ↗

Lineage

Successor to StarCoder and StarCoder Plus. Siblings at 3B and 7B in the StarCoder 2 family. The Stack v2 dataset supersedes The Stack v1.

Sources