The Open-Source AI Stack
RSS
All models

Models · kimi-k2

Kimi K2.6

Open Moonshot AI · 2026-04-21 · Modified MIT

April 21 2026 K2 refresh. Headline is Agent Swarm scaling to 300 sub-agents over 4000 coordinated steps, plus native video input and 262K context. Moonshot reports SWE-Bench Pro 58.6, narrowly ahead of GPT-5.4's 57.7 at release per the lab.

Cost

$0.95 / Mtok input
$4.00 / Mtok output

· as of 2026-05-21

source ↗

Speed

63.9 tok/sec output
1229 ms TTFT

· as of 2026-05-21

source ↗

Architecture

tokens in Embedding vocab 160,000 · kimi tokenizer × 61 layers Multi-head Latent Attention RoPE context 262,144 tokens MoE Router 384 experts total · 8 active per token shown: 32 of 384 Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
moe
Total params
1T
Active params
32B
Experts
384 total · 8 active
Context window
262K tokens
Attention
mla
Position encoding
rope
Post-training
sft, rlhf
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

GPQA-Diamond 90.5 as of 2026-04-21 source ↗

Code

SWE-Bench Verified 80.2 as of 2026-04-21 source ↗
LiveCodeBench 89.6 as of 2026-04-21 source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Agent Swarm up to 300 sub-agents / 4000 steps
  • · Native video input
  • · Preserve Thinking Mode across multi-turn

Lineage

Agent Swarm scale-out is the headline; SWE-Bench Pro topped GPT-5.4 per Moonshot.

Sources