The Open-Source AI Stack
RSS
All models

Models · kimi-k2

Kimi K2 Thinking

Open Moonshot AI · 2025-11-06 · Modified MIT

Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.

Cost

$0.60 / Mtok input
$2.50 / Mtok output

· as of 2026-05-21

source ↗

Speed

102.4 tok/sec output
992 ms TTFT

· as of 2026-05-21

source ↗

Architecture

tokens in Embedding vocab 160,000 · kimi tokenizer × 61 layers Multi-head Latent Attention RoPE context 256,000 tokens MoE Router 384 experts total · 8 active per token shown: 32 of 384 Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
moe
Total params
1T
Active params
32B
Experts
384 total · 8 active
Context window
256K tokens
Attention
mla
Position encoding
rope
Post-training
sft, rlhf
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro 84.6 as of 2025-11-06 source ↗
GPQA-Diamond 84.5 as of 2025-11-06 source ↗

Code

SWE-Bench Verified 71.3 as of 2025-11-06 source ↗
LiveCodeBench 85.3 as of 2026-05-21 source ↗

Math

AIME 2025 99.1 as of 2025-11-06 source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM
bitsandbytes On-the-fly NF4 / INT8 weight quantization inside Transformers. runs on Transformers

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Native interleaved thinking and tool use
  • · Native INT4 quantization-aware training
  • · 200-300 tool-call agentic horizon

Lineage

First Moonshot reasoning model in the Kimi K2 family; native thinking interleaved with tools.

Sources