The Open-Source AI Stack
RSS
All models

Models · Compare

GPT-5.1 vs Kimi K2 Thinking

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: GPT-5.1 B: Kimi K2 Thinking
Released 2025-11-122025-11-06
Developer OpenAIMoonshot AI
Openness ProprietaryOpen
License ProprietaryModified MIT
OSI-approved nono
Data released nono
Training code nono
Architecture unknownmoe
Total params 1T
Active params 32B
Experts 384 (8 active)
Context window 400K256K
Attention unknownmla
Position enc. unknownrope
Pretraining tokens
Post-training rlhfsft, rlhf
Training hardware
$/M input $1.25$0.60
$/M output $10.00$2.50
Output tok/sec 114.7102.4

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 87.0 2026-05-21 84.6 2025-11-06
GPQA-Diamond 87.3 2026-05-21 84.5 2025-11-06

Code

SWE-Bench Verified 71.3 2025-11-06
LiveCodeBench 86.8 2026-05-21 85.3 2026-05-21

Math

AIME 2025 94.0 2026-05-21 99.1 2025-11-06

Context · A

Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.

Context · B

Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.

GPT-5.1 detail → · Kimi K2 Thinking detail →