The Open-Source AI Stack
RSS
All models

Models · Compare

Kimi K2 Thinking vs GPT-5.1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Kimi K2 Thinking B: GPT-5.1
Released 2025-11-062025-11-12
Developer Moonshot AIOpenAI
Openness OpenProprietary
License Modified MITProprietary
OSI-approved nono
Data released nono
Training code nono
Architecture moeunknown
Total params 1T
Active params 32B
Experts 384 (8 active)
Context window 256K400K
Attention mlaunknown
Position enc. ropeunknown
Pretraining tokens
Post-training sft, rlhfrlhf
Training hardware
$/M input $0.60$1.25
$/M output $2.50$10.00
Output tok/sec 102.4114.7

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 84.6 2025-11-06 87.0 2026-05-21
GPQA-Diamond 84.5 2025-11-06 87.3 2026-05-21

Code

SWE-Bench Verified 71.3 2025-11-06
LiveCodeBench 85.3 2026-05-21 86.8 2026-05-21

Math

AIME 2025 99.1 2025-11-06 94.0 2026-05-21

Context · A

Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.

Context · B

Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.

Kimi K2 Thinking detail → · GPT-5.1 detail →