Models · Compare
GPT-4o vs DeepSeek-V2 Chat
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: GPT-4o | B: DeepSeek-V2 Chat |
|---|---|---|
| Released | 2024-05-13 | 2024-05-07 |
| Developer | OpenAI | DeepSeek |
| Openness | Proprietary | Open |
| License | Proprietary | DeepSeek License |
| OSI-approved | no | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | unknown | moe |
| Total params | — | 236B |
| Active params | — | 21B |
| Experts | — | — |
| Context window | — | 128K |
| Attention | unknown | mla |
| Position enc. | unknown | rope-yarn |
| Pretraining tokens | — | 8.1T |
| Post-training | rlhf | sft, rlhf |
| Training hardware | — | — |
| $/M input | $2.50 | — |
| $/M output | $10.00 | — |
| Output tok/sec | 131.6 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 74.8 2026-05-21 | — |
| GPQA-Diamond | 54.3 2026-05-21 | — |
Code
| LiveCodeBench | 30.9 2026-05-21 | — |
Math
| MATH | 75.9 2026-05-21 | — |
| AIME 2024 | 15.0 2026-05-21 | — |
| AIME 2025 | 6.0 2026-05-21 | — |
Context · A
Native-multimodal model with audio + vision + text in a single pretrained backbone. Pushed real-time voice latency to under 400ms; the multimodal benchmark anchor through 2024.
Context · B
The Multi-head Latent Attention (MLA) debut paper. Cut KV-cache memory by ~93% versus dense attention, making 128K-context MoE inference economically viable.