Models · Compare

GPT-4o vs DeepSeek-V2 Chat

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: GPT-4o	B: DeepSeek-V2 Chat
Released	2024-05-13	2024-05-07
Developer	OpenAI	DeepSeek
Openness	Proprietary	Open
License	Proprietary	DeepSeek License
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	unknown	moe
Total params	—	236B
Active params	—	21B
Experts	—	—
Context window	—	128K
Attention	unknown	mla
Position enc.	unknown	rope-yarn
Pretraining tokens	—	8.1T
Post-training	rlhf	sft, rlhf
Training hardware	—	—
$/M input	$2.50	—
$/M output	$10.00	—
Output tok/sec	131.6	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	74.8 2026-05-21	—
GPQA-Diamond	54.3 2026-05-21	—

Code

LiveCodeBench

30.9 2026-05-21

—

Math

MATH	75.9 2026-05-21	—
AIME 2024	15.0 2026-05-21	—
AIME 2025	6.0 2026-05-21	—

Context · A

Native-multimodal model with audio + vision + text in a single pretrained backbone. Pushed real-time voice latency to under 400ms; the multimodal benchmark anchor through 2024.

Context · B

The Multi-head Latent Attention (MLA) debut paper. Cut KV-cache memory by ~93% versus dense attention, making 128K-context MoE inference economically viable.

GPT-4o detail → · DeepSeek-V2 Chat detail →