Models · Compare

Qwen3 Max vs Qwen3-VL 235B-A22B Instruct

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Qwen3 Max	B: Qwen3-VL 235B-A22B Instruct
Released	—	2025-09-23
Developer	Alibaba	Alibaba
Openness	Proprietary	Open
License	Proprietary	Apache-2.0
OSI-approved	no	yes
Data released	no	no
Training code	no	no
Architecture	moe	moe
Total params	—	235B
Active params	—	22B
Experts	—	—
Context window	—	262K
Attention		mrope-interleaved
Position enc.		rope-interleaved
Pretraining tokens	—	—
Post-training	sft, rlhf	sft, rlhf
Training hardware	—	—
$/M input	$1.66	$0.30
$/M output	$7.22	$1.90
Output tok/sec	32.4	50.9

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	84.1 2026-05-21	82.3 2026-05-21
GPQA-Diamond	76.4 2026-05-21	71.2 2026-05-21

Code

LiveCodeBench

—

59.4 2026-05-21

Math

AIME 2025

—

70.7 2026-05-21

Context · A

Trillion-parameter MoE, API-only via Qwen Chat and Alibaba Cloud at release. Pretrained on roughly 36T tokens. Alibaba's first proprietary Qwen flagship at this scale, breaking the open-weights pattern. Supports more than 100 languages.

Context · B

Vision-language flagship of the Qwen3 line, 235B total weights (about 471 GB). Adds Interleaved-MRoPE for video reasoning, DeepStack multi-level ViT feature fusion, and Text-Timestamp Alignment for grounded event localization. Shipped with both Instruct and Thinking variants under Apache 2.0; native context to 256K, extensible to 1M.

Qwen3 Max detail → · Qwen3-VL 235B-A22B Instruct detail →