Models · Compare
Qwen3-VL 235B-A22B Instruct vs Qwen3 Max
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Qwen3-VL 235B-A22B Instruct | B: Qwen3 Max |
|---|---|---|
| Released | 2025-09-23 | — |
| Developer | Alibaba | Alibaba |
| Openness | Open | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | moe |
| Total params | 235B | — |
| Active params | 22B | — |
| Experts | — | — |
| Context window | 262K | — |
| Attention | mrope-interleaved | |
| Position enc. | rope-interleaved | |
| Pretraining tokens | — | — |
| Post-training | sft, rlhf | sft, rlhf |
| Training hardware | — | — |
| $/M input | $0.30 | $1.66 |
| $/M output | $1.90 | $7.22 |
| Output tok/sec | 50.9 | 32.4 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 82.3 2026-05-21 | 84.1 2026-05-21 |
| GPQA-Diamond | 71.2 2026-05-21 | 76.4 2026-05-21 |
Code
| LiveCodeBench | 59.4 2026-05-21 | — |
Math
| AIME 2025 | 70.7 2026-05-21 | — |
Context · A
Vision-language flagship of the Qwen3 line, 235B total weights (about 471 GB). Adds Interleaved-MRoPE for video reasoning, DeepStack multi-level ViT feature fusion, and Text-Timestamp Alignment for grounded event localization. Shipped with both Instruct and Thinking variants under Apache 2.0; native context to 256K, extensible to 1M.
Context · B
Trillion-parameter MoE, API-only via Qwen Chat and Alibaba Cloud at release. Pretrained on roughly 36T tokens. Alibaba's first proprietary Qwen flagship at this scale, breaking the open-weights pattern. Supports more than 100 languages.