Models · Compare

Qwen 3 235B A22B Instruct vs Mistral Medium 3

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Qwen 3 235B A22B Instruct	B: Mistral Medium 3
Released	2025-04-28	2025-05-07
Developer	Alibaba	Mistral AI
Openness	Open	Proprietary
License	Apache-2.0	Proprietary
OSI-approved	yes	no
Data released	no	no
Training code	no	no
Architecture	moe	unknown
Total params	235B	—
Active params	22B	—
Experts	128 (8 active)	—
Context window	131K	128K
Attention	gqa	unknown
Position enc.	rope	unknown
Pretraining tokens	36.0T	—
Post-training	sft, dpo, grpo	sft, rlhf
Training hardware	—	—
$/M input	$0.45	$0.40
$/M output	$1.80	$2.00
Output tok/sec	66.6	29

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	83.0 2025-04-28	76.0 2026-05-21
GPQA-Diamond	—	57.8 2026-05-21

Code

LiveCodeBench

34.3 2026-05-21

40.0 2026-05-21

Math

MATH	90.2 2026-05-21	90.7 2026-05-21
AIME 2024	85.7 2025-04-28	44.0 2026-05-21
AIME 2025	23.7 2026-05-21	30.3 2026-05-21

Context · A

Qwen's first major MoE release, with a hybrid thinking-vs- non-thinking inference mode controllable per request. Apache 2.0 across the size ladder reset the openness baseline among Chinese labs.

Context · B

Mid-tier flagship released May 7 2025 at $0.40 / $2.00 per Mtok with a 128K context window. Mistral positioned it as roughly 90 percent of Claude Sonnet 3.7 performance at a fraction of the cost, with deployment supported on self-hosted setups starting at four GPUs.

Qwen 3 235B A22B Instruct detail → · Mistral Medium 3 detail →