Models · Compare

Mistral Large 2 vs Claude 3.5 Sonnet

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Mistral Large 2	B: Claude 3.5 Sonnet
Released	2024-07-24	2024-06-20
Developer	Mistral AI	Anthropic
Openness	Source-available	Proprietary
License	Mistral Research License	Proprietary
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	dense	unknown
Total params	123B	—
Active params	—	—
Experts	—	—
Context window	131K	200K
Attention	gqa	unknown
Position enc.	rope	unknown
Pretraining tokens	—	—
Post-training	sft, dpo	rlhf, constitutional
Training hardware	—	—
$/M input	$2.00	—
$/M output	$6.00	—
Output tok/sec	31.7	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU	84.0 2024-07-24	—
MMLU-Pro	69.7 2026-05-21	—
GPQA-Diamond	48.6 2026-05-21	—

Code

HumanEval	92.0 2024-07-24	—
LiveCodeBench	29.3 2026-05-21	—

Math

MATH	71.5 2024-07-24	—
AIME 2024	11.0 2026-05-21	—
AIME 2025	14.0 2026-05-21	—

Context · A

Mistral's frontier dense model from July 2024, sized for single-node inference at 123B parameters with a 128K context. Weights are downloadable under the Mistral Research License for non-commercial use, with a separate paid Mistral Commercial License required for production deployment. Trained with explicit emphasis on reducing hallucinations and supporting parallel and sequential function calling across dozens of natural and coding languages.

Context · B

The first Claude release to beat its own larger sibling (Claude 3 Opus) on most benchmarks. Established Artifacts, driving a wave of code-and-canvas product copies.

Mistral Large 2 detail → · Claude 3.5 Sonnet detail →