The Open-Source AI Stack
RSS
All models

Models · mistral

Mistral Large 2

Source-available Mistral AI · 2024-07-24 · Mistral Research License

Mistral's frontier dense model from July 2024, sized for single-node inference at 123B parameters with a 128K context. Weights are downloadable under the Mistral Research License for non-commercial use, with a separate paid Mistral Commercial License required for production deployment. Trained with explicit emphasis on reducing hallucinations and supporting parallel and sequential function calling across dozens of natural and coding languages.

Cost

$2.00 / Mtok input
$6.00 / Mtok output

Mistral La Plateforme · as of 2026-05-21

via Artificial Analysis ↗

Speed

31.7 tok/sec output
548 ms TTFT

· as of 2026-05-21

source ↗

Architecture

tokens in Embedding vocab not disclosed · mistral tokenizer × N layers Grouped-Query Attention RoPE context 131,072 tokens Dense MLP SwiGLU activation (standard) 123B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
123B
Active params
123B
Context window
131K tokens
Attention
gqa
Position encoding
rope
Post-training
sft, dpo
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 84.0 as of 2024-07-24 source ↗
MMLU-Pro 69.7 as of 2026-05-21 source ↗
GPQA-Diamond 48.6 as of 2026-05-21 source ↗

Code

HumanEval 92.0 as of 2024-07-24 source ↗
LiveCodeBench 29.3 as of 2026-05-21 source ↗

Math

MATH 71.5 as of 2024-07-24 source ↗
AIME 2024 11.0 as of 2026-05-21 source ↗
AIME 2025 14.0 as of 2026-05-21 source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
GPTQ Post-training 4-bit weight quantization for GPU serving. runs on vLLM, SGLang, Transformers
EXL2 ExLlamaV2's variable-bitrate format for consumer GPUs. runs on ExLlamaV2

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Single-node-deployable 123B dense at 128K context
  • · Explicit anti-hallucination fine-tuning target
  • · Parallel and sequential function calling

Sources