Command A+

Cost

$0.00 / Mtok input

$0.00 / Mtok output

· as of 2026-05-21

source ↗

Speed

212.2 tok/sec output

157 ms TTFT

· as of 2026-05-21

source ↗

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: moe
Total params: 218B
Active params: 25B
Context window: 128K tokens
Attention: unknown
Position encoding: unknown
Post-training: sft, rlhf
OSI-approved: yes
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

GPQA-Diamond

76.1

as of 2026-05-21

source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama

FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

· First Apache 2.0 Cohere model
· W4A4 4-bit native distribution
· Native citations via lossless quantization claim
· Sovereign-deployment focus (air-gapped, on-prem)

Lineage

First Apache 2.0 Cohere model; targets sovereign / air-gapped enterprise.

Derived from

Command A 2025-03-13

Architecture

Specs

Benchmarks

General reasoning

Available quantizations

Notable innovations

Lineage

Sources