Llama 3.2 90B Vision Instruct

Source-available Meta · 2024-09-25 · Llama 3.2 Community License

The frontier-tier Llama 3.2 vision-language checkpoint, built by attaching a cross-attention vision adapter to the Llama 3.1 70B text backbone. Meta positioned it as competitive with Claude 3 Haiku and GPT-4o-mini on image reasoning at release.

Cost

— / Mtok input

— / Mtok output

Together AI · as of 2026-05-19

via Artificial Analysis ↗

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: dense
Total params: not disclosed
Active params: 88.8B
Context window: 131K tokens
Attention: gqa
Position encoding: rope-llama3
Training hardware: H100
Post-training: sft, rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

Available quantizations

FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

bitsandbytes On-the-fly NF4 / INT8 weight quantization inside Transformers. runs on Transformers

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations