The Open-Source AI Stack
RSS
All models

Models · gpt-4

GPT-4o

Proprietary OpenAI · 2024-05-13 · Proprietary

Native-multimodal model with audio + vision + text in a single pretrained backbone. Pushed real-time voice latency to under 400ms; the multimodal benchmark anchor through 2024.

Cost

$2.50 / Mtok input
$10.00 / Mtok output

OpenAI API · as of 2026-05-21

via Artificial Analysis ↗

Speed

131.6 tok/sec output
583 ms TTFT

OpenAI API · as of 2026-05-21

via Artificial Analysis ↗

Why people cared

GPT-4o was the May 2024 release that established native multimodal as the closed-frontier standard. "o" stood for "omni": a single pretrained backbone handling text, audio, and vision tokens, rather than separate models stitched together with adapters. The real-time voice mode pushed conversational latency below 400ms, a threshold past which interactions feel synchronous rather than turn-based, and the demo that accompanied the launch became the reference point for what AI voice products were expected to feel like. Pricing dropped to $2.50 per million input tokens and $10 per million output, a roughly 12x reduction from the original GPT-4. The open-weights world spent the second half of 2024 chasing this combination: Llama 3.1, Qwen 2.5, and DeepSeek V3 all positioned themselves as cheaper or comparable on text, but native multimodal at GPT-4o's latency remained out of reach until well into 2025. The model also became the default backend for ChatGPT free users, which made its capabilities and quirks visible to a global user base in a way no prior frontier model had been.

Architecture

tokens in Embedding vocab not disclosed × N layers Architecture not disclosed (proprietary or undocumented) Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
unknown
Total params
not disclosed
Active params
not disclosed
Context window
not verified
Attention
unknown
Position encoding
unknown
Post-training
rlhf
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro 74.8 as of 2026-05-21 source ↗
GPQA-Diamond 54.3 as of 2026-05-21 source ↗

Code

LiveCodeBench 30.9 as of 2026-05-21 source ↗

Math

MATH 75.9 as of 2026-05-21 source ↗
AIME 2024 15.0 as of 2026-05-21 source ↗
AIME 2025 6.0 as of 2026-05-21 source ↗

Recommended use cases

  • multimodal chat
  • voice interfaces
  • general-purpose API workloads

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

  • · Native audio+vision+text training
  • · Sub-400ms voice latency

Known limitations

  • · Architecture and training data not disclosed; benchmarks reflect what OpenAI chose to publish. source ↗

Lineage

Successor to GPT-4 in OpenAI's product line; trained as a new native-multimodal pretrain not as a fine-tune of GPT-4.

Sources