GPT-4o · Models · The Open-Source AI Stack

Cost

$2.50 / Mtok input

$10.00 / Mtok output

OpenAI API · as of 2026-05-21

via Artificial Analysis ↗

Speed

131.6 tok/sec output

583 ms TTFT

OpenAI API · as of 2026-05-21

via Artificial Analysis ↗

Why people cared

GPT-4o was the May 2024 release that established native multimodal as the closed-frontier standard. "o" stood for "omni": a single pretrained backbone handling text, audio, and vision tokens, rather than separate models stitched together with adapters. The real-time voice mode pushed conversational latency below 400ms, a threshold past which interactions feel synchronous rather than turn-based, and the demo that accompanied the launch became the reference point for what AI voice products were expected to feel like. Pricing dropped to $2.50 per million input tokens and $10 per million output, a roughly 12x reduction from the original GPT-4. The open-weights world spent the second half of 2024 chasing this combination: Llama 3.1, Qwen 2.5, and DeepSeek V3 all positioned themselves as cheaper or comparable on text, but native multimodal at GPT-4o's latency remained out of reach until well into 2025. The model also became the default backend for ChatGPT free users, which made its capabilities and quirks visible to a global user base in a way no prior frontier model had been.

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: unknown
Total params: not disclosed
Active params: not disclosed
Context window: not verified
Attention: unknown
Position encoding: unknown
Post-training: rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro	74.8	as of 2026-05-21	source ↗
GPQA-Diamond	54.3	as of 2026-05-21	source ↗

Code

LiveCodeBench

30.9

as of 2026-05-21

source ↗

Math

MATH	75.9	as of 2026-05-21	source ↗
AIME 2024	15.0	as of 2026-05-21	source ↗
AIME 2025	6.0	as of 2026-05-21	source ↗

Recommended use cases

multimodal chat
voice interfaces
general-purpose API workloads

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

· Native audio+vision+text training
· Sub-400ms voice latency

Known limitations

· Architecture and training data not disclosed; benchmarks reflect what OpenAI chose to publish. source ↗

Lineage

Successor to GPT-4 in OpenAI's product line; trained as a new native-multimodal pretrain not as a fine-tune of GPT-4.

Sources

GPT-4o announcement (OpenAI, May 13 2024) ↗