Cost
OpenAI API · as of 2026-05-21
Why people cared
OpenAI o1 was the first publicly available frontier reasoning model and the existence proof for spending extra inference-time compute on a private chain of thought before answering. The September 2024 preview release and December 2024 GA established the template: the model produces a hidden reasoning trace (billed at output rates) before the user-visible answer, with benchmark scores on GPQA-Diamond and AIME that materially exceeded GPT-4o on the same architecture-and-data class. The pricing structure was new to the market: at $60 per million output tokens with reasoning traces consuming most of the output budget, a single hard problem could cost dollars rather than fractions of a cent. That created two follow-on stories. First, the open community responded with DeepSeek R1 four months later under MIT license, demonstrating that the reasoning recipe was within reach of organizations not at OpenAI's scale. Second, the reasoning-vs-cost framing made "thinking budget" a first-class deployment knob: subsequent OpenAI releases (o1-mini, o3, o3-mini) and competitor responses (Claude 3.7 extended thinking, Gemini 2.5 thinking) all let the developer dial how much inference compute to spend per request.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- unknown
- Total params
- not disclosed
- Active params
- not disclosed
- Context window
- not verified
- Attention
- unknown
- Position encoding
- unknown
- Post-training
- rlhf
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU-Pro | 84.1 | as of 2026-05-21 | source ↗ |
| GPQA-Diamond | 77.3 | as of 2024-12-05 | source ↗ |
Code
| LiveCodeBench | 67.9 | as of 2026-05-21 | source ↗ |
Recommended use cases
- math reasoning
- code reasoning
- complex multi-step problems
Available quantizations
None. The weights are not distributed, so there are no public quantizations.
Notable innovations
- · Inference-time reasoning compute
- · Chain-of-thought as a training target
Known limitations
- · Reasoning traces are billed as output tokens but not visible to the user; cost-per-problem can be hard to predict. source ↗
Lineage
First public reasoning model from OpenAI.