Epoch AI is a research group that tracks frontier-AI compute, capability, and economic trends, and develops evaluation benchmarks. Its public datasets include the Notable AI Models database (training compute, dataset size, parameter counts across hundreds of models) and the AI accelerator database. Its benchmark line, hosted at epoch.ai/benchmarks, includes FrontierMath alongside derived metrics.
FrontierMath is the centerpiece: a set of approximately 350 original mathematics problems written by working research mathematicians, spanning computational number theory through abstract algebraic geometry. Problems are graded into tiers, with Tier 4 comprising 50 research-level problems including 2 public problems and a 20-question private holdout, designed at a symposium of leading mathematicians. The evaluation protocol requires the model to submit a Python `answer()` function returning the solution (typically an integer or sympy object), checked programmatically; the benchmark was developed with OpenAI funding and OpenAI retains exclusive access to part of the holdout set. Per the FrontierMath paper (arXiv:2411.04872), problems typically require hours to days for an expert mathematician to solve.
The Manifund regrant of $200,000 in September 2024 from regrantor Leopold Aschenbrenner supported this frontier benchmark pilot. The grant predated FrontierMath's November 2024 public launch and the subsequent extension into Tier 4. Epoch AI's 2025 impact report also describes the Epoch Capabilities Index, which combines scores across many benchmarks into a single capability metric to address the rapid saturation of individual evaluations.
Within the open-source AI stack, FrontierMath and the Epoch Capabilities Index sit at the evaluation layer. They function as a public yardstick for measuring frontier reasoning capability beyond the saturated MATH and GSM8K benchmarks, with Epoch's compute-trends work providing the denominator (training compute) against which capability progress is plotted.
Recipient
Epoch AI
Funder
Manifund · foundation · US
Operates an AI safety regranting program that gives expert regrantors $100K+ budgets to make fast, low-friction grants to early-stage technical and policy projects.
Primary source
Additional sources
More from Manifund
- China and AI deep coverage 2024-06-01 · Undisclosed
Support for ChinaTalk's coverage of Chinese AI policy and the DeepSeek lab, ahead of the curve on what became major US policy debates.
- Scoping Developmental Interpretability (Timaeus first funding) 2024-03-01 · Undisclosed
First funding for the Timaeus team's developmental interpretability research program. Manifund regrant accelerated their research by months and seeded what is now an established alignment lab.