The Open-Source AI Stack
RSS

Grants · Project grant · Global

Frontier AI benchmark pilot

Regrant from Leopold Aschenbrenner supporting Epoch AI's pilot of a new frontier AI benchmark, building on their earlier compute-trends and capability-tracking work.

Epoch AI is a research group that tracks frontier-AI compute, capability, and economic trends, and develops evaluation benchmarks. Its public datasets include the Notable AI Models database (training compute, dataset size, parameter counts across hundreds of models) and the AI accelerator database. Its benchmark line, hosted at epoch.ai/benchmarks, includes FrontierMath alongside derived metrics.

FrontierMath is the centerpiece: a set of approximately 350 original mathematics problems written by working research mathematicians, spanning computational number theory through abstract algebraic geometry. Problems are graded into tiers, with Tier 4 comprising 50 research-level problems including 2 public problems and a 20-question private holdout, designed at a symposium of leading mathematicians. The evaluation protocol requires the model to submit a Python `answer()` function returning the solution (typically an integer or sympy object), checked programmatically; the benchmark was developed with OpenAI funding and OpenAI retains exclusive access to part of the holdout set. Per the FrontierMath paper (arXiv:2411.04872), problems typically require hours to days for an expert mathematician to solve.

The Manifund regrant of $200,000 in September 2024 from regrantor Leopold Aschenbrenner supported this frontier benchmark pilot. The grant predated FrontierMath's November 2024 public launch and the subsequent extension into Tier 4. Epoch AI's 2025 impact report also describes the Epoch Capabilities Index, which combines scores across many benchmarks into a single capability metric to address the rapid saturation of individual evaluations.

Within the open-source AI stack, FrontierMath and the Epoch Capabilities Index sit at the evaluation layer. They function as a public yardstick for measuring frontier reasoning capability beyond the saturated MATH and GSM8K benchmarks, with Epoch's compute-trends work providing the denominator (training compute) against which capability progress is plotted.

Recipient

Epoch AI

Funder

Manifund · foundation · US

Operates an AI safety regranting program that gives expert regrantors $100K+ budgets to make fast, low-friction grants to early-stage technical and policy projects.

Primary source

https://manifund.substack.com/p/manifund-2025-regrants

Additional sources

More from Manifund