The Open-Source AI Stack
RSS

Grants · Project grant · UK

AISI Inspect framework

Open-sourced evaluation harness from UK AISI; available for community use.

Inspect is a Python framework for large language model evaluations developed by the UK AI Security Institute and Meridian Labs, open-sourced in April 2024 under MIT license. The codebase (UKGovernmentBEIS/inspect_ai) is roughly 99.8% Python with a TypeScript and React web UI for viewing eval results. Built-in facilities cover prompt engineering, tool use, multi-turn dialog, model-graded evaluation, and third-party Python plugin extension.

The framework abstracts over model providers, so a single eval can run against Anthropic, OpenAI, Google, or any HuggingFace-hosted model with a configuration change. Inspect Evals, the companion repository, ships more than 200 prebuilt evaluations including agent benchmarks like GAIA, SWE-Bench, GDM CTF, and Cybench. The package supports both pip and uv for install, and the contributor stack runs on Quarto for documentation and standard Python tooling (Ruff, MyPy, pytest).

Adoption has been wide for an evaluation harness with under two years of public history. AISI uses Inspect for nearly all of its automated frontier-model evaluations. External adopters include Anthropic, DeepMind, xAI (Grok), and other safety institutes plus organizations like Apollo Research and METR. Over 50 external contributors have committed to the framework, and the prebuilt eval catalog continues to receive community submissions, the most recent being Airside Labs' Pre-Flight Benchmark added to the community evaluations package.

The structural argument for an open evaluation framework maintained by a government lab is the absence of a commercial incentive to favor any model family. Most existing eval frameworks (HELM, LM Eval Harness, OpenAI Evals) are either academic, vendor-affiliated, or community-maintained without institutional backing. AISI's funding model and mandate to evaluate frontier systems for the UK government give Inspect a different sustainability profile, and the explicit publication of the harness rather than only the eval results lets external teams reproduce AISI's methodology rather than trust its conclusions.

Recipient

Open release

Funder

UK AI Security Institute (Alignment Project) · government · UK

Frontier AI alignment research at the UK national-lab level. Released open Inspect evaluation framework.

Primary source

https://www.aisi.gov.uk/work

Additional sources

More from UK AI Security Institute (Alignment Project)