Variable-binding interpretability infrastructure

Raphael Milliere, at the time of the December 17, 2024 Cosmos award an Assistant Professor in Philosophy of Artificial Intelligence at Macquarie University, works on the cognitive science of large neural networks. He has since moved to the University of Oxford's Institute for Ethics in AI. His research program treats variable binding, the ability to temporarily associate symbols with values while keeping their identity, as the question that decides whether transformer models can perform genuinely symbolic computation.

The technical work, published as "How Do Transformers Learn Variable Binding in Symbolic Programs?" on arXiv (2505.20896), trains transformers on synthetic programs of variable assignments and dereferences, then uses mechanistic interpretability methods, including causal interventions, to reverse-engineer how the model implements the task. The paper identifies three training phases: random constant prediction, a shallow heuristic that prefers early assignments, and finally the emergence of a systematic dereferencing mechanism. The trained model uses the residual stream as an addressable memory space, with specific attention heads routing variable values across token positions and layers.

Variable binding has been a long-standing dividing line in cognitive science between connectionist and classical-symbolic accounts. Showing that transformers learn to implement it without dedicated architecture, and that the implementation is legible to interpretability tools, lands at the evaluation and weights cross-layers of the open stack. The grant falls in the Cosmos fast-grant range of $1K to $10K. The deliverable is open interpretability infrastructure rather than a closed product, consistent with the institute's emphasis on artifacts other researchers can build on.

Recipient

Raphaël Millière

Funder

Cosmos Institute · foundation · US

Backs philosopher-builders making prototypes, essays, and projects at the intersection of AI and human flourishing, with emphasis on reason, decentralization, and individual autonomy.

Primary source

https://blog.cosmos-institute.org/p/announcing-the-second-cohort-of-cosmos

Recipient

Funder

Primary source

Additional sources

More from Cosmos Institute