Perplex

Cosmos Institute granted Steven Molotnikov on August 15, 2025 for Perplex, with funding in the $1K to $10K range plus compute credits. The grant came through the inaugural 27-winner AI x Truth-Seeking cohort co-funded by Cosmos Institute and FIRE under a $1M program.

Molotnikov is an AI alignment researcher based in Cambridge with prior work as an interdisciplinary engineer across robotics, autonomous systems, and energy. He studied at UC San Diego and maintains an engineering portfolio at prints.blue. His current research focuses on detecting hidden objectives in language models, an area where peer work includes Anthropic's "Auditing language models for hidden objectives" (arXiv 2503.10965) and the broader scheming and goal-misgeneralization literature.

Perplex surfaces hidden goals in closed AI systems by probing them with open-weight reference models. The method gives independent reviewers a path to audit proprietary deployments without insider access to weights or training data, which is one of the few avenues available to researchers outside the labs that train frontier models. The approach treats the disparity between what closed model APIs reveal and what an open reference model would reveal as a measurable signal.

Within the cohort Perplex sits in the safety-guardrails and evaluation layers, paired thematically with Kunvar Thaman's Reward Hacking Benchmark as another tool aimed at probing closed systems from outside. The lock-in vector being addressed is the audit asymmetry between labs that train models and the third parties who use or regulate them.

Recipient

Steven Molotnikov

Funder

Cosmos Institute · foundation · US

Backs philosopher-builders making prototypes, essays, and projects at the intersection of AI and human flourishing, with emphasis on reason, decentralization, and individual autonomy.

Primary source

https://blog.cosmos-institute.org/p/introducing-the-first-cohort-of-ai

Recipient

Funder

Primary source

Additional sources

More from Cosmos Institute