Inception Labs (Batch 4)

Inception Labs builds diffusion-based large language models, an approach distinct from the autoregressive transformer-decoder architecture that dominates the frontier. The company was founded by Stanford professor Stefano Ermon, whose academic research focuses on diffusion models, and was selected for AI Grant Batch 4 (announced October 2024) receiving a $250,000 SAFE plus Microsoft Azure and partner credits.

The technical approach replaces token-by-token left-to-right generation with a diffusion process: the model starts from noise plus a guess and iteratively denoises toward the answer, with the underlying network still parameterized as a transformer but trained to predict multiple tokens in parallel rather than one token at a time. The Mercury paper (arXiv:2506.17298) reports Mercury Coder Mini and Mercury Coder Small achieving 1109 tokens/sec and 737 tokens/sec respectively on a single NVIDIA H100, against speed-optimized autoregressive frontier baselines, while remaining within a comparable quality band on coding benchmarks. Mercury 2, announced in February 2026, extends the line with reasoning-class diffusion models.

In November 2025 Inception closed a $50 million seed round led by Menlo Ventures, with participation from Mayfield, Innovation Endeavors, Microsoft's M12, Snowflake Ventures, Databricks Investment, and Nvidia's NVentures, plus angel checks from Andrew Ng and Andrej Karpathy. The capital is directed toward scaling the diffusion-LLM training stack and shipping Mercury into production deployments where latency dominates cost (real-time code completion, interactive agents).

Within the open-source AI stack Inception sits at training and weights. The diffusion-LLM track is a small but active branch of frontier LLM architecture research (alongside work at Google DeepMind on text diffusion); Mercury is the most production-deployed example in the commercial space. The architectural distinction matters at the stack level because diffusion-LLM inference patterns map to a different runtime engine and scheduling profile than autoregressive decoders, which has downstream consequences for the inference-stack layer.

Recipient

Inception Labs

Funder

AI Grant (Friedman / Gross) · corporate · Global

Distributed AI research lab and accelerator backing early-stage AI startups and open-source projects with cash, compute, and Microsoft Azure credits.

Primary source

https://aigrant.org/

Recipient

Funder

Primary source

Additional sources

More from AI Grant (Friedman / Gross)