API
For developers and teamsThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.
- → 12M token context window
- → Streaming + tool use
- → OpenAI-compatible endpoints
SubQ is a sub-quadratic LLM built for multi-million token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.
Use Cases
Reason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a fraction of the cost.
~ Approximate token counts.
Architecture
SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.
SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.
Benchmarks
SubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale.
SubQ balances long-context retrieval without compromising on reasoning and knowledge.
| Benchmark | SubQ 1.1 Small | ||||||
|---|---|---|---|---|---|---|---|
Graduate-level science GPQA Diamond · pass@1 | 85.4 | 93.2 | 92 | 87.5 | 87.5 | 81.7 | 67.2 |
Agentic finance AutomationBench | 13% | 18% | 16% | 8% | 0% | n/r | 3% |
Competitive programming LiveCodeBench v6 · pass@4 | 89.7 | 92 | 92.2 | 88.9 | 78.6 | 78.2 | 69.7 |
SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context.
Products
The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.
The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.
About
Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.
Built by researchers from
Early Access
Join the private preview.