Subquadratic — Efficiency is Intelligence

All your context. Always available.

Reason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a fraction of the cost.

012M

Python source code

The entire 3.13 standard library

~5.1M

Six months of React PRs

~1,050 pull requests against the React codebase

~7.5M

~ Approximate token counts.

Not just another model.An architectural breakthrough.

SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.

SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.

A leader in long-context retrieval and reasoning tasks

Long context retrieval

SubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale.

Multi-task retrievalRULER (128K)

99.12%

128K

Single-fact retrievalNeedle-in-a-haystack (1M–12M)

100%

98%

12M

Reasoning & knowledge

SubQ balances long-context retrieval without compromising on reasoning and knowledge.

Benchmark	SubQ 1.1 Small	GPT-5.5	Opus 4.8	Sonnet 4.6	GPT-5.4-mini	GPT-5.4-nano	Haiku 4.5
Graduate-level science GPQA Diamond · pass@1	85.4	93.2	92	87.5	87.5	81.7	67.2
Agentic finance AutomationBench	13%	18%	16%	8%	0%	n/r	3%
Competitive programming LiveCodeBench v6 · pass@4	89.7	92	92.2	88.9	78.6	78.2	69.7

n/r = result not reported by the model provider

Unrivaled efficiency

SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context.

Compute comparison: dense O(n²) attention reaches 252 PFLOP per layer at 1M tokens, while SubQ's SSA O(n) attention stays near-flat — up to 64× less compute.

Third-party validated results →Technical report →

Two ways to use SubQ.

API

For developers and teams

The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

→ 12M token context window
→ Streaming + tool use
→ OpenAI-compatible endpoints

Request API access →

Code

For coding agents

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.

→ Auto-redirects expensive model turns
→ One-line install

Request SubQ Code access →

We built the architecture the industry said wasn't possible.

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.

Built by researchers from

Meta
Google
Oxford
Cambridge
BYU

The first modelbuilt for long‑context tasks

All your context. Always available.

Not just another model.An architectural breakthrough.