Subquadratic

The first model built for long-context tasks

SubQ is a sub-quadratic LLM built for 12M-token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.

Request early access →
12M token reasoning
150 tokens per second
1/5 of other leading LLMs

All your context. Always available.

Reason across 12M tokens in one prompt: entire repos, months of PRs, and long-running agent state, with room to spare at one-fifth the cost.

0
12M

Python source code

The entire 3.13 standard library

~5.1M

Six months of React PRs

~1,050 pull requests against the React codebase

~7.5M

~ Approximate token counts.

Not just another model. An architectural breakthrough.

SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.

SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.

Transformer

O(n²)

Quadratic attention — every token attends to every other token, wasting compute on irrelevant connections.

SubQ

O(n)

Linear sparse attention — only the relationships that matter are computed, making long context practical.

Technical report (coming soon)

A leader in long-context retrieval and coding tasks.

Benchmarks Gemini 3.1 Pro Opus 4.6 Opus 4.7 GPT-5.4 GPT-5.5 SubQ 1M-Preview
SWE-Bench Verified
Real-world software engineering ability
80.6% 80.8% 87.6% n/r n/r 81.8%
RULER @ 128K
Long-context accuracy across 13 tests
n/r 94.8%* n/r n/r n/r 95.6%
MRCR v2 (8-needle, 1M)
Multi-round coreference resolution in long contexts
26.3% 78.3% 32.2% 36.6% 74.0% 86.2%

n/r = result was not reported by the model provider · * = internally evaluated

SubQ results are third-party validated

Two ways to use SubQ.

API

For developers and teams

The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

  • 12M token context window
  • Streaming + tool use
  • OpenAI-compatible endpoints
Request API access →

Code

For coding agents

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.

  • ~25% lower bill, 10× faster exploration
  • Auto-redirects expensive model turns
  • One-line install
Request SubQ Code access →

From the lab.

Partnerships

May 14, 2026

We're Partnering with LayerLens to Evaluate SubQ

Read more →

Product

May 5, 2026

Introducing SubQ: The First Fully Subquadratic LLM

Read more →

Technical

Updated May 15, 2026

How SSA Makes Long Context Practical

Read more →

We built the architecture the industry said wasn't possible.

Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.

Built by researchers from
Meta Google Oxford Cambridge BYU

Is your business ready? Build with us.

Join the private preview.

Request early access →