QED: A Multi-Agent System for Mathematical Discovery

I’m excited to open-source QED, a multi-agent pipeline I built with Qihao Ye that takes a mathematical problem statement in LaTeX and produces a rigorous natural-language proof. QED has already solved a research-level open problem in partial differential equations, with the proof verified by domain experts from three institutions.

What is QED?

QED orchestrates multiple AI models (Claude, Codex, Gemini) through their command-line interfaces to conduct iterative proof search with rigorous verification. Rather than relying on single-shot model responses or conventional agent frameworks, it spawns model CLI subprocesses and coordinates them through a structured pipeline.

The system operates through three stages:

Stage 0 — Literature Survey: A survey agent evaluates problem difficulty (Easy/Medium/Hard) and investigates the mathematical landscape, cataloguing relevant theorems and related results without proposing proof strategies.

Stage 1 — Proof Search Loop: An iterative process (up to 9 rounds by default) where:

  • Proof search agents attack the problem simultaneously (when multi-model is enabled)
  • Verification agents independently check each proof across multiple verification phases
  • A selector agent identifies the strongest proof when multiple providers are active
  • A verdict agent determines whether to continue iterating or conclude

Stage 2 — Proof Effort Summary: A summary agent compiles comprehensive documentation of the entire proof development process.

Verification Architecture

What distinguishes QED from other AI math systems is its rigorous verification pipeline. The system implements two-phase verification:

Structural Verification (Phases 1-3):

  • Problem-statement integrity checking (word-by-word comparison)
  • Citation verification with independent URL validation
  • Subgoal decomposition tree validation

Detailed Verification (Phase 4):

  • Fine-grained logical step analysis with computational checks
  • Subgoal resolution confirmation
  • Analysis of tagged original steps
  • Coverage completeness checks

Every external result must be enclosed in <cite> tags containing type, label, title, authors, source URL, verifier locator, exact statement, and usage. Nontrivial original contributions require <key-original-step> wrapping with maximal detail. A proof requires unanimous verifier approval for completion.

What QED Achieved

QED successfully solved a research-level problem involving Carleman estimates for wave equations. The specific challenge: finding a smooth space-time weight function (with auxiliary parameters) for the wave operator on a half-infinite domain satisfying global pseudoconvexity, growth, and positivity constraints.

This is not a textbook exercise. Constructing such weight functions is traditionally a difficult, highly problem-dependent, and time-consuming process that mathematicians perform by hand. The problem required searching through a high-dimensional space of possible weight function forms while satisfying multiple interacting constraints.

Domain experts from University of Missouri-Kansas City, UC San Diego, and Worcester Polytechnic Institute verified the AI-generated solution. Their observation: “The use of LLM-based search aims to replace the traditional hand-crafted process.”

The iterative workflow proved valuable during this collaboration. When the experts strengthened constraints mid-process (requiring additional properties from the weight function), QED resumed from its previous candidate rather than restarting from scratch. This is possible because of QED’s resume support—the system automatically detects prior progress and continues from the exact interruption point.

Technical Features

Multi-Model Parallelization: When enabled, multiple providers conduct simultaneous proof search. Each round generates N proofs × M verifiers = N×M verification reports, with cross-verification between models.

Adaptive Difficulty Handling: Easy problems receive lightweight verification; hard problems undergo comprehensive multi-phase analysis. Difficulty affects survey depth and verification thoroughness.

Human Guidance Integration: Two-level feedback system allows global persistent hints across all rounds plus per-round targeted guidance. This enables mathematicians to steer the search without taking over the proof process.

Resume Support: Interrupted runs automatically detect prior progress, skip completed surveys, identify incomplete steps, and resume precisely where stopped.

Token Tracking: Comprehensive usage monitoring across all providers with per-call breakdown and machine-readable JSON output.

Why This Matters

In my earlier post, I analyzed how Gemini contributed nontrivial mathematical insight to an algebraic geometry paper. I predicted that in 2026, AI systems would be able to independently complete highly nontrivial pure math work.

QED represents a step toward that prediction. The system doesn’t just assist mathematicians—it autonomously searches for solutions, verifies correctness, and iterates until it finds a valid proof. The PDE result demonstrates this capability on a real research problem.

More importantly, QED is designed to be a tool mathematicians can actually use. The human guidance integration, resume support, and structured verification make it practical for real mathematical workflows rather than just a research demonstration.

QED is currently working on other open mathematical research questions. More to be announced.

Check out the code: github.com/proofQED/QED