Proofread: A 5-Phase Agent Pipeline for LaTeX
I built proofread, an automated proofreading pipeline for LaTeX papers and textbooks.
It is not just a single prompt. It is a multi-phase agent workflow that splits documents into sections, proofreads each section, verifies report quality, and then produces an aggregate summary.
What It Is
proofread is a Claude Code based proofreading system for LaTeX source.
The pipeline is designed for long technical writing where errors are distributed across many sections: typos, grammar issues, LaTeX issues, notation inconsistencies, and mathematical mistakes.
What It Does
The system runs in 5 phases:
- Split (Phase 0): split each
ch<N>.txtchapter into section-level files using\sectionboundaries. - Context Builder (Phase 1): build a chapter context index (definitions, notation, conventions).
- Proofreader (Phase 2): review each section for language, LaTeX, and math issues.
- Verification (Phase 3): audit the proofreading report for false positives, missed issues, and format quality.
- Verdict + Summary (Phases 4-5): either accept the section report (
DONE) or retry (CONTINUE, up to 3 iterations), then generate a global summary report.
This verification loop is the key design choice: the system does not trust the first draft of its own report.
Inputs and Outputs
Input format is simple but strict:
- You provide files named
ch1.txt,ch2.txt, … - Each chapter file must contain
\section{...}or\section*{...}markers. - Content before the first
\sectionis ignored.
Main outputs:
- Per-section reports in
result/proofread/ch<N>/... - Verification files in
result/verification/ch<N>/... - Final aggregate summary in
result/summary/proofread_summary.md
Why I Built It
I built proofread because I believe current frontier models are already strong enough to save a large chunk of proofreading labor.
The goal of this pipeline is to automate as much of that repetitive work as possible while keeping final decisions in human hands.
Scope and Limits
This tool is a proofreading and review assistant, not a replacement for author judgment.
I still treat theorem validity, final mathematical correctness, and editorial decisions as human-owned.
Also, because the pipeline uses Claude Code with --dangerously-skip-permissions, it should be run carefully (ideally in a sandboxed environment), and API cost should be monitored for large books.
Repository
If you are writing in LaTeX and want agentic quality control, this project is a practical starting point.