PaperDebugger: Extensible In-Editor Academic Support
- PaperDebugger is a plugin-based, multi-agent system that integrates LLM-driven academic writing support directly into Overleaf.
- The system employs a modular architecture featuring a Chrome extension, Kubernetes orchestration, and an extensible Model Context Protocol for efficient, parallel agent workflows.
- It offers precise diff-based patching, context-aware editing, and robust performance metrics to ensure secure, reliable in-editor operations.
PaperDebugger is a plugin-based, multi-agent academic writing, review, and editing system designed to operate natively within editor environments such as Overleaf. It brings LLM-driven assistance directly into the LaTeX editing workflow, providing context-aware, parallelizable, and extensible support for writing and reviewing academic manuscripts. The system is realized through a Chrome extension, a Kubernetes-orchestrated backend, and an extensible Model Context Protocol (MCP) that allows fine-grained document manipulation, multi-agent scheduling, and secure in-editor operations (Hou et al., 2 Dec 2025).
1. System Architecture
PaperDebugger’s architecture is vertically decomposed across five layers:
1. Presentation Layer (Chrome Extension):
A Chrome-approved extension is injected into the Overleaf DOM, augmenting the UI with inline action buttons and a persistent side panel. The extension streams document state—including selection context and project metadata—via a secure gRPC/SSE WebSocket connection to backend services.
2. Protocol Layer (Model Context Protocol, MCP / XtraMCP):
Agent and tool calls conform to an RPC schema defined by MCP. Each protocol message consists of a method (agent/tool name), structured parameters (typed via Pydantic schemas), and a session version number. Streaming of LLM outputs, tool events, and document patches uses server-sent events (SSE).
3. Backend & Orchestration:
Go-based stateless frontend pods act as authenticated entry points, enforcing schema validation and request routing. A central orchestrator (also Go) dynamically launches short-lived Kubernetes pods to host agents or tools. Pods are sand-boxed at the container level and utilize in-memory or object storage for state management.
4. Agent Layer:
Both prompt-template (single-shot) and workflow-based (multi-step) agents are supported. Workflows can invoke LLM calls, tool endpoints (literature search, document scoring, citation lookup), and schema-based validation heuristics.
5. Infrastructure Layer:
Distributed Redis clusters handle active session metadata. S3-compatible object storage persists immutable Overleaf project snapshots for agent use. Telemetry is collected with Prometheus/Grafana for monitoring latency, error rates, and user events.
This architecture enables low-latency, parallel execution of diverse agent workflows, bidirectional synchronization with document state, and fine-grained patch application within the editor (Hou et al., 2 Dec 2025).
2. Multi-Agent Plugin Framework
PaperDebugger is structured as a fully pluggable, multi-agent execution environment. Agent functionality is declared via YAML manifests, specifying agent name, type (workflow or prompt-template), entrypoints, resource requirements, and timeouts. Agents can be scheduled on-demand or in batch/event-driven modes.
Upon invocation, the orchestrator performs schema validation, launches a dedicated pod with the agent’s container image, streams input parameters, collects output via SSE, and tears down the pod on completion or timeout. Agents can access read-only copies of the project file tree and interface with predefined tool endpoints.
Synchronization between the extension and Overleaf is realized via two mechanisms:
- File-level Snapshots: At session start, the extension retrieves a zipped snapshot of the Overleaf project, which is persisted in object storage for subsequent agent use.
- Diff-based Patching: All agent-proposed edits are returned as JSON patches derived from unified diffs between the pre-edit and proposed buffer, containing absolute line and span information. Patches are applied via Overleaf’s internal model API only if the session version matches, ensuring strong consistency and collision avoidance.
Session progression is version-controlled, and each applied patch increments a monotonically increasing version. This guarantees reproducibility and facilitates auditability through Git commit logs in the background repository (Hou et al., 2 Dec 2025).
3. Document State Management and Security
Security and state isolation are implemented via per-agent container sandboxing, PodSecurityPolicies, and ephemeral, auto-expiring Redis-backed session memory. Cross-cutting security measures include:
- OAuth2-mediated, read-only Overleaf API access: User tokens are encrypted at rest and never disclosed to agent pods.
- Container-level restrictions: File system and network writes are tightly scoped.
- Strict input/output schemas (Pydantic): Agents and tools must produce outputs conformant to JSON schema; the orchestrator rejects or retries (with lower temperature) in case of schema violations.
- No persistent user data: No user-identifiable data is stored long-term; all telemetry is anonymized and coarse-grained.
These policies ensure that agent tasks are stateless, isolated per-request, and cannot affect either host systems or cross-user state (Hou et al., 2 Dec 2025).
4. Tool Integration and Workflow Composition
PaperDebugger’s XtraMCP layer exposes four primary tool endpoints invoked by agent workflows:
- literature_search: Embedding-based kNN retrieval over a pre-indexed arXiv vector store, followed by LLM-based re-ranking and snippet extraction.
- citation_lookup: Aggregation of RIS/BibTeX entries from public APIs (e.g., Crossref, Semantic Scholar) with confidence-based result selection.
- document_scoring: Normalized clarity, coherence, and novelty metrics in , provided via XtraGPT scoring templates.
- diff_patch_generator: Application of UNIX
[diff](https://www.emergentmind.com/topics/differential-transformer-diff) --unifiedwith custom serialization for JSON patch construction.
Agents orchestrate these calls natively, e.g., parallel execution of segment-based reviews, asynchronous citation lookups, or multi-stage research and summarization workflows. Workflow-based agents may coordinate multiple tool calls, enforce schema-compliance at each step, and stream progress updates and partial results via SSE (Hou et al., 2 Dec 2025).
5. Example Use Cases and User Interactions
The user experience is oriented around localized and document-scale agent actions within the Overleaf editor:
- Localized Critique & Patch: Users highlight a LaTeX span to trigger agent workflows (Reviewer, Enhancer, PatchGenerator); partial results and final JSON patches are streamed to the extension, which renders inline diffs for user approval.
- Structured Review (Full Document): The document is split into segments (e.g., ≈2,000 tokens); Reviewer agents process segments in parallel; results are merged and presented as a composite diff for user application.
- Citation and Literature Search: Agents retrieve, summarize, and insert references inline, leveraging the literature_search and citation_lookup endpoints.
All interactions are documented with full provenance via versioned patches and git-committed changes, aligning agent activity with the editor’s native history model (Hou et al., 2 Dec 2025).
6. Performance Analytics and Reliability
From May–November 2025, usage analytics indicate:
- Chrome extension installs: 112
- Registered users: 78
- Monthly active users: 23 (~29.5% retention)
- Projects created: 158, Threads: 797
Interaction events:
- Diffs viewed: 1073
- Copy suggestion: 375
- Insert patch: 359
Latency (90th percentile):
- Prompt-template tasks: 2.1 s
- Single-segment review: 4.7 s
- Full-document (10 segments + merge): 18.3 s
Reliability:
- Agent pod launch success: 99.2%
- Patch apply success: 98.7%
The quantitative metrics suggest high operational reliability and end-user responsiveness. Measurement is event-based, with back-end times derived from pod start/end logs, and reliability ratios computed over all user sessions (Hou et al., 2 Dec 2025).
7. Design Considerations, Limitations, and Future Directions
Design choices include a Chrome-extension + Overleaf focus (enabling deep DOM-level integration but excluding other environments), per-agent pod isolation (adding ~1 s startup latency), and diff-based patching (requiring precise line tracking under concurrent edits).
Limitations:
- Domain style drift, with some outputs noted as “too CS-centric” due to tuning data.
- Document ingestion scalability at >200 pages can be slow.
- Chrome-only support; supporting Firefox or offline editors requires recasting the UI and synchronization layers.
Potential extensions include on-premise LLM agent pods, editor-agnostic plugin wrappers, improved merge conflict handling, graph-based citation network analysis tools, and real-time collaborative agent sessions (Hou et al., 2 Dec 2025).
PaperDebugger establishes that agentic, in-editor academic writing support is technically viable, performant, and aligns closely with the requirements of arXiv-centric research workflows. The system’s design leverages secure session separation, diff-based granular patching, and a composable, multi-agent orchestration model to enable direct LLM-mediated enhancement, structured review, and research-augmented editing—all within the native context of authoring environments like Overleaf.