Paper2Agent Interactive Framework
- The paper presents Paper2Agent, an automated framework that transforms research papers into interactive, reproducible AI agents using a Model Context Protocol server.
- The framework employs multi-phase agentic extraction and rigorous testing to modularize scientific code, resources, and workflows for seamless reproducibility.
- Paper2Agent enables natural language queries to trigger validated analytical processes, reducing manual setup and enhancing research dissemination.
Paper2Agent is an automated, multi-agent framework designed to transform research papers from static artifacts into interactive, reproducible AI agents. It systematizes the extraction, packaging, and interfacing of codebases and scientific workflows defined in academic papers, making them accessible through a standardized protocol and natural language queries. The process centers on a Model Context Protocol (MCP) server and involves multi-phase agentic extraction, testing, and robustification. The end product is a reliable, interactive agent that can be accessed by users or other AI systems for advanced scientific queries, greatly reducing friction in research reproducibility and method dissemination (Miao et al., 8 Sep 2025).
1. Framework Overview
Paper2Agent automates the "agentification" of research papers by analyzing manuscripts and associated source code repositories to extract algorithms, data resources, and usage patterns. The central abstraction is the Model Context Protocol (MCP) server, which encapsulates the methods, tools, assets, and standardized workflows described in a given paper. The major operational stages are:
- Download and parse both manuscript and codebase.
- Establish a reproducible environment with correct dependencies.
- Deploy specialized sub-agents to extract and modularize key analytical features as MCP Tools.
- Organize supporting resources (e.g., manuscripts, tables, figures) as MCP Resources.
- Generate step-wise MCP Prompts to guide workflow execution and linking between tools.
- Iteratively test and refine the MCP using benchmarked cases until outputs are reliable.
- Deploy the resulting MCP server on a remote host.
- Integrate the MCP with a chat-centric agent (e.g., Claude Code) to allow for natural language interaction, where user queries trigger workflow executions that embody the paper’s scientific contributions.
The resulting system enables users to engage with a research paper’s methods as an operational agent, bypassing conventional manual adaptation and code inspection.
2. Model Context Protocol (MCP)
The Model Context Protocol is a standardized wrapping protocol that exposes a paper’s code and methods as a suite of callable “tools” to LLMs or other agents. Key elements are:
- MCP Tools: Modular, atomic functions distilled from the paper’s codebase and tutorials, capturing individual analytical capabilities.
- MCP Resources: Static assets needed for operation and transparency, such as dataset links, supplementary code fragments, or specific workflows.
- MCP Prompts: Structured instructions encoding how to chain tools together for standard or custom analyses, including both exact reproductions of paper workflows and new user-driven pipelines.
By centralizing these elements in the MCP server, Paper2Agent enables any compatible LLM-driven agent to query, orchestrate, and validate scientific procedures, with guarantees of structural fidelity to the original work. MCP also exposes reference links to source code to ensure traceability and eliminates the risk of code hallucination by restricting method exposure to verified tools only.
3. Case Studies
Paper2Agent demonstrates its methodology with several domain-specific cases:
- AlphaGenome (Genomic Variant Interpretation): Extraction of 22 MCP tools supporting tasks like variant scoring and visualization. The resulting agent enables direct, natural language-based requests for variant analysis and successfully reproduces all tutorial and novel benchmark queries.
- TISSUE (Uncertainty-Aware Spatial Transcriptomics): Generation of ~6 MCP tools covering gene prediction, interval construction, and hypothesis testing. The agent responds to both standard analyses and meta-queries about required inputs and available features.
- Scanpy (Single-Cell Data Preprocessing): Extraction and wiring of seven MCP tools for workflows such as quality control, dimensionality reduction, and clustering. The agent executes canonical pipelines as well as data-specific novel analyses and provides results matching manual expert execution.
Benchmarks show that these agents replicate both the original paper findings and handle unseen queries, indicating robust reproducibility and adaptability.
4. Validation and Testing
Reliability is achieved via a multi-stage validation strategy:
- After tool extraction, a dedicated test-verifier-improver agent iteratively runs both canonical tutorial queries and additional, curated test cases.
- Outputs are checked against known references; only those passing all test cases are exposed as MCP tools.
- Agents embed links to original code for transparency and operate within pre-configured environments to enforce exact dependency resolution.
- This methodology ensures that the delivered agent cannot diverge from the original research implementation, reducing the risk of incorrect or hallucinated results.
Evaluation in case studies demonstrates that outputs are either exact matches or, if not possible, excluded from the deployable MCP toolkit.
5. Technical Details
The transformation pipeline is managed by multiple specialized sub-agents:
- Environment-manager agent: Prepares dependencies and environment images for reproducibility.
- Tutorial-scanner agent: Identifies and collects representative tutorial code and educational artifacts.
- Tutorial-tool-extractor-implementor agent: Parameterizes and modularizes tutorial code into discrete tools.
- Test-verifier-improver agent: Iteratively tests and improves tool wrappers, ensuring functionality until reference results are matched.
The final deployment connects the MCP server to a chat-like interface. Users submit queries, which are interpreted and mapped to sequences of MCP tool calls and workflows specified in MCP Prompts. The design supports both step-by-step and end-to-end execution, as well as transparent documentation for each step through structured resources in MCP.
A high-level pseudo-code organization:
1 2 3 4 5 6 7 8 9 |
For each research paper: 1. Download manuscript and codebase. 2. Prepare environment with Environment-manager agent. 3. Identify tutorials with Tutorial-scanner agent. 4. For each tutorial: a. Extract tool with Tutorial-tool-extractor-implementor agent. b. Run test cases with Test-verifier-improver agent. 5. Aggregate all validated tools and workflows into MCP. 6. Deploy MCP server and link to a conversational agent. |
6. Impact and Future Directions
Paper2Agent introduces a new paradigm for scientific knowledge dissemination:
- By converting traditional papers into active, programmable agents, it reduces barriers to method adoption and reproducibility.
- Scientists interact directly with the underlying algorithms without manual environment setup or ambiguous code adaptation.
- Broader impact includes facilitating AI-driven scientific exploration, collaborative workflows among “AI co-scientists,” and reducing the likelihood of irreproducible research claims.
Planned future directions include extending agentification to data- and discovery-focused papers, synthesizing multiple paper agents into unified interfaces, and automating even more rigorous benchmark-driven validation (e.g., with LLM-as-judge frameworks). The possibility of an “agent availability” statement for published works signals an evolution in the standard of research dissemination, potentially making dynamic agent delivery a norm alongside data and code availability.
In summary, Paper2Agent constitutes a comprehensive, multi-agent system that converts scholarly research into standardized, robust, and interactive AI agents by packaging workflows, code, resources, and validation artifacts into an MCP server. Its demonstrated capability to produce reliable, test-verified agents in genomics and single-cell bioinformatics—not only for reproducing original results but for answering novel user queries—highlights a substantive advance toward interactive, reproducible, and easily adoptable scientific research (Miao et al., 8 Sep 2025).