- The paper introduces a multi-agent system that automates iterative hypothesis generation, experiment suggestion, and data analysis for therapeutic discovery.
- It employs specialized agents for literature review and experimental data interpretation using reproducible Docker environments and LLM-based pairwise ranking.
- The study validates Robin by identifying ripasudil as a promising candidate for treating dry age-related macular degeneration through iterative testing.
Robin is a multi-agent system designed to automate key intellectual steps of the scientific discovery process, specifically focusing on therapeutic discovery. It integrates LLM agents for literature search and data analysis into an iterative workflow. The system aims to accelerate the pace of scientific research by generating hypotheses, suggesting experiments, analyzing resulting data, and refining hypotheses based on experimental feedback.
The core components of Robin include:
- Robin (Orchestrator): Manages the overall workflow, coordinating interactions between specialized agents. It uses an LLM (OpenAI o4-mini) for synthesis and hypothesis generation and another (Anthropic Claude 3.7 Sonnet) as an LLM judge for ranking.
- Crow and Falcon: Literature search agents based on PaperQA2 (2409.13740). Crow performs concise literature summaries for initial exploration, while Falcon conducts deeper investigations for detailed evaluation of candidates. These agents access scientific literature, clinical trial reports, and the Open Targets Platform [10.1093/nar/gkae1128].
- Finch: A scientific data analysis agent (2503.00096) capable of analyzing various experimental data modalities, such as RNA-seq and flow cytometry. Finch operates within a Jupyter notebook environment using a pre-built Docker container (BixBench-env:v1.0) for reproducibility and utilizes a ReAct-based agentic prompting strategy (2210.03629) with tools like
edit_cell
and submit_answer
.
The typical workflow for therapeutic discovery with Robin proceeds as follows:
- Given a disease name, Robin queries Crow to research the disease pathology and propose potential causal mechanisms.
- For each mechanism, Crow provides reports on in vitro models and relevant assays. Robin uses an LLM judge (trained to align with human expert preferences) to rank these proposals via pairwise comparisons, selecting the top-ranked assay for the experimental strategy. The ranking utilizes the Bradley-Terry-Luce (BTL) model [bradley_rank_1952] based on LLM-adjudicated comparisons.
- Robin then conducts further literature review using Crow to identify potential therapeutic candidates based on the selected assay and disease mechanism.
- Falcon generates detailed evaluation reports for these candidates, which are then ranked by the LLM judge based on scientific rationale, pharmacological profile, and supporting literature.
- Human scientists review the ranked list, select top candidates, and perform experiments in the lab using a protocol based on Robin's suggested assay.
- Raw or semi-processed experimental data (e.g., .fcs files for flow cytometry, read counts for RNA-seq) is provided back to Robin.
- Robin deploys Finch to analyze the data. To account for stochasticity in LLM agent analysis, Robin can launch multiple independent Finch trajectories (e.g., 10) and synthesize their outputs for a consensus-driven conclusion.
- Robin interprets Finch's analysis results, extracts scientific insights, and refines hypotheses, which informs the next round of candidate generation, closing the iterative discovery loop. Robin can also propose follow-up experiments.
A key demonstration of Robin's capability involved identifying novel therapeutic candidates for dry age-related macular degeneration (dAMD).
- Robin initially identified enhancing RPE cell phagocytosis as a therapeutic strategy and proposed a flow cytometry assay using ARPE-19 cells and pHrodo beads.
- After a literature review, Robin proposed 30 existing drugs. The top candidates were experimentally tested.
- Finch analyzed the flow cytometry data from the first round, quantifying phagocytosis enhancement by gating cells and performing statistical tests (e.g., Dunnett test). This confirmed that ROCK inhibitors like Y-27632 enhanced phagocytosis, consistent with prior findings [mao_analysis_2012, mao_acute_2021, Halasz_Townes-Anderson_2016].
- Robin recommended a follow-up RNA-seq experiment on Y-27632-treated cells to investigate the mechanism. Finch analyzed the RNA-seq data, performing differential gene expression (DGE) analysis (using tools like DESeq2 [love_moderated_2014], biomaRt [durinck_mapping_2009], EnhancedVolcano [noauthor_enhancedvolcano_nodate]) and GO enrichment analysis (Figure 3 B-D). This revealed unexpected transcriptional changes, including significant upregulation of ABCA1, a lipid efflux pump, suggesting a novel mechanistic link to dAMD pathology beyond known cytoskeletal effects.
- Based on these insights, Robin generated a new round of candidates. Experimental testing and Finch's analysis showed that ripasudil, another ROCK inhibitor clinically used for glaucoma, significantly outperformed Y-27632 in enhancing RPE phagocytosis (Figure 4).
This discovery of ripasudil for dAMD phagocytosis highlights Robin's ability to synthesize literature-based hypotheses and validate them through iterative experimentation and data analysis. Ripasudil's existing clinical use in ocular applications makes it a promising drug repurposing candidate for dAMD.
Implementation Considerations:
- The system is implemented as a streamlined Jupyter notebook using the Aviary framework.
- Specific LLMs are chosen for different tasks (OpenAI o4-mini for generation, Anthropic Claude 3.7 Sonnet for judging).
- LLM judging uses pairwise comparisons and the BTL model, demonstrating high concordance and consistency with human experts.
- Finch's data analysis is performed within a controlled Docker environment to ensure reproducibility. The use of multiple Finch trajectories helps mitigate issues arising from the stochastic nature of LLMs in complex analysis tasks, generating more robust consensus results.
- The connection between Robin and the wet lab currently requires human intervention for translating experimental plans into precise protocols and executing the experiments.
Limitations and Future Development:
- Robin currently generates experimental outlines rather than precise, executable laboratory protocols, requiring human translation.
- Finch's data analysis capabilities are heavily reliant on prompt engineering by domain experts; future work aims for greater autonomy in prompt generation and adaptation across data types.
- Refining the LLM judge and hypothesis generation process to better align with nuanced human scientific judgment is an ongoing area for improvement.
Robin represents a significant step towards fully automating the scientific discovery loop, demonstrated by identifying and validating a novel therapeutic candidate and mechanism for dAMD. Its architecture, integrating specialized agents for literature and data analysis within an iterative framework, provides a practical paradigm for AI-driven research.