Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery (2504.16728v2)

Published 23 Apr 2025 in cs.AI, cs.CL, and cs.HC

Abstract: The rapid advancement in capabilities of LLMs raises a pivotal question: How can LLMs accelerate scientific discovery? This work tackles the crucial first stage of research, generating novel hypotheses. While recent work on automated hypothesis generation focuses on multi-agent frameworks and extending test-time compute, none of the approaches effectively incorporate transparency and steerability through a synergistic Human-in-the-loop (HITL) approach. To address this gap, we introduce IRIS: Interactive Research Ideation System, an open-source platform designed for researchers to leverage LLM-assisted scientific ideation. IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis. Designed to empower researchers with greater control and insight throughout the ideation process. We additionally conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation. We open-source our code at https://github.com/Anikethh/IRIS-Interactive-Research-Ideation-System

Summary

  • The paper introduces IRIS's human-in-the-loop framework that synergizes user control with MCTS-driven exploration to enhance hypothesis generation.
  • It employs fine-grained review and query-based retrieval to refine research briefs and ground ideas with relevant, cited literature.
  • User studies show that interactive use of IRIS improved hypothesis scores by 0.5 points and ELO ratings by 12, boosting overall researcher satisfaction.

This paper introduces IRIS (Interactive Research Ideation System), a human-in-the-loop (HITL) platform designed to accelerate scientific discovery by assisting researchers in generating novel hypotheses and refining them into research briefs. The system addresses limitations of existing AI-driven scientific ideation tools, which often rely on fully automated processes, lack transparency and steerability, use coarse-grained feedback, employ simplistic retrieval methods, and have unstructured approaches to exploring the idea space.

IRIS tackles the crucial first stage of research: generating novel hypotheses. It aims to provide a synergistic collaboration between researchers and LLMs, empowering users with greater control and insight throughout the ideation process.

The core contributions of IRIS include:

  1. A Human-in-the-Loop Framework: IRIS is user-centered, balancing automated generation with human oversight at every stage.
  2. Monte Carlo Tree Search (MCTS): Adapts MCTS to systematically explore the idea space, alternating exploration and exploitation phases, and extending test-time computation.
  3. Fine-grained Review based Refinement: Employs a detailed hierarchical taxonomy to provide actionable feedback on specific components of a research brief, mitigating LLM "reward hacking".
  4. Query-based Retrieval: Generates targeted queries to retrieve relevant literature, synthesizing comprehensive, cited responses through re-ranking, clustering, and summarization using tools like the Semantic Scholar API and ScholarQA.
  5. Open Source: The system is publicly available to encourage broader adoption and further development.

System Architecture and Implementation:

IRIS employs a three-agent architecture:

  • Ideation Agent: Generates and iteratively improves the research brief (Title, Proposed Methodology, Experiment Plan). It can operate in semi-automatic mode (guided by user feedback) or fully autonomous mode (driven by MCTS actions).
  • Review Agent: Provides both a quantitative "reward" (an average score based on a hierarchical taxonomy of evaluation aspects like originality, clarity, feasibility, effectiveness, and impact) and qualitative fine-grained feedback on distinct segments of the research brief. This granular feedback mechanism, validated by the researcher, is crucial for steering the LLM effectively.
  • Retrieval Agent: Synthesizes targeted search queries based on the research goal. It uses a two-stage retrieval and three-stage generation pipeline leveraging the Semantic Scholar API and ScholarQA to retrieve relevant passages, re-rank them, aggregate them at the paper level, extract quotes, generate a structured report plan, cluster passages, and generate cited, summarized sections. It also supports uploading user-provided PDF documents for context using tools like Grobid.

The system's user interface is built with HTML, CSS, and JavaScript. LLM functionalities are powered by Gemini-2.0-Flash, accessible via LiteLLM, which allows users to swap in other LLMs.

Monte Carlo Tree Search Adaptation:

IRIS adapts MCTS to navigate the subjective landscape of scientific ideation.

  • State: A state ss in the search tree is defined by the current research brief, a reward estimate (from the Review Agent), the latest review feedback, and retrieved knowledge.
  • Actions: The defined action space A\mathcal{A} includes: generate (create a new brief), refine w/ retrieval (refine based on retrieved literature), refine w/ review (refine based on Review Agent feedback), and refine w/ user feedback (refine based on direct user input).
  • Evaluation: The LLM-based Review Agent acts as a proxy judge to estimate the quality (reward) of a generated hypothesis state.
  • Selection: Uses the Upper Confidence Bound for Trees (UCT) algorithm:

    UCT(n)=Q(n)N(n)+clnN(np)N(n)\text{UCT}(n) = \frac{Q(n)}{N(n)} + c \sqrt{\frac{\ln N(n_p)}{N(n)}}

    where Q(n)Q(n) is total reward, N(n)N(n) is visit count, N(np)N(n_p) is parent visit count, and cc is the exploration constant.

  • Process: Iteratively performs Selection, Evaluation, Expansion (creating child nodes for applicable actions if below max depth), and Backpropagation (updating QQ and NN up the tree).
  • Memory: Agents maintain trajectory-level memory to guide generation and avoid redundancy.
  • Cost Control: Allows users to set budgets for computational resources, adjusting the exploration constant cc to prioritize exploitation for tighter budgets.

Evaluation and Results:

The paper evaluates IRIS through automated LLM-as-a-judge metrics and a user paper with 8 researchers across disciplines.

  • Metric Validation: Found a moderate correlation between human baseline rankings and LLM-based ELO scores (Pearson's r=0.60), but weaker with absolute scores (r=0.45), suggesting ELO might be a better proxy for human preference in this context.
  • Automated Evaluation: LLM-as-a-judge results showed that user interaction within IRIS consistently improved hypothesis quality, increasing average absolute scores by 0.5 points and ELO ratings by 12 points over interaction depth up to 3.
  • User Study Feedback: Quantitative ratings (on a 1-5 Likert scale) indicated high user satisfaction with key features:
    • Usefulness of Fine-grained Feedback: 4.3
    • MCTS Tree Interface (Steerability): 4.2
    • Usability and Control: 4.5
    • Quality of Lit. Summaries: 3.7
    • Overall Satisfaction (Final Research Brief): 3.9
    • Qualitative feedback highlighted the value of the MCTS tree for transparency and control, the usefulness of the fine-grained feedback (often reflecting user concerns or sparking new insights), and the role of retrieval in grounding ideas (though quality varied by domain).

Overall, the paper showed that 75% of users found the final hypothesis generated with IRIS marginally or substantially better than initial attempts, and all users reported an enhanced understanding of the proposed methodology.

Practical Implications and Limitations:

IRIS demonstrates a practical implementation of a HITL system for a complex creative task like scientific ideation. It provides researchers with concrete tools (structured review, guided search, iterative refinement) to leverage LLMs effectively while retaining control and transparency, addressing concerns about automated systems generating plausible but flawed or misaligned content. The MCTS adaptation provides a structured way to explore the vast idea space while managing computational resources.

The paper notes limitations, including the current reliance on the researcher's domain expertise to verify LLM outputs and the potential for higher-quality hypothesis generation with access to more advanced LLMs. Future work aims towards a more reciprocal Human-AI co-creation model where the AI can also challenge the researcher's decisions.

In summary, IRIS offers a practical, open-source platform that integrates LLMs into the scientific ideation process through a carefully designed HITL framework, novel MCTS adaptation, fine-grained feedback, and sophisticated literature retrieval, showcasing the potential of human-AI collaboration for accelerating scientific discovery.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com