Enhancing Research Idea Generation through Combinatorial Innovation and Multi-Agent Iterative Search Strategies

Published 22 Apr 2026 in cs.CL, cs.AI, cs.DL, and cs.IR | (2604.20548v1)

Abstract: Scientific progress depends on the continual generation of innovative re-search ideas. However, the rapid growth of scientific literature has greatly increased the cost of knowledge filtering, making it harder for researchers to identify novel directions. Although existing LLM-based methods show promise in research idea generation, the ideas they produce are often repetitive and lack depth. To address this issue, this study proposes a multi-agent iterative planning search strategy inspired by com-binatorial innovation theory. The framework combines iterative knowledge search with an LLM-based multi-agent system to generate, evaluate, and re-fine research ideas through repeated interaction, with the goal of improving idea diversity and novelty. Experiments in the natural language processing domain show that the proposed method outperforms state-of-the-art base-lines in both diversity and novelty. Further comparison with ideas derived from top-tier machine learning conference papers indicates that the quality of the generated ideas falls between that of accepted and rejected papers. These results suggest that the proposed framework is a promising approach for supporting high-quality research idea generation. The source code and dataset used in this paper are publicly available on Github repository: https://github.com/ChenShuai00/MAGenIdeas. The demo is available at https://huggingface.co/spaces/cshuai20/MAGenIdeas.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that a multi-agent iterative framework significantly improves idea diversity and novelty through collaborative refinement.
It employs role-differentiated agents and a rigorously filtered literature dataset, using iterative evaluation like Swiss-system tournaments.
Empirical results show enhanced high-score ratios and novelty metrics compared to traditional single-agent and keyword-based methods.

Multi-Agent Iterative Search: Framework for Research Idea Generation

Motivation and Theoretical Foundations

The exponential increase in scientific literature has elevated the challenge of identifying genuinely novel research directions due to heightened cognitive and temporal burdens on researchers. Traditional LLM-based systems demonstrate limited efficacy, producing repetitive, low-depth research ideas. The paper formulates research idea generation as a combinatorial innovation problem, drawing upon Schumpeterian theory that positions creative outputs as atypical recombinations of existing knowledge elements. Prior work has predominantly relied on single-agent systems or keyword-based retrieval, leading to perspective bias and path dependency. These systems fail to simulate the collaborative, iterative knowledge refinement central to actual scientific progress.

The proposed framework leverages combinatorial innovation by implementing role-differentiated, multi-agent LLMs, each agent instantiated from real author background metadata. The collaborative, iterative nature of the approach enables knowledge recombination across heterogeneous perspectives, with agents independently evaluating, critiquing, and refining emerging ideas while integrating newly retrieved domain-specific literature.

Methodological Design

The framework follows a four-stage pipeline:

Dataset Construction: The experiment focuses on NLP, utilizing 144 ACL 2024 long papers, 6,153 references, 953 anonymized author profiles, and 25,906 author publications, integrating ACL Anthology, OpenAlex, and Semantic Scholar via deterministic filtering for citation and metadata completeness.
Initial Idea Generation: For each target paper, the system generates 15 initial research ideas via LLM prompting, guided by ten scientific discovery theory methodologies (e.g., hypothetico-deductive, paradigm theory). This ensures methodologically diverse idea incubation.
Iterative Multi-Agent Refinement: Team sizes (2–8 agents) are determined by author counts. Iteratively, agents perform planned literature searches, propose new ideas, and engage in competitive evaluation (Swiss-system tournament + zero-shot LLM ranking) using rubrics derived from top conference review protocols. Each iteration incorporates self-critique, cross-agent feedback, and refinement.
Abstract Generation: Final ideas are summarized in structured form for subsequent evaluation and comparison against standardized conference paper abstracts.

The framework architecture ensures that idea evolution incorporates broad exploratory recombination (early rounds) followed by focused refinement and increased path dependence (later rounds).

Evaluation and Experimental Results

Metrics and Baselines

Evaluation is conducted over multiple metrics: semantic diversity (breadth of unique concepts), novelty (semantic dissimilarity from extant literature), and quality scores (proportion of high-scoring ideas per Swiss-system tournament). Baselines include AI-Researcher (literature-grounded LLM prompting) and NOVA (iterative single-agent search). The framework is tested across three backbone LLMs (DeepSeek-3.1, GPT-4o, qwen3-8b).

Numerical Results

Diversity: Achieved 0.898, outperforming NOVA (0.867) and AI-Researcher (0.680).
Novelty: 0.133, higher than NOVA (0.107) and AI-Researcher (0.067).
HighScore Ratio: 0.184, exceeding NOVA (0.026) and AI-Researcher (0.013).

Cross-model evaluation demonstrates framework generalizability; all three backbones produced competitive outputs, with DeepSeek-3.1 excelling in diversity and quality ratio while GPT-4o and qwen3-8b scored higher in semantic novelty.

Benchmarking Against Conference Papers

A statistically rigorous comparison was conducted—402 generated ideas versus NLP-domain submissions from ICLR 2025 (accepted/rejected papers). Generated ideas scored mean values of 2.224 (SD=0.777) compared to accepted (2.776, SD=1.646) and rejected (2.311, SD=1.620) papers; results indicate generated ideas are consistently superior to rejected submissions, yet fall short of the mean quality of accepted papers.

Ablation, Team Size, and Iteration Effects

Ablation studies isolating single-agent versus multi-agent settings revealed that knowledge planning/search positively affected diversity and novelty, but single-agent systems plateaued after several iterations. Multi-agent systems demonstrated increased quality and consistent upward metric trends. Medium-sized teams (4–7 agents) provided optimal quality-to-uniqueness performance, aligning with large-team disruption versus small-team novelty literature.

Iteration number positively correlated with improved quality and novelty, with diminishing returns and increasing path dependence over successive rounds.

Mechanisms of Knowledge Recombination

Fine-grained entity analysis—extracting methods, tasks, metrics, and datasets—across iterative outputs revealed a shift from broad exploratory recombination in early rounds to concentrated, inherited entity structures in later rounds. Assimilation of external knowledge declined with increased path dependence, suggesting future work should enhance external information integration and mechanism design against path locking.

Error Analysis and Limitations

Qualitative analysis revealed four failure modes: internal technical inconsistency, unsupported extrapolation, underspecified integration of components, and nonviable novelty (excessive scope, weak focus). While diversity and novelty metrics are sufficient for automated evaluation, they inadequately capture feasibility and methodological soundness. The framework incurs notable computational overhead, especially during iterative refinement and competitive evaluation stages.

Practical and Theoretical Implications

The work demonstrates that multi-agent, theory-guided frameworks substantially expand the research idea search space for LLMs, mitigate perspective bias, and promote high-quality ideation. Integrating combinatorial innovation theory with LLM-driven ideation offers both empirical improvements and theoretical interpretability. Collaborative multi-agent architectures provide effective support for early-stage ideation and align with actual scientific processes. However, domain-adaptive strategies are necessary for transferability beyond NLP, and richer evaluation protocols—potentially expert review—are needed to quantify feasibility.

Conclusion

The study establishes a principled, multi-agent iterative framework for automated research idea generation, grounded in combinatorial innovation theory. Empirical results show consistent gains in diversity, novelty, and quality relative to strong baselines. While generated ideas do not match top-tier conference submissions, they exhibit clear academic value and suggest that theory-informed AI systems can meaningfully improve scientific creativity and knowledge recombination. Future research should address domain generalizability, reproducibility, efficiency, and evaluation rigor.

Markdown Report Issue