Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models (2501.05468v1)

Published 5 Jan 2025 in cs.CL

Abstract: Systematic literature reviews and meta-analyses are essential for synthesizing research insights, but they remain time-intensive and labor-intensive due to the iterative processes of screening, evaluation, and data extraction. This paper introduces and evaluates LatteReview, a Python-based framework that leverages LLMs and multi-agent systems to automate key elements of the systematic review process. Designed to streamline workflows while maintaining rigor, LatteReview utilizes modular agents for tasks such as title and abstract screening, relevance scoring, and structured data extraction. These agents operate within orchestrated workflows, supporting sequential and parallel review rounds, dynamic decision-making, and iterative refinement based on user feedback. LatteReview's architecture integrates LLM providers, enabling compatibility with both cloud-based and locally hosted models. The framework supports features such as Retrieval-Augmented Generation (RAG) for incorporating external context, multimodal reviews, Pydantic-based validation for structured inputs and outputs, and asynchronous programming for handling large-scale datasets. The framework is available on the GitHub repository, with detailed documentation and an installable package.

Summary

  • The paper presents a multi-agent framework that automates systematic review tasks using large language models.
  • The evaluation demonstrates AUC values up to 0.95, highlighting effective relevance scoring and data extraction from diverse datasets.
  • The framework offers modular workflows and customizable reviewer agents, paving the way for scalable and rigorous literature reviews.

Evaluating LatteReview: A Systematic Framework for Automation of Literature Reviews with LLMs

LatteReview emerges as a sophisticated tool designed to streamline the traditionally labor-intensive process of systematic literature reviews and meta-analyses by harnessing the capabilities of LLMs. The framework proposed in this academic paper is noteworthy for its utilization of multi-agent systems that automate critical review tasks, such as title and abstract screening, relevance scoring, and data abstraction. The authors detail how LatteReview leverages these advanced technologies to maintain the rigorous standards essential to academic research, promising substantial savings in both time and human resources.

Architecture and Components

The LatteReview framework is founded on a multi-agent architecture that is both modular and extensible, permitting customization to fulfill the unique requirements of diverse research contexts. The architecture incorporates three principal components: Providers, Reviewer Agents, and Workflows. The Providers facilitate interactions with various LLM APIs—such as those from OpenAI, Ollama, and LiteLLM—thereby abstracting the complexities involved and ensuring uniformity across different model architectures.

Reviewer Agents are central to the framework's implementation, performing various tasks like relevance scoring, data abstraction, and custom reviews according to user-defined schemas. The flexibility offered by these agents is further enhanced by workflows that coordinate multi-round reviews. These workflows can manage sequential and parallel processing, depend on dynamic decision-making, and yield structured outputs by facilitating intricate review processes.

Core Functionalities

LatteReview's suite of functionalities is designed to maximize flexibility, accuracy, and scalability in processing diverse input types, including text and images. These functionalities include:

  • Scoring Reviews: This fundamental ability allows AI agents to assign scores based on predefined criteria, integrating reasoning transparency and certainty scores to aid in decision-making.
  • Title and Abstract Reviews: Specialized for screening academic content against inclusion and exclusion criteria, ensuring only pertinent items proceed.
  • Abstraction Reviews: Focuses on structured data extraction from unstructured inputs, which is particularly beneficial for summarizing and analyzing research trends.
  • Multi-Reviewer Workflows: Supports the orchestration of complex review processes involving multiple AI reviewers capable of sequential and parallel evaluations.

LatteReview also supports Retrieval-Augmented Generation (RAG) processes and provides custom agent capabilities, which enable users to tailor functionalities to specific review scenarios.

Evaluation and Practical Insights

The paper evaluates LatteReview using datasets from the SYNERGY collection and a custom dataset stemming from the authors’ previous publications. The evaluation highlights the framework’s discriminative capability in identifying relevant articles, with Area Under the Curve (AUC) values ranging from 0.77 to 0.95 across various review tasks. This performance, however, depends significantly on the heterogeneity of datasets and the clarity of inclusion/exclusion criteria.

The authors offer practical insights into optimizing the use of LatteReview, emphasizing the importance of clear review criteria, appropriate selection of reviewer models, and effective integration of human oversight. The paper suggests that even with modest LLM configurations, the framework demonstrates substantial effectiveness, particularly when review prompts are well-defined.

Implications and Future Directions

LatteReview introduces substantial advancements in the field of systematic review automation. By integrating state-of-the-art LLMs and supporting multimodal data, the framework sets a precedent for more efficient and exhaustive academic reviews. The potential implications are far-reaching, particularly in domains requiring rapid synthesis of extensive literature such as healthcare and policy formulation.

Future iterations of LatteReview are expected to enhance its functionality and user-friendliness. Developments such as the incorporation of a broader array of LLMs, improved context management, and the introduction of no-code interfaces are anticipated. These enhancements aim to broaden the framework's applicability and render it more accessible to a wider research audience.

In conclusion, LatteReview presents a robust, scalable solution for academic literature review automation, effectively combining innovation with user-oriented flexibility. As systematic reviews continue to underpin evidence-based practices, frameworks like LatteReview have the potential to redefine the approach to synthesizing research insights across disciplines.