Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation (2406.14979v2)

Published 21 Jun 2024 in cs.CL
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

Abstract: Despite the significant progress of LLMs in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in LLM generation, inputting the entire document may introduce off-topic information, causing the model to deviate from the central topic and affecting the relevance of the generated content. To address these issues, we propose the Retrieve-Plan-Generation (RPG) framework. RPG generates plan tokens to guide subsequent generation in the plan stage. In the answer stage, the model selects relevant fine-grained paragraphs based on the plan and uses them for further answer generation. This plan-answer process is repeated iteratively until completion, enhancing generation relevance by focusing on specific topics. To implement this framework efficiently, we utilize a simple but effective multi-task prompt-tuning method, enabling the existing LLMs to handle both planning and answering. We comprehensively compare RPG with baselines across 5 knowledge-intensive generation tasks, demonstrating the effectiveness of our approach.

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

The paper entitled "Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation" proposes a novel framework named RPG (Retrieve-Plan-Generation) designed to improve the performance of LLMs in knowledge-intensive tasks. LLMs have made significant strides in various applications, yet they are often hampered by factual inaccuracies when generating responses due to limited internal knowledge. The RAG (Retrieval-Augmented Generation) system addresses some of these concerns by incorporating external knowledge sources. However, including entire retrieved documents can introduce off-topic information, which may dilute response relevance and coherence.

Proposed Solution

The RPG framework provides a two-stage process: the Plan Stage and the Answer Stage. This iterative approach ensures that:

  1. Plan Tokens are generated to guide the structure and content of the subsequent output.
  2. Fine-Grained Paragraphs are selected based on the current plan token to generate targeted answers.

By iterating between planning and answering, the RPG framework maintains the relevance and topical coherence of the generated content. This approach attempts to remedy the issues of focus shift and off-topic influences inherent in the traditional RAG methods.

Implementation Method

The RPG framework leverages an efficient multi-task prompt tuning method, which allows existing LLMs to be fine-tuned for both planning and answering tasks without needing extensive modifications to the model architecture. The training involves:

  • Multi-Task Prompt Tuning: This technique ensures the model can handle both planning and answering by learning task-specific prompts while keeping the LLM largely unchanged.
  • Incremental Planning: During inference, the model alternates generating plan tokens and answers, iteratively refining the output until completion.

Evaluation and Results

The authors evaluated RPG across five tasks, including long-form generation (ASQA, ELI5), multi-hop reasoning (2WikiMultiHopQA), and short-form question answering (PopQA, PubHealth). Key metrics such as ROUGE, MAUVE, and F1 scores were used to measure the performance.

Long-Form Generation:

  • The RPG model demonstrated significant improvements over traditional RAG and even instruction-tuned models like Alpaca and Llama2 in the ASQA and ELI5 datasets, particularly in metrics like ROUGE and MAUVE.

Multi-Hop Generation:

  • On the 2WikiMultiHopQA dataset, RPG outperformed other RAG-based models and approached the performance of top-tier models such as ChatGPT equipped with dynamic retrieval strategies.

Short-Form Generation:

  • While not the primary focus, RPG still excelled in short-form QA tasks, often outperforming other retrieval-augmented models by maintaining focused and relevant responses.

Implications and Future Directions

The RPG framework sets a precedent for incorporating explicit pre-planning in LLMs, effectively bolstering relevance and adherence to the topic in lengthy generation tasks. Practical implications include enhanced capabilities for applications demanding detailed and contextually appropriate responses, such as academic writing, report generation, and customer support.

Future directions could involve:

  1. Scaling RPG to Larger LLMs: Given resource constraints, the current implementation focuses on models up to Llama2\textsubscript{7B}. Extending this to 13B or 70B models could unlock further performance enhancements.
  2. Extending Training Data: The dataset size used in this paper, reconstructed to about 50k instances, suggests that increasing the size and variety of training data could further strengthen RPG’s capabilities.

By isolating and iterating distinct planning and answering phases, RPG offers a robust solution to the prevalent issue of off-topic content, especially in knowledge-intensive tasks. This approach not only enhances response accuracy but also supports the generation of comprehensive and contextually grounded answers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuanjie Lyu (4 papers)
  2. Zihan Niu (5 papers)
  3. Zheyong Xie (5 papers)
  4. Chao Zhang (907 papers)
  5. Tong Xu (113 papers)
  6. Yang Wang (670 papers)
  7. Enhong Chen (242 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com