Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation
The paper entitled "Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation" proposes a novel framework named RPG (Retrieve-Plan-Generation) designed to improve the performance of LLMs in knowledge-intensive tasks. LLMs have made significant strides in various applications, yet they are often hampered by factual inaccuracies when generating responses due to limited internal knowledge. The RAG (Retrieval-Augmented Generation) system addresses some of these concerns by incorporating external knowledge sources. However, including entire retrieved documents can introduce off-topic information, which may dilute response relevance and coherence.
Proposed Solution
The RPG framework provides a two-stage process: the Plan Stage and the Answer Stage. This iterative approach ensures that:
- Plan Tokens are generated to guide the structure and content of the subsequent output.
- Fine-Grained Paragraphs are selected based on the current plan token to generate targeted answers.
By iterating between planning and answering, the RPG framework maintains the relevance and topical coherence of the generated content. This approach attempts to remedy the issues of focus shift and off-topic influences inherent in the traditional RAG methods.
Implementation Method
The RPG framework leverages an efficient multi-task prompt tuning method, which allows existing LLMs to be fine-tuned for both planning and answering tasks without needing extensive modifications to the model architecture. The training involves:
- Multi-Task Prompt Tuning: This technique ensures the model can handle both planning and answering by learning task-specific prompts while keeping the LLM largely unchanged.
- Incremental Planning: During inference, the model alternates generating plan tokens and answers, iteratively refining the output until completion.
Evaluation and Results
The authors evaluated RPG across five tasks, including long-form generation (ASQA, ELI5), multi-hop reasoning (2WikiMultiHopQA), and short-form question answering (PopQA, PubHealth). Key metrics such as ROUGE, MAUVE, and F1 scores were used to measure the performance.
Long-Form Generation:
- The RPG model demonstrated significant improvements over traditional RAG and even instruction-tuned models like Alpaca and Llama2 in the ASQA and ELI5 datasets, particularly in metrics like ROUGE and MAUVE.
Multi-Hop Generation:
- On the 2WikiMultiHopQA dataset, RPG outperformed other RAG-based models and approached the performance of top-tier models such as ChatGPT equipped with dynamic retrieval strategies.
Short-Form Generation:
- While not the primary focus, RPG still excelled in short-form QA tasks, often outperforming other retrieval-augmented models by maintaining focused and relevant responses.
Implications and Future Directions
The RPG framework sets a precedent for incorporating explicit pre-planning in LLMs, effectively bolstering relevance and adherence to the topic in lengthy generation tasks. Practical implications include enhanced capabilities for applications demanding detailed and contextually appropriate responses, such as academic writing, report generation, and customer support.
Future directions could involve:
- Scaling RPG to Larger LLMs: Given resource constraints, the current implementation focuses on models up to Llama2\textsubscript{7B}. Extending this to 13B or 70B models could unlock further performance enhancements.
- Extending Training Data: The dataset size used in this paper, reconstructed to about 50k instances, suggests that increasing the size and variety of training data could further strengthen RPG’s capabilities.
By isolating and iterating distinct planning and answering phases, RPG offers a robust solution to the prevalent issue of off-topic content, especially in knowledge-intensive tasks. This approach not only enhances response accuracy but also supports the generation of comprehensive and contextually grounded answers.