Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Plan and Generate Text with Citations (2404.03381v3)

Published 4 Apr 2024 in cs.CL

Abstract: The increasing demand for the deployment of LLMs in information-seeking scenarios has spurred efforts in creating verifiable systems, which generate responses to queries along with supporting evidence. In this paper, we explore the attribution capabilities of plan-based models which have been recently shown to improve the faithfulness, grounding, and controllability of generated text. We conceptualize plans as a sequence of questions which serve as blueprints of the generated content and its organization. We propose two attribution models that utilize different variants of blueprints, an abstractive model where questions are generated from scratch, and an extractive model where questions are copied from the input. Experiments on long-form question-answering show that planning consistently improves attribution quality. Moreover, the citations generated by blueprint models are more accurate compared to those obtained from LLM-based pipelines lacking a planning component.

Exploring Attribution in Plan-Based Models for Text Generation with Citations

Introduction to Attribution in Text Generation

Recent advancements in generative AI have presented new challenges and opportunities in the development of verifiable systems capable of producing text alongside supporting evidence. This research focuses on enhancing the generation of long-form responses to queries by integrating attribution mechanisms into the plan-based models of text generation.

The Core Challenges

Two primary challenges are addressed:

  1. Attribution Quality: How can models produce responses with high-quality citations that are factually accurate and faithfully represented?
  2. Plan-Based Text Generation: How can blueprint plans, conceptualized as sequences of questions, improve the structure, faithfulness, and citation accuracy of the generated content?

Methodology and Models

The paper introduces models based on two blueprint strategies:

  • Abstractive Blueprint Models, where generated questions form a structured plan to guide the content generation process.
  • Extractive Blueprint Models, which construct blueprints by selecting relevant questions directly from the input data.

Both models were compared against baseline systems without planning components, evaluating their effectiveness in terms of output quality and attribution accuracy.

Key Findings and Results

The research reveals that blueprint models consistently improve both the quality of generated content and the accuracy of citations. Notably, the extractive blueprint model exhibits significant advancements in summary quality, suggesting a robust approach to integrating planning and attribution mechanisms.

Quantitative Analysis shows:

  • An improvement in ROUGE-L scores, indicating better content relevance and structure.
  • Higher ANLI scores, reflecting enhanced factual consistency and faithfulness.
  • Superior attribution quality, as evidenced by improved AutoAIS metrics.

Implications and Future Directions

This paper underscores the potential of blueprint models in fostering more faithful and verifiable text generation systems. The findings suggest that planning mechanisms not only aid in structuring generated content but also play a crucial role in enhancing citation accuracy.

Practical Implications include:

  • The utilization of blueprint models in information retrieval and summarization tasks, especially those requiring verifiable sources.
  • Improvement in user trust towards AI-generated content through transparent attribution.

Theoretical Implications involve:

  • Validation of the hypothesis that explicit content planning can lead to improved generation fidelity and source attribution.
  • Demonstration of the transferability of attribution skills across different information-seeking tasks and domains.

Looking ahead, further research could explore the integration of blueprint models with larger and more complex datasets, expanding their applicability and understanding of their limitations. Additionally, future work might delve into the dynamics between different blueprint strategies and their impact on the diversity and comprehensiveness of generated content.

Conclusion

This research marks a significant step towards developing text generation models that not only produce coherent and relevant responses but also attribute their sources accurately. By leveraging blueprint plans, it opens new avenues for improving the reliability and trustworthiness of AI-generated content, addressing critical challenges in the field of generative AI and information verification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. SMART: Sentences as basic units for text evaluation. In The Eleventh International Conference on Learning Representations.
  2. Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Mapping the design space of human-AI interaction in text summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 431–455, Seattle, United States. Association for Computational Linguistics.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Ranking generated summaries by correctness: An interesting but challenging application for natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2214–2220, Florence, Italy. Association for Computational Linguistics.
  8. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567, Florence, Italy. Association for Computational Linguistics.
  9. Attributed text generation via post-hoc research and revision. arXiv preprint arXiv:2210.08726.
  10. Enabling large language models to generate text with citations. arXiv preprint arXiv:2305.14627.
  11. News summarization and evaluation in the era of gpt-3.
  12. LongT5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 724–736, Seattle, United States. Association for Computational Linguistics.
  13. TRUE: Re-evaluating factual consistency evaluation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3905–3920, Seattle, United States. Association for Computational Linguistics.
  14. TRUE: Re-evaluating factual consistency evaluation. In Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 161–175, Dublin, Ireland. Association for Computational Linguistics.
  15. Zero-shot retrieval with search agents and hybrid environments. arXiv preprint arXiv:2209.15469.
  16. Text-blueprint: An interactive platform for plan-based conditional generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 105–116, Dubrovnik, Croatia. Association for Computational Linguistics.
  17. Aquamuse: Automatically generating datasets for query-based multi-document summarization. arXiv preprint arXiv:2010.12694.
  18. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  19. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics.
  20. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  21. Evaluating verifiability in generative search engines.
  22. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
  23. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147.
  24. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
  25. Step-by-step: Separating planning from realization in neural data-to-text generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2267–2277, Minneapolis, Minnesota. Association for Computational Linguistics.
  26. WebGPT: Browser-assisted question-answering with human feedback. CoRR, abs/2112.09332.
  27. Conditional Generation with a Question-Answering Blueprint. Transactions of the Association for Computational Linguistics, 11:974–996.
  28. A well-composed text is half done! composition sampling for diverse conditional generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1319–1339, Dublin, Ireland. Association for Computational Linguistics.
  29. Planning with learned entity prompts for abstractive summarization. Transactions of the Association for Computational Linguistics, 9:1475–1492.
  30. Ms marco: A human generated machine reading comprehension dataset.
  31. Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online. Association for Computational Linguistics.
  32. OpenAI. 2023. Gpt-4 technical report.
  33. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  34. The web is your oyster - knowledge-intensive NLP against a very large web corpus. CoRR, abs/2112.09924.
  35. Data-to-text generation with content selection and planning. In Proceedings of the AAAI conference on artificial intelligence, pages 6908–6915.
  36. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  38. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  39. Measuring attribution in natural language generation models. arXiv preprint arXiv:2112.12870.
  40. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  41. Qampari:: An open-domain question answering benchmark for questions with many answers from multiple paragraphs. arXiv preprint arXiv:2205.12665.
  42. ASQA: Factoid questions meet long-form answers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8273–8288, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  43. Lamda: Language models for dialog applications. CoRR, abs/2201.08239.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  45. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc.
  46. Exploring neural models for query-focused summarization. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1455–1468, Seattle, United States. Association for Computational Linguistics.
  47. Yumo Xu and Mirella Lapata. 2020. Coarse-to-fine query focused multi-document summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3632–3645, Online. Association for Computational Linguistics.
  48. Yumo Xu and Mirella Lapata. 2022. Document summarization with latent queries. Transactions of the Association for Computational Linguistics, 10:623–638.
  49. Benchmarking large language models for news summarization.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Constanza Fierro (11 papers)
  2. Reinald Kim Amplayo (28 papers)
  3. Fantine Huot (19 papers)
  4. Nicola De Cao (21 papers)
  5. Joshua Maynez (28 papers)
  6. Shashi Narayan (35 papers)
  7. Mirella Lapata (135 papers)
Citations (13)