Unveiling the Magic: Investigating Attention Distillation in Retrieval-augmented Generation (2402.11794v1)
Abstract: Retrieval-augmented generation framework can address the limitations of LLMs by enabling real-time knowledge updates for more accurate answers. An efficient way in the training phase of retrieval-augmented models is attention distillation, which uses attention scores as a supervision signal instead of manually annotated query-document pairs. Despite its growing popularity, the detailed mechanisms behind the success of attention distillation remain unexplored, particularly the specific patterns it leverages to benefit training. In this paper, we address this gap by conducting a comprehensive review of attention distillation workflow and identifying key factors influencing the learning quality of retrieval-augmented LLMs. We further propose indicators for optimizing models' training methods and avoiding ineffective training.
- Can retriever-augmented language models reason? the blame game between the retriever and the language model. arXiv preprint arXiv:2212.09146.
- Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
- Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 113–122.
- A survey of knowledge enhanced pre-trained language models. IEEE Transactions on Knowledge and Data Engineering.
- Unsupervised dense information retrieval with contrastive learning.
- Gautier Izacard and Edouard Grave. 2020a. Distilling knowledge from reader to retriever for question answering. arXiv preprint arXiv:2012.04584.
- Gautier Izacard and Edouard Grave. 2020b. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282.
- Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
- Active retrieval augmented generation. arXiv preprint arXiv:2305.06983.
- TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR.
- Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
- Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958.
- Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40.
- Scalable extraction of training data from (production) language models.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
- In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
- knn-prompt: Nearest neighbor zero-shot inference, 2022b. URL https://arxiv. org/abs/2205.13792.
- Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
- Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
- Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063.
- Extractive summarization via ChatGPT for faithful summary generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3270–3278, Singapore. Association for Computational Linguistics.
- SummIt: Iterative text summarization via ChatGPT. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10644–10657, Singapore. Association for Computational Linguistics.
- Improving the faithfulness of abstractive summarization via entity coverage control. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 528–535, Seattle, United States. Association for Computational Linguistics.
- How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534.
- Zizhong Li (9 papers)
- Haopeng Zhang (32 papers)
- Jiawei Zhang (529 papers)