Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering (2403.05217v1)
Abstract: Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs LLMs to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and TriviaQA) demonstrate that LLMQA achieves the best performance in terms of both answer accuracy and evidence quality, showcasing its potential for advancing ODQA research and applications.
- Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279 (2022).
- Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037 (2022).
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
- Reading Wikipedia to answer open-domain questions. In 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017. Association for Computational Linguistics (ACL), 1870–1879.
- UnitedQA: A Hybrid Approach for Open Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 3080–3090. https://doi.org/10.18653/v1/2021.acl-long.240
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) (2023).
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering. arXiv preprint arXiv:2305.17080 (2023).
- Commonsense knowledge mining from pretrained models. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 1173–1178.
- Product-aware answer generation in e-commerce question-answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 429–437.
- Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum. In AAAI.
- Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 (2020).
- Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
- Unsupervised Dense Information Retrieval with Contrastive Learning. https://doi.org/10.48550/ARXIV.2112.09118
- Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 874–880. https://doi.org/10.18653/v1/2021.eacl-main.74
- Evaluating Open-Domain Question Answering in the Era of Large Language Models. arXiv preprint arXiv:2305.06984 (2023).
- Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).
- Relevance-guided supervision for openqa with colbert. Transactions of the association for computational linguistics 9 (2021), 929–944.
- Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300 (2019).
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602 (2021).
- GPT understands, too. arXiv preprint arXiv:2103.10385 (2021).
- Zero-Shot Listwise Document Reranking with a Large Language Model. arXiv preprint arXiv:2305.02156 (2023).
- Large language model is not a good few-shot information extractor, but a good reranker for hard samples! arXiv preprint arXiv:2303.08559 (2023).
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651 (2023).
- Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553 (2020).
- Neurips 2020 efficientqa competition: Systems, analyses and lessons learned. In NeurIPS 2020 Competition and Demonstration Track. PMLR, 86–111.
- Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- KILT: a benchmark for knowledge intensive language tasks. arXiv preprint arXiv:2009.02252 (2020).
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066 (2019).
- Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023).
- RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2010.08191 (2020).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
- How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 5418–5426. https://doi.org/10.18653/v1/2020.emnlp-main.437
- The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
- Questions are all you need to train a dense passage retriever. Transactions of the Association for Computational Linguistics 11 (2023), 600–616.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4222–4235.
- Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv preprint arXiv:2303.11366 (2023).
- Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference. arXiv preprint arXiv:2306.12509 (2023).
- BeamSearchQA: Large Language Models are Strong Zero-Shot QA Solver. arXiv preprint arXiv:2305.14766 (2023).
- From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models. arXiv preprint arXiv:2310.18659 (2023).
- Evaluating open question answering evaluation. arXiv preprint arXiv:2305.12421 (2023).
- Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561 (2022).
- Generate rather than retrieve: Large language models are strong context generators. In International Conference for Learning Representation (ICLR).
- Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems 34 (2021), 27263–27277.
- Automatic evaluation of attribution by large language models. arXiv preprint arXiv:2305.06311 (2023).
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022).
- Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774 (2021).
- Hongda Sun (10 papers)
- Yuxuan Liu (96 papers)
- Chengwei Wu (7 papers)
- Haiyu Yan (2 papers)
- Cheng Tai (8 papers)
- Xin Gao (208 papers)
- Shuo Shang (30 papers)
- Rui Yan (250 papers)