Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reliable, Adaptable, and Attributable Language Models with Retrieval (2403.03187v1)

Published 5 Mar 2024 in cs.CL, cs.AI, and cs.LG
Reliable, Adaptable, and Attributable Language Models with Retrieval

Abstract: Parametric LLMs (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability. In this position paper, we advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs. By incorporating large-scale datastores during inference, retrieval-augmented LMs can be more reliable, adaptable, and attributable. Despite their potential, retrieval-augmented LMs have yet to be widely adopted due to several obstacles: specifically, current retrieval-augmented LMs struggle to leverage helpful text beyond knowledge-intensive tasks such as question answering, have limited interaction between retrieval and LM components, and lack the infrastructure for scaling. To address these, we propose a roadmap for developing general-purpose retrieval-augmented LMs. This involves a reconsideration of datastores and retrievers, the exploration of pipelines with improved retriever-LM interaction, and significant investment in infrastructure for efficient training and inference.

This paper (Asai et al., 5 Mar 2024 ) advocates for Retrieval-Augmented LLMs (RALMs) as the successor to purely parametric LLMs (LMs), arguing that RALMs offer significant advantages in reliability, adaptability, and attributability, which are key challenges for modern LMs like GPT-4.

The authors begin by outlining the fundamental weaknesses of parametric LMs:

  • W1: Factual inaccuracies (Hallucinations): Parametric LMs struggle to store all knowledge, especially long-tail facts, leading to errors.
  • W2: Difficulty of verifications: Outputs lack clear attributions, making fact-checking difficult.
  • W3: Difficulty of opting out data: Removing specific data (due to privacy or copyright) from the training corpus is hard, and tracing output provenance is complex.
  • W4: Computationally expensive adaptation: Updating parametric LMs to new data or domains requires costly re-training or fine-tuning.
  • W5: Prohibitively large model size: While scaling improves performance, it leads to immense computational costs and resource requirements.

Retrieval-augmented LMs, in contrast, maintain an external datastore and retrieve relevant information during inference to condition their output. This approach is presented as a way to mitigate the weaknesses of parametric LMs:

  • W1: Reduced factual errors: By accessing external knowledge, RALMs can explicitly retrieve and incorporate long-tail facts, reducing hallucinations.
  • W2: Better attributions: The retrieved documents can be provided as sources for the generated text, improving verifiability.
  • W3: Flexible opt-in/out: Managing the datastore is simpler than managing a massive training corpus, allowing easier control over included information.
  • W4: Adaptability & customizability: Changing or updating the datastore allows easy adaptation to new domains or real-time knowledge without costly LM retraining.
  • W5: Parameter efficiency: RALMs can potentially achieve similar or better performance with smaller LM parameters by leveraging external knowledge, reducing the burden of memorization.

Despite their potential, the paper notes that RALMs haven't achieved widespread adoption beyond specific knowledge-intensive tasks like Question Answering. The authors identify three main obstacles:

  • C1: Limitations of retrievers and datastores: Current retrieval often relies on semantic or lexical similarity, which may not find helpful text for diverse tasks (e.g., reasoning) where different types of relationships are needed. Over-reliance on general datastores like Wikipedia also limits effectiveness in specialized domains.
  • C2: Limited interactions between retrievers and LMs: Simple approaches like prepending retrieved text to the input (common in RAG) lead to shallow interactions. This can result in unsupported generations, susceptibility to irrelevant context, and difficulties in integrating information from multiple documents. Input augmentation also increases context length, raising inference costs.
  • C3: Lack of infrastructure for scaling: Unlike parametric LMs, training and inference for RALMs, especially with massive datastores, lack standardized, efficient infrastructure. Updating large indices during training is computationally expensive, and large-scale nearest neighbor search for inference requires significant resources.

To address these challenges and promote the broader adoption of RALMs, the paper proposes a roadmap:

  1. Rethinking Retrieval and the Datastore (Addressing C1):
    • Beyond semantic and lexical similarity: Develop new definitions of "relevance" and retrieval methods that can find helpful text for diverse tasks beyond just factual knowledge, potentially by incorporating contextual information or task-specific signals (e.g., instruction-tuned retrievers).
    • Reconsidering and improving the datastore: Explore strategies for curating, filtering, and composing datastores beyond general corpora like Wikipedia. This includes balancing multiple domains and ensuring data quality.
  2. Enhancing Retriever-LM Interactions (Addressing C2):
    • New architectures beyond input augmentation: Investigate more integrated architectures like intermediate fusion (e.g., RETRO) or output interpolation (e.g., kNN LM, NPM) that allow deeper interaction between the retriever and LM components throughout generation. This may require significant pre-training efforts.
    • Incorporating retrieval during LM pre-training: Train LMs with retrieval from the outset to make them better at leveraging retrieved context, potentially with minimal architectural changes.
    • Further adaptation after pre-training: Explore post-pre-training adaptation methods like instruction tuning or RLHF specifically designed for RALMs to improve their ability to utilize retrieved information and avoid unsupported generations. Techniques to filter or adaptively use retrieved context are also promising.
    • Efficient end-to-end training: Develop training strategies that jointly optimize both the retriever and the LM, potentially without direct supervision on retrieved documents, to improve overall pipeline performance and reduce retrieval errors.
  3. Building Better Systems and Infrastructures for Scaling and Adaptation (Addressing C3):
    • Scalable search for massive-scale datastores: Invest in research and development for efficient nearest neighbor search algorithms, data compression (e.g., binary embeddings), quantization, and data loading techniques for datastores scaled to trillions of tokens. This requires interdisciplinary efforts involving systems and hardware design.
    • Standardization and open-source developments: Develop standardized libraries, frameworks (beyond basic RAG interfaces like LangChain or LlamaIndex), and evaluation benchmarks specifically for RALMs to facilitate research, implementation, and adoption across different architectures and training configurations.

In summary, the paper posits that RALMs represent a promising path forward for more robust and flexible LLMs. However, realizing this potential requires fundamental advancements in defining and finding relevant information, designing architectures that enable deep interactions between retrieval and generation, and building scalable infrastructure for training and inference with massive datastores. These challenges necessitate collaborative, interdisciplinary research efforts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (142)
  1. Challenges in information-seeking QA: Unanswerable questions and paragraph retrieval. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021. URL https://aclanthology.org/2021.acl-long.118.
  2. One question answering model for many languages with cross-lingual dense passage retrieval. In Advances in Neural Information Processing Systems, 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/3df07fdae1ab273a967aaa1d355b8bb6-Paper.pdf.
  3. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Tutorial), 2023a. URL https://aclanthology.org/2023.acl-tutorials.6.
  4. Task-aware retrieval with instructions. In Findings of the Association for Computational Linguistics: ACL 2023, 2023b. URL https://aclanthology.org/2023.findings-acl.225.
  5. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hSyW5go0v8.
  6. Llemma: An open language model for mathematics. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=4WnqRR915j.
  7. Can retriever-augmented language models reason? the blame game between the retriever and the language model. In Bouamor, H., Pino, J., and Bali, K. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.findings-emnlp.1036.
  8. GPT-NeoX-20B: An open-source autoregressive language model. In Fan, A., Ilic, S., Wolf, T., and Gallé, M. (eds.), Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://aclanthology.org/2022.bigscience-1.9.
  9. Attributed question answering: Evaluation and modeling for attributed large language models. arXiv preprint arXiv:2212.08037, 2022. URL https://arxiv.org/abs/2212.08037.
  10. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning, 2022. URL https://proceedings.mlr.press/v162/borgeaud22a.html.
  11. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022. URL https://dl.acm.org/doi/fullHtml/10.1145/3531146.3534642.
  12. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  13. BTR: Binary token representations for efficient retrieval augmented language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=3TO3TtnOFl.
  14. Extracting training data from large language models. In 30th USENIX Security Symposium, 2021. URL https://arxiv.org/abs/2012.07805.
  15. Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=TatRHT_1cK.
  16. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. URL https://aclanthology.org/P17-1171.
  17. Dense X Retrieval: What retrieval granularity should we use? arXiv preprint arXiv:2312.06648, 2023a. URL https://arxiv.org/abs/2312.06648.
  18. MuRAG: Multimodal retrieval-augmented generator for open question answering over images and text. In Goldberg, Y., Kozareva, Z., and Zhang, Y. (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://aclanthology.org/2022.emnlp-main.375.
  19. MEDITRON-70B: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079, 2023b. URL https://arxiv.org/abs/2311.16079.
  20. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022. URL https://arxiv.org/abs/2204.02311.
  21. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=H4DqfPSibmx.
  22. Editing factual knowledge in language models. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. URL https://aclanthology.org/2021.emnlp-main.522/.
  23. Mention memory: incorporating textual knowledge into transformers through entity mention attention. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=OY1A8ejQgEX.
  24. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. URL https://aclanthology.org/N19-1423.
  25. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021. doi: 10.18653/v1/2021.emnlp-main.98. URL https://aclanthology.org/2021.emnlp-main.98.
  26. The faiss library. arXiv preprint arXiv:2401.08281, 2024. URL https://arxiv.org/abs/1702.08734.
  27. You can’t pick your neighbors, or can you? when and how to rely on retrieval in the kNN-LM. In Findings of the Association for Computational Linguistics: EMNLP 2022, 2022. URL https://aclanthology.org/2022.findings-emnlp.218.
  28. Alpacafarm: A simulation framework for methods that learn from human feedback. arXiv preprint arXiv:2305.14387, 2023. URL https://arxiv.org/abs/2305.14387.
  29. ELI5: Long form question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/P19-1346.
  30. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. URL https://arxiv.org/abs/2101.00027.
  31. Enabling large language models to generate text with citations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://aclanthology.org/2023.emnlp-main.398.
  32. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023b. URL https://arxiv.org/abs/2312.10997.
  33. Olmo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838, 2024. URL https://arxiv.org/abs/2402.00838.
  34. Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296, 2023. URL https://arxiv.org/abs/2308.03296.
  35. Search engine guided neural machine translation. In AAAI Conference on Artificial Intelligence, 2018. URL https://api.semanticscholar.org/CorpusID:19206366.
  36. The false promise of imitating proprietary language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Kz3yckpCN5.
  37. Rag vs fine-tuning: Pipelines, tradeoffs, and a case study on agriculture. arXiv preprint arXiv:2401.08406, 2024. URL https://arxiv.org/abs/2401.08406.
  38. Retrieval augmented language model pre-training. In International Conference on Machine Learning, 2020. URL https://dl.acm.org/doi/pdf/10.5555/3524938.3525306.
  39. Retrieval-based neural code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018. URL https://aclanthology.org/D18-1111.
  40. Rest: Retrieval-based speculative decoding. arXiv preprint arXiv:2311.08252, 2023. URL https://arxiv.org/abs/2311.08252.
  41. Foundation models and fair use. arXiv preprint arXiv:2303.15715, 2023. URL https://arxiv.org/abs/2303.15715.
  42. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021a. URL https://aclanthology.org/2021.eacl-main.74.
  43. Distilling knowledge from reader to retriever for question answering. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=NTEz-6wysdb.
  44. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=jKN1pXi7b0.
  45. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 2023. URL http://jmlr.org/papers/v24/23-0037.html.
  46. Knowledge unlearning for mitigating privacy risks in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.acl-long.805.
  47. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://aclanthology.org/2023.emnlp-main.495.
  48. Lifelong pretraining: Continually adapting language models to emerging corpora. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022. URL https://aclanthology.org/2022.naacl-main.351.
  49. Billion-scale similarity search with gpus. arXiv preprint arXiv:1702.08734, 2017. URL https://arxiv.org/abs/1702.08734.
  50. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, 2022a. URL https://proceedings.mlr.press/v202/kandpal23a/kandpal23a.pdf.
  51. Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning, 2022b.
  52. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020. URL https://aclanthology.org/2020.emnlp-main.550.
  53. Realtime qa: What’s the answer right now? In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://arxiv.org/abs/2207.13332.
  54. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HklBjCEKvH.
  55. Nearest neighbor machine translation. In International Conference on Learning Representations, 2021. URL https://arxiv.org/abs/2010.00710.
  56. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020. URL https://doi.org/10.1145/3397271.3401075.
  57. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=sY5N0zY5Od.
  58. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/Q19-1026.
  59. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023. URL https://arxiv.org/abs/2309.06180.
  60. Copy is all you need. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=CROlOA9Nd8C.
  61. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. URL https://aclanthology.org/P19-1612.
  62. Talkin”bout ai generation: Copyright and the generative-ai supply chain. Forthcoming, Journal of the Copyright Societ, 2024. URL https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4523551.
  63. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. 2020a. URL https://aclanthology.org/2020.acl-main.703.
  64. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, 2020b. URL https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
  65. Overcoming catastrophic forgetting during domain adaptation of seq2seq language generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022. URL https://aclanthology.org/2022.naacl-main.398.
  66. How to train your dragon: Diverse augmentation towards generalizable dense retrieval. In Bouamor, H., Pino, J., and Bali, K. (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  6385–6400, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.423. URL https://aclanthology.org/2023.findings-emnlp.423.
  67. RA-DIT: Retrieval-augmented dual instruction tuning. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=22OTbutug9.
  68. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 2023. URL https://arxiv.org/abs/2307.03172.
  69. RoBERTa: A robustly optimized BERT pretraining approach, 2020. URL https://openreview.net/forum?id=SyxS0T4tvS.
  70. A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169, 2023. URL https://arxiv.org/abs/2305.13169.
  71. Sail: Search-augmented instruction learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023. URL https://aclanthology.org/2023.findings-emnlp.242.
  72. Z-ICL: Zero-shot in-context learning with pseudo-demonstrations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.acl-long.129.
  73. Expertqa: Expert-curated questions and attributed answers. ArXiv, abs/2309.07852, 2023. URL https://api.semanticscholar.org/CorpusID:261823130.
  74. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.acl-long.546.
  75. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251, 2023a. URL https://arxiv.org/abs/2305.14251.
  76. Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023b. Association for Computational Linguistics. URL https://aclanthology.org/2023.findings-acl.132.
  77. SILO language models: Isolating legal risk in a nonparametric datastore. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ruk0nyQPec.
  78. Fine-grained hallucinations detections. arXiv preprint, 2024. URL https://arxiv.org/abs/2401.06855.
  79. Memory-based model editing at scale. In Proceedings of the 39th International Conference on Machine Learning, 2022. URL https://proceedings.mlr.press/v162/mitchell22a.html.
  80. Generative representational instruction tuning. arXiv preprint arXiv:2402.09906, 2024. URL https://arxiv.org/abs/2402.09906.
  81. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021. URL https://arxiv.org/abs/2112.09332.
  82. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://aclanthology.org/2022.emnlp-main.669.
  83. Cross-lingual retrieval augmented prompt for low-resource languages. In Findings of the Association for Computational Linguistics: ACL 2023, 2023. URL https://aclanthology.org/2023.findings-acl.528.
  84. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. URL https://arxiv.org/abs/2303.08774.
  85. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=TG8KACxEON.
  86. Fine-tuning or retrieval? comparing knowledge injection in llms. arXiv preprint arXiv:2312.05934, 2023. URL https://arxiv.org/abs/2312.05934.
  87. Red teaming language models with language models. arXiv preprint arXiv:2202.03286, 2022. URL https://arxiv.org/abs/2202.03286.
  88. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018. URL https://aclanthology.org/N18-1202.
  89. Measuring and narrowing the compositionality gap in language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023. URL https://aclanthology.org/2023.findings-emnlp.378.
  90. Improving language understanding by generative pre-training. 2018. URL https://openai.com/research/language-unsupervised.
  91. Language models are unsupervised multitask learners, 2019. URL https://openai.com/research/better-language-models.
  92. Scaling language models: Methods, analysis & insights from training gopher. ArXiv, abs/2112.11446, 2021. URL https://api.semanticscholar.org/CorpusID:245353475.
  93. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  94. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 2023. URL https://arxiv.org/abs/2302.00083.
  95. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020. URL https://doi.org/10.1145/3394486.3406703.
  96. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 2009. URL https://api.semanticscholar.org/CorpusID:207178704.
  97. Long-range language modeling with self-retrieval. arXiv preprint arXiv:2306.13421, 2023. URL https://arxiv.org/abs/2306.13421.
  98. Learning to retrieve prompts for in-context learning. In Carpuat, M., de Marneffe, M.-C., and Meza Ruiz, I. V. (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022. URL https://aclanthology.org/2022.naacl-main.191.
  99. Green ai; 2019. arXiv preprint arXiv:1907.10597, 2019. URL https://arxiv.org/abs/1907.10597.
  100. Retrieval-based language models using a multi-domain datastore. In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2023. URL https://openreview.net/forum?id=5ck1WQ4yW4.
  101. Large language models can be easily distracted by irrelevant context. In Proceedings of the 40th International Conference on Machine Learning, 2023a. URL https://proceedings.mlr.press/v202/shi23a.html.
  102. Nearest neighbor zero-shot inference. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://arxiv.org/abs/2205.13792.
  103. Trusting your evidence: Hallucinate less with context-aware decoding. arXiv preprint arXiv:2305.14739, 2023b. URL https://arxiv.org/abs/2305.14739.
  104. REPLUG: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652, 2023c. URL https://arxiv.org/abs/2301.12652.
  105. In-context pretraining: Language modeling beyond document boundaries. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=LXVswInHOo.
  106. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021. URL https://aclanthology.org/2021.findings-emnlp.320.pdf.
  107. End-to-end training of multi-document reader and retriever for open-domain question answering. In Advances in Neural Information Processing Systems, 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/da3fde159d754a2555eaa198d2d105b2-Paper.pdf.
  108. Large language models encode clinical knowledge. Nature, 2023. URL https://www.nature.com/articles/s41586-023-06291-2.
  109. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. Association for Computational Linguistics. URL https://aclanthology.org/P19-1355.
  110. Selective annotation makes language models better few-shot learners. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=qY1hlv7gwg.
  111. One embedder, any task: Instruction-finetuned text embeddings. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023b. Association for Computational Linguistics. URL https://aclanthology.org/2023.findings-acl.71.
  112. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085, 2022. URL https://arxiv.org/abs/2211.09085.
  113. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018. URL https://aclanthology.org/N18-1074.
  114. Learning to infer and execute 3d shape programs. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rylNH20qFQ.
  115. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a. URL https://arxiv.org/abs/2302.13971.
  116. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b. URL https://arxiv.org/abs/2307.09288.
  117. Attention is all you need. In Advances in Neural Information Processing Systems, 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  118. Instructretro: Instruction tuning post retrieval-augmented pretraining. arXiv preprint arXiv:2310.07713, 2023a. URL https://arxiv.org/abs/2310.07713.
  119. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023b. URL https://aclanthology.org/2023.emnlp-main.482.
  120. A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proceedings of the VLDB Endowment, 2021. URL https://doi.org/10.14778/3476249.3476255.
  121. How far can camels go? exploring the state of instruction tuning on open resources. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023c. URL https://openreview.net/forum?id=w4zZNC4ZaV.
  122. Retrieval-based controllable molecule generation. In The Eleventh International Conference on Learning Representations, 2023d. URL https://openreview.net/forum?id=vDFA1tpuLvk.
  123. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.
  124. Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022. URL https://dl.acm.org/doi/10.1145/3531146.3533088.
  125. Naturalprover: Grounded mathematical proof generation with language models. Advances in Neural Information Processing Systems, 35:4913–4927, 2022.
  126. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022. URL https://arxiv.org/abs/2211.05100.
  127. Memorizing transformers. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TrjbxzRcnf-.
  128. RECOMP: Improving retrieval-augmented LMs with context compression and selective augmentation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=mlJLVigNHp.
  129. Efficient passage retrieval with hashing for open-domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.acl-short.123.
  130. Leandojo: Theorem proving with retrieval-augmented language models. In Advances in neural information processing systems, 2023. URL https://arxiv.org/abs/2306.15626.
  131. ReAct: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYUL-X.
  132. Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561, 2022. URL https://arxiv.org/abs/2211.12561.
  133. Adaptive Semiparametric Language Models. Transactions of the Association for Computational Linguistics, 2021. URL https://doi.org/10.1162/tacl_a_00371.
  134. Making retrieval-augmented language models robust to irrelevant context. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ZS4m74kZpH.
  135. Retrieval augmentation for commonsense reasoning: A unified approach. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://aclanthology.org/2022.emnlp-main.294.
  136. Automatic evaluation of attribution by large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023. URL https://aclanthology.org/2023.findings-emnlp.307.
  137. Distilling and retrieving generalizable knowledge for robot manipulation via language corrections. arXiv preprint arXiv:2311.10678, 2023. URL https://arxiv.org/abs/2311.10678.
  138. Dense text retrieval based on pretrained language models: A survey. 2023a. URL https://doi.org/10.1145/3637870.
  139. Pytorch fsdp: Experiences on scaling fully sharded data parallel. Proc. VLDB Endow., 2023b. URL https://doi.org/10.14778/3611540.3611569.
  140. Training language models with memory augmentation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://arxiv.org/abs/2205.12674.
  141. MQuAKE: Assessing knowledge editing in language models via multi-hop questions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://aclanthology.org/2023.emnlp-main.971.
  142. Docprompting: Generating code by retrieving the docs. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ZTCxT2t2Ru.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Akari Asai (35 papers)
  2. Zexuan Zhong (17 papers)
  3. Danqi Chen (84 papers)
  4. Pang Wei Koh (64 papers)
  5. Luke Zettlemoyer (225 papers)
  6. Hannaneh Hajishirzi (176 papers)
  7. Wen-tau Yih (84 papers)
Citations (37)