Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptation Approaches for Nearest Neighbor Language Models (2211.07828v2)

Published 15 Nov 2022 in cs.CL

Abstract: Semi-parametric Nearest Neighbor LLMs ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adapting the underlying LM (using Adapters), 2) expanding neighborhood retrieval over an additional adaptation datastore, and 3) adapting the weights (scores) of retrieved neighbors using a learned Rescorer module. We study each adaptation strategy separately, as well as the combined performance improvement through ablation experiments and an extensive set of evaluations run over seven adaptation domains. Our combined adaptation approach consistently outperforms purely parametric adaptation and zero-shot ($k$NN-LM) baselines that construct datastores from the adaptation data. On average, we see perplexity improvements of 17.1% and 16% for these respective baselines, across domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. ArXiv, abs/2106.10199.
  2. A neural probabilistic language model. Advances in neural information processing systems, 13.
  3. Improving language models by retrieving from trillions of tokens. In ICML.
  4. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 615–621, New Orleans, Louisiana. Association for Computational Linguistics.
  5. Wikimedia Foundation. Wikimedia downloads.
  6. Re2g: Retrieve, rerank, generate. ArXiv, abs/2207.06300.
  7. SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China. Association for Computational Linguistics.
  8. XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online. Association for Computational Linguistics.
  9. Efficient nearest neighbor language models. arXiv preprint arXiv:2109.04212.
  10. Meta-learning the difference: Preparing large language models for efficient adaptation. ArXiv, abs/2207.03509.
  11. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  12. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685.
  13. Efficient attentions for long document summarization.
  14. Matt Gardner Johannes Welbl, Nelson F. Liu. 2017. Crowdsourcing multiple choice science questions.
  15. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
  16. Nearest neighbor machine translation. ArXiv, abs/2010.00710.
  17. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
  18. Booksum: A collection of datasets for long-form narrative summarization.
  19. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. ArXiv, abs/2205.05638.
  20. Recurrent neural network based language model. In Interspeech, volume 2, pages 1045–1048. Makuhari.
  21. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745.
  22. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with bert. ArXiv, abs/1901.04085.
  23. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  24. Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779.
  25. Improving language understanding by generative pre-training.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  27. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
  28. Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. George Polovets (5 papers)
  2. Monica Sunkara (20 papers)
  3. Rishabh Bhardwaj (30 papers)
Citations (6)