Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Set the Clock: Temporal Alignment of Pretrained Language Models (2402.16797v2)

Published 26 Feb 2024 in cs.CL

Abstract: LLMs (LMs) are trained on web text originating from many points in time and, in general, without any explicit temporal grounding. This work investigates the temporal chaos of pretrained LMs and explores various methods to align their internal knowledge to a target time, which we call "temporal alignment." To do this, we first automatically construct a dataset containing 20K time-sensitive questions and their answers for each year from 2000 to 2023. Based on this dataset, we empirically show that pretrained LMs (e.g., LLaMa2), despite having a recent pretraining cutoff (e.g., 2022), mostly answer questions using earlier knowledge (e.g., in 2019). We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment. Our experiments demonstrate that aligning LLaMa2 to the year 2022 can enhance its performance by up to 62% according to that year's answers. This improvement occurs even without explicitly mentioning time information, indicating the possibility of aligning models' internal sense of time after pretraining. Finally, we find that alignment to a historical time is also possible, with up to 2.8$\times$ the performance of the unaligned LM in 2010 if finetuning models to that year. These findings hint at the sophistication of LMs' internal knowledge organization and the necessity of tuning them properly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Oshin Agarwal and Ani Nenkova. 2022. Temporal effects on pre-trained models for language processing tasks. Transactions of the Association for Computational Linguistics, 10:904–921.
  2. Giusepppe Attardi. 2015. Wikiextractor. https://github.com/attardi/wikiextractor.
  3. Language models are few-shot learners.
  4. A dataset for answering time-sensitive questions. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics.
  6. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10:257–273.
  7. Xinyang Geng. 2023. Easylm: A simple and scalable training framework for large language models.
  8. Wes Gurnee and Max Tegmark. 2023. Language models represent space and time. arXiv preprint arXiv:2310.02207.
  9. Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702.
  10. TemporalWiki: A lifelong benchmark for training and evaluating ever-evolving language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6237–6250, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Complex temporal question answering on knowledge graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, page 792–802, New York, NY, USA. Association for Computing Machinery.
  12. Lifelong pretraining: Continually adapting language models to emerging corpora. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4764–4780, Seattle, United States. Association for Computational Linguistics.
  13. Realtime qa: What’s the answer right now? arXiv preprint arXiv:2207.13332.
  14. Continual pre-training of language models. In The Eleventh International Conference on Learning Representations.
  15. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  16. Mind the gap: Assessing temporal generalization in neural language models. In Advances in Neural Information Processing Systems, volume 34, pages 29348–29363. Curran Associates, Inc.
  17. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356–2362.
  18. Streamingqa: A benchmark for adaptation to new knowledge over time in question answering models. In International Conference on Machine Learning, pages 13604–13622. PMLR.
  19. A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity.
  20. TimeLMs: Diachronic language models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 251–260, Dublin, Ireland. Association for Computational Linguistics.
  21. Time waits for no one! analysis and challenges of temporal misalignment.
  22. Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations.
  23. Fast model editing at scale. In International Conference on Learning Representations.
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  25. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  26. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  27. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
  28. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  29. Question answering over temporal knowledge graphs. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6663–6676, Online. Association for Computational Linguistics.
  30. Improving time sensitivity for question answering over temporal knowledge graphs. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8017–8026, Dublin, Ireland. Association for Computational Linguistics.
  31. Jungbin Son and Alice Oh. 2023. Time-aware representation learning for time-sensitive question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 70–77, Singapore. Association for Computational Linguistics.
  32. Towards benchmarking and improving the temporal reasoning capability of large language models. arXiv preprint arXiv:2306.08952.
  33. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  34. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  35. Freshllms: Refreshing large language models with search engine augmentation. arXiv preprint arXiv:2310.03214.
  36. Menatqa: A new dataset for testing the temporal comprehension and reasoning abilities of large language models. arXiv preprint arXiv:2310.05157.
  37. Michael J. Q. Zhang and Eunsol Choi. 2023. Mitigating temporal misalignment by discarding outdated facts.
  38. How do large language models capture the ever-changing world knowledge? a review of recent advances. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8289–8311.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bowen Zhao (44 papers)
  2. Zander Brumbaugh (3 papers)
  3. Yizhong Wang (42 papers)
  4. Hannaneh Hajishirzi (176 papers)
  5. Noah A. Smith (224 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com