Time is Encoded in the Weights of Finetuned Language Models
Abstract: We present time vectors, a simple tool to customize LLMs to new time periods. Time vectors are created by finetuning a LLM on data from a single time (e.g., a year or month), and then subtracting the weights of the original pretrained model. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period. Time vectors specialized to adjacent time periods appear to be positioned closer together in a manifold. Using this structure, we interpolate between time vectors to induce new models that perform better on intervening and future time periods, without any additional training. We demonstrate the consistency of our findings across different tasks, domains, model sizes, and time scales. Our results suggest that time is encoded in the weight space of finetuned models.
- Robert Bamler and Stephan Mandt. 2017. Dynamic word embeddings. In International conference on Machine learning, pages 380–389. PMLR.
- Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, Online.
- Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10:257–273.
- Simple, interpretable and stable method for detecting words with usage change across corpora. arXiv preprint arXiv:2112.14330.
- Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv preprint arXiv:1804.11283.
- Don’t stop pretraining: Adapt language models to domains and tasks. ArXiv, abs/2004.10964.
- Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096.
- Lora: Low-rank adaptation of large language models.
- Editing models with task arithmetic.
- Patching open-vocabulary models by interpolating weights. Advances in Neural Information Processing Systems, 35:29262–29277.
- Diachronic degradation of language models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 195–200.
- Mind the gap: Assessing temporal generalization in neural language models.
- The power of scale for parameter-efficient prompt tuning.
- Branch-train-merge: Embarrassingly parallel training of expert language models.
- A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169.
- Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829.
- Time waits for no one! analysis and challenges of temporal misalignment. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5944–5958, Seattle, United States. Association for Computational Linguistics.
- Task arithmetic in the tangent space: Improved editing of pre-trained models. ArXiv, abs/2305.12827.
- Exploring the limits of transfer learning with a unified text-to-text transformer.
- Shruti Rijhwani and Daniel Preoţiuc-Pietro. 2020. Temporally-informed analysis of named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7605–7617.
- Paul Röttger and Janet B Pierrehumbert. 2021. Temporal adaptation of bert and performance on downstream document classification: Insights from social media. arXiv preprint arXiv:2104.08116.
- Terrence Szymanski. 2017. Temporal word analogies: Identifying lexical replacement with diachronic word embeddings. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), pages 448–453.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR.
- Robust fine-tuning of zero-shot models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7949–7961.
- Michael JQ Zhang and Eunsol Choi. 2023. Mitigating temporal misalignment by discarding outdated facts. arXiv preprint arXiv:2305.14824.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.