Papers
Topics
Authors
Recent
Search
2000 character limit reached

Time is Encoded in the Weights of Finetuned Language Models

Published 20 Dec 2023 in cs.CL | (2312.13401v2)

Abstract: We present time vectors, a simple tool to customize LLMs to new time periods. Time vectors are created by finetuning a LLM on data from a single time (e.g., a year or month), and then subtracting the weights of the original pretrained model. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period. Time vectors specialized to adjacent time periods appear to be positioned closer together in a manifold. Using this structure, we interpolate between time vectors to induce new models that perform better on intervening and future time periods, without any additional training. We demonstrate the consistency of our findings across different tasks, domains, model sizes, and time scales. Our results suggest that time is encoded in the weight space of finetuned models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Robert Bamler and Stephan Mandt. 2017. Dynamic word embeddings. In International conference on Machine learning, pages 380–389. PMLR.
  2. Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, Online.
  3. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10:257–273.
  4. Simple, interpretable and stable method for detecting words with usage change across corpora. arXiv preprint arXiv:2112.14330.
  5. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv preprint arXiv:1804.11283.
  6. Don’t stop pretraining: Adapt language models to domains and tasks. ArXiv, abs/2004.10964.
  7. Diachronic word embeddings reveal statistical laws of semantic change. arXiv preprint arXiv:1605.09096.
  8. Lora: Low-rank adaptation of large language models.
  9. Editing models with task arithmetic.
  10. Patching open-vocabulary models by interpolating weights. Advances in Neural Information Processing Systems, 35:29262–29277.
  11. Diachronic degradation of language models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 195–200.
  12. Mind the gap: Assessing temporal generalization in neural language models.
  13. The power of scale for parameter-efficient prompt tuning.
  14. Branch-train-merge: Embarrassingly parallel training of expert language models.
  15. A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169.
  16. Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829.
  17. Time waits for no one! analysis and challenges of temporal misalignment. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5944–5958, Seattle, United States. Association for Computational Linguistics.
  18. Task arithmetic in the tangent space: Improved editing of pre-trained models. ArXiv, abs/2305.12827.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer.
  20. Shruti Rijhwani and Daniel Preoţiuc-Pietro. 2020. Temporally-informed analysis of named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7605–7617.
  21. Paul Röttger and Janet B Pierrehumbert. 2021. Temporal adaptation of bert and performance on downstream document classification: Insights from social media. arXiv preprint arXiv:2104.08116.
  22. Terrence Szymanski. 2017. Temporal word analogies: Identifying lexical replacement with diachronic word embeddings. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), pages 448–453.
  23. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR.
  24. Robust fine-tuning of zero-shot models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7949–7961.
  25. Michael JQ Zhang and Eunsol Choi. 2023. Mitigating temporal misalignment by discarding outdated facts. arXiv preprint arXiv:2305.14824.
Citations (12)

Summary

  • The paper demonstrates that fine-tuning language models with time vectors effectively encodes temporal linguistic trends within the weight space.
  • The method involves subtracting base model weights from period-specific fine-tuned models to reveal organized temporal directions and mitigate misalignment.
  • Applying interpolation and task analogy techniques, the approach improves model performance on intervening and future text without full retraining.

Introduction to Time Vectors

Language evolves over time, and this poses a challenge for LLMs which must adapt to shifts in word usage and context that occur over months and years. This phenomenon, known as temporal misalignment, can affect the performance of a model when the data it was trained on becomes outdated. To address this issue, a new concept of time vectors has been introduced, which allows LLMs to be fine-tuned for specific time periods, improving their performance on text from those periods without the need for continual retraining.

Understanding Temporal Misalignment

Temporal misalignment becomes apparent when variations in training and testing data timelines lead to performance degradation, which is particularly noticeable across different time periods. This misalignment has been analyzed at both yearly and monthly scales, revealing linear degradation patterns in yearly settings and seasonal patterns on a monthly basis. These findings showcase the imperative need for LLMs that can adjust to the temporal variations inherent in language.

Time Vector Mechanics

Time vectors are essentially created by fine-tuning a pre-trained LLM on text from a single period (e.g., a specific year) and then subtracting the weights of the original model. This yields a direction in the weight space that corresponds to the specific time period's linguistic nuances. Researchers found that these vectors possess an organized manifold structure, with vectors for adjacent time periods being closer together, suggesting that time is indeed encoded within the weight space of these models.

Application of Time Vectors

Leveraging the organization of time vectors, various applications have emerged:

  1. Interpolating Time Vectors: By interpolating between time vectors corresponding to different periods, models have shown improved performance on intervening months or years.
  2. Generalizing to Future Time Periods: A technique called task analogy allows for the improvement of performance on future data by using only unlabeled text from those future time periods, avoiding the need for additional training data.
  3. Multi-Time Period Generalization: Multi-year model performance remains a challenge. Interpolating between all time vectors for a task, known as a 'time soup,' has not yet reached the efficiency of models trained on data from all periods at once.

Conclusions and Implications

This research illustrates the potential of time vectors in enabling LLMs to adapt to new time periods. By utilizing weight space interpolation and task analogy, models can be updated to reflect intervening and future linguistic trends. However, creating a model that generalizes well across multiple time periods is complex, suggesting that further sophisticated methods are needed. The release of both the code and the finetuned models marks a step towards more temporally aware and adaptable AI language systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 24 tweets with 84 likes about this paper.

HackerNews