Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation

Published 23 Aug 2022 in cs.CL | (2208.10734v3)

Abstract: Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked LLM (MLM) using time-sensitive templates. Given two snapshots $C_1$ and $C_2$ of a corpus taken respectively at two distinct timestamps $T_1$ and $T_2$, we first propose an unsupervised method to select (a) \emph{pivot} terms related to both $C_1$ and $C_2$, and (b) \emph{anchor} terms that are associated with a specific pivot term in each individual snapshot. We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms. Moreover, we propose an automatic method to learn time-sensitive templates from $C_1$ and $C_2$, without requiring any human supervision. Next, we use the generated prompts to adapt a pretrained MLM to $T_2$ by fine-tuning using those prompts. Multiple experiments show that our proposed method reduces the perplexity of test sentences in $C_2$, outperforming the current state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Oshin Agarwal and Ani Nenkova. 2022. Temporal effects on pre-trained models for language processing tasks. Transactions of the Association for Computational Linguistics, 10:904–921.
  2. Taichi Aida and Danushka Bollegala. 2023. Unsupervised semantic variation prediction using the distribution of sibling embeddings. In Proc. of the Findings of 61st Annual Meeting of the Association for Computational Linguistics.
  3. Dynamic language models for continuously evolving content. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 2514–2524, New York, NY, USA. Association for Computing Machinery.
  4. Analysing word meaning over time by exploiting temporal random indexing.
  5. Joan Baybee. 2015. Language Change. Cambridge University Press.
  6. PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains. Transactions of Association for Computational Linguistics.
  7. Unsupervised cross-domain word representation learning. In Proc. of ACL, pages 730 – 740.
  8. Lyle Campbell. 2004. Historic Linguistics. Edinburgh University Press.
  9. Unsupervised cross-lingual representation learning at scale. In ACL.
  10. Alexis CONNEAU and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  12. Time-Aware Language Models as Temporal Knowledge Bases. Transactions of the Association for Computational Linguistics, 10:257–273.
  13. Prompt tuning or fine-tuning - investigating relational knowledge in pre-trained language models. In 3rd Conference on Automated Knowledge Base Construction.
  14. John R. Firth. 1957. A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis, pages 1 – 32.
  15. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
  16. Analysing lexical semantic change with contextualised word representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3960–3973, Online. Association for Computational Linguistics.
  17. Dynamic contextualized word embeddings. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6970–6984, Online. Association for Computational Linguistics.
  18. Temporalwiki: A lifelong benchmark for training and evaluating ever-evolving language models. arXiv preprint arXiv:2204.14211.
  19. Lifelong pretraining: Continually adapting language models to emerging corpora. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pages 1–16, virtual+Dublin. Association for Computational Linguistics.
  20. Masahiro Kaneko and Danushka Bollegala. 2021. Debiasing pre-trained contextualised embeddings. In Proc. of 16th conference of the European Chapter of the Association for Computational Linguistics (EACL).
  21. Masahiro Kaneko and Danushka Bollegala. 2022. Unmasking the mask – evaluating social biases in masked language models. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, page (to appear), Vancouver, BC, Canada.
  22. Gender bias in masked language models for multiple languages. In Proc. of NAACL-HLT.
  23. Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimization. In Proc. of ICLR.
  24. Peter Koch. 2016. Meaning change and semantic shifts., pages 21–66. De Gruyter.
  25. Albert: A lite bert for self-supervised learning of language representations. In Proc. of ICLR.
  26. Sustainable modular debiasing of language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4782–4797, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  27. Teven Le Scao and Alexander Rush. 2021. How many data points is a prompt worth? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2627–2636, Online. Association for Computational Linguistics.
  28. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.
  29. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  30. TimeLMs: Diachronic Language Models from Twitter.
  31. On measuring social biases in sentence encoders. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 622–628, Minneapolis, Minnesota. Association for Computational Linguistics.
  32. Rada Mihalcea and Vivi Nastase. 2012. Word epoch disambiguation: Finding how words change over time. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 259–263, Jeju Island, Korea. Association for Computational Linguistics.
  33. On the stability of fine-tuning BERT: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations.
  34. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  35. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
  36. Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying LMs with mixtures of soft prompts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5203–5212, Online. Association for Computational Linguistics.
  37. Wenjun Qiu and Yang Xu. 2022. HistBERT: A Pre-trained Language Model for Diachronic Lexical Semantic Analysis.
  38. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  39. Taraka Rama and Lars Borin. 2015. Comparative evaluation of string similarity measures for automatic language classification. In Sequences in Language and Text, pages 171–200. DE GRUYTER.
  40. Justyna A. Robinson. 2012. A gay paper: why should sociolinguistics bother with semantics?: Can sociolinguistic methods shed light on semantic variation and change in reference to the adjective gay? English Today, 28(4):38–54.
  41. Time masking for temporal language models. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, page 833–841, New York, NY, USA. Association for Computing Machinery.
  42. Guy D. Rosin and Kira Radinsky. 2022. Temporal attention for language models. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1498–1508, Seattle, United States. Association for Computational Linguistics.
  43. Maja Rudolph and David Blei. 2018. Dynamic embeddings for language evolution. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, pages 1003–1011, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
  44. Tracing semantic change with latent semantic analysis. In Current Methods in Historical Semantics, pages 161–183. DE GRUYTER.
  45. SemEval-2020 task 1: Unsupervised lexical semantic change detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1–23, Barcelona (online). International Committee for Computational Linguistics.
  46. mtrust: Discerning multi-faceted trust in a connected world. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 93–102.
  47. Semantic change computation: A successive approach. World Wide Web, 19(3):375–415.
  48. Distilling relation embeddings from pretrained language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9044–9062, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  49. Avoiding inference heuristics in few-shot prompt-based finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9063–9074, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  50. Compositional demographic word embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4076–4089, Online. Association for Computational Linguistics.
  51. Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237.
  52. Socialized word embeddings. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, California. International Joint Conferences on Artificial Intelligence Organization.
  53. Learning sense-specific static embeddings using contextualised word embeddings as a proxy. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pages 493–502, Shanghai, China. Association for Computational Lingustics.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.