Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

TempoFormer: A Transformer for Temporally-aware Representations in Change Detection (2408.15689v2)

Published 28 Aug 2024 in cs.CL

Abstract: Dynamic representation learning plays a pivotal role in understanding the evolution of linguistic content over time. On this front both context and time dynamics as well as their interplay are of prime importance. Current approaches model context via pre-trained representations, which are typically temporally agnostic. Previous work on modelling context and temporal dynamics has used recurrent methods, which are slow and prone to overfitting. Here we introduce TempoFormer, the first task-agnostic transformer-based and temporally-aware model for dynamic representation learning. Our approach is jointly trained on inter and intra context dynamics and introduces a novel temporal variation of rotary positional embeddings. The architecture is flexible and can be used as the temporal representation foundation of other models or applied to different transformer-based architectures. We show new SOTA performance on three different real-time change detection tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Hassan Alhuzali and Sophia Ananiadou. 2021. Spanemo: Casting multi-label emotion classification as span-prediction. arXiv preprint arXiv:2101.10038.
  2. Robert Bamler and Stephan Mandt. 2017. Dynamic word embeddings. In International conference on Machine learning, pages 380–389. PMLR.
  3. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. arXiv preprint arXiv:2303.16421.
  4. Generation, distillation and evaluation of motivational interviewing-style reflections with a foundational language model. arXiv preprint arXiv:2402.01051.
  5. The nxt-format switchboard corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language resources and evaluation, 44:387–419.
  6. Jonathan P Chang and Cristian Danescu-Niculescu-Mizil. 2019. Trouble on the horizon: Forecasting the derailment of online conversations as they develop. arXiv preprint arXiv:1909.01362.
  7. Timebench: A comprehensive evaluation of temporal reasoning abilities in large language models. arXiv preprint arXiv:2311.17667.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  9. Limitations of transformers on clinical text classification. IEEE journal of biomedical and health informatics, 25(9):3596–3607.
  10. Switchboard: Telephone speech corpus for research and development. In Acoustics, speech, and signal processing, ieee international conference on, volume 1, pages 517–520. IEEE Computer Society.
  11. Semeval-2019 task 7: Rumoureval 2019: Determining rumour veracity and support for rumours. In Proceedings of the 13th International Workshop on Semantic Evaluation: NAACL HLT 2019, pages 845–854. Association for Computational Linguistics.
  12. A personalized sentiment model with textual and contextual information. In Proceedings of the 23rd conference on computational natural language learning (CoNLL), pages 992–1001.
  13. Time-aware predictions of moments of change in longitudinal user posts on social media. In International Workshop on Advanced Analytics and Learning on Temporal Data, pages 293–305. Springer.
  14. Do language models have a common sense regarding time? revisiting temporal commonsense reasoning in the era of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6750–6774.
  15. Does the magic of bert apply to medical code assignment? a quantitative study. Computers in biology and medicine, 139:104998.
  16. Mistral 7b. arXiv preprint arXiv:2310.06825.
  17. Fakebert: Fake news detection in social media with a bert-based deep learning approach. Multimedia tools and applications, 80(8):11765–11788.
  18. Yova Kementchedjhieva and Anders Søgaard. 2021. Dynamic forecasting of conversation derailment. arXiv preprint arXiv:2110.05111.
  19. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  20. Topic shift detection for mixed initiative response. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 161–166.
  21. Sumeet Kumar and Kathleen M Carley. 2019. Tree lstms with convolution units to predict stance and rumor veracity in social media conversations. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 5047–5058.
  22. Prominent features of rumor propagation in online social media. In 2013 IEEE 13th international conference on data mining, pages 1103–1108. IEEE.
  23. Event detection: Gate diversity and syntactic importance scoresfor graph convolution neural networks. arXiv preprint arXiv:2010.14123.
  24. Outlier detection for improved data quality and diversity in dialog systems. arXiv preprint arXiv:1904.03122.
  25. Mind the gap: Assessing temporal generalization in neural language models. Advances in Neural Information Processing Systems, 34:29348–29363.
  26. Multi-task learning with auxiliary speaker identification for conversational emotion recognition. arXiv preprint arXiv:2003.01478.
  27. Recurrent attention networks for long-text modeling. arXiv preprint arXiv:2306.06843.
  28. Multimodal conversation modelling for topic derailment detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5115–5127.
  29. Rumor detection on twitter with claim-guided hierarchical graph attention networks. arXiv preprint arXiv:2110.04522.
  30. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988.
  31. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  32. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  33. Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829.
  34. Terry J Lyons. 1998. Differential equations driven by rough signals. Revista Matemática Iberoamericana, 14(2):215–310.
  35. Jing Ma and Wei Gao. 2020. Debunking rumors on twitter with tree transformer. ACL.
  36. An attention-based rumor detection model with tree-structured recursive neural networks. ACM Transactions on Intelligent Systems and Technology (TIST), 11(4):1–28.
  37. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  38. Modelling temporal document sequences for clinical icd coding. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1640–1649.
  39. Hierarchical transformers for long document classification. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU), pages 838–844. IEEE.
  40. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32.
  41. A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications, 32(23):17309–17320.
  42. Shaina Raza and Chen Ding. 2022. Fake news detection based on news content and social contexts: a transformer-based approach. International Journal of Data Science and Analytics, 13(4):335–362.
  43. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  44. Alex Rosenfeld and Katrin Erk. 2018. Deep neural models of semantic shift. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 474–484.
  45. Time masking for temporal language models. In Proceedings of the fifteenth ACM international conference on Web search and data mining, pages 833–841.
  46. Guy D Rosin and Kira Radinsky. 2022. Temporal attention for language models. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1498–1508.
  47. Paul Röttger and Janet Pierrehumbert. 2021. Temporal adaptation of bert and performance on downstream document classification: Insights from social media. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2400–2412.
  48. Phase: Learning emotional phase-aware representations for suicide ideation detection on social media. In Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics: main volume, pages 2415–2428.
  49. A time-aware transformer based model for suicide ideation detection on social media. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 7685–7697.
  50. Suicide ideation detection via social and temporal user representations using hyperbolic learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2176–2190.
  51. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958.
  52. More diverse dialogue datasets via diversity-informed data collection. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 4958–4968.
  53. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
  54. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  55. Overview of the clpsych 2022 shared task: Capturing moments of change in longitudinal user posts.
  56. Adam Tsakalidis and Maria Liakata. 2020. Sequential modelling of the evolution of word representations for semantic change detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8485–8497.
  57. Identifying moments of change from longitudinal user text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4647–4660.
  58. Sig-networks toolkit: Signature networks for longitudinal language modelling. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 223–237.
  59. Sequential path signature networks for personalised longitudinal language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5016–5031.
  60. Do sentence interactions matter? leveraging sentence level representations for fake news classification. arXiv preprint arXiv:1910.12203.
  61. Temporal blind spots in large language models. arXiv preprint arXiv:2401.12078.
  62. Bitimebert: Extending pre-trained language representations with bi-temporal information. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 812–821.
  63. Georg Wenzel and Adam Jatowt. 2024. Temporal validity change prediction. arXiv preprint arXiv:2401.00779.
  64. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  65. Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling. arXiv preprint arXiv:2106.01040.
  66. Event detection with multi-order graph convolution and aggregated attention. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 5766–5770.
  67. A weakly supervised propagation model for rumor verification and stance detection with multiple instance learning. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pages 1761–1772.
  68. Exploring pre-trained language models for event extraction and generation. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 5284–5294.
  69. Coupled hierarchical transformer for stance-aware rumor verification in social media conversations. Association for Computational Linguistics.
  70. Code synonyms do matter: Multiple synonyms matching network for automatic icd coding. arXiv preprint arXiv:2203.01515.
  71. Generic intent representation in web search. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 65–74.
  72. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  73. Hibert: Document level pre-training of hierarchical bidirectional transformers for document summarization. arXiv preprint arXiv:1905.06566.
  74. Towards making the most of context in neural machine translation. arXiv preprint arXiv:2002.07982.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets