Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Topic Aware Probing: From Sentence Length Prediction to Idiom Identification how reliant are Neural Language Models on Topic? (2403.02009v1)

Published 4 Mar 2024 in cs.CL

Abstract: Transformer-based Neural LLMs achieve state-of-the-art performance on various natural language processing tasks. However, an open question is the extent to which these models rely on word-order/syntactic or word co-occurrence/topic-based information when processing natural language. This work contributes to this debate by addressing the question of whether these models primarily use topic as a signal, by exploring the relationship between Transformer-based models' (BERT and RoBERTa's) performance on a range of probing tasks in English, from simple lexical tasks such as sentence length prediction to complex semantic tasks such as idiom token identification, and the sensitivity of these tasks to the topic information. To this end, we propose a novel probing method which we call topic-aware probing. Our initial results indicate that Transformer-based models encode both topic and non-topic information in their intermediate layers, but also that the facility of these models to distinguish idiomatic usage is primarily based on their ability to identify and encode topic. Furthermore, our analysis of these models' performance on other standard probing tasks suggests that tasks that are relatively insensitive to the topic information are also tasks that are relatively difficult for these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
  2. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019a.
  3. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi:10.18653/v1/P18-1198. URL https://www.aclweb.org/anthology/P18-1198.
  4. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 287–297, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi:10.18653/v1/W18-5431. URL https://aclanthology.org/W18-5431.
  5. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi:10.18653/v1/N19-1419. URL https://www.aclweb.org/anthology/N19-1419.
  6. What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy, August 2019. Association for Computational Linguistics. doi:10.18653/v1/W19-4828. URL https://aclanthology.org/W19-4828.
  7. Visualizing and measuring the geometry of bert. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/159c1ffe5b61b41b3c4d8f4c2150f6c4-Paper.pdf.
  8. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3651–3657, Florence, Italy, July 2019a. Association for Computational Linguistics. doi:10.18653/v1/P19-1356. URL https://aclanthology.org/P19-1356.
  9. Open sesame: Getting inside BERT’s linguistic knowledge. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 241–253, Florence, Italy, August 2019. Association for Computational Linguistics. doi:10.18653/v1/W19-4825. URL https://aclanthology.org/W19-4825.
  10. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
  11. Probing for constituency structure in neural language models. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6738–6757, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.findings-emnlp.502. URL https://aclanthology.org/2022.findings-emnlp.502.
  12. The architectural bottleneck principle, 2022.
  13. BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, Florence, Italy, July 2019a. Association for Computational Linguistics. doi:10.18653/v1/P19-1452. URL https://aclanthology.org/P19-1452.
  14. Does BERT rediscover a classical NLP pipeline? In Proceedings of the 29th International Conference on Computational Linguistics, pages 3143–3153, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.278.
  15. UnNatural Language Inference. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7329–7346, Online, August 2021a. Association for Computational Linguistics. doi:10.18653/v1/2021.acl-long.569. URL https://aclanthology.org/2021.acl-long.569.
  16. Out of order: How important is the sequential order of words in a sentence in natural language understanding tasks? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1145–1160, Online, August 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.findings-acl.98. URL https://aclanthology.org/2021.findings-acl.98.
  17. Bert & family eat word salad: Experiments with text understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 35(14):12946–12954, May 2021. doi:10.1609/aaai.v35i14.17531. URL https://ojs.aaai.org/index.php/AAAI/article/view/17531.
  18. How effective is BERT without word ordering? implications for language understanding and data privacy. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 204–211, Online, August 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.acl-short.27. URL https://aclanthology.org/2021.acl-short.27.
  19. Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2888–2913, Online and Punta Cana, Dominican Republic, November 2021b. Association for Computational Linguistics. doi:10.18653/v1/2021.emnlp-main.230. URL https://aclanthology.org/2021.emnlp-main.230.
  20. Word order does matter and shuffled language models know it. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6907–6919, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.acl-long.476. URL https://aclanthology.org/2022.acl-long.476.
  21. Foundations of statistical natural language processing. MIT press, 1999.
  22. Capturing and measuring thematic relatedness. Language Resources and Evaluation, 54(3):645–682, 2020.
  23. What do you mean, BERT? In Proceedings of the Society for Computation in Linguistics 2020, pages 279–290, New York, New York, January 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.scil-1.35.
  24. Shapley idioms: Analysing BERT sentence embeddings for general idiom token identification. Frontiers in Artificial Intelligence, 5, 2022. ISSN 2624-8212. doi:10.3389/frai.2022.813967. URL https://www.frontiersin.org/article/10.3389/frai.2022.813967.
  25. Idioms in context: The idix corpus. In LREC, 2010.
  26. Unsupervised type and token identification of idiomatic expressions. Comput. Linguist., 35(1):61–103, March 2009. ISSN 0891-2017. doi:10.1162/coli.08-010-R1-07-048. URL http://dx.doi.org/10.1162/coli.08-010-R1-07-048.
  27. Multiword expressions: A pain in the neck for nlp. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, CICLing ’02, pages 1–15, Berlin, Heidelberg, 2002. Springer-Verlag. ISBN 3-540-43219-1. URL http://dl.acm.org/citation.cfm?id=647344.724004.
  28. The VNC-tokens dataset. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008), pages 19–22, 2008.
  29. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8:842–866, 2020. doi:10.1162/tacl_a_00349. URL https://aclanthology.org/2020.tacl-1.54.
  30. Parsing as pretraining. In AAAI Conference on Artificial Intelligence, 2020.
  31. Are pre-trained language models aware of phrases? simple but strong baselines for grammar induction. ArXiv, abs/2002.00737, 2020.
  32. Inducing syntactic trees from BERT representations. arXiv preprint arXiv:1906.11511, 2019.
  33. Allyson Ettinger. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Transactions of the Association for Computational Linguistics, 8:34–48, 01 2020. ISSN 2307-387X. doi:10.1162/tacl_a_00298. URL https://doi.org/10.1162/tacl_a_00298.
  34. Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3090–3104, Online, April 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.eacl-main.270. URL https://aclanthology.org/2021.eacl-main.270.
  35. Perturbed masking: Parameter-free probing for analyzing and interpreting BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4166–4176, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.acl-main.383. URL https://aclanthology.org/2020.acl-main.383.
  36. What do you learn from context? probing for sentence structure in contextualized word representations. CoRR, abs/1905.06316, 2019b. URL http://arxiv.org/abs/1905.06316.
  37. What’s in a name? are BERT named entity representations just as good for any other name? In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 205–214, Online, July 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.repl4nlp-1.24. URL https://aclanthology.org/2020.repl4nlp-1.24.
  38. Probing linguistic information for logical inference in pre-trained language models. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10):10509–10517, Jun. 2022. doi:10.1609/aaai.v36i10.21294. URL https://ojs.aaai.org/index.php/AAAI/article/view/21294.
  39. Yoav Goldberg. Assessing BERT’s syntactic abilities. CoRR, abs/1901.05287, 2019. URL http://arxiv.org/abs/1901.05287.
  40. What does BERT learn about the structure of language? In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 3651–3657. Association for Computational Linguistics, 2019b. doi:10.18653/v1/p19-1356. URL https://doi.org/10.18653/v1/p19-1356.
  41. Linguistic knowledge and transferability of contextual representations. CoRR, abs/1903.08855, 2019b. URL http://arxiv.org/abs/1903.08855.
  42. Revealing the dark secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4365–4374, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1445. URL https://aclanthology.org/D19-1445.
  43. The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 31–47, Valencia, Spain, April 2017. Association for Computational Linguistics. doi:10.18653/v1/W17-1704. URL https://aclanthology.org/W17-1704.
  44. Edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 222–240, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics. URL https://aclanthology.org/W18-4925.
  45. Edition 1.2 of the PARSEME shared task on semi-supervised identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 107–118, online, December 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.mwe-1.14.
  46. SemEval-2016 task 10: Detecting minimal semantic units and their meanings (DiMSUM). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 546–559, San Diego, California, June 2016. Association for Computational Linguistics. doi:10.18653/v1/S16-1084. URL https://aclanthology.org/S16-1084.
  47. SemEval-2022 task 2: Multilingual idiomaticity detection and sentence embedding. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 107–121, Seattle, United States, July 2022. Association for Computational Linguistics. doi:10.18653/v1/2022.semeval-1.13. URL https://aclanthology.org/2022.semeval-1.13.
  48. Survey: Multiword expression processing: A Survey. Computational Linguistics, 43(4):837–892, December 2017. doi:10.1162/COLI_a_00302. URL https://aclanthology.org/J17-4005.
  49. Construction of an idiom corpus and its application to idiom identification based on wsd incorporating idiom-specific features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 992–1001, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. URL http://dl.acm.org/citation.cfm?id=1613715.1613844.
  50. Linguistic cues for distinguishing literal and non-literal usages. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, pages 683–691, Stroudsburg, PA, USA, 2010a. Association for Computational Linguistics. URL http://dl.acm.org/citation.cfm?id=1944566.1944644.
  51. Using Gaussian mixture models to detect figurative language in context. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 297–300, Los Angeles, California, June 2010b. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/N10-1039.
  52. Automatic detection of idiomatic clauses. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, pages 435–446, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-37247-6.
  53. Classifying idiomatic and literal expressions using topic models and intensity of emotions. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2019–2027, Doha, Qatar, October 2014. Association for Computational Linguistics. doi:10.3115/v1/D14-1216. URL https://www.aclweb.org/anthology/D14-1216.
  54. Idiom token classification using sentential distributed semantics. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 194–204, Berlin, Germany, August 2016. Association for Computational Linguistics. doi:10.18653/v1/P16-1019. URL https://www.aclweb.org/anthology/P16-1019.
  55. Skip-thought vectors. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/f442d33fa06832082290ad8544a8da27-Paper.pdf.
  56. Leveraging contextual embeddings and idiom principle for detecting idiomaticity in potentially idiomatic expressions. In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 72–80, Online, December 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.cogalex-1.9.
  57. Probing for idiomaticity in vector space models. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3551–3564, Online, April 2021. Association for Computational Linguistics. doi:10.18653/v1/2021.eacl-main.310. URL https://aclanthology.org/2021.eacl-main.310.
  58. Finding BERT’s idiomatic key. In Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), pages 57–62, 2021.
  59. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990.
  60. Jacob Eisenstein. Introduction to natural language processing. MIT press, 2019.
  61. Probing with noise: Unpicking the warp and weft of embeddings. In Proceedings of the Fifth BlackBoxNLP Workshop on analyzing and interpreting neural networks for NLP. Association for Computational Linguistics, 2022.
  62. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  63. On the interplay between fine-tuning and sentence-level probing for linguistic knowledge in pre-trained transformers. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 68–82, Online, November 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.blackboxnlp-1.7. URL https://aclanthology.org/2020.blackboxnlp-1.7.
  64. Designing and interpreting probes with control tasks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 2733–2743. Association for Computational Linguistics, 2019. doi:10.18653/v1/D19-1275. URL https://doi.org/10.18653/v1/D19-1275.
  65. Literal Occurrences of Multiword Expressions: Rare Birds That Cause a Stir. The Prague Bulletin of Mathematical Linguistics, 112:5–54, April 2019. ISSN 0032-6585. doi:10.2478/pralin-2019-0001. URL https://ufal.mff.cuni.cz/pbml/112/art-savary-et-al.pdf.
  66. Empirical comparison of area under ROC curve (AUC) and mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. In Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, ICMLSC 2019, page 1–6, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450366120. doi:10.1145/3310986.3311023. URL https://doi.org/10.1145/3310986.3311023.
  67. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  68. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=BJh6Ztuxl.
  69. Structbert: Incorporating language structures into pre-training for deep language understanding. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=BJgQ4lSFPH.
  70. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://aclanthology.org/D13-1170.
  71. MAGPIE: A large corpus of potentially idiomatic expressions. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 279–287, Marseille, France, May 2020. European Language Resources Association. ISBN 979-10-95546-34-4. URL https://aclanthology.org/2020.lrec-1.35.
  72. Explaining translationese: why are neural classifiers better and what do they learn? In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 281–296, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.blackboxnlp-1.23.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
Citations (1)