Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Connecting degree and polarity: An artificial language learning study (2109.06333v2)

Published 13 Sep 2021 in cs.CL

Abstract: We investigate a new linguistic generalization in pre-trained LLMs (taking BERT (Devlin et al., 2019) as a case study). We focus on degree modifiers (expressions like slightly, very, rather, extremely) and test the hypothesis that the degree expressed by a modifier (low, medium or high degree) is related to the modifier's sensitivity to sentence polarity (whether it shows preference for affirmative or negative sentences or neither). To probe this connection, we apply the Artificial Language Learning experimental paradigm from psycholinguistics to a neural LLM. Our experimental results suggest that BERT generalizes in line with existing linguistic observations that relate degree semantics to polarity sensitivity, including the main one: low degree semantics is associated with preference towards positive polarity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Erin D Bennett and Noah D Goodman. 2018. Extremely costly intensifiers are stronger than quite costly ones. Cognition, 178:147–161.
  2. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  3. Lisa Bylinina and Alexey Tikhonov. 2022a. The driving forces of polarity-sensitivity: Experiments with multilingual pre-trained neural language models. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 44.
  4. Lisa Bylinina and Alexey Tikhonov. 2022b. Transformers in the loop: Polarity in neural models of language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6601–6610.
  5. The emergence of monotone quantifiers via iterated learning. In Proceedings of the 41st Annual Meeting of the Cognitive Science Society.
  6. Learning biases predict a word order universal. Cognition, 122(3):306–329.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.
  8. Variability in the learning of complex morphophonology. Applied psycholinguistics, 35(4):807–831.
  9. Delia Fara. 2000. Shifting sands: An interest-relative theory of vagueness. Philosophical topics, 28(1):45–81.
  10. Gilles Fauconnier. 1975. Polarity and the scale principle. In Proceedings of Chicago Linguistc Society 11, pages 188–99.
  11. Sara Finley and William Badecker. 2009. Artificial language learning and feature-based generalization. Journal of Memory and Language, 61(3):423–437.
  12. Brain signatures of artificial language processing: Evidence challenging the critical period hypothesis. Proceedings of the National Academy of Sciences, 99(1):529–534.
  13. Neural language models as psycholinguistic subjects: Representations of syntactic state. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 32–42, Minneapolis, Minnesota. Association for Computational Linguistics.
  14. Scalar diversity, negative strengthening, and adjectival semantics. Frontiers in Psychology, 9:1659.
  15. Joseph H Greenberg. 1963. Some universals of grammar with particular reference to the order of meaningful elements. Universals of language, 2:73–113.
  16. Michael Israel. 1996. Polarity sensitivity as lexical semantics. Linguistics and Philosophy, 19(6):619–666.
  17. Michael Israel. 2011. The grammar of polarity: Pragmatics, sensitivity, and the logic of scales, volume 127. Cambridge University Press.
  18. Satoshi Ito. 2015. Bias and Prosody in Japanese Negative Polar Questions. Ph.D. thesis, Cornell University.
  19. Language models use monotonicity to assess NPI licensing. CoRR, abs/2105.13818.
  20. Jaap Jumelet and Dieuwke Hupkes. 2018. Do language models understand anything? on the ability of lstms to understand negative polarity items. In BlackboxNLP@ EMNLP.
  21. Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication. Cognition, 165:45–52.
  22. Learning the difference that makes a difference with counterfactually-augmented data. In International Conference on Learning Representations.
  23. Explaining the efficacy of counterfactually augmented data.
  24. Christopher Kennedy. 2007. Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and philosophy, 30(1):1–45.
  25. Christopher Kennedy and Louise McNally. 2005. Scale structure, degree modification, and the semantics of gradable predicates. Language, pages 345–381.
  26. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  27. William A Ladusaw. 1979. Polarity sensitivity as inherent scope relations. Ph.D. thesis, Austin, TX: University of Texas at Austin.
  28. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  29. Diana Mazzarella and Nicole Gotzner. 2021. The polarity asymmetry of negative strengthening: dissociating adjectival polarity from facethreatening potential. Glossa: a journal of general linguistics, 6(1).
  30. Evolving artificial sign languages in the lab: From improvised gesture to systematic sign. Cognition, 192:103964.
  31. Rick Nouwen. 2013. Best nogal aardige middenmoters: de semantiek van graadadverbia van het midden-bereik. Nederlandse Taalkunde, 18(2):204–214.
  32. Carita Paradis. 1997. Degree modifiers of adjectives in spoken British English. Lund University Press.
  33. Deep contextualized word representations. In Proc. of NAACL.
  34. Bootstrapping in a language of thought: A formal model of numerical concept learning. Cognition, 123(2):199–217.
  35. M-modifiers, attenuation and polarity sensitivity. In Proceedings of Sinn und Bedeutung, volume 25.
  36. Stephanie Solt. 2018. Not much: On the variable polarity sesitivity of ‘much’ words cross-linguistically. In Proceedings of Sinn und Bedeutung, volume 23.
  37. Investigating novel verb learning in bert: Selectional preference classes and alternation-based syntactic generalization. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 265–275.
  38. Quantifiers satisfying semantic universals are simpler. In Proceedings of the 43rd Annual Meeting of the Cognitive Science Society.
  39. Charles van Os. 1989. Aspekte der Intensivierung im Deutschen, volume 37. Narr Tübingen.
  40. Investigating bert’s knowledge of language: Five analysis methods with npis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2877–2887.
  41. Structural supervision improves few-shot learning and syntactic generalization in neural language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4640–4652, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lisa Bylinina (7 papers)
  2. Alexey Tikhonov (35 papers)
  3. Ekaterina Garmash (3 papers)

Summary

We haven't generated a summary for this paper yet.