Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frequency effects in Linear Discriminative Learning (2306.11044v2)

Published 19 Jun 2023 in cs.CL

Abstract: Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Spalex: A spanish lexical decision database from a massive online data collection. Frontiers in psychology, 9:2156.
  2. Baayen, R. H. (2001). Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht.
  3. Baayen, R. H. (2005). Data mining at the intersection of psychology and linguistics. In Cutler, A., editor, Twenty-first century psycholinguistics: Four cornerstones, pages 69–83. Erlbaum, Hillsdale, New Jersey.
  4. Baayen, R. H. (2010). Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon, 5(3):436–461.
  5. Inflectional morphology with linear mappings. The Mental Lexicon, 13(2):230–268.
  6. WpmWithLdl: Implementation of word and paradigm morphology with linear discriminative learning. R package Version 1.2.20.
  7. The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition but in linear discriminative learning. Complexity, 2019.
  8. Singulars and plurals in Dutch: Evidence for a parallel dual route model. Journal of Memory and Language, 36:94–117.
  9. Frequency in lexical processing. Aphasiology, 30(11):1174–1220.
  10. An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological review, 118(3):438.
  11. The CELEX lexical database [cd rom]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
  12. Modeling morphological priming in german with naive discriminative learning. Frontiers in Communication, 5:17.
  13. Visual word recognition of single-syllable words. Journal of experimental psychology: General, 133(2):283.
  14. The english lexicon project. Behavior research methods, 39(3):445–459.
  15. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65–98.
  16. Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):199–231.
  17. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University.
  18. The word frequency effect. Experimental psychology.
  19. The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1):45–50.
  20. Moving beyond kučera and francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behavior research methods, 41(4):977–990.
  21. The impact of word prevalence on lexical decision times: Evidence from the dutch lexicon project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3):441.
  22. Bybee, J. (2010). Language, usage and cognition. Cambridge University Press, Cambridge.
  23. Frequency and the Emergence of Linguistic Structure, volume 45. John Benjamins Publishing.
  24. Bilingual and multilingual mental lexicon: a modeling study with linear discriminative learning. Language Learning, 71(S1):219–292.
  25. Vector space morphology with linear discriminative learning. In Crepaldi, D., editor, Linguistic morphology in the mind and brain.
  26. Estonian case inflection made simple a case study in word and paradigm morphology with linear discriminative learning. In Körtvélyessy, L. and Štekauer, P., editors, Complex Words: Advances in Morphology, chapter 7, pages 119–14. Cambridge University Press.
  27. Word-minimality, epenthesis and coda licensing in the early acquisition of english. Language and speech, 49(2):137–173.
  28. The morphology of indonesian: Data and quantitative modeling. The Routledge Handbook of Asian Linguistics. Routledge.
  29. How noisy is lexical decision? Frontiers in psychology, 3:348.
  30. Predicting the unpredictable: Interpreting neutralized segments in dutch. Language, 79(1):5–38.
  31. The french lexicon project: Lexical decision data for 38,840 french words and 38,840 pseudowords. Behavior research methods, 42:488–496.
  32. Positional and phonotactic effects on the realization of dipping tones in Taiwan Mandarin. In Gussenhoven, C. and Riad, T., editors, Phonology and Phonetics, Tones and Tunes: Vol. 2. Experimental Studies in Word and Sentence Prosody, pages 239–269. Mouton de Gruyter, Berlin.
  33. Forster, K. (1976). Accessing the mental lexicon. In Wales, F. and Walker, E., editors, New approaches to language mechanisms, pages 257–286. Amsterdam: North-Holland.
  34. Forster, K. I. (1979). Levels of processing and the structure of the language processor. Sentence processing, pages 27–85.
  35. Forster, K. I. (1994). Computational modeling and elementary process analysis in visual word recognition. Journal of Experimental Psychology: Human Perception and Performance, 20(6):1292.
  36. Time and thyme again: Connecting spoken word duration to models of the mental lexicon. Under revision for Language.
  37. Learning word vectors for 157 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  38. Computing the meanings of words in reading: cooperative division of labor between visual and phonological processes. Psychological review, 111(3):662.
  39. Simulating phonological and semantic impairment of English tense inflection with linear discriminative learning. The Mental Lexicon, 15(3):385–421.
  40. How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. arXiv preprint arXiv:2207.00430.
  41. Modeling morphology with linear discriminative learning: considerations and design choices. Frontiers in Psychology.
  42. Linear Discriminative Learning: Theory and implementation in the julia package JudiLing. Manuscript, University of Tübingen.
  43. Ho, A. T. (1976). The acoustic variation of Mandarin tones. Phonetica, 33(5):353–367.
  44. Howie, J. M. (1974). On the domain of tone in mandarin. Phonetica, 30(3):129–148.
  45. Models of visual word recognition: sampling the state of the art. Journal of Experimental Psychology: Human perception and performance, 20(6):1311.
  46. Kapatsinski, V. (2022). The logistic perceptron accounts for rank frequency effects in lexical processing. In Nixon, J., Tomaschek, F., and Baayen, R. H., editors, Proceedings of the Second International Conference on Error-Driven Learning in Language (EDLL 2022), pages 16–17.
  47. Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 dutch mono-and disyllabic words and nonwords. Frontiers in psychology, 1:174.
  48. The british lexicon project: Lexical decision data for 28,730 monosyllabic and disyllabic english words. Behavior research methods, 44(1):287–304.
  49. Reassessing word frequency as a determinant of word recognition for skilled and unskilled readers. Journal of Experimental Psychology: Human Perception and Performance, 39(3):802.
  50. Introduction to latent semantic analysis. Discourse Processes, 25:259–284.
  51. Lee, C.-Y. (2007). Does horse activate mother? processing lexical tone in form priming. Language and Speech, 50(1):101–123.
  52. Luo, X. (2021). JudiLing: An implementation for Linear Discriminative Learning in JudiLing (unpublished Master's thesis).
  53. MacWhinney, B. (2000). The CHILDES project. Tools for Analyzing Talk. Part, 1.
  54. An interactive activation model of context effects in letter perception: I. an account of basic findings. Psychological review, 88(5):375.
  55. Explorations in parallel distributed processing: A handbook of models, programs, and exercises. MIT press.
  56. Keeping it simple: Implementation and performance of the proto-principle of adaptation and learning in the language sciences. arXiv preprint arXiv:2003.03813.
  57. Nonlinearities in bilingual visual word recognition: An introduction to generalized additive modeling. Bilingualism: Language and Cognition, 24(5):825–832.
  58. Morton, J. (1969). Interaction of information in word recognition. Psychological review, 76(2):165.
  59. Morton, J. (1979a). Facilitation in word recognition: Experiments causing change in the logogen model. Processing of visible language, pages 259–268.
  60. Morton, J. (1979b). Word recognition. Psycholinguistics: Series 2. Structures and processes, pages 107–156.
  61. Serial mechanisms in lexical access: the rank hypothesis. Psychological Review, 111(3):721.
  62. Comprehension, production and processing of maltese plurals in the discriminative lexicon. PsyArXiv 10.31234/osf.io/rkath.
  63. Norris, D. (2006). The bayesian reader: explaining word recognition as an optimal bayesian decision process. Psychological review, 113(2):327.
  64. Norris, D. (2013). Models of visual word recognition. Trends in cognitive sciences, 17(10):517–524.
  65. Nusbaum, H. C. (1985). A stochastic account of the relationship between lexical density and word frequency. Technical report, Indiana University. Research on Speech Perception, Progress Report #11.
  66. R Core Team (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  67. Reicher, G. M. (1969). Perceptual recognition as a function of meaningfulness of stimulus material. Journal of experimental psychology, 81(2):275.
  68. Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological review, 74(1):71.
  69. Classical conditioning II: current research and theory, chapter A theory of Pavlovian conitioning: variations in the effectiveness of reinforcement and nonreinforcement, pages 64–99. Appleton-Century-Crofts, New York.
  70. Homographic entries in the internal lexicon. Journal of verbal learning and verbal behavior, 9(5):487–494.
  71. Homographic entries in the internal lexicon: Effects of systematicity and relative frequency of meanings. Journal of verbal learning and verbal behavior, 10(1):57–62.
  72. Learning by error backpropagation. In Parallel Distributed Processing, volume 1. MIT press.
  73. An interactive activation model of context effects in letter perception: Ii. the contextual enhancement effect and some tests and extensions of the model. Psychological review, 89(1):60.
  74. childes-db: A flexible and reproducible interface to the child language data exchange system. Behavior research methods, 51(4):1928–1941.
  75. Durational differences of word-final/s/emerge from the lexicon: Modelling morpho-phonetic effects in pseudowords with linear discriminative learning. Frontiers in Psychology, 12:2983.
  76. A distributed, developmental model of word recognition and naming. Psychological review, 96(4):523.
  77. Ldl-auris: a computational model, grounded in error-driven learning, for the comprehension of single spoken words. Language, Cognition and Neuroscience, pages 1–28.
  78. Shmueli, G. (2010). To explain or to predict? Statistical Science, pages 289–310.
  79. Morpho-phonetic effects in speech production: Modeling the acoustic duration of english derived words with linear discriminative learning. Frontiers in Psychology, 12.
  80. Chinese lexical database (cld) a large-scale lexical database for simplified mandarin chinese. Behavior Research Methods, 50:2606–2629.
  81. Practice makes perfect: The consequences of lexical proficiency for articulation. Linguistic Vanguard, 4:1–13.
  82. Adaptive switching circuits. 1960 WESCON Convention Record Part IV.
  83. Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B), 73(1):3–36.
  84. Pitch targets and their realization: Evidence from Mandarin Chinese. Speech communication, 33(4):319–337.
  85. Wikipedia2Vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 23–30. Association for Computational Linguistics.
Citations (8)

Summary

We haven't generated a summary for this paper yet.