Modeling cognitive processes of natural reading with transformer-based Language Models (2505.11485v1)

Published 16 May 2025 in cs.CL and cs.AI

Abstract: Recent advances in NLP have led to the development of highly sophisticated LLMs for text generation. In parallel, neuroscience has increasingly employed these models to explore cognitive processes involved in language comprehension. Previous research has shown that models such as N-grams and LSTM networks can partially account for predictability effects in explaining eye movement behaviors, specifically Gaze Duration, during reading. In this study, we extend these findings by evaluating transformer-based models (GPT2, LLaMA-7B, and LLaMA2-7B) to further investigate this relationship. Our results indicate that these architectures outperform earlier models in explaining the variance in Gaze Durations recorded from Rioplantense Spanish readers. However, similar to previous studies, these models still fail to account for the entirety of the variance captured by human predictability. These findings suggest that, despite their advancements, state-of-the-art LLMs continue to predict language in ways that differ from human readers.

Authors (3)

Bruno Bianchi (3 papers)
Fermín Travi (1 paper)
Juan E. Kamienkowski (2 papers)

Summary

Modeling Cognitive Processes of Natural Reading with Transformer-based LLMs

The paper under review investigates the potential applicability of modern transformer-based LLMs (LMs) in explaining cognitive phenomena associated with natural reading. The authors explore whether the sophisticated text-generative capabilities of state-of-the-art models like GPT2, LLaMA-7B, and LLaMA2-7B can adequately account for the human reading processes as captured by eye movement metrics, specifically focusing on Gaze Duration (GD).

Research Context

NLP models have gradually increased in complexity to produce remarkably fluent text generation. In tandem, cognitive neuroscience has leveraged these models to offer insights into the neural mechanisms underlying language comprehension. However, current models still fall short in explaining the complete variance observed in human predictability as evidenced by reading behaviors, most notably through measures like GD. Past efforts using traditional models such as N-grams and LSTMs have demonstrated partial success in these applications.

Methodology

The authors utilized GD data from native Rioplantense Spanish readers for eight narrative texts. These metrics were complemented by analyses of both cloze and computational predictability (cloze-Pred and comp-Pred) using the outputs from several LMs. The cloze-Pred values were gathered from a significant participant pool through cloze tasks, providing a baseline for comparisons. In the computational domain, models like GPT2 were fine-tuned using corpora from narrative stories and regionally relevant Spanish (Rioplatense Spanish). Predictabilities were then statistically analyzed using Linear Mixed Models (LMMs) against established eye movement metrics.

Results

The results underscore that predictions by transformer-based architectures generally outperform those from earlier models in capturing GD variance. Specifically, GPT2 and the LLaMA variations demonstrated improved performance over traditional LSTM models, although the difference between the fine-tuned GPT2 models was negligible due to the small size of the fine-tuning datasets. Importantly, LLaMA2-7B was identified as having particularly strong explanatory power, suggesting its predictions better approximate human cloze-Pred than its predecessor, LLaMA-7B, thereby indicating its reduced reliance on lexical frequency.

Discussion and Implications

This paper reveals advancements in LMs’ ability to simulate human-like language predictability during reading, showcasing their potential utility in cognitive neuroscience research. However, it is important to acknowledge that even the most advanced models leave significant unexplained variance in comparison to cloze-Pred. The nuanced results call for consideration of both the strengths and limitations of NLP models in capturing cognitive phenomena accurately.

These findings suggest practical applications in both fields, such as enhancing models that mimic human reading processes for use in AI-driven educational tools or more accurate brain-cognitive interfaces. Theoretically, they also invite further exploration into the architectures that can mirror human-like linguistic processing—potentially informing AI model adjustments and contributing to neuroscience's understanding of language cognition.

Speculation on Future Developments

Future efforts could focus on broadening the linguistic diversity of test subjects and modeling inputs. Extending analyses to underrepresented languages and dialects may reveal deeper insights, both culturally enriching technology and offering equitable AI solutions to diverse linguistic communities. Moreover, scaling up fine-tuning datasets could lead to substantial optimization, enhancing the latent capabilities of these models.

Additionally, as transformer models evolve, it is likely that their enhanced contextual abilities and parameter richness will facilitate increasingly accurate cognitive approximations. Continued interchange between AI model development and cognitive neuroscientific research has the potential to propel both fields into new innovative directions, creating tools better aligned with human cognitive processes.

This research contributes to a nuanced understanding of computational text predictability and its interaction with human linguistic cognition, forwarding dialogue between neuroscience and artificial intelligence.

Related Papers

Find Related Papers

Tweets

https://twitter.com/zuckerbarge/status/1924427968283115848

https://twitter.com/PapersInML/status/1924526275684659343

YouTube

Show All Videos