Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens (2304.11389v2)

Published 22 Apr 2023 in cs.CL

Abstract: Recent psycholinguistic studies have drawn conflicting conclusions about the relationship between the quality of a LLM and the ability of its surprisal estimates to predict human reading times, which has been speculated to be due to the large gap in both the amount of training data and model capacity across studies. The current work aims to consolidate these findings by evaluating surprisal estimates from Transformer-based LLM variants that vary systematically in the amount of training data and model capacity on their ability to predict human reading times. The results show that surprisal estimates from most variants with contemporary model capacities provide the best fit after seeing about two billion training tokens, after which they begin to diverge from humanlike expectations. Additionally, newly-trained smaller model variants reveal a 'tipping point' at convergence, after which the decrease in LLM perplexity begins to result in poorer fits to human reading times. These results suggest that the massive amount of training data is mainly responsible for the poorer fit achieved by surprisal from larger pre-trained LLMs, and that a certain degree of model capacity is necessary for Transformer-based LLMs to capture humanlike expectations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Byung-Doh Oh (9 papers)
  2. William Schuler (15 papers)
Citations (18)