Alignment between LLM perplexity and human neurophysiological responses to confusing code

Determine whether large language models assign higher perplexity to code snippets that contain atoms of confusion compared to clean, functionally equivalent snippets, and ascertain whether token-level perplexity from large language models aligns with human neurophysiological responses during program comprehension.

Background

Atoms of confusion are short, syntactically valid code patterns known to impair human comprehension across languages. Prior behavioral and neurophysiological studies show that such patterns increase error rates, visual effort, and elicit distinct EEG fixation-related potentials indicative of confusion.

Perplexity is a probabilistic measure of model uncertainty in predicting tokens. In both natural language and source code, higher perplexity has been linked to increased processing difficulty for humans. However, whether this metric reflects confusion specifically for atoms of confusion and whether it aligns with human neurophysiological responses during program comprehension remained unestablished at the point this question was posed.

References

While atoms of confusion are known to induce confusion, and perplexity has been linked to comprehension difficulty in isolated studies, it is unclear whether LLMs assign higher perplexity to these known confusing constructs or whether LLM perplexity aligns with human neurophysiological responses during program comprehension.

— How do Humans and LLMs Process Confusing Code? (2508.18547 - Abdelsalam et al., 25 Aug 2025) in Section 2.4, Research Gap

Alignment between LLM perplexity and human neurophysiological responses to confusing code

Background

References

Related Problems