Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task (2506.08872v1)

Published 10 Jun 2025 in cs.AI

Abstract: This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed three sessions under the same condition. In a fourth session, LLM users were reassigned to Brain-only group (LLM-to-Brain), and Brain-only users were reassigned to LLM condition (Brain-to-LLM). A total of 54 participants took part in Sessions 1-3, with 18 completing session 4. We used electroencephalography (EEG) to assess cognitive load during essay writing, and analyzed essays using NLP, as well as scoring essays with the help from human teachers and an AI judge. Across groups, NERs, n-gram patterns, and topic ontology showed within-group homogeneity. EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use. In session 4, LLM-to-Brain participants showed reduced alpha and beta connectivity, indicating under-engagement. Brain-to-LLM users exhibited higher memory recall and activation of occipito-parietal and prefrontal areas, similar to Search Engine users. Self-reported ownership of essays was the lowest in the LLM group and the highest in the Brain-only group. LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.

PDF Abstract

This paper investigates the cognitive impact of using a LLM like ChatGPT compared to a traditional web search engine or no external tools ("Brain-only") for essay writing. The paper involved 54 participants across three groups (LLM, Search Engine, Brain-only) completing essay writing tasks over four sessions. Data was collected using Electroencephalography (EEG) to measure brain activity, NLP to analyze essays, and post-task interviews. The research aimed to understand how different tools affect essay quality, cognitive load, brain activity patterns, memory, and perceived ownership of the written work.

Experimental Design

The paper assigned participants to one of three groups:

LLM Group: Used only OpenAI's GPT-4o.
Search Engine Group: Used any website except LLMs (primarily Google).
Brain-only Group: Used no external tools, relying solely on their knowledge.

Participants completed three essay writing sessions using their assigned tool(s) on different SAT prompts. A subset of 18 participants completed a fourth session where the LLM and Brain-only groups switched conditions (LLM-to-Brain and Brain-to-LLM) and wrote on topics they had previously addressed. Each essay writing task was limited to 20 minutes.

Data collection involved:

EEG: Recording brain activity using a 32-electrode headset during the essay writing task.
NLP Analysis: Analyzing the written essays for various linguistic features like Named Entity Recognition (NER), n-grams, topic ontology, and calculating similarities/distances between texts.
Interviews: Conducting post-session interviews to gather subjective feedback on tool usage, strategy, quoting ability, and essay ownership.
Scoring: Essays were scored by human teachers and an AI judge based on metrics like uniqueness, content, language, structure, and organization.

Key Findings

The paper revealed significant differences across groups in neural activity, essay characteristics, and participant perceptions:

1. Neural Connectivity Patterns (EEG Analysis)

Overall Connectivity: The "Brain-only" group consistently showed the strongest and most widespread neural network connectivity across all measured frequency bands (Alpha, Beta, Theta, Delta). The "Search Engine" group exhibited intermediate connectivity, while the "LLM" group showed the weakest overall coupling. This suggests that relying less on external tools demanded greater internal cognitive coordination.
Band-Specific Differences:
- Alpha (8-12 Hz): Higher in Brain-only, associated with internal attention and semantic processing. Lower in LLM, suggesting less reliance on internally generated ideas. Search Engine showed engagement related to visual attention.
- Beta (12-30 Hz): Higher overall in Brain-only, reflecting sustained cognitive and motor engagement. Search Engine showed beta linked to visuo-spatial processing (e.g., scrolling). LLM showed some beta, possibly for procedural fluency (typing).
- Theta (4-8 Hz): Significantly higher in Brain-only, strongly associated with working memory load and executive control. Lower in LLM, consistent with reduced working memory burden due to AI scaffolding. Search Engine showed less extensive theta networking than Brain-only.
- Delta (0.1-4 Hz): Most pronounced difference, far higher in Brain-only, suggesting recruitment of broad, low-frequency networks for integrative processes, potentially including memory and emotional content. Much weaker in Search Engine and LLM, possibly reflecting a more externally oriented or shallow processing mode.
Information Flow: Brain-only showed greater "bottom-up" flow (posterior to frontal), potentially representing internal idea generation. LLM users showed more "top-down" flow (frontal to posterior), suggesting integration and filtering of external (AI) input.
Session 4 Insights:
- LLM-to-Brain: When previously LLM users wrote without tools (Session 4), their neural connectivity was lower than Brain-only participants in earlier sessions (Sessions 2 & 3), especially in Alpha and Beta bands. This indicates reduced engagement in self-driven elaboration and critical scrutiny after prior LLM use, potentially suggesting "skill atrophy."
- Brain-to-LLM: When previously Brain-only users were introduced to LLMs (Session 4), they showed a significant increase in connectivity across all bands compared to their prior Brain-only sessions (especially Session 1), suggesting high cognitive load related to integrating the new tool's output.

2. Linguistic Analysis (NLP Analysis)

Essay Homogeneity: Essays from the LLM group were the most homogeneous within topics, suggesting a convergence towards typical LLM-generated phrasing and structures. Brain-only essays were the most variable.
Named Entities (NER): The LLM group used significantly more named entities (people, places, dates), followed by Search Engine, then Brain-only.
N-grams: Analysis revealed distinct n-gram patterns per group and topic. For example, Brain-only frequently used more conceptual or introspective phrases ("true happi", "benefit other"), while Search Engine sometimes showed bias towards popular search terms ("homeless person"). LLM-generated text showed a bias towards third-person address. Session 4 analysis indicated that participants sometimes reused vocabulary from their previous tool usage.
Ontology: Ontological analysis of essay concepts showed that LLM and Search Engine groups had overlapping conceptual structures, distinct from the Brain-only group.
AI Judge vs. Human Teachers: An AI judge tended to give higher scores for uniqueness and content than human teachers, who were more skeptical of AI-generated uniformity and recognized patterns associated with LLM use (e.g., standard structures, lack of personal nuance).

3. Behavioral Insights (Interviews)

Quoting Ability: The most striking behavioral difference was in the ability to recall quotes from their essays. LLM users performed significantly worse, especially in early sessions, with many unable to provide any correct quotes. This impairment persisted somewhat in later sessions. Brain-only and Search Engine groups had much better quoting ability and accuracy. This correlates with the neural findings suggesting shallower encoding in the LLM group.
Essay Ownership: Brain-only participants reported the highest sense of ownership over their essays. LLM users often reported fragmented or low ownership, feeling dissociated from the tool-generated content. Search Engine users had moderate ownership. This aligns with reduced self-monitoring and evaluation networks in the LLM group.
Reflections: LLM users sometimes found the output robotic and felt compelled to edit for personalization. Some questioned the need for AI for certain prompts or felt "analysis-paralysis." Search Engine users appreciated having diverse opinions but felt excluded from AI innovation. Brain-only users valued autonomy and focusing on their own thoughts/experiences. Ethical discomfort regarding AI use was also reported.

Synthesis and Practical Implications

The paper concludes that using LLMs for tasks like essay writing, while potentially increasing efficiency and content generation speed (as suggested by homogeneity and NER usage), may come at a significant cognitive cost, leading to "cognitive debt."

Cognitive Offloading: LLMs appear to facilitate cognitive offloading, reducing the immediate cognitive load (working memory, executive control) required for deep internal processing, planning, and idea generation, as evidenced by lower neural connectivity in LLM users.
Impact on Learning: This offloading may negatively impact key learning processes:
- Memory: Reduced engagement of memory encoding networks may lead to poorer retention and recall (demonstrated by quoting difficulties).
- Critical Thinking & Creativity: Lower connectivity in networks associated with self-driven ideation and critical evaluation might result in less unique or critically analyzed content. N-gram patterns and AI/human scoring discrepancies support this.
- Ownership & Agency: The sense of psychological ownership and cognitive agency over the work appears diminished when relying heavily on external generation.
Tool Differences: Search engines promote a different cognitive mode, involving visual scanning and integration of diverse external sources, leading to intermediate cognitive engagement patterns compared to LLMs or Brain-only work.
Session 4 Implications: The findings from Session 4 suggest that prior LLM use may hinder subsequent performance on the same task without the tool, as participants show reduced neural engagement compared to those with prior unassisted practice. Conversely, introducing LLMs after initial unassisted practice may induce high cognitive integration, potentially a more beneficial sequence for learning.
Energy Cost: The paper also briefly highlights the significantly higher energy consumption of LLM queries compared to search queries, an important environmental and economic consideration.

Limitations and Future Work

The paper's limitations include a relatively small sample size from a specific academic demographic, the use of a single LLM (ChatGPT), and a focus solely on the essay writing task in an educational context.

Future work should involve:

Larger, more diverse participant samples.
Comparison across multiple LLMs and multimodal AI tools.
Breaking down tasks into sub-components (e.g., idea generation, drafting, revising) for more granular analysis.
Including fMRI to capture deeper brain regions involved in memory and cognition.
Longitudinal studies to assess long-term impacts on skill development.
Exploring hybrid strategies that balance AI assistance with required self-driven cognitive effort.
Developing methods to identify AI-generated text based on stylistic "fingerprinting" of human writing.

Conclusion

The paper concludes that while LLMs offer efficiency benefits, their use in learning tasks like essay writing may lead to the accumulation of cognitive debt. This debt manifests as reduced engagement of neural networks crucial for deep processing, memory formation, and critical thinking, potentially impacting long-term skill development and a sense of ownership over one's work. A careful, balanced approach to integrating AI in education is necessary to leverage its benefits without compromising fundamental cognitive skills and intellectual autonomy.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Nataliya Kosmyna (3 papers)
Eugene Hauptmann (1 paper)
Ye Tong Yuan (1 paper)
Jessica Situ (1 paper)
Xian-Hao Liao (1 paper)
Ashly Vivian Beresnitzky (1 paper)
Iris Braunstein (1 paper)
Pattie Maes (46 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/TheLincoln/status/1934593069724139866

https://twitter.com/physorg_com/status/1936213453070385158

https://twitter.com/fredericomunoz/status/1935453775864721658

https://twitter.com/Tharizdun03/status/1935805361946005605

https://twitter.com/Abe_Froman_SKC/status/1935408361459302426

https://twitter.com/miroyato/status/1935442097408012720

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task (2506.08872v1)

Related Papers

Tweets

YouTube

HackerNews

Reddit