This paper investigates the cognitive impact of using a LLM like ChatGPT compared to a traditional web search engine or no external tools ("Brain-only") for essay writing. The paper involved 54 participants across three groups (LLM, Search Engine, Brain-only) completing essay writing tasks over four sessions. Data was collected using Electroencephalography (EEG) to measure brain activity, NLP to analyze essays, and post-task interviews. The research aimed to understand how different tools affect essay quality, cognitive load, brain activity patterns, memory, and perceived ownership of the written work.
Experimental Design
The paper assigned participants to one of three groups:
- LLM Group: Used only OpenAI's GPT-4o.
- Search Engine Group: Used any website except LLMs (primarily Google).
- Brain-only Group: Used no external tools, relying solely on their knowledge.
Participants completed three essay writing sessions using their assigned tool(s) on different SAT prompts. A subset of 18 participants completed a fourth session where the LLM and Brain-only groups switched conditions (LLM-to-Brain and Brain-to-LLM) and wrote on topics they had previously addressed. Each essay writing task was limited to 20 minutes.
Data collection involved:
- EEG: Recording brain activity using a 32-electrode headset during the essay writing task.
- NLP Analysis: Analyzing the written essays for various linguistic features like Named Entity Recognition (NER), n-grams, topic ontology, and calculating similarities/distances between texts.
- Interviews: Conducting post-session interviews to gather subjective feedback on tool usage, strategy, quoting ability, and essay ownership.
- Scoring: Essays were scored by human teachers and an AI judge based on metrics like uniqueness, content, language, structure, and organization.
Key Findings
The paper revealed significant differences across groups in neural activity, essay characteristics, and participant perceptions:
1. Neural Connectivity Patterns (EEG Analysis)
- Overall Connectivity: The "Brain-only" group consistently showed the strongest and most widespread neural network connectivity across all measured frequency bands (Alpha, Beta, Theta, Delta). The "Search Engine" group exhibited intermediate connectivity, while the "LLM" group showed the weakest overall coupling. This suggests that relying less on external tools demanded greater internal cognitive coordination.
- Band-Specific Differences:
- Alpha (8-12 Hz): Higher in Brain-only, associated with internal attention and semantic processing. Lower in LLM, suggesting less reliance on internally generated ideas. Search Engine showed engagement related to visual attention.
- Beta (12-30 Hz): Higher overall in Brain-only, reflecting sustained cognitive and motor engagement. Search Engine showed beta linked to visuo-spatial processing (e.g., scrolling). LLM showed some beta, possibly for procedural fluency (typing).
- Theta (4-8 Hz): Significantly higher in Brain-only, strongly associated with working memory load and executive control. Lower in LLM, consistent with reduced working memory burden due to AI scaffolding. Search Engine showed less extensive theta networking than Brain-only.
- Delta (0.1-4 Hz): Most pronounced difference, far higher in Brain-only, suggesting recruitment of broad, low-frequency networks for integrative processes, potentially including memory and emotional content. Much weaker in Search Engine and LLM, possibly reflecting a more externally oriented or shallow processing mode.
- Information Flow: Brain-only showed greater "bottom-up" flow (posterior to frontal), potentially representing internal idea generation. LLM users showed more "top-down" flow (frontal to posterior), suggesting integration and filtering of external (AI) input.
- Session 4 Insights:
- LLM-to-Brain: When previously LLM users wrote without tools (Session 4), their neural connectivity was lower than Brain-only participants in earlier sessions (Sessions 2 & 3), especially in Alpha and Beta bands. This indicates reduced engagement in self-driven elaboration and critical scrutiny after prior LLM use, potentially suggesting "skill atrophy."
- Brain-to-LLM: When previously Brain-only users were introduced to LLMs (Session 4), they showed a significant increase in connectivity across all bands compared to their prior Brain-only sessions (especially Session 1), suggesting high cognitive load related to integrating the new tool's output.
2. Linguistic Analysis (NLP Analysis)
- Essay Homogeneity: Essays from the LLM group were the most homogeneous within topics, suggesting a convergence towards typical LLM-generated phrasing and structures. Brain-only essays were the most variable.
- Named Entities (NER): The LLM group used significantly more named entities (people, places, dates), followed by Search Engine, then Brain-only.
- N-grams: Analysis revealed distinct n-gram patterns per group and topic. For example, Brain-only frequently used more conceptual or introspective phrases ("true happi", "benefit other"), while Search Engine sometimes showed bias towards popular search terms ("homeless person"). LLM-generated text showed a bias towards third-person address. Session 4 analysis indicated that participants sometimes reused vocabulary from their previous tool usage.
- Ontology: Ontological analysis of essay concepts showed that LLM and Search Engine groups had overlapping conceptual structures, distinct from the Brain-only group.
- AI Judge vs. Human Teachers: An AI judge tended to give higher scores for uniqueness and content than human teachers, who were more skeptical of AI-generated uniformity and recognized patterns associated with LLM use (e.g., standard structures, lack of personal nuance).
3. Behavioral Insights (Interviews)
- Quoting Ability: The most striking behavioral difference was in the ability to recall quotes from their essays. LLM users performed significantly worse, especially in early sessions, with many unable to provide any correct quotes. This impairment persisted somewhat in later sessions. Brain-only and Search Engine groups had much better quoting ability and accuracy. This correlates with the neural findings suggesting shallower encoding in the LLM group.
- Essay Ownership: Brain-only participants reported the highest sense of ownership over their essays. LLM users often reported fragmented or low ownership, feeling dissociated from the tool-generated content. Search Engine users had moderate ownership. This aligns with reduced self-monitoring and evaluation networks in the LLM group.
- Reflections: LLM users sometimes found the output robotic and felt compelled to edit for personalization. Some questioned the need for AI for certain prompts or felt "analysis-paralysis." Search Engine users appreciated having diverse opinions but felt excluded from AI innovation. Brain-only users valued autonomy and focusing on their own thoughts/experiences. Ethical discomfort regarding AI use was also reported.
Synthesis and Practical Implications
The paper concludes that using LLMs for tasks like essay writing, while potentially increasing efficiency and content generation speed (as suggested by homogeneity and NER usage), may come at a significant cognitive cost, leading to "cognitive debt."
- Cognitive Offloading: LLMs appear to facilitate cognitive offloading, reducing the immediate cognitive load (working memory, executive control) required for deep internal processing, planning, and idea generation, as evidenced by lower neural connectivity in LLM users.
- Impact on Learning: This offloading may negatively impact key learning processes:
- Memory: Reduced engagement of memory encoding networks may lead to poorer retention and recall (demonstrated by quoting difficulties).
- Critical Thinking & Creativity: Lower connectivity in networks associated with self-driven ideation and critical evaluation might result in less unique or critically analyzed content. N-gram patterns and AI/human scoring discrepancies support this.
- Ownership & Agency: The sense of psychological ownership and cognitive agency over the work appears diminished when relying heavily on external generation.
- Tool Differences: Search engines promote a different cognitive mode, involving visual scanning and integration of diverse external sources, leading to intermediate cognitive engagement patterns compared to LLMs or Brain-only work.
- Session 4 Implications: The findings from Session 4 suggest that prior LLM use may hinder subsequent performance on the same task without the tool, as participants show reduced neural engagement compared to those with prior unassisted practice. Conversely, introducing LLMs after initial unassisted practice may induce high cognitive integration, potentially a more beneficial sequence for learning.
- Energy Cost: The paper also briefly highlights the significantly higher energy consumption of LLM queries compared to search queries, an important environmental and economic consideration.
Limitations and Future Work
The paper's limitations include a relatively small sample size from a specific academic demographic, the use of a single LLM (ChatGPT), and a focus solely on the essay writing task in an educational context.
Future work should involve:
- Larger, more diverse participant samples.
- Comparison across multiple LLMs and multimodal AI tools.
- Breaking down tasks into sub-components (e.g., idea generation, drafting, revising) for more granular analysis.
- Including fMRI to capture deeper brain regions involved in memory and cognition.
- Longitudinal studies to assess long-term impacts on skill development.
- Exploring hybrid strategies that balance AI assistance with required self-driven cognitive effort.
- Developing methods to identify AI-generated text based on stylistic "fingerprinting" of human writing.
Conclusion
The paper concludes that while LLMs offer efficiency benefits, their use in learning tasks like essay writing may lead to the accumulation of cognitive debt. This debt manifests as reduced engagement of neural networks crucial for deep processing, memory formation, and critical thinking, potentially impacting long-term skill development and a sense of ownership over one's work. A careful, balanced approach to integrating AI in education is necessary to leverage its benefits without compromising fundamental cognitive skills and intellectual autonomy.