- The paper reports that AI-assisted programming using Copilot yields significantly higher immediate performance and reduced workload compared to human pairing.
- The study finds that human pair programming supports stronger learning retention and positive emotional engagement despite slower task completion.
- Methodologically, a within-subject design with 22 participants and robust regression analyses validated findings on cognitive load and affect in both paradigms.
Controlled Comparison of AI-Assisted and Human Pair Programming Paradigms
Experimental Design and Methodology
This paper presents a methodologically rigorous, within-subjects study involving 22 novice and intermediate Python programmers to examine differences between human-human pair programming and AI-assisted programming via GitHub Copilot. Participants completed task pools from HumanEval under time pressure in both paradigms, then returned after a one-week retention interval to individually retest on the same tasks. Objective measures included programming performance (task completion and speed), retest performance (learning retention), and subjective measures captured workload (NASA-TLX dimensions) and affect (valence/arousal ratings per the Control-Value Theory). Regression analyses utilized cluster robust inference procedures to account for small sample size and pairwise nesting.
Quantitative Results
Participants achieved significantly higher programming scores with Copilot (mean +14 points out of 100, padj​<.001, Hedges' g=0.99), completing more tasks and finishing faster compared to human teams. However, self-grades for individual performance showed no significant difference between paradigms. Qualitative responses confirmed Copilot's advantage in speed, knowledge retrieval, and optimality but also emphasized its tendency to dominate the interaction, reducing perceived contribution and feelings of collaborative engagement.
Learning Retention
Retest performance (proxy for learning) evidenced a larger decrement for tasks initially completed with Copilot than those done in human pairs (mean -19 points, padj​=0.054, Hedges' g=1.13), though absolute retest scores were statistically comparable between groups. Notably, stronger human teammates exhibited a marginally significant reduction in retest scores for Copilot tasks (padj​=0.054, g =0.28), indicating potential impairment in deeper engagement. Human-human pairing promoted more active discussion and articulation of strategies, supporting metacognitive processing linked to retention.
Workload
Mental demand, temporal demand, and effort were substantially reduced in the Copilot condition (padj​<0.01, g>1.2), confirming Copilot's capacity to offload cognitive burden, accelerate task completion, and lower perceived exhaustion. Frustration levels were low and did not differ significantly. Participants cited Copilot's ease of use and absence of judgment, though occasionally noted its inability to comprehend implicit context.
Affect: Emotional Valence and Arousal
Human pair programming induced significantly greater increases in positive valence and arousal than AI-assisted programming (padj​<0.005, g≥1.6). Responses highlighted heightened fulfillment, excitement, and social energy during human collaboration, whereas Copilot interactions were affectively flat. The Control-Value Theory interpretation holds: human teams bolster activity and social value, leading to high-arousal positive emotions; Copilot inflates outcome value but attenuates control and intrinsic engagement.
Theoretical Implications and Context
This work synthesizes Control-Value Theory and Cognitive Load Theory to elucidate how task modality affects affect, workload, and learning. Copilot's outcome-focused, sycophantic interaction style offers mechanical advantages—speed and reliability—at the cost of diminished social reward, epistemic conflict, and metacognitive activation. The evidence suggests a risk of "AI obscurity": learners attribute successful outcomes less to self and more to AI, blunting pride and self-efficacy development [(2604.18538), pekrun_control-value_2024, kallia_be_2025]. Learning theorists argue active engagement, discussion, and conflict are necessary for durable knowledge, supported here by stronger retention and affect in the human paradigm.
Prior literature corroborates the findings: commodity AI tools increase novice performance [gardella_performance_2024, kazemitabaar_studying_2023], but adaptation as a learning resource rather than avoidance is critical [kazemitabaar_how_2024, prather_its_2024]. Meta-analyses confirm pair programming's efficacy for novice enjoyment and learning, suggesting unique benefits from collaborative synergy and two-way communication [umapathy_meta-analysis_2017, hawlitschek_empirical_2023]. The emotional and motivational circuits activated through human teamwork are largely absent from current AI tools.
Practical Recommendations and AI Integration Strategies
The paper recommends that educators should reconsider the educational value of human pair programming, especially given its superior affective and lasting learning outcomes despite lower immediate productivity. Copilot remains an excellent low-stakes resource for speed and knowledge retrieval, but excessive reliance risks eroding metacognition, collaboration, and autonomy. The authors speculate that hybrid paradigms—human-human teams augmented by strategic, rate-limited AI support—may offer best-of-both-worlds scenarios, provided social dynamics and collaborative interaction are not displaced [lyu_will_2025]. Careful guardrails for AI usage, incentives for delayed help-seeking, and moderation of Copilot's proactive suggestions are advised to preserve constructive engagement.
Implications for future AI tools include the development of multi-modal, dialogically rich, controllable AI partners capable of supporting rather than supplanting authentic collaborative dynamics. Context specification, social scaffolding, rate-limiting, and Socratic mechanisms may be required to realize the educational affordances of AI without undermining the core human value of teamwork.
Limitations and Directions for Future Research
A small, selective sample limits generalizability; nevertheless, thorough counterbalancing and robust statistical procedures strengthen internal validity. The fixed time structure and nature of HumanEval tasks may have advantaged Copilot; future studies should examine less well-specified, messier tasks and naturalistic time allocations. Further, longitudinal studies with diverse populations could clarify how Copilot-induced learning behaviors translate into long-term skill development and autonomous problem-solving.
Conclusion
This controlled laboratory study demonstrates that novice programmers benefit from higher immediate performance and reduced workload in AI-assisted programming, but human pair programming yields greater positive emotion and comparable or superior learning retention. The findings underscore the necessity of maintaining autonomy, metacognitive engagement, and social interaction in educational contexts—even as AI tools proliferate. Optimal learning may be achieved by judicious integration of AI into human teams, carefully preserving the affective and cognitive benefits of authentic collaboration while leveraging AI's strengths in knowledge delivery and syntactic support.
Citation: "Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms" (2604.18538)