- The paper demonstrates that a pedagogy-informed custom chatbot significantly increased interaction intensity and cognitive diversity compared to a general-purpose chatbot.
- Heterogeneous Interaction Network Analysis revealed distinct multi-turn, exploratory inquiry patterns versus direct answer requests in student interactions.
- Despite similar solution quality, the enriched process-level engagement indicates the benefit of using process-oriented AI design in education.
Introduction
This paper delivers a systematic investigation of how pedagogy-informed custom generative AI (GAI) chatbots, leveraging Socratic prompting, differentially influence students’ science problem-solving interactions and outcomes compared to general-purpose GAI chatbots within a secondary education context. Recognizing the limitations of generalized LLM-based assistants—which frequently foster cognitive offloading due to their direct-answering design—the authors posit that custom chatbots, engineered for science education and Socratic inquiry, could mitigate this pitfall by scaffolding and diversifying student cognitive engagement.
Experimental Design and Methodological Approach
A within-subject, counterbalanced study involving 48 secondary students was implemented. Each participant completed parallel science problem-solving tasks with both a custom, pedagogy-driven chatbot (Socratic, RAG-equipped Gemini 2.5 Flash) and a general-purpose chatbot (baseline Gemini 2.5 Flash), with both systems deployed via a homogeneous frontend to obviate confounding variables linked to usability. All dialogs were automatically logged, yielding a substantial dataset (n=3297).
Problem-solving processes were dissected utilizing Heterogeneous Interaction Network Analysis (HINA), enabling multi-level, quantitative characterization of interaction intensity, cognitive diversity, and dialogical patterning at both node and dyad levels. Students’ written solutions were independently scored for correctness and completeness, with linear mixed-effects models used to assess performance differences attributable to chatbot type, controlling for topic and task order.
Key Findings
Interaction Intensity and Cognitive Diversity
The authors report statistically significant increases in both interaction intensity and cognitive interaction diversity when students used the Socratic custom chatbot. Specifically, average interaction counts per student grew (M=21.17 for custom vs. M=12.21 for general-purpose; Wilcoxon Z= 4.087, p < 0.001, r = 0.590), and the spread of cognitive strategies was broader (M=0.420 vs. M=0.299; paired t-test t=3.301, p=0.004, d=0.44). These results indicate finer-grained engagement, with students not merely soliciting answers but cycling through distinct forms of inquiry, reflection, and metacognitive activity when supported by a Socratic conversational structure.
Dialogical and Problem-Solving Patterns
HINA visualization and dyadic analysis revealed that students interacting with the custom chatbot predominantly engaged in multi-turn, exploratory inquiries, persistently following heuristic steps, refining solutions iteratively, and engaging in clarification. In contrast, interactions with the general-purpose chatbot were dominated by direct requests (answer copying, translation requests, and formatting adjustments), with minimal recursive reasoning or revision. Socio-emotional and self-disclosure behaviors were also more salient in custom chatbot dialogues, reflecting a richer human-AI engagement environment.
Despite marked differences in interaction process measures, no statistically significant difference was detected in solution quality between chatbot conditions (linear mixed model main effect: F=1.521, p=0.224). The mean solution scores trended higher for the custom chatbot (EMM=3.317) versus the general-purpose chatbot (EMM=2.937), but this did not achieve statistical significance. Task context and order also had no discernible effect, confirming robustness of the null outcome regarding performance.
Theoretical and Practical Implications
The dissociation between enriched cognitive engagement and parity in problem-solving outcomes highlights a key risk in relying solely on product metrics in AI-assisted learning research: superficial performance similarity conceals deep distinctions in underlying cognitive effort, strategy repertoire, and agency. The Socratic chatbot’s ability to foster a coupled cognitive system, compelling students to retain ownership of reasoning processes, is consistent with contemporary extended cognition frameworks and mitigates metacognitive laziness associated with AI-driven cognitive substitution.
The findings validate critiques that general-purpose GAI chatbots—though strong at solution generation—undermine active learning by serving as cognitive shortcuts rather than scaffolds. This underscores the need for pedagogically aligned LLM customizations using prompt engineering and RAG, not only for accuracy but for epistemic and metacognitive alignment with educational objectives.
Limitations and Future Directions
This study is limited by a non-mixed-gender sample and a focus on immediate product outcomes; future work should integrate longitudinal designs examining retention, transfer, and knowledge gains, as well as socially situated variables like gender. The HINA framework could be extended to large-scale datasets and multimodal AI support systems, evaluating the scalability of pedagogy-informed chatbot architectures and their impact on diverse cognitive and affective outcomes.
Conclusion
Pedagogy-informed custom GAI chatbots, designed with iterative Socratic prompting and contextual retrieval, induce significantly stronger and more diverse cognitive interactions in science problem solving compared to general-purpose chatbots. While outcome-level performance equivalence persists, the process-centric gains highlight the superiority of custom chatbots for cognitive engagement and the necessity of process-level analytics in AI-supported educational research. Future work should emphasize the synergy between instructional design and AI capabilities, driving the evolution of GAI systems toward authentic cognitive augmentation in STEM education.