Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students' Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis

Published 3 Apr 2026 in cs.SI, cs.AI, and cs.HC | (2604.03022v1)

Abstract: Problem solving plays an essential role in science education, and generative AI (GAI) chatbots have emerged as a promising tool for supporting students' science problem solving. However, general-purpose chatbots (e.g., ChatGPT), which often provide direct, ready-made answers, may lead to students' cognitive offloading. Prior research has rarely focused on custom chatbots for facilitating students' science problem solving, nor has it examined how they differently influence problem-solving processes and performance compared to general-purpose chatbots. To address this gap, we developed a pedagogy-informed custom GAI chatbot grounded in the Socratic questioning method, which supports students by prompting them with guiding questions. This study employed a within-subjects counterbalanced design in which 48 secondary school students used both custom and general-purpose chatbot to complete two science problem-solving tasks. 3297 student-chatbot dialogues were collected and analyzed using Heterogeneous Interaction Network Analysis (HINA). The results showed that: (1) students demonstrated significantly higher interaction intensity and cognitive interaction diversity when using custom chatbot than using general-purpose chatbot; (2) students were more likely to follow custom chatbot's guidance to think and reflect, whereas they tended to request general-purpose chatbot to execute specific commands; and (3) no statistically significant difference was observed in students' problem-solving performance evaluated by solution quality between two chatbot conditions. This study provides novel theoretical insights and empirical evidence that custom chatbots are less likely to induce cognitive offloading and instead foster greater cognitive engagement compared to general-purpose chatbots. This study also offers insights into the design and integration of GAI chatbots in science education.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that a pedagogy-informed custom chatbot significantly increased interaction intensity and cognitive diversity compared to a general-purpose chatbot.
Heterogeneous Interaction Network Analysis revealed distinct multi-turn, exploratory inquiry patterns versus direct answer requests in student interactions.
Despite similar solution quality, the enriched process-level engagement indicates the benefit of using process-oriented AI design in education.

Comparative Analysis of Pedagogically Informed Custom Versus General-Purpose GAI Chatbots in Science Problem Solving

Introduction

This paper delivers a systematic investigation of how pedagogy-informed custom generative AI (GAI) chatbots, leveraging Socratic prompting, differentially influence students’ science problem-solving interactions and outcomes compared to general-purpose GAI chatbots within a secondary education context. Recognizing the limitations of generalized LLM-based assistants—which frequently foster cognitive offloading due to their direct-answering design—the authors posit that custom chatbots, engineered for science education and Socratic inquiry, could mitigate this pitfall by scaffolding and diversifying student cognitive engagement.

Experimental Design and Methodological Approach

A within-subject, counterbalanced study involving 48 secondary students was implemented. Each participant completed parallel science problem-solving tasks with both a custom, pedagogy-driven chatbot (Socratic, RAG-equipped Gemini 2.5 Flash) and a general-purpose chatbot (baseline Gemini 2.5 Flash), with both systems deployed via a homogeneous frontend to obviate confounding variables linked to usability. All dialogs were automatically logged, yielding a substantial dataset (n=3297).

Problem-solving processes were dissected utilizing Heterogeneous Interaction Network Analysis (HINA), enabling multi-level, quantitative characterization of interaction intensity, cognitive diversity, and dialogical patterning at both node and dyad levels. Students’ written solutions were independently scored for correctness and completeness, with linear mixed-effects models used to assess performance differences attributable to chatbot type, controlling for topic and task order.

Key Findings

Interaction Intensity and Cognitive Diversity

The authors report statistically significant increases in both interaction intensity and cognitive interaction diversity when students used the Socratic custom chatbot. Specifically, average interaction counts per student grew (M=21.17 for custom vs. M=12.21 for general-purpose; Wilcoxon Z= 4.087, p < 0.001, r = 0.590), and the spread of cognitive strategies was broader (M=0.420 vs. M=0.299; paired t-test t=3.301, p=0.004, d=0.44). These results indicate finer-grained engagement, with students not merely soliciting answers but cycling through distinct forms of inquiry, reflection, and metacognitive activity when supported by a Socratic conversational structure.

Dialogical and Problem-Solving Patterns

HINA visualization and dyadic analysis revealed that students interacting with the custom chatbot predominantly engaged in multi-turn, exploratory inquiries, persistently following heuristic steps, refining solutions iteratively, and engaging in clarification. In contrast, interactions with the general-purpose chatbot were dominated by direct requests (answer copying, translation requests, and formatting adjustments), with minimal recursive reasoning or revision. Socio-emotional and self-disclosure behaviors were also more salient in custom chatbot dialogues, reflecting a richer human-AI engagement environment.

Problem-Solving Performance

Despite marked differences in interaction process measures, no statistically significant difference was detected in solution quality between chatbot conditions (linear mixed model main effect: F=1.521, p=0.224). The mean solution scores trended higher for the custom chatbot (EMM=3.317) versus the general-purpose chatbot (EMM=2.937), but this did not achieve statistical significance. Task context and order also had no discernible effect, confirming robustness of the null outcome regarding performance.

Theoretical and Practical Implications

The dissociation between enriched cognitive engagement and parity in problem-solving outcomes highlights a key risk in relying solely on product metrics in AI-assisted learning research: superficial performance similarity conceals deep distinctions in underlying cognitive effort, strategy repertoire, and agency. The Socratic chatbot’s ability to foster a coupled cognitive system, compelling students to retain ownership of reasoning processes, is consistent with contemporary extended cognition frameworks and mitigates metacognitive laziness associated with AI-driven cognitive substitution.

The findings validate critiques that general-purpose GAI chatbots—though strong at solution generation—undermine active learning by serving as cognitive shortcuts rather than scaffolds. This underscores the need for pedagogically aligned LLM customizations using prompt engineering and RAG, not only for accuracy but for epistemic and metacognitive alignment with educational objectives.

Limitations and Future Directions

This study is limited by a non-mixed-gender sample and a focus on immediate product outcomes; future work should integrate longitudinal designs examining retention, transfer, and knowledge gains, as well as socially situated variables like gender. The HINA framework could be extended to large-scale datasets and multimodal AI support systems, evaluating the scalability of pedagogy-informed chatbot architectures and their impact on diverse cognitive and affective outcomes.

Conclusion

Pedagogy-informed custom GAI chatbots, designed with iterative Socratic prompting and contextual retrieval, induce significantly stronger and more diverse cognitive interactions in science problem solving compared to general-purpose chatbots. While outcome-level performance equivalence persists, the process-centric gains highlight the superiority of custom chatbots for cognitive engagement and the necessity of process-level analytics in AI-supported educational research. Future work should emphasize the synergy between instructional design and AI capabilities, driving the evolution of GAI systems toward authentic cognitive augmentation in STEM education.