Introduction
The application of LLMs as cognitive models in education sparks valuable discussion about their capabilities to replicate certain features of human cognition. An interesting dimension of this discussion revolves around whether LLMs exhibit biases in problem-solving tasks like those observed in human learners, particularly children. A recent investigation focused on this subject, dissecting various stages of problem-solving, including text comprehension, solution planning, and solution execution.
Cognitive Modeling of LLMs
The paper's approach to cognitive modeling suggests splitting the problem-solving process into distinct steps. Using a neuro-symbolic method, the researchers constructed a series of tests corresponding to each step. Their aim was to determine whether state-of-the-art LLMs display human-like biases during these steps. The tests were insightful, reflecting human biases in both text comprehension and solution planning. However, interestingly, LLMs did not exhibit the same biases in the solution execution phase, particularly in computations involving carries.
The investigation proposes that biases at the text comprehension level may result from the influence human creators have on training datasets. These biases could be embedded in the training data and, consequently, in the models themselves. At the solution planning phase, models demonstrated a preference for problems involving dynamic changes such as state transfers over static comparisons. This echoes the behavior seen in child learners who find dynamic-state problems less challenging.
Numerical Reasoning in LLMs
Arguably the most evocative finding was the absence of biases in the execution of arithmetic expressions, specifically the lack of a "carry effect." In human cognition, carry operations are known to place a higher demand on working memory, leading to increased difficulty. The LLMs studied, however, did not show a degradation in performance when carrying was required, suggesting a significant departure from human cognitive patterns in numerical reasoning.
This contrast may highlight fundamental differences between LLM memory mechanisms and human working memory limitations. The finding also prompts questions about the composition of the training datasets for LLMs and whether they encompass a sufficient diversity of arithmetic expressions to instill such a numerical bias.
Implications and Future Directions
These findings have practical implications for the design and deployment of educational technology using LLMs. The biases observed in text comprehension and solution planning underscore the models' potential to mimic human-like reasoning in earlier problem-solving stages. Consequently, educators and technologists should consider these cognitive biases when leveraging LLMs for educational purposes.
However, the absence of the carry effect bias suggests that caution is warranted when relying on LLMs to replicate human-like numerical reasoning. Therefore, careful validation against human cognitive processes is vital, especially if these models are to be used as accurate representations of student problem-solving behaviors.
As an extension of this work, future research might delve into other cognitive biases that are absent in adults but potentially present in children. Exploring different instructional prompting strategies could also provide insights into how models replicate nuanced human thought processes. Finally, analyzing the behaviour of LLMs across various languages might uncover additional layers of complexity in cognitive modeling.