Can Large Language Models generalize analogy solving like people can? (2411.02348v1)

Published 4 Nov 2024 in cs.AI, cs.CL, and cs.HC

Abstract: When we solve an analogy we transfer information from a known context to a new one through abstract rules and relational similarity. In people, the ability to solve analogies such as "body : feet :: table : ?" emerges in childhood, and appears to transfer easily to other domains, such as the visual domain "( : ) :: < : ?". Recent research shows that LLMs can solve various forms of analogies. However, can LLMs generalize analogy solving to new domains like people can? To investigate this, we had children, adults, and LLMs solve a series of letter-string analogies (e.g., a b : a c :: j k : ?) in the Latin alphabet, in a near transfer domain (Greek alphabet), and a far transfer domain (list of symbols). As expected, children and adults easily generalized their knowledge to unfamiliar domains, whereas LLMs did not. This key difference between human and AI performance is evidence that these LLMs still struggle with robust human-like analogical transfer.

PDF HTML Abstract

Evaluation of LLMs in Generalizing Analogy Solving Across Domains

The paper "Can LLMs generalize analogy solving like children can?" by Stevenson et al. investigates the comparative abilities of humans and LLMs in solving analogies and transferring this skill to unfamiliar domains. The central question posed by the authors is whether LLMs can emulate the human child's capacity to generalize analogy-solving methods across different contexts, such as the transition from familiar alphabets to unfamiliar symbol sets.

The paper assesses mechanisms of analogical reasoning, a cornerstone of cognitive development and learning in humans. By transferring abstract reasoning from known contexts, humans can adeptly handle novel domain analogies, as seen in problems like mapping "body : feet" to "table : legs." Numerous studies support the idea that humans can make such contextual shifts even at a young age. Thus, the paper hypothesizes whether LLMs, which have shown proficiency in handling analogy problems in certain contexts, possess the same level of adaptive generalization.

The researchers devised a task engaging children, adults, and various LLMs, including models from Anthropic, Google's Gemma, OpenAI's GPT, and Meta's Llama. The participants were tasked with solving letter-string analogies across familiar (Latin alphabet), near-transfer (Greek alphabet), and far-transfer (symbol set) domains. The findings confirm that human participants, including children, readily generalized across domains. In contrast, LLMs displayed robust performance only in the well-known Latin alphabet domain. Their capability to generalize to the Greek alphabet was weaker, with a significant decline in performance observed in symbol-based analogies, marking an apparent limitation in these models' analogical reasoning.

The LLMs' reduced performance in unfamiliar domains pinpoints a lack of flexible abstraction, a quality innate to human intelligence and critical for achieving robust generalization. This inflexibility raises concerns about the current models' assumed capabilities for understanding and applying conceptual rules beyond familiar contexts. The authors further investigated the errors made by LLMs, revealing that these models have a predisposition to literal rule application and exhibit difficulties processing predecessor and two-place successor operations, which likely impairs their analogy-solving ability in novel realms.

The paper's major implication lies in its challenge to the notion of LLMs approximating human-like reasoning, particularly in the domain of transfer learning. This paper contributes to the ongoing discourse on the cognitive capabilities of LLMs and suggests revisiting the underlying architectures and training paradigms if the broader generalization is desired. The findings emphasize the necessity for models capable of abstracting and transferring relational concepts across domains that feature differing surface-level similarities—a haLLMark of genuine analogical reasoning.

Future research may explore how training regimes might be modified to foster such robust abstraction in LLMs. Incorporating multisensory or symbolic interaction data could enhance the models' adaptability and generalization capabilities. Additionally, examining whether alternative model designs, which focus on explicit reasoning mechanisms, can overcome the observed limitations, remains an essential avenue of inquiry. This paper thus provides valuable insights and prompts further investigation into machine learning and artificial general intelligence, encouraging innovations that drive LLMs closer towards human-like intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Claire E. Stevenson (6 papers)
Alexandra Pafford (1 paper)
Han L. J. van der Maas (8 papers)
Melanie Mitchell (28 papers)

Related Papers

Find Related Papers

HackerNews

Can Large Language Models generalize analogy solving like people can? (3 points, 1 comment)