Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction (2503.03666v1)

Published 5 Mar 2025 in cs.CL and cs.LG

Abstract: Analogical reasoning relies on conceptual abstractions, but it is unclear whether LLMs harbor such internal representations. We explore distilled representations from LLM activations and find that function vectors (FVs; Todd et al., 2024) - compact representations for in-context learning (ICL) tasks - are not invariant to simple input changes (e.g., open-ended vs. multiple-choice), suggesting they capture more than pure concepts. Using representational similarity analysis (RSA), we localize a small set of attention heads that encode invariant concept vectors (CVs) for verbal concepts like "antonym". These CVs function as feature detectors that operate independently of the final output - meaning that a model may form a correct internal representation yet still produce an incorrect output. Furthermore, CVs can be used to causally guide model behaviour. However, for more abstract concepts like "previous" and "next", we do not observe invariant linear representations, a finding we link to generalizability issues LLMs display within these domains.

Summary

The paper investigates how Large Language Models represent abstract concepts, finding verbal concepts achieve invariance but abstract ones like 'previous' or 'next' do not, hindering generalization.
Function Vectors (FVs) were found to be task-specific and not invariant, while Representational Similarity Analysis identified invariant Concept Vectors (CVs) for verbal concepts in early-to-mid layer attention heads.
The study highlights limitations in LLM abstraction capabilities and suggests that invariant abstract concept formation is not linear, advocating for exploring diverse representation geometries.

The paper "Analogical Reasoning Inside LLMs: Concept Vectors and the Limits of Abstraction" investigates the internal representations of abstract concepts by LLMs. The authors focus on understanding whether LLMs develop internal abstractions that support analogical reasoning, particularly through the lens of function vectors (FVs) and concept vectors (CVs).

Key Findings and Methods

Function Vectors (FVs):
- The paper examines FVs, compact representations for in-context learning (ICL) tasks. It is found that FVs are not invariant to variations in input format (e.g., open-ended versus multiple-choice questions), indicating that they may not represent pure concepts but rather encode a denser, task-specific information structure.
Concept Vectors (CVs):
- Using representational similarity analysis (RSA), the authors identify a subset of attention heads in LLMs that encode invariant CVs for verbal concepts like 'antonym.' These vectors serve as conceptual feature detectors operating independently from the final model output.
Invariance and Generalization:
- CVs demonstrated invariance for verbal concepts across various prompts and languages, but not for more abstract concepts such as 'previous' and 'next.' The lack of invariant CVs for these abstract domains highlights challenges in the generalizability of LLMs’ conceptual understanding.
Representational strategies:
- The paper elucidates that abstract concept formations like 'previous' or 'next' are not linear or invariant, hindering the models’ ability to generalize across tasks. Instead, LLMs rely on known sequences for these tasks (e.g., alphabets, days of the week).
Portability and Causality:
- CVs can guide model behavior more effectively than FVs in certain contexts, albeit with limitations. While CVs show promise for capturing abstract representations that influence behavior, FV interventions retain an advantage in specific zero-shot scenarios.
Analytical Techniques:
- Activation patching is applied to assess causal effects of specific neural activations, revealing limitations in identifying latent conceptual information. The paper highlights the insufficiency of activation patching for capturing internal representations comprehensively.
Architectural Insights:
- Attention heads encoding verbal concepts are predominantly found in early-to-mid layers of the transformer architecture, with model performance improving as the number of ICL examples increases. This suggests that the development of invariant representations is contextually driven by task exposure during training.

The authors conclude that while verbal concepts achieve a level of conceptual invariance in LLMs, abstract concepts do not exhibit similar robust representations. This discrepancy stresses future research avenues to better model human-like abstraction capabilities and analogical reasoning within the framework of LLMs. They caution against relying solely on linear representation hypotheses, advocating for a broader approach to investigating the geometry and organization of learned representations in neural models.

PDF Markdown

Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction (2503.03666v1)

Summary

Key Findings and Methods

Related Papers