Zero-Shot Analogical Reasoning

Updated 18 March 2026

Zero-Shot Analogical Reasoning is the ability to infer and apply abstract relational mappings without in-task training, relying solely on pre-trained knowledge.
It employs diverse methodologies—symbolic models, neural networks, and compositional modules—to generalize analogies across domains such as symbolic, visual, and narrative contexts.
Empirical benchmarks show robust performance in controlled settings, while also revealing limitations like surface bias, brittleness under counterfactuals, and challenges in handling far analogies.

Zero-shot analogical reasoning is the capacity of a computational system to solve analogy problems in novel domains or with novel inputs, relying exclusively on pre-trained knowledge and a single prompt, without any task-specific adaptation or in-context exemplars. In formal terms, this entails inferring an abstract transformation or mapping $f$ from the relationship between a source pair $(S_1, S_2)$ and applying it to a target $T_1$ to yield the analogous target $T_2$ , that is, $f(S_1) = S_2 \implies f(T_1) = T_2$ (Lewis et al., 2024). This faculty underpins the flexible generalization characteristic of human cognition and has become a central focus in both cognitive modeling and contemporary AI research.

1. Formal Definitions and Theoretical Frameworks

In zero-shot analogical reasoning, systems solve analogy problems of the form $A:B::C:D$ , where the mapping from $A$ to $B$ must be abstracted and applied to infer $D$ given $C$ . This setup spans domains including symbolic analogies (e.g., word pairs, letter-strings), visual matrices (e.g., Raven’s Progressive Matrices), and complex structures such as narratives or event graphs. Zero-shot status is guaranteed when no supervision or in-task exemplars are present at inference time; all generalization must arise from knowledge and abstractions acquired in pre-training or architectural biases (Webb et al., 2022, Sourati et al., 2023).

Human-like analogical reasoning, as instantiated in models such as LISA and DORA, emerges from learning explicit predicate–argument structures, dynamic role–filler bindings (e.g., via temporal synchrony), and structural alignment algorithms that enforce one-to-one mappings while preserving higher-order relations (Doumas et al., 2019). Neural network frameworks achieve zero-shot transfer by ensuring the representations (of predicates, relations, and bindings) are sufficiently abstract and compositional to generalize across task domains without re-training (Hill et al., 2019, Wu et al., 2020).

2. Benchmarks and Task Protocols

Canonical zero-shot analogical reasoning tasks include:

Letter-string analogies: Given transformations between pairs in one symbolic space (e.g., $[a\ b\ c\ d] \rightarrow [a\ b\ c\ e]$ ), apply the same mapping to a target (e.g., $[i\ j\ k\ l] \rightarrow [?]$ ) (Webb et al., 2022, Lewis et al., 2024).
Digit matrix reasoning: Raven-inspired $3\times3$ matrices of integers or symbols, with the task of inferring a missing entry based on an underlying rule (constant, progression, logic, permutation, etc.) (Webb et al., 2022).
Semantic and story analogies: Narrative or event-graph analogies requiring alignment at the level of system or relational mappings beyond surface similarities (Sourati et al., 2023).

Benchmark datasets are carefully controlled to avoid overlap between training and test domains, including holding out full domains, attribute–relation pairs (e.g., progression-of-color never seen during training), or transferring between disparate domains (e.g., visual to symbolic) (Wu et al., 2020, Hill et al., 2019). In cognitive-inspired settings, performance is typically measured by accuracy (fraction of correctly completed analogies), with human baselines established for comparison.

3. Model Architectures and Methodologies

Zero-shot analogical reasoning models span symbolic, connectionist, and LLM architectures:

Symbolic/Structured Models: These instantiate explicit predicate discovery, dynamic binding (e.g., via oscillatory gating mechanisms), and mapping via one-to-one constraint-satisfaction using similarity metrics over learned predicate and argument weights (Doumas et al., 2019). Zero-shot transfer is achieved when relational representations can be mapped to novel domains without parameter updating.
End-to-End Neural Networks: Robust capacities are induced not merely through architectural complexity but by training objectives that enforce abstraction—particularly via Learning by Contrasting Abstract relational structures (LABC), where negative examples instantiate semantically different relations on the same data, forcing the network to encode and generalize relational structure rather than surface cues (Hill et al., 2019, Wu et al., 2020).
Compositional Neural Modules: In compositional learners such as SCL, object, attribute, and relation extractors are explicitly decoupled and composed, with shared parameters and "scattering" mechanisms. Each module must generalize in composition to unseen (attribute, relation) pairs, affording robust zero-shot performance on tasks such as Raven’s Matrices (Wu et al., 2020).
Probabilistic Analogical Mapping: visiPAM exemplifies the synthesis of learned perceptual representations (from self-supervised transformers for 2D or point-cloud segmenters for 3D) with Bayesian graph-matching algorithms inspired by cognitive theory (soft isomorphism, joint node/edge similarity). Zero-shot mapping arises because no part of the mapping mechanism is trained on analogy tasks, relying instead on cognitive priors and expressive visual encoders (Webb et al., 2022).
LLMs: Transformer-based LLMs (e.g., GPT-3/4, PaLM) can exhibit strong zero-shot analogical reasoning in both text and language-encoded visual problems. Analogical capacity emerges spontaneously when prompts encode the analogy structure, with further improvement available via analogical or chain-of-thought prompting that scaffolds multi-step relational mapping (Webb et al., 2022, Sourati et al., 2023, Yasunaga et al., 2023).

4. Empirical Findings and Quantitative Results

Multiple studies have demonstrated that, under certain configurations, AI systems can attain or even surpass human-level accuracy in zero-shot analogical reasoning tasks:

Text analogies (GPT-3/4): Accuracy ranges from 61% (letter-string generative) to above 90% (matrix MC, verbal analogies), often exceeding human controls. Nevertheless, analogy difficulty and generalization level impact performance; challenging cases (e.g., 3-fold generalizations or far analogies) degrade LLM performance to near or below human baselines (Webb et al., 2022, Sourati et al., 2023).
Visual analogies (SCL, LABC-trained nets): Zero-shot accuracy on held-out (attribute, relation) pairs can reach 90–98%, a 2–3-fold improvement over less compositional baselines (e.g., CoPINet at 35%) (Wu et al., 2020). In symbolic settings, explicit contrastive negative generation (LABC) raises zero-shot accuracy from as low as 25% (random negatives) to 89–95% (Hill et al., 2019).
Cross-modal mappings (visiPAM): Without any analogy-specific training, visiPAM outperforms supervised part-matching models on real-image datasets (error reductions of ≈30%), and produces human-correlated behavior in 3D part mapping to a level statistically indistinguishable from inter-human agreement (Webb et al., 2022).
Chain-of-thought and analogical prompting (LLMs): Self-generated analogical exemplars consistently outperform zero-shot and few-shot CoT baselines on GSM8K, MATH, and BIG-Bench, with improvements up to 7% absolute. Performance saturates for K=3–5 diverse self-generated exemplars (Yasunaga et al., 2023).

5. Failure Modes, Robustness, and Limitations

Despite strong performance on standard benchmarks, contemporary systems—especially LLMs—exhibit distinct limitations in robustness and depth of zero-shot analogical reasoning:

Surface-level bias: Models tend to privilege surface similarities ("surface mapping") over deeper relational or system analogies, often failing in "far analogy" conditions where only high-level structure is shared (Sourati et al., 2023). Even with explicit instructions or solved demonstrations, LLMs require step-wise, chain-of-thought prompting to approach human accuracy on deep analogies.
Counterfactual and adversarial variants: When letter-string analogy tasks are constructed over permuted or entirely novel symbol alphabets, LLM performance degrades sharply (e.g., accuracy drops from 45%–48% to as low as 13%–19%), whereas human accuracy remains invariant. Similar brittleness is observed in digit matrix problems when the blank is moved or symbols replace digits, and in story analogy tasks where answer order or paraphrasing is manipulated (Lewis et al., 2024).
Order and lexicality effects: LLMs are susceptible to answer-order biases and lexical overlap shortcuts (e.g., choosing the first-listed story or over-relying on surface paraphrase), whereas humans utilize event-graph structure and are invariant to such manipulations (Lewis et al., 2024, Sourati et al., 2023).
Auxiliary demands: Certain transformations (e.g., counting-based letter shifts) elicit high performance only if the LLM can deploy external code execution. This suggests a lack of internal symbolic counting or composition, in contrast to human subitizing or abstract position coding (Webb et al., 2024).
Gaps in abstraction: Although compositional, modular networks attain strong generalization across held-out (attribute, relation) pairs or new domain combinations, extension to richer, more open-ended scenes, or to multi-step, hierarchical analogies, remains non-trivial (Wu et al., 2020).

6. Strategies for Improving Zero-Shot Analogical Reasoning

Empirical and theoretical analyses suggest several directions to enhance zero-shot analogical reasoning:

Contrastive and compositional training: Training objectives that force models to distinguish between plausible but structurally distinct relational patterns (as in LABC) yield more abstract and transferable internal representations (Hill et al., 2019).
Explicit relational modules: Architectures incorporating object, attribute, and relation extraction modules, parameter sharing, and "split–share–merge" mechanisms (as in SCL) promote robustness and compositionality (Wu et al., 2020).
Structured analogical mapping: Integrating graph-based Bayesian mapping algorithms with high-capacity perceptual encoders enables generalization to unseen scenes and object categories (as shown by visiPAM) (Webb et al., 2022).
Prompt engineering and chain-of-thought scaffolding: In LLMs, analogical prompting—which requests self-generated, diverse exemplars tailored to each test query—and in-context chain-of-thought reasoning both shift the model toward abstraction and away from surface bias, especially on far analogies (Sourati et al., 2023, Yasunaga et al., 2023).
Benchmark design: Including counterfactual, paraphrased, and symbol permutation variants—where surface form is altered but relational structure is conserved—provides a more robust diagnostic of genuine abstract reasoning (Lewis et al., 2024).

7. Current Perspectives and Open Challenges

Current research converges on the view that zero-shot analogical reasoning, while demonstrably emergent in large-scale foundation models and modular neural architectures, remains limited by surface bias and brittleness to minimal perturbations. The highest-performing models leverage compositional inductive biases, explicit relational abstraction, and—when applicable—prompt engineering or contrastive training to approach human-level generalization (Sourati et al., 2023, Wu et al., 2020, Webb et al., 2022, Yasunaga et al., 2023, Lewis et al., 2024).

Major unresolved topics include:

Mechanistic interpretability: Elucidating the internal representations in LLMs or modular networks that underpin analogical mapping, variable binding, and structural alignment.
Extension to multimodal and hierarchical analogies: Building systems that can map structures across modalities (e.g., text-to-vision, 2D-to-3D) and compose multi-step or higher-order analogies (Webb et al., 2022, Wu et al., 2020).
Training procedures that induce invariance: Developing objectives and data regimes that yield models robust to counterfactual surface manipulations and order effects.
Cognitive fidelity: Closing the gap between artificial and human analogizers by integrating explicit event- or object-graph structure, symbolic reasoning, and flexible attention mechanisms at scale.

Overall, zero-shot analogical reasoning is an active area at the interface of cognitive science, machine learning, and natural intelligence, motivating the design of both new benchmarks and mechanisms that transcend pattern matching, enabling true abstraction and transfer.