KAAR: Augmenting Abstract Reasoning in AI

Updated 23 November 2025

The paper presents KAAR as a framework that integrates explicit knowledge priors—from ontologies to causal models—into AI, boosting abstract reasoning and generalization.
It employs stage-wise augmentation and symbolic graph abstraction to reduce search space, improve interpretability, and enhance performance on benchmarks like ARC.
Empirical evaluations reveal significant gains in accuracy and systematic generalization, underscoring KAAR's robust approach to complex reasoning tasks.

Knowledge Augmentation for Abstract Reasoning (KAAR) is a family of methodologies aimed at enhancing the abstract reasoning capabilities of artificial intelligence systems, particularly LLMs and program synthesis engines. These approaches introduce explicit, structured, and often human-aligned "core knowledge" priors—either as ontologies, symbolic abstractions, staged augmentations, or causal models—into the reasoning process. The objective is to enable stronger generalization, compositionality, and human-comparable abstraction in tasks such as the Abstraction and Reasoning Corpus (ARC), symbolic program induction, commonsense inference, and broader cognitive benchmarks. KAAR techniques have been instantiated across various modalities, including ontology-guided LLM prompting, symbolic knowledge graph construction, algebraic reasoning backends, and causal data augmentation, achieving consistent gains in accuracy and generalization over non-augmented baselines.

1. Ontological Structuring and Staged Knowledge Augmentation

A central insight of KAAR is the decomposition of cognitive priors into a dependency-aware ontology. In the ARC benchmark, core priors are organized into three hierarchical levels: (1) objectness, (2) geometry/topology/numbers, and (3) goal-directedness. This stratification enables stage-wise augmentation, where only the minimal set of relevant priors is provided to the solver at each reasoning step, thereby limiting interference and supporting controlled abstraction (Lei et al., 23 May 2025). Specifically:

Level 1 (Objectness): Groups pixels into components via raw matrices, whole-image abstractions, spatial splits, and connected-component clustering (4- and 8-connected).
Level 2 (Geometry, Topology, Counting): Computes attributes (size, shape, bounding box, symmetry), relations (touching/non-touching, containment, spatial alignment), and statistics (counts, frequency).
Level 3 (Goal-Directedness): Encodes parameterized transformations (color changes, movement, extension, completion, resizing, selection, copy, flip, rotation, cropping).

Pseudo-code formalizes a loop that progressively samples solutions at each augmentation stage, invoking a program-synthesis backbone after every layer. This staged process both reduces the search space and fosters greater robustness to irrelevant or noisy priors.

2. Symbolic and Graph-Based Abstraction

KAAR methodologies extend to fully symbolic regimes, as exemplified by knowledge graph-based solvers (Lim et al., 27 Nov 2024). Here, input–output pairs are converted into layered knowledge graphs (ARCKGs), with nodes representing pixels, objects, grids, and paired grids, and edges abstracting domain-specific relationships via a Domain-Specific Language (DSL). The reasoning process involves:

Extraction of core features by intersecting attributes across training examples (specifier),
Synthesis of candidate transformation programs constrained by these core features,
Abductive filtering to select solutions that match all provided training pairs.

This symbolic KAAR reduces the combinatorial space for search, increases interpretability, and provides quantitative gains, especially for grid-level properties such as height, width, and color set prediction (e.g., 66.5% H/W/C accuracy with knowledge graphs versus 32.3% without for ten-DSL, depth-two synthesizers).

3. Causal and Data Augmentation Perspectives

KAAR can take the form of explicit causal modeling and principled data augmentation. The CausalARC framework expresses reasoning tasks as structural causal models (SCMs), allowing the generation of rich observational, interventional, and counterfactual data streams for both neural and symbolic learners (Maasch et al., 3 Sep 2025). In this paradigm:

Tasks are sampled from SCMs defined by exogenous variables, endogenous grid features, and structural equations.
Data generated includes L1 (observational), L2 (interventional via $do(\cdot)$ ), and L3 (counterfactual) examples.
In-context learning, program synthesis, and causal discovery are facilitated by templated prompts that encode causal structure.

Empirically, causal data augmentation improves systematic generalization and supports advanced forms of reasoning, such as counterfactual inference and program induction, under severe data-limited and out-of-distribution settings.

4. Algebraic and Hybrid Neuro-Symbolic Models

Hybrid approaches such as ALANS bridge object-centric perception and algebraic abstraction for tasks like Raven's Progressive Matrices. These architectures employ a neural frontend for object detection and an algebraic backend operating over induced group-like structures, enabling on-the-fly operator induction, execution, and isomorphic decoding (Zhang et al., 2021). This design allows for:

End-to-end trainability via object-CNNs and matrix representation theory,
Closed-form operator induction for symbolic reasoning,
Systematic generalization across seen and unseen task decompositions, outperforming pure connectionist baselines (e.g., ALANS >78% systematicity, compared to <60% for prior models).

Such hybrid designs merge the flexibility of neural representations with the systematic power of explicit algebraic reasoning.

5. Empirical Evaluation, Metrics, and Comparative Benchmarks

KAAR methods have been extensively evaluated across several reasoning domains:

KAAR Variant	Benchmark/Task	Gains over Baseline	Generalization Properties
Hierarchical Ontology+RSPC	ARC (400 unique problems)	Absolute $\Delta\approx$ 5%, Relative $\gamma$ up to 64.5% (Lei et al., 23 May 2025)	Maintains or reduces generalization gap
Abductive Symbolic	ARC (H/W/C)	HWC: 66.5% (KG) vs. 32.3% (no-KG) (Lim et al., 27 Nov 2024)	Strong color abstraction, interpretable
Causal Data Augmentation	CausalARC/ARC	TTT acc ≈ 46–47.1%; Counterfactual acc up to 100% (certain tasks) (Maasch et al., 3 Sep 2025)	Handles OOD, low-data via SCMs
Neuro-Symbolic	RPM Systematicity Splits	ALANS 78–80% (Systematicity/Productivity/Localism) (Zhang et al., 2021)	Generalizes systematically across splits

Category-wise analysis shows that movement-related tasks and medium-sized problems benefit most; performance degrades for high-dimensional grids, but KAAR outperforms baselines robustly in standard settings. Category overlaps between strong and weak LLMs increase under KAAR, indicating convergence in solvability across model families.

6. Comparative Methodologies and Extensions

KAAR principles are instantiated in diverse forms beyond ARC, including:

Visual Imagery DSLs: Direct manipulation of grid representations via cognitively inspired primitive operations and program synthesis, facilitating interpretable reasoning chains (Ainooson et al., 2023).
Chain-of-Thought Knowledge Augmentation: Augmenting task inputs with CoT traces extracted from LLMs, enabling smaller models to leverage large-scale pretraining knowledge for improved accuracy across commonsense, arithmetic, and symbolic reasoning (e.g., up to +15–20% accuracy on major benchmarks) (Wu et al., 2023).
Risk-Adaptive Search: Integrating dynamic retrieval-augmented generation with risk assessment and Monte Carlo Tree Search for open-ended, multi-hop reasoning, yielding up to 23% EM gains over prior knowledge-augmented reasoning methods (Zhang et al., 15 Apr 2025).
Abstract Reasoning Induction in Temporal QA: Segregating knowledge-agnostic (method) and knowledge-based (fact execution) phases, with constructivist meta-learning from both successful and failed traces, leading to relative accuracy gains of 29.7% and 9.27% on temporal QA datasets (Chen et al., 2023).
Commonsense Conceptualization: Inducing abstract commonsense knowledge graphs from event-centric corpora via contextualized conceptualization, with 2.95M valid abstract triples, improving zero-shot QA accuracy over baselines (He et al., 2022).

7. Future Directions and Open Challenges

Challenges facing KAAR include scalability of symbolic search and DSL design, integration of abstract knowledge with end-to-end neural systems, and extending the representational ontology to real-world domains (vision, physics, social reasoning). Proposed directions encompass:

Automated or meta-learned DSL expansion (e.g., via DreamCoder-style abstraction learning),
Hybrid architectures combining LM-generated rules with symbolic planners,
More expressive conceptualization to cover modalities, exceptions, and non-monotonic logic,
Parameter-efficient LLM adaptation for large-scale conceptualization and abstraction tasks.

Despite substantial progress, the ARC benchmark remains challenging for both pure neural and augmented systems; the KAAR paradigm continues to illuminate gaps in existing AI and guide principled routes toward more genuine abstract reasoning and generalization (Lei et al., 23 May 2025, Lim et al., 27 Nov 2024, Maasch et al., 3 Sep 2025, He et al., 2022).