Raven’s Progressive Matrices (RPM)

Updated 30 March 2026

Raven’s Progressive Matrices (RPM) is a set of abstract visual puzzles designed to measure fluid intelligence through analogical and inductive reasoning.
RPM has driven advancements in computational reasoning, inspiring diverse approaches from hand-engineered symbolic systems to deep neural models achieving high accuracy on benchmark datasets.
RPM is widely applied in cognitive assessment, neuro-symbolic machine reasoning, and visual question answering, serving as a rigorous test for systematic generalization and interpretability.

Raven’s Progressive Matrices (RPM) are a family of abstract visual reasoning tasks originally developed as human intelligence tests to assess fluid intelligence, particularly analogical and inductive reasoning. Each RPM item presents a 3×3 grid of panels composed of abstract visual patterns, with the lower-right (or an arbitrary) cell missing. Participants must select or generate the panel that best completes the matrix according to latent relational rules that govern the transformation and combination of visual attributes. The RPM task has become a central benchmark in computational models of abstract reasoning, inspiring a diverse spectrum of architectures, from hand-engineered symbolic systems to state-of-the-art deep neural networks, which seek to achieve, and explain, high-level human-like pattern discovery and analogical generalization.

1. RPM Problem Structure and Rule Taxonomy

An RPM instance consists of a 3×3 grid with one missing panel, and a choice set of eight candidate panels (selection task) or, in generative settings, the requirement to reconstruct the correct missing image. Rules governing correct completions act on panel attributes: number, position, type (shape), size, and color. Attribute-wise rules are conventionally classified into types such as "Constant" ( $x_{r+1, j} - x_{r, j} = 0$ ), "Progression" ( $x_{r+1, j} = x_{r, j} + c$ for $c \neq 0$ ), "Arithmetic" ( $x_{r, 3} = x_{r, 1} \oplus x_{r, 2}$ ), and "Distribute-Three" (the set $\{x_{r,1}, x_{r,2}, x_{r,3}\}$ takes all unique values for each attribute per row/column) (Li, 3 Oct 2025). Modern RPM datasets such as RAVEN and I-RAVEN systematically vary these attributes and rules, supporting rigorous generalization tests by controlling answer set structure and eliminating candidate bias (Hu et al., 2020, Li, 3 Oct 2025). The I-RAVEN dataset, in particular, uses an "Attribute Bisection Tree" (ABT) to generate candidate sets balanced in their attribute violations, directly addressing the shortcut vulnerability found in the original RAVEN (Hu et al., 2020).

2. Computational Paradigms: Historical Progression and Core Algorithms

Symbolic and Rule-Based Models. Early approaches encoded panels as symbolic attribute vectors, hand-crafted rules (AND, OR, XOR, progression), and combinatorial search for compatible rule-attribute compositions. Systems such as Hunt's Gestalt algorithm, FAIRMAN/BETTERMAN, and CSP generators formalized RPM solutions as constraints in logic or algebraic formalisms (Yang et al., 2023, Xu et al., 2023). Algebraic machine reasoning, for example, encodes each panel as a monomial ideal in a polynomial ring, uses Gröbner bases to extract set-theoretic invariances (intra-, inter-, compositional), and selects candidate answers based on set-intersection over extracted invariances, achieving 93.2% accuracy on I-RAVEN (Xu et al., 2023).

Connectionist and Hybrid Models. Cognitive architectures such as ACT-R and systems combining qualitative spatial representations with analogy engines (e.g., CogSketch+SME) integrate symbolic abstraction with neural or heuristic perception modules (Yang et al., 2023).

End-to-End Deep Networks. Recent deep models range from:

Panel-wise Siamese CNN+MLP/LSTM: encode each panel then aggregate features; achieves only basic RPM performance (Małkiński et al., 2022).
Relation Networks (RN, WReN, MRNet, LEN, MXGNet, CoPINet, DCNet, SRAN, SCL): learn to compose representations of panel tuples and explicitly model relations via MLPs or higher-order pooling (Jahrens et al., 2020, Hu et al., 2020, Wu et al., 2020, Zhuo et al., 2022).
Scattering Compositional Learner (SCL): formalizes the pipeline into object, attribute, and relation detectors, forcing strict compositionality and delivering strong zero-shot generalization on previously unseen attribute-rule combinations (Wu et al., 2020).
Transformers and Slot Attention Architectures (e.g., STSN, SAViR-T, task-decomposition transformers): process matrix panels or decomposed object slots with self-attention mechanisms, often achieving top performance and strong interpretability via explicit symbolic description heads (Mondal et al., 2023, Sahu et al., 2022, Kwiatkowski et al., 2023).
Two-stage Rule Induction (TRIVR): separates perception and reasoning via analytic rule discovery (e.g., least-squares estimation of linear changes in attribute vectors per row), yielding full transparency and facilitating few-shot adaptation (He et al., 2021).

3. Learning Paradigms: Supervised, Unsupervised, and Generative Reasoning

Supervised Learning. The majority of neural approaches are trained by cross-entropy over the correct answer index, sometimes augmented with auxiliary rule-label or meta-target objectives that encode explicit ground-truth relational structure (Małkiński et al., 2022).

Unsupervised Abstract Reasoning. To remove dependence on ground-truth labels, recent advances introduce unsupervised paradigms featuring:

Pseudo-Target Labeling (MCPT, NCD): Convert the unsupervised problem into a supervised binary classification by leveraging RPM priors; for example, labeling both observed rows as positive and all candidate completions as negative (with known noise) (Zhuo et al., 2021).
Contrastive Reasoning (Pairwise Relations Discriminator, PRD): Frame RPM as a relation comparison problem between the true context rows and each candidate-filled third row, training models to discriminate relational similarity without answer keys. PRD uses a frozen ResNet-18 backbone for row-wise feature extraction, dropout-regularized MLP discrimination, and achieves state-of-the-art unsupervised accuracy (55.9% on I-RAVEN, surpassing prior unsupervised MCPT at 28.5%) (Kiat et al., 2020).
Negative Answer Augmentation and Feature Decentralization: Introduce random negatives and normalize feature representations per-problem, greatly enhancing robustness and generalization (Zhuo et al., 2021).

Generative and Disentangled Models. Latent variable approaches such as CRAB, RAISE, and models using latent Gaussian process priors shift the paradigm from answer selection to answer "painting" or generation, evaluating whether the model grasps ruling abstractions beyond mere discrimination:

CRAB learns disentangled concept vectors per panel, infers rule variables via amortized nonlinear inference, and reconstructs missing panels via knowledge-guided conditional generation. Iterative EM-style learning of Gaussian mixture priors over rule variables facilitates the global abstraction of concept-changing rules, yielding state-of-the-art arbitrary-position answer generation and concept-level interpretability (Shi et al., 2023).
RAISE represents each image via independent latent concepts and abstracts a small set of atomic rules (convolutional functions) across all problems. The model composes candidate solutions by attributing rule selection to latent concepts and executes these rules via conditional neural modules; both bottom-right and arbitrary-position answer generation exceed baselines, including strong out-of-distribution generalization to held-out rule-attribute combinations (Shi et al., 2024).
Latent GP Priors further enforce smooth, interpretable rule trajectories across a matrix, allowing principled extrapolation and highly data-efficient answer painting; latent dimensions align with visual primitives (e.g., color, size) (Shi et al., 2021).

4. Inductive Biases and Architectural Innovations

Stratified and Hierarchical Embeddings. The Stratified Rule-Aware Network (SRAN) captures cellwise, row-wise, and ecological (multi-row) embedding granularities, fusing them with gated MLPs to jointly encode order sensitivity and permutation invariance—critical for robust rule induction (Hu et al., 2020). SCL's hard modular decomposition into object, attribute, and relation primitives produces linearly interpretable internal representations mapped directly to human-understandable features (Wu et al., 2020).

Object-Centric Processing. Object-centric encoders such as slot-attention modules paired with transformers (STSN) have demonstrated that extracting individual object representations prior to reasoning is a powerful inductive bias, enabling state-of-the-art performance not just on canonical RPM datasets (PGM, I-RAVEN), but on more visually complex benchmarks (CLEVR-Matrices) without any problem-specific structure (Mondal et al., 2023).

Contrastive Learning and Data Augmentation. Dual-contrast architectures explicitly model both rule-induced similarity and candidate separation, regularizing internal representations for better transfer. Data augmentation via pixel-level morphological mixup (CAM-Mix) synthesizes hard negatives near the true decision boundary, markedly reducing overfitting and boosting generalization, particularly on debiased datasets (He et al., 2021, Zhuo et al., 2022).

Task Decomposition and Transparent Reasoning. Direct decomposition of the RPM into subgoals (panel property prediction, answer selection via property distance) within transformer networks, as in (Kwiatkowski et al., 2023), achieves accuracy surpassing prior methods while rendering the inference process interpretable and immune to answer-set bias.

5. Generalization, Compositionality, and Limitations

Compositional Generalization. Systematic exclusion of specific rule-attribute pairs or entire rule types at training time (e.g., Progression or Arithmetic) exposes sharp drops in model accuracy on novel rule combinations, revealing that most existing architectures—transformers, contrastive networks, and even strong SCL-like approaches—still largely memorize statistical patterns rather than discovering generalizable abstract operators (Li, 3 Oct 2025). Explicitly compositional architectures (e.g., algebraic methods, modular scattering learners, generative models with decoupled concept/rule spaces) are relatively robust, but substantial gaps remain.

Rule Omission and Statistical Shortcuts. Studies with impartial datasets (I-RAVEN) and diagnostic splits demonstrate that "context-blind" performance—i.e., accuracy when only the answer set is visible—drops to chance in bias-free settings but remains anomalously high on biased datasets. Thus, unbiased candidate sampling and evaluation protocols are required for meaningful abstraction assessment (Li, 3 Oct 2025, Hu et al., 2020).

Data Efficiency and Transfer. Two-stage models such as TRIVR, which separate perception and analytic rule induction, achieve high accuracy even with dramatically reduced training data (He et al., 2021). Nevertheless, out-of-distribution generalization, especially extrapolation beyond seen attribute values or to completely novel visual domains, remains a frontier (Mondal et al., 2023, Li, 3 Oct 2025).

6. Interdisciplinary and Practical Significance

RPM computational models have influenced a broad array of domains:

Visual Question Answering (VQA) and Scene Understanding: Relation Networks and transformer-based reasoners are routinely embedded within VQA pipelines to capture inter-object and inter-attribute relations (Małkiński et al., 2022).
Automatic Item Generation in Cognitive Assessment: Algorithmic RPM generators (PGM, RAVEN, I-RAVEN) enable fully automated, psychometrically-controlled test construction for human assessment and AI benchmarking (Yang et al., 2023).
Neuro-symbolic Machine Reasoning: Hybrid architectures characterized by learned perceptual encoders and symbolic or modular relational engines (e.g., abduction-and-execution models, algebraic ideal reasoners, latent generative models) are at the core of neuro-symbolic AI research, facilitating transparent and compositional reasoning (Xu et al., 2023, Shi et al., 2023).
General Reasoning and Meta-Learning: Advances in meta-contrastive learning, modular composition, and explicit abstraction are being transferred to broader analogical reasoning and systematic generalization benchmarks, marking RPM as a canonical challenge for domain-general intelligence (Wu et al., 2020, Li, 3 Oct 2025).

In summary, Raven’s Progressive Matrices serve as a rigorous stress test for artificial abstract reasoning due to their rule-governed, visually grounded structure and the necessity for models to bridge low-level pattern extraction with high-level combinatorial generalization. Despite remarkable progress—state-of-the-art supervised, unsupervised, and generative solvers routinely match or outperform human accuracy on i.i.d. settings—robust out-of-distribution reasoning, systematic compositionality, and interpretability remain open challenges requiring continued innovation in modularity, symbolic-neural integration, and data- and curriculum-efficient meta-learning (Małkiński et al., 2022, Li, 3 Oct 2025, Yang et al., 2023).