Relational Convolutional Networks

Updated 15 November 2025

Relational Convolutional Networks are neural architectures that extend standard convolution to model pairwise and higher-order relations among objects using learnable graphlet filters.
They employ group selection and attention mechanisms to aggregate relation tensors and capture complex relational patterns in visual reasoning and set formation tasks.
Experimental validations show that RCNs achieve high accuracy and robust generalization on out-of-distribution tasks compared to traditional ConvNets and graph-based models.

Relational Convolutional Networks (RCNs) constitute a class of neural architectures designed to generalize the convolution operation for structured, object-centric input scenarios by convolving learnable relational filters—termed "graphlets"—over groups of objects. Unlike traditional convolutional layers, which capture local patterns via spatial kernels, RCNs encode pairs, triplets, or higher-order relations between object features, enabling hierarchical modeling of compositional relational structure. Their core innovation is the relational convolution operation, which computes inner products between groupwise subsets of pairwise relation tensors and learnable graphlet templates, and aggregates these across all possible (or attentively chosen) groupings. This approach is motivated by the need for architectures with both compositional and explicit relational inductive biases, essential for tasks such as visual reasoning, set formation, and relational games.

1. Formal Definition of Relational Convolution

The principal RCN operation begins with an object sequence $X=(x_1,\ldots,x_n)$ , with each $x_i\in\mathbb{R}^d$ . A relation tensor

$R[i,j]=r(x_i,x_j)\in\mathbb{R}^{d_r}$

is built using a learnable multi-dimensional inner product relation (MD-IPR) function $r(\cdot,\cdot)$ .

A graphlet filter bank $\mathbf{f}\in\mathbb{R}^{s\times s\times d_r\times n_f}$ is convolved against all (or learned-labeled) groups $g$ —each of cardinality $s$ —with the relational convolution output defined as: $R*\mathbf{f} \equiv \Bigl(\text{rel}(R[g],\mathbf{f})\Bigr)_{g\in\mathcal{G}} \in \mathbb{R}^{|\mathcal{G}|\times n_f}$ where $\text{rel}(R[g],\mathbf{f}) = (\langle R[g],f_1\rangle, \dots, \langle R[g],f_{n_f}\rangle)$ , and the inner product sums over all pairwise relations and channels within each group of objects.

Permutation-invariance within groups can be enforced by pooling over all re-orderings.

2. Graphlet Filters: Structure and Learning

Graphlet filters are learnable templates defined in $\mathbb{R}^{s\times s\times d_r}$ , designed to capture complex patterns of relations among groups of objects. Each filter encodes a relational motif (e.g., “all objects share color and differ in shape”), and the convolutional inner product measures how closely a group matches those motifs.

Filters are learned by end-to-end backpropagation, jointly with the parameters of the MD-IPR relation function and any grouping mechanisms. Stacking multiple filters allows the network to represent a diverse set of relational structures, and learning is driven by supervisory signals relevant to the target relational task.

3. RCN Architectural Composition and Group Attention

An RCN is composed of a sequence of "Relational Convolution Blocks" (RCBs), each performing four principal steps:

Pairwise relation construction: For incoming object vectors at layer $\ell$ , pairwise relations are computed using MD-IPR.
Group selection: Groups of objects are formed, either exhaustively or via a learned group-attention mechanism. Group attention assigns a probability (via softmax) to each object belonging to a group, enabling dynamic, task-dependent selection of object subsets over which to convolve.
Relational convolution: Each group’s relation subtensor is convolved with all graphlet filters, yielding a new feature vector for that group.
Stacking and normalization: Outputs from all groups are stacked and optionally passed through normalization, MLP, or residual connections for aggregation.

This sequential composition induces higher-order relational representational capacity, with deeper layers able to model not just pairwise, but also triplet, quadruplet, and more complex hierarchical relations among object groups.

4. Theoretical Foundations and Inductive Bias

The MD-IPR function is proven to be a universal approximator for continuous bivariate functions and positive-definite kernels, establishing that RCNs can represent any pairwise relation within the architecture’s parameterization. Hierarchical relational compositionality arises by stacking relational convolutions: the first layer encodes pairwise relations, subsequent layers operate on groupwise patterns (triplets, quadruplets), and deeper layers are in principle capable of encoding arbitrarily complex relational hierarchies.

This dual inductive bias—in compositionality (from deep stacking) and explicit relational structure (from MD-IPR and graphlet convolution)—differentiates RCNs from conventional graph-message-passing neural networks and attention-based architectures, which typically lack explicit mechanisms for higher-order relation modeling.

5. Experimental Validation and Comparative Performance

RCNs have been rigorously validated on out-of-distribution (OOD) generalization tasks such as relational games on grid-arranged visual inputs and combinatorial set-games (e.g., the 5-card set-existence task). Key findings include:

Relational games (Hexominoes, Stripes): On binary tasks like "same," "between," "row match pattern," RCNs consistently match or outperform baselines (CoRelNet, PrediNet, Transformer, GCN/GAT/GIN, CNN), especially on the most difficult tasks requiring compositional relational reasoning. For instance, RCN achieves 87% accuracy on the hardest OOD split ("row match pattern" on stripes), versus ≤65% for alternatives.
Set-Game: RCN attains 97.9% test accuracy, substantially exceeding attention, recurrent, and standard graph baselines (typical range 0.5–0.7). Ablations confirm that multi-dimensional relations and symmetric inner products are critical for generalization.
Representation analysis: PCA of relational convolution outputs reveals linear separability of relational structures (e.g., true sets vs. non-sets).

Group attention mechanisms in RCNs enable automatic discovery of relevant object subsets for each relational reasoning task, improving both interpretability and efficiency.

6. Insights, Limitations, and Prospective Directions

Analysis of RCNs supports several notable conclusions:

The conjunction of compositionality (deep, stacked blocks) and explicit relational inductive bias is essential for generalizing beyond seen patterns and reasoning over relational structures that cannot be represented with message-passing alone.
Attentive grouping greatly improves scalability and interpretability, as the network learns which object subsets to focus on for relational feature extraction.
Multi-dimensional relations and symmetric forms promote robust, generalizable relational embedding spaces.

Identified limitations include: benchmarks to date primarily involve only second-order relations; real-world scenarios may require deeper hierarchical modeling. RCN assumes segmented, object-centric input, leaving end-to-end integration with unsupervised object discovery as an open area.

Future research avenues include: devising challenging benchmarks demanding true higher-order relations; unifying RCNs with slot-attention or MONet-style object-centric representation learners; embedding RCN modules in hybrid architectures with Transformer or graph-based RL agents; and thorough theoretical analysis of model expressivity in terms of group size, depth, and relation channel dimensionality.

RCNs thus offer a rigorous, extensible framework for learning hierarchical relational representations via convolution-like operations on groups of objects, advancing the intersection of compositional and relational deep learning.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Relational Convolutional Networks.