Sequential Group Composition Task

Updated 10 February 2026

Sequential group composition is a structured learning challenge where models predict cumulative group products from sequential inputs, enabling analysis of compositional and context-sensitive reasoning.
It leverages algebraic properties and harmonic analysis to assess how architectures like RNNs, tree-structured MLPs, and Transformers generalize on unseen sequential compositions.
The framework extends to emergent communication, meta-RL, and function learning, offering practical insights into architectural trade-offs and probabilistic task inference.

The sequential group composition task is a canonical structured learning challenge in which a model receives as input a sequence of elements (typically sampled from a finite group and encoded in a fixed vector space) and must predict their cumulative product or composed outcome. This domain-independent formulation provides a test-bed to analyze how neural and cognitive systems implement compositional computation, sequence-sensitive function application, and context-dependent reasoning. The sequential group composition paradigm has been rigorously analyzed in domains spanning abstract algebraic sequences, cognitive function learning, emergent communication in neural agents, and structured task inference, revealing sharp connections between model architecture, representation, and compositional generalization (Marchetti et al., 3 Feb 2026, Zhou et al., 2024, Carmeli et al., 15 Jan 2026, Ren et al., 2020).

1. Formal Definitions and Paradigms

At its core, the sequential group composition task operates over a finite group $G$ (with binary operation $\cdot$ , identity $1$, and inverses), a length- $k$ sequence $(g_1, \ldots, g_k) \in G^k$ , and a fixed injective encoding $\varphi: G \rightarrow \mathbb{R}^d$ (often utilizing the group’s left-regular representation $\lambda(g)$ acting on a “template” vector $x$ ). The objective is to design a model $f: \mathbb{R}^{k \cdot d} \rightarrow \mathbb{R}^d$ that, given

$X_{g_{1:k}} = (\varphi(g_1), ..., \varphi(g_k)),$

outputs the encoding of the total product

$y^* = \varphi(g_1 \cdot g_2 \cdot ... \cdot g_k).$

Learning is typically cast as regression, minimizing

$\mathcal{L}(\Theta) = \frac{1}{2|G|^k} \sum_{(g_1,\ldots,g_k) \in G^k} \| \varphi(g_1 \cdots g_k) - f(\varphi(g_1),...,\varphi(g_k);\Theta) \|^2.$

No linear model suffices unless the encoding is trivial; at least one non-linear layer is required (Marchetti et al., 3 Feb 2026).

Beyond algebraic structure, this composition scheme extends to sequential application of functions with preconditions and effects, as in the visual function learning paradigm, in which objects (e.g., car-part trees) undergo a user- or agent-selected sequence of atomic edits or transformations, and models must predict the final object configuration, even as the contextual preconditions for composing functions may interact (Zhou et al., 2024).

2. Structural Properties and Theoretical Analysis

The learnability and sample complexity of sequential group composition is deeply influenced by the algebraic and statistical properties of the group and the encoding. Harmonic analysis plays a central role: representing group elements in terms of their irreducible representations (irreps), with the group-Fourier transform

$\widehat{x}[\rho] = \sum_{g \in G} \rho(g)^\dagger x(g),$

the learning dynamics of a shallow two-layer model can be interpreted as successive identification (“plateaus”) of irreps, with the order dictated by the spectral power $P_x(\rho) = \|\widehat{x}[\rho]\|_{op}^2$ and score

$S(\rho) = (n_\rho \cdot C_\rho)^{(1-k)/2} \; P_x(\rho)^{(k+1)/2}$

(Marchetti et al., 3 Feb 2026). For Abelian groups (all irreps 1D), features are learned according to the Fourier amplitude. For non-Abelian groups, the dimension bias favors early acquisition of low-dimensional representations.

Critically, for a two-layer model with polynomial activation, the minimal hidden layer width required to solve the task exactly scales exponentially in $k$ : $H = \Omega\bigl(k\,2^k\,|G|\bigr)$ . This exponential dependence makes shallow solutions infeasible beyond modest sequence lengths.

3. Architectural Trade-offs: Depth Versus Width

The associativity property of group composition enables efficient architectures. RNNs with quadratic activation and hidden width $O(|G|^{1.5})$ can compose $k$ elements in $k$ sequential steps, recursively maintaining the composed product in hidden state (Marchetti et al., 3 Feb 2026). Parallel, tree-structured MLPs (depth $O(\log k)$ , width $O(|G|^{1.5})$ ) support simultaneous pairwise merges, further reducing the required depth. Thus, depth compensates for the width explosion inherent in shallow representations: depth $L \geq \Omega(\log k)$ is necessary and sufficient for tractable scaling.

Empirical results confirm these theoretical predictions: only deep or recurrent architectures can generalize accurately to long sequences, and shallow architectures exhibit distinct “feature-by-feature” learning curves marked by harmonic plateaus.

4. Contextual and Functional Extensions

In cognitive and AI settings, the sequential group composition framework generalizes to function composition tasks wherein functions $f_i: \mathrm{Dom}(f_i) \subset X \to \mathrm{Cod}(f_i) \subset X$ act on structured objects (e.g., trees), each transformation depending on explicit preconditions and possibly creating or removing contexts required for subsequent functions. Canonical interaction types (“feeding,” “counter-feeding,” “bleeding,” “counter-bleeding”) probe the learner’s sensitivity to ordering and context interactions (Zhou et al., 2024).

Experimental paradigms specifically structure training to ensure that, during composition, models and humans encounter function pairings never seen together in sequence during training—enforcing genuine zero-shot compositionality rather than memorization (Zhou et al., 2024).

Meta-learning for compositionality (MLC) equips generic sequence-to-sequence networks (e.g., Transformers) to generalize over entire “mini-universes” of function systems, using support episodes to anchor function meaning and compositional queries to assess chaining and contextual dynamics. Fine-tuning on real human-generated data and synthetic episodes matching human-like error distributions aligns model behavior closely with human performance.

5. Emergent Communication and Conceptual Decomposition

In multi-agent coordination games, sequential group composition arises as agents must construct linguistic messages that describe conjunctive concepts—each “phrase” corresponding to a Boolean function of object features (Carmeli et al., 15 Jan 2026). The Composition through Decomposition (CtD) protocol implements this in two explicit stages:

Decompose: Agents learn to associate each atomic concept (single feature–value pair) with a discrete codeword in a codebook, using a straight-through estimator for gradient flow in non-differentiable quantization.
Compose: Without further training, agents interpret unseen feature conjunctions by selecting the corresponding set of atomic codes, constructing messages as unordered bags of code-vectors.

Zero-shot generalization is evaluated by freezing the codebook learned in Decompose and testing on composite-phrase data; CtD achieves near-perfect compositional generalization across several structured datasets, with metrics including accuracy (ACC), adjusted mutual information (AMI), disentanglement (POS, BOS), context independence (CI), and best matching (CBM). For instance, on the “Thing” dataset, CtD achieves ACC=1.00, AMI=1.00, CI=1.00, CBM=1.00 (Carmeli et al., 15 Jan 2026). This strong zero-shot performance demonstrates the effectiveness of disentangled codebook learning for compositional communication.

6. Sequential Group Composition in Compositional RL and Task Inference

Real-world tasks often decompose into a sequence of sub-tasks or groupings, rendering the sequential group composition formalism central to meta-RL and hierarchical reinforcement learning. The OCEAN (Online ContExt AdaptatioN) framework addresses online inference for compositional MDPs by maintaining:

Global latent variable $z^{(g)}$ : Encodes the mixture or sequence of group modules ( $G_1 \rightarrow G_2 \rightarrow \ldots$ ), with priors adapted (categorical, Dirichlet, Gaussian) to the expected structural constraints.
Local latent variables $\{z_t^{(\ell)}\}$ : Markovian or RNN-structured, indicating sub-task identity (“current group index or sub-step”).
Variational inference: Product-of-experts for global context ( $q_\phi^{(g)}$ ) and VRNN-style local inference ( $q_\phi^{(\ell)}$ ), trained to maximize ELBO regularized with maximum-entropy RL objectives.

OCEAN enables inference of both task structure (which groups are present/required and in what order) and real-time position within the group sequence, without hand-labeled sub-task boundaries (Ren et al., 2020). When applied to multi-stage and sequentially-structured control benchmarks (e.g., MuJoCo tasks with goal/direction switches), OCEAN yields $2$– $5\times$ faster adaptation than prior meta-RL baselines. Extension to generic sequential group composition involves setting the cardinality of $z^{(g)}$ and the transition structure of $z_t^{(\ell)}$ according to domain knowledge about group sequences and possible state transitions.

7. Empirical Findings and Implications

Extensive experimentation across algebraic group composition (Marchetti et al., 3 Feb 2026), function composition in humans and machines (Zhou et al., 2024), emergent communication (Carmeli et al., 15 Jan 2026), and online RL (Ren et al., 2020) establishes fundamental properties:

Zero-shot compositionality: Both humans and suitably meta-learned models achieve high accuracy composing unseen function pairs, under all interaction types, indicating flexible, context-sensitive compositional generalization (Zhou et al., 2024).
Depth-structured solutions: RNNs and tree-structured MLPs leverage associativity, achieving perfect task performance with only polynomial width, while two-layer models scale exponentially in width with sequence length (Marchetti et al., 3 Feb 2026).
Task structure impact: Human and model performance is insensitive to interaction type (feeding, bleeding, etc.), with error patterns reflecting function order errors, input copying, and misuse of irrelevant features (Zhou et al., 2024).
Probabilistic task inference: OCEAN supports flexible adaptation to unknown, sequentially composed sub-task structure, inferring group order and boundaries online without labeled sub-tasks (Ren et al., 2020).
Conceptual disentanglement: Codebook-based emergent communication protocols generalize compositionally by reusing atomic concept codes, even in untrained conjunctions (Carmeli et al., 15 Jan 2026).

These findings collectively suggest that sequential group composition tasks operate as a tractable microcosm for understanding the interplay of symmetry, architecture, gradient-based learning, and context-sensitive reasoning in deep learning and human cognition.