REMUL: Relaxed Equivariance via Multitask Learning

Updated 12 October 2025

The paper proposes REMUL, where relaxed equivariance is achieved by penalizing symmetry deviations through multitask learning objectives.
It employs tensor priors, regularization, and attention mechanisms to adapt representations when symmetries are partly broken.
Empirical results across vision, molecular dynamics, and reinforcement learning demonstrate improved generalization, interpretability, and efficiency.

Relaxed Equivariance via Multitask Learning (REMUL) is a modern paradigm in representation learning where equivariance—traditionally enforced as a hard architectural constraint—is instead approached as a flexible, data-driven property, tuned via multitask or regularized training objectives. The REMUL methodology can be instantiated through diverse model classes, including deep networks, convolutional architectures, graph neural networks, and Riemannian manifold optimizations. This approach is motivated by practical scenarios where exact symmetries (e.g., rigid roto-translational equivariance) are absent or partially broken, as in vision, molecular graphs, and dynamic systems. REMUL enables efficient generalization, interpretability, and negative transfer mitigation, often with substantial computational benefits over strict equivariant baselines (Elhag et al., 23 Oct 2024, Wu et al., 22 Aug 2024, Hsu et al., 22 Aug 2025, Chen et al., 5 May 2025).

1. Mathematical Foundations of Relaxed Equivariance

Traditionally, equivariant neural models satisfy $f(\varphi(g)x) = \rho(g)f(x)$ for all $x$ and $g$ in a group $G$ , with $\varphi$ and $\rho$ representing group actions on input and output spaces, respectively. In REMUL, this constraint is replaced by a multitask learning objective that penalizes deviations from equivariance:

$\mathcal{L}_\text{total}(f_\theta, \mathcal{X}, \mathcal{Y}, G) = \alpha \mathcal{L}_\text{obj}(f_\theta, \mathcal{X}, \mathcal{Y}) + \beta \mathcal{L}_\text{equi}(f_\theta, \mathcal{X}, \mathcal{Y}, G)$

Here, $\mathcal{L}_\text{obj}$ is the conventional supervised or regression loss, while $\mathcal{L}_\text{equi}$ quantifies violation of equivariance:

$\mathcal{L}_\text{equi} = \sum_i \ell(f_\theta(\varphi(g_i)x_i), \rho(g_i)y_i)$

The hyperparameter $\beta$ modulates the strength of equivariant regularization, allowing practitioners to interpolate between strict equivariance ( $\beta \gg 1$ ) and unconstrained learning ( $\beta \approx 0$ ) (Elhag et al., 23 Oct 2024). In practice, this is implemented in architectures such as Transformers or GNNs, where standard layers are augmented with an equivariance penalty but otherwise remain unconstrained.

2. Multilinear and Covariance-Based Relationship Modeling

REMUL's connection to earlier multitask frameworks is evident in models such as Multilinear Relationship Networks (MRN) (Long et al., 2015) and matrix-variate multitask formulations (Zhao et al., 2017). Specifically, MRN regularizes deep task-specific layers by placing a tensor-normal prior over parameter tensors:

Parameters for $T$ tasks in layer $\ell$ are organized into a tensor $\mathcal{W}^\ell \in \mathbb{R}^{D_1^\ell \times D_2^\ell \times T}$ .
A tensor-normal prior with Kronecker-decomposable covariance:

$\operatorname{vec}(\mathcal{W}^\ell) \sim \mathcal{N}(\operatorname{vec}(\mathcal{O}), \Sigma_1^\ell \otimes \Sigma_2^\ell \otimes \Sigma_3^\ell)$

$\Sigma_1^\ell$ , $\Sigma_2^\ell$ , $\Sigma_3^\ell$ capture correlations among features, classes, and tasks, respectively. By learning these covariance matrices, MRN and similar REMUL approaches enable adaptive, fine-grained sharing and "relaxed equivariance" across tasks—preventing negative transfer when tasks are dissimilar and promoting under-transfer when additional sharing aids generalization.

The block-coordinate optimization and graph-theoretic reductions utilized in efficient multitask learning (Zhao et al., 2017) facilitate scalable estimation of these covariance structures, even when sample sizes are small relative to feature/task dimensionality.

3. Attention, Modulation, and Multiplicative Mechanisms

REMUL often modulates shared network features via learnable attention or multiplicative mechanisms that adapt representations to each task (Lekkala et al., 2020, Levi et al., 2020, Wang et al., 2016). In one paradigm, each task's parameter vector is factorized as $\alpha_t = c \circ \beta_t$ :

$c$ is global, shared across tasks, indicating global feature utility.
$\beta_t$ is task-specific, enabling local suppression or activation.

Optimization proceeds with a multitask loss plus regularizers on both $c$ and $\beta_t$ , where the choice of $\ell_p$ and $\ell_k$ norms (sparsity vs. smoothness) enables control over the "relaxation" of equivariance. Analytical closed-form relations tie the shrinkage of shared and task-specific components to their respective norms and penalty strengths, permitting highly adaptive sharing (Wang et al., 2016).

Attention modules in REMUL typically operate over feature maps, computing per-task channel or spatial weighting—thus enabling the same backbone to specialize per task (Lekkala et al., 2020). This mechanism generalizes the concept of relaxed equivariance: invariant features can be reweighted according to the demands of each task or domain, allowing soft adaptation without architectural rigidity.

4. Explicit Regularization and Gradient-Based Relaxation

A recent development is the introduction of explicit regularization terms that operate directly on saliency maps or gradient structures (Bai et al., 2022, Shin et al., 25 Sep 2024). For instance, Saliency-Regularized Deep Multi-Task Learning (SRDML) aligns the input gradients $\nabla_A f_t$ between task heads $f_t, f_{t'}$ , so that related tasks—defined by similar gradient maps—become aligned in function space:

$\min_{h, \{f_t\}, \omega} \sum_t \mathcal{L}_t(f_t(h(X)), Y_t) + \lambda \sum_{1 \leq i < j \leq T} \omega_{ij} \cdot \text{dist}(\nabla_A f_i, \nabla_A f_j)$

This equivalence between functional and gradient similarity (see Theorem 1 in (Bai et al., 2022)) permits adaptive, interpretable control over sharing: tasks that are dissimilar can reduce regularization weights, avoiding negative transfer, while similar tasks remain strongly coupled.

Dummy Gradient norm Regularization (DGR) (Shin et al., 25 Sep 2024) further promotes universality of the shared encoder by penalizing the gradient norm with respect to dummy (arbitrary) task predictors, ensuring that representations are flexible and relaxed enough to generalize well across diverse tasks and classifiers.

5. Adaptivity in Symmetry and Equivariance Constraints

The core of REMUL's advantage is in decoupling strict, hard-coded symmetry from learned, data-driven constraints (Ouderaa et al., 2022, Wu et al., 22 Aug 2024, Park et al., 6 Nov 2024). Relaxed rotational equivariance is implemented via learnable $G$ -Biases added to group convolution filters, allowing small but crucial deviations per rotation element:

Strict GConv (SRE): filters strictly shared across group elements.
RREConv: filter for group element $i$ is $F^{(G, i)} + B^{(G, i)}$ , where $B^{(G, i)}$ is a learnable bias.
Empirical results indicate significant gains in classification and object detection accuracy on natural image datasets, even with minimal parameter overhead (Wu et al., 22 Aug 2024).

Non-stationary kernels and learnable frequency parameters in convolutional architectures provide continuous control over equivariance, allowing gradient-based adjustment of symmetry strength during training, subject to regularization priors that bias toward useful symmetry (Ouderaa et al., 2022).

In reinforcement learning, approximately equivariant architectures (using relaxed group and steerable convolutions) outperform strictly equivariant networks in domains where symmetries are only approximate, and show increased robustness to noise (Park et al., 6 Nov 2024). Theoretical results quantify the error bound on $Q$ -function deviation under relaxed symmetry (see Theorem 1 in (Park et al., 6 Nov 2024)).

6. Geometry-Aware and Manifold-Based Relaxed Equivariance

Embedding shared representations on Riemannian manifolds enables further control and robustness for REMUL under heterogeneous task distributions (Chen et al., 5 May 2025). In GeoERM, the shared representation matrix $A^{(t)}$ for each task is constrained to the Stiefel manifold:

Riemannian gradient step: projects the Euclidean gradient onto the tangent space.
Polar retraction: ensures manifold fidelity after the update.

These operations guarantee the alignment of representations with intrinsic geometry, reducing negative transfer and improving stability under adversarial or outlier tasks, which is crucial for relaxed equivariance across tasks that vary widely in structure or label noise.

7. Applications and Empirical Performance

REMUL methods have been validated in a variety of domains. Specific findings include:

Dynamical system modeling (e.g., N-body problems): REMUL-trained Transformers achieve low equivariance error and perform competitively or better than hard equivariant architectures, at much faster training and inference rates (Elhag et al., 23 Oct 2024).
Human motion capture: Intermediate levels of learned equivariance optimize prediction accuracy when real-world symmetries are broken (Elhag et al., 23 Oct 2024).
Molecular dynamics: REMUL-based GNNs outperform EGNN baselines and data-augmented models, with optimal equivariance regularization varying per molecular structure (Elhag et al., 23 Oct 2024).
Surface property prediction in materials science: FIRE-GNN leverages surface-normal symmetry breaking and multitask learning to halve the test MAE for work function prediction compared to prior RF models, enabling rapid and accurate screening across chemical families (Hsu et al., 22 Aug 2025).
Vision: RREConv-augmented architectures show improvements in classification and object detection across discrete rotation groups and common image datasets (Wu et al., 22 Aug 2024).

Empirical studies consistently report gains in accuracy, reductions in generalization error, improved interpretability (via covariance or saliency maps), and pronounced computational speedups compared to strict equivariant baselines.

Relaxed Equivariance via Multitask Learning (REMUL) reconceptualizes equivariance as a learnable entity within a multitask or regularized framework. This enables deep learning systems to adapt symmetry constraints to the data, domain, or task, achieving flexible sharing, robustness to symmetry breaking, and computational efficiency. REMUL is readily compatible with a broad spectrum of model classes and optimization strategies, providing a principled avenue for learning structured, generalizable, and interpretable representations in modern AI systems.