Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 152 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Equivariant non-linear maps for neural networks on homogeneous spaces (2504.20974v1)

Published 29 Apr 2025 in cs.LG, math.RT, and stat.ML

Abstract: This paper presents a novel framework for non-linear equivariant neural network layers on homogeneous spaces. The seminal work of Cohen et al. on equivariant $G$-CNNs on homogeneous spaces characterized the representation theory of such layers in the linear setting, finding that they are given by convolutions with kernels satisfying so-called steerability constraints. Motivated by the empirical success of non-linear layers, such as self-attention or input dependent kernels, we set out to generalize these insights to the non-linear setting. We derive generalized steerability constraints that any such layer needs to satisfy and prove the universality of our construction. The insights gained into the symmetry-constrained functional dependence of equivariant operators on feature maps and group elements informs the design of future equivariant neural network layers. We demonstrate how several common equivariant network architectures - $G$-CNNs, implicit steerable kernel networks, conventional and relative position embedded attention based transformers, and LieTransformers - may be derived from our framework.

Summary

  • The paper introduces a framework that extends linear G-CNNs to non-linear equivariant neural network layers via integral transforms with feature-dependent kernels.
  • It proves that any G-equivariant operator can be expressed as an integral transform satisfying a generalized steerability constraint.
  • This unifying framework connects diverse architectures, including self-attention and LieTransformer, offering insights for designing expressive equivariant networks.

This paper introduces a novel mathematical framework for constructing and understanding non-linear equivariant neural network layers on homogeneous spaces. Building upon the established theory of linear equivariant convolutions (G-CNNs) on these spaces, the authors generalize the concept to arbitrary non-linear operators using integral transforms with input-dependent kernels.

The core idea is to represent an operator Φ\Phi mapping input feature maps ff (sections of a vector bundle over a homogeneous space G/HG/H, equivalent to functions in an induced representation Iρ\mathcal{I}_\rho) to output feature maps (in Iσ\mathcal{I}_\sigma) as an integral:

[Φf](g)=Gω(f,g,g)dg[\Phi f](g) = \int_{G'} \omega(f, g, g') dg'

where g,gGg, g' \in G (or possibly different groups G,GG, G') and ω:Iρ×G×GVσ\omega: \mathcal{I}_\rho \times G \times G' \to V_\sigma is a map that takes the entire input feature field ff as an argument, in addition to points gg and gg'. The space of such maps ω\omega must satisfy a compatibility constraint ω(f,gh,g)=σ(h1)ω(f,g,g)\omega(f, gh, g') = \sigma(h^{-1})\omega(f, g, g') for hHh \in H to ensure the output is in the target induced representation Iσ\mathcal{I}_\sigma. The authors prove that any map Φ:IρIσ\Phi: \mathcal{I}_\rho \to \mathcal{I}_\sigma can be expressed in this integral form by choosing ω(f,g,g)=δ(g)λ[f](g)\omega(f, g, g') = \delta(g')\lambda[f](g), where λ=Φ\lambda = \Phi and δ\delta is the Dirac delta distribution, interpreted in a distributional sense.

The paper then derives the crucial constraint on ω\omega for the operator Φ\Phi to be GG-equivariant, meaning Φ(kf)=k(Φf)\Phi(kf) = k(\Phi f) for all group elements kGk \in G. This condition is ω(f,g,g)=ω(kf,kg,g)\omega(f, g, g') = \omega(kf, kg, g'), which means ω\omega must be constant on the GG-orbits of the combined space Iρ×G×G\mathcal{I}_\rho \times G \times G'. This equivariance allows the map ω\omega to be reduced to a two-argument map ω^(g1f,g)\hat{\omega}(g^{-1}f, g'), where g1fg^{-1}f represents the input feature field centered at gg. The resulting equivariant non-linear operator takes the form:

[Φf](g)=Gω^(g1f,g)dg[\Phi f](g) = \int_{G'} \hat{\omega}(g^{-1}f, g') dg'

The map ω^:Iρ×GVσ\hat{\omega}: \mathcal{I}_\rho \times G' \to V_\sigma must satisfy a generalized steerability constraint: ω^(hf,gh)=σ(h)ω^(f,g)\hat{\omega}(hf, g'h') = \sigma(h)\hat{\omega}(f, g') for hH,hHh \in H, h' \in H', where the action on fIρf \in \mathcal{I}_\rho is defined by the induced representation structure. The authors show that any GG-equivariant operator λ:IρIσ\lambda: \mathcal{I}_\rho \to \mathcal{I}_\sigma can be represented by this form with ω^(g1f,g)=δ(g)λ[g1f](e)\hat{\omega}(g^{-1}f, g') = \delta(g')\lambda[g^{-1}f](e).

The paper highlights the distinction between this non-linear framework and the linear G-CNNs. In the linear case, the kernel κ^(g)\hat{\kappa}(g') is independent of the feature map ff, and the integrand is a simple product κ^(g1g)f(g)\hat{\kappa}(g^{-1}g')f(g'). The domain of the kernel can be further reduced from GG to the double coset space HG/HH \setminus G / H'. In the non-linear setting, the integrand ω^(g1f,g)\hat{\omega}(g^{-1}f, g') depends arbitrarily on the feature map ff, making the operation non-linear. The domain of ω^\hat{\omega} is Iρ×G\mathcal{I}_\rho \times G', and the dependence on ff prevents the same kind of domain reduction seen in the linear case.

A significant contribution is demonstrating how various established equivariant neural network architectures are special cases of this framework, characterized by specific forms of the ω^\hat{\omega} map:

  • G-CNNs (linear convolutions): ω^(g1f,g)=κ^(g)[g1f](g)\hat{\omega}(g^{-1}f, g') = \hat{\kappa}(g')[g^{-1}f](g'), where κ^\hat{\kappa} is a steerable kernel independent of ff. [cohen-theory-equivariant-hom]
  • Implicit Steerable CNNs [implicit-kernels]: For the Euclidean group on Rd\mathbb{R}^d, this corresponds to a kernel κ\kappa dependent on position and features: ω^(g1f,g)=κ(g,[g1f](e),[g1f](g))[g1f](g)\hat{\omega}(g^{-1} f, g') = \kappa( g', [g^{-1}f](e), [g^{-1}f](g') ) [g^{-1} f](g'). This generalizes the steerability condition to include feature dependence.
  • Standard Self-Attention [vaswaniAttentionAllYou2017]: For a finite set and permutation group SnS_n, with trivial HH representation, the Softmax attention mechanism takes the form of the general integral operator with ω^\hat{\omega} dependent on query, key, and value projections of the features, including the Softmax normalization factor.
  • Relative Position Bias Self-Attention [shaw-etal-2018-self]: For the translation group on integers, this corresponds to a similar Softmax form, but with the attention score incorporating a positional bias term ψ(g)\psi(g') that depends only on the relative position gg'.
  • LieTransformer [hutchinsonLieTransformerEquivariantSelfattention2021]: This continuous attention mechanism on homogeneous spaces, originally formulated for trivial feature representations, is shown to fit the framework with ω^(g1f,g)=1Zα(g1f)α([g1f](e),[g1f](g),g)WV[g1f](g)\hat{\omega}(g^{-1}f,g') = \frac{1}{\mathcal{Z}_\alpha(g^{-1}f)} \alpha( [g^{-1}f](e), [g^{-1}f](g'), g' ) W_V[g^{-1}f](g'), where α\alpha is the attention kernel function and Zα\mathcal{Z}_\alpha is the normalization.

The authors also discuss the mathematical foundations in a distributional sense in an appendix, providing a more rigorous backing for the universality claims.

In conclusion, the paper provides a unifying theoretical framework for non-linear equivariant operators on homogeneous spaces, demonstrating that both generalized convolutions and attention mechanisms can be seen as special cases of an integral transform with a feature-dependent kernel (ω^\hat{\omega}). This framework offers insights into the structural requirements for designing novel equivariant layers and connects disparate architectures within a common mathematical language based on group theory and functional analysis. The practical implications include guiding the design of more expressive and efficient equivariant neural networks, potentially leveraging techniques like implicit kernels for implementation.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 115 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube