- The paper introduces a framework that extends linear G-CNNs to non-linear equivariant neural network layers via integral transforms with feature-dependent kernels.
- It proves that any G-equivariant operator can be expressed as an integral transform satisfying a generalized steerability constraint.
- This unifying framework connects diverse architectures, including self-attention and LieTransformer, offering insights for designing expressive equivariant networks.
This paper introduces a novel mathematical framework for constructing and understanding non-linear equivariant neural network layers on homogeneous spaces. Building upon the established theory of linear equivariant convolutions (G-CNNs) on these spaces, the authors generalize the concept to arbitrary non-linear operators using integral transforms with input-dependent kernels.
The core idea is to represent an operator Φ mapping input feature maps f (sections of a vector bundle over a homogeneous space G/H, equivalent to functions in an induced representation Iρ) to output feature maps (in Iσ) as an integral:
[Φf](g)=∫G′ω(f,g,g′)dg′
where g,g′∈G (or possibly different groups G,G′) and ω:Iρ×G×G′→Vσ is a map that takes the entire input feature field f as an argument, in addition to points g and g′. The space of such maps ω must satisfy a compatibility constraint ω(f,gh,g′)=σ(h−1)ω(f,g,g′) for h∈H to ensure the output is in the target induced representation Iσ. The authors prove that any map Φ:Iρ→Iσ can be expressed in this integral form by choosing ω(f,g,g′)=δ(g′)λ[f](g), where λ=Φ and δ is the Dirac delta distribution, interpreted in a distributional sense.
The paper then derives the crucial constraint on ω for the operator Φ to be G-equivariant, meaning Φ(kf)=k(Φf) for all group elements k∈G. This condition is ω(f,g,g′)=ω(kf,kg,g′), which means ω must be constant on the G-orbits of the combined space Iρ×G×G′. This equivariance allows the map ω to be reduced to a two-argument map ω^(g−1f,g′), where g−1f represents the input feature field centered at g. The resulting equivariant non-linear operator takes the form:
[Φf](g)=∫G′ω^(g−1f,g′)dg′
The map ω^:Iρ×G′→Vσ must satisfy a generalized steerability constraint: ω^(hf,g′h′)=σ(h)ω^(f,g′) for h∈H,h′∈H′, where the action on f∈Iρ is defined by the induced representation structure. The authors show that any G-equivariant operator λ:Iρ→Iσ can be represented by this form with ω^(g−1f,g′)=δ(g′)λ[g−1f](e).
The paper highlights the distinction between this non-linear framework and the linear G-CNNs. In the linear case, the kernel κ^(g′) is independent of the feature map f, and the integrand is a simple product κ^(g−1g′)f(g′). The domain of the kernel can be further reduced from G to the double coset space H∖G/H′. In the non-linear setting, the integrand ω^(g−1f,g′) depends arbitrarily on the feature map f, making the operation non-linear. The domain of ω^ is Iρ×G′, and the dependence on f prevents the same kind of domain reduction seen in the linear case.
A significant contribution is demonstrating how various established equivariant neural network architectures are special cases of this framework, characterized by specific forms of the ω^ map:
- G-CNNs (linear convolutions): ω^(g−1f,g′)=κ^(g′)[g−1f](g′), where κ^ is a steerable kernel independent of f. [cohen-theory-equivariant-hom]
- Implicit Steerable CNNs [implicit-kernels]: For the Euclidean group on Rd, this corresponds to a kernel κ dependent on position and features: ω^(g−1f,g′)=κ(g′,[g−1f](e),[g−1f](g′))[g−1f](g′). This generalizes the steerability condition to include feature dependence.
- Standard Self-Attention [vaswaniAttentionAllYou2017]: For a finite set and permutation group Sn, with trivial H representation, the Softmax attention mechanism takes the form of the general integral operator with ω^ dependent on query, key, and value projections of the features, including the Softmax normalization factor.
- Relative Position Bias Self-Attention [shaw-etal-2018-self]: For the translation group on integers, this corresponds to a similar Softmax form, but with the attention score incorporating a positional bias term ψ(g′) that depends only on the relative position g′.
- LieTransformer [hutchinsonLieTransformerEquivariantSelfattention2021]: This continuous attention mechanism on homogeneous spaces, originally formulated for trivial feature representations, is shown to fit the framework with ω^(g−1f,g′)=Zα(g−1f)1α([g−1f](e),[g−1f](g′),g′)WV[g−1f](g′), where α is the attention kernel function and Zα is the normalization.
The authors also discuss the mathematical foundations in a distributional sense in an appendix, providing a more rigorous backing for the universality claims.
In conclusion, the paper provides a unifying theoretical framework for non-linear equivariant operators on homogeneous spaces, demonstrating that both generalized convolutions and attention mechanisms can be seen as special cases of an integral transform with a feature-dependent kernel (ω^). This framework offers insights into the structural requirements for designing novel equivariant layers and connects disparate architectures within a common mathematical language based on group theory and functional analysis. The practical implications include guiding the design of more expressive and efficient equivariant neural networks, potentially leveraging techniques like implicit kernels for implementation.