Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning from Frustration: Torsor CNNs on Graphs

Published 27 Oct 2025 in cs.LG and math.AT | (2510.23288v1)

Abstract: Most equivariant neural networks rely on a single global symmetry, limiting their use in domains where symmetries are instead local. We introduce Torsor CNNs, a framework for learning on graphs with local symmetries encoded as edge potentials -- group-valued transformations between neighboring coordinate frames. We establish that this geometric construction is fundamentally equivalent to the classical group synchronization problem, yielding: (1) a Torsor Convolutional Layer that is provably equivariant to local changes in coordinate frames, and (2) the frustration loss -- a standalone geometric regularizer that encourages locally equivariant representations when added to any NN's training objective. The Torsor CNN framework unifies and generalizes several architectures -- including classical CNNs and Gauge CNNs on manifolds -- by operating on arbitrary graphs without requiring a global coordinate system or smooth manifold structure. We establish the mathematical foundations of this framework and demonstrate its applicability to multi-view 3D recognition, where relative camera poses naturally define the required edge potentials.

Summary

  • The paper introduces Torsor CNNs, a novel framework unifying gauge-equivariant learning and group synchronization for arbitrary graphs with local symmetries.
  • It demonstrates practical benefits in multi-view 3D recognition by reducing intra-class variance and improving retrieval performance through explicit feature synchronization and frustration regularization.
  • The approach offers a rigorous mathematical foundation using discrete analogues of principal bundles and sheaf theory, inspiring robust geometric regularization in deep learning architectures.

Torsor CNNs: Gauge-Equivariant Learning on Graphs with Local Symmetries

Motivation and Framework

The paper introduces Torsor CNNs, a discrete geometric deep learning framework for graphs endowed with local symmetries, formalized via edge potentials—group-valued transformations encoding the relationship between neighboring coordinate frames. Unlike classical CNNs and G-CNNs, which rely on global symmetry groups acting transitively on the domain, or Gauge CNNs, which require smooth manifold structure and parallel transport, Torsor CNNs operate on arbitrary graphs without a global coordinate system or smoothness assumptions. This generality is achieved by leveraging the mathematical equivalence between gauge-equivariant learning and the group synchronization problem, a well-studied topic in robotics, vision, and distributed sensing.

The central construction is as follows: for each edge (u,v)(u,v) in a graph X=(V,E)X=(V,E), an edge potential ψuvG\psi_{uv} \in G (e.g., a rotation matrix in SO(3)SO(3)) specifies how to transform features from vv's local frame to uu's. Feature fields are modeled as global sections of an associated vector sheaf, and synchronization is enforced by requiring fu=ρ(ψuv)fvf_u = \rho(\psi_{uv}) f_v for all edges, where ρ\rho is a representation of GG on the feature space FF. The deviation from perfect synchronization is quantified by the frustration loss, which is gauge-invariant and can be used as a regularizer in arbitrary neural architectures.

Mathematical Formalism

The framework is grounded in discrete analogues of principal bundles and sheaf theory. A network GG-torsor is constructed from edge potentials, with stalks at vertices and edges given by GG-torsors and restriction maps encoding the group action. The associated vector sheaf E=Pψ×ρF\mathcal{E} = P^\psi \times_\rho F models feature fields as equivalence classes [p,w][p, w], where pp is a local frame and ww a feature vector, with the relation (pg,w)(p,ρ(g)w)(p \cdot g, w) \sim (p, \rho(g) w) for gGg \in G.

Gauge transformations γ:VG\gamma: V \to G act by changing local frames, transforming edge potentials as ψuv=γu1ψuvγv\psi'_{uv} = \gamma_u^{-1} \psi_{uv} \gamma_v and features as fv=ρ(γv)1fvf'_v = \rho(\gamma_v)^{-1} f_v. The frustration loss

ηF(f;X,ψ)=1vol(X)(u,v)Efuρ(ψuv)fv2\eta_F(f; X, \psi) = \frac{1}{\mathrm{vol}(X)} \sum_{(u,v) \in E} \|f_u - \rho(\psi_{uv}) f_v\|^2

is strictly zero if and only if ff is a global section (i.e., perfectly synchronized), and is invariant under gauge transformations.

Torsor Convolutional Layers

The Torsor Convolutional Layer is a gauge-equivariant linear operator on feature assignments FVF^V, parameterized by a GG-equivariant intertwiner K:FinFoutK: F_{\text{in}} \to F_{\text{out}} satisfying K(ρin(g)w)=ρout(g)K(w)K(\rho_{\text{in}}(g) w) = \rho_{\text{out}}(g) K(w) for all gGg \in G. For vertex vv, the output is

fout(v)=1cvuvwuvK(ρin(ψuv)1fin(u))f_{\text{out}}(v) = \frac{1}{c_v} \sum_{u \sim v} w_{uv} K\left(\rho_{\text{in}}(\psi_{uv})^{-1} f_{\text{in}}(u)\right)

where wuvw_{uv} are edge weights and cvc_v is a normalization factor. This construction ensures that global sections are preserved and the layer is equivariant under gauge transformations. Nonlinearities must also be equivariant, which restricts admissible activation functions to those compatible with the group representation (e.g., norm-based, tensor product, or gated nonlinearities).

The framework subsumes classical CNNs (translation equivariance on grids), G-CNNs (equivariance on homogeneous spaces), and Gauge CNNs (local symmetries on manifolds) as special cases, with appropriate choices of graph structure, group GG, and edge potentials.

Empirical Evaluation: Multi-View 3D Recognition

The practical utility of Torsor CNNs is demonstrated on multi-view 3D object recognition (ModelNet40), where each object is observed from multiple camera viewpoints with known relative poses. The view graph is constructed with vertices as camera views and edge potentials ψijSO(3)\psi_{ij} \in SO(3) computed from camera rotations. Two approaches are evaluated:

  1. Direct Torsor CNN Implementation: Features are explicitly transported between views using the edge potentials, and synchronized to a reference view for pooling. This yields global descriptors with reduced intra-class variance, facilitating classification and retrieval tasks. In metric learning (e.g., triplet loss), alignment via known transformations collapses intra-class distances, simplifying optimization and improving mean average precision (mAP).
  2. Frustration Regularization: The frustration loss is added to the objective of standard multi-view networks (MVCNN, EMVN), encouraging feature consistency across views without architectural changes. This regularization improves convergence, robustness to noisy or missing views, and retrieval performance.

Numerical results indicate that both approaches reduce intra-class variance and improve retrieval mAP, with the frustration loss providing a flexible geometric regularizer for arbitrary architectures.

Theoretical and Practical Implications

The equivalence between gauge-equivariant learning and group synchronization provides a unified perspective on geometric deep learning, connecting neural architectures to classical problems in robotics, vision, and distributed sensing. The discrete torsor formalism enables learning on domains with heterogeneous local symmetries, such as molecular graphs with mixed symmetry groups, and is compatible with sheaf-theoretic generalizations.

Practically, the framework facilitates the incorporation of domain knowledge (e.g., sensor orientations, camera poses) into neural models, improving data efficiency and generalization. The frustration loss offers an immediate path to geometric regularization, and the torsor convolutional layer can be implemented as a reusable module in graph neural network libraries.

Future Directions

Potential extensions include support for heterogeneous structure groups at different nodes, richer edge-dependent kernels for expressive local interactions, and standardized implementations of torsor-aware layers and regularizers. The framework is well-suited for distributed learning scenarios, sensor networks, and scientific domains where global coordinate systems are unavailable or ill-defined.

Conclusion

Torsor CNNs provide a principled, mathematically rigorous approach to learning on graphs with local symmetries, unifying and generalizing existing equivariant architectures. The frustration loss serves as a gauge-invariant regularizer, and the torsor convolutional layer enables explicit geometric consistency. These tools are broadly applicable to domains with distributed, locally structured data, and offer a foundation for future developments in geometric and topological deep learning.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 72 likes about this paper.