- The paper introduces Torsor CNNs, a novel framework unifying gauge-equivariant learning and group synchronization for arbitrary graphs with local symmetries.
- It demonstrates practical benefits in multi-view 3D recognition by reducing intra-class variance and improving retrieval performance through explicit feature synchronization and frustration regularization.
- The approach offers a rigorous mathematical foundation using discrete analogues of principal bundles and sheaf theory, inspiring robust geometric regularization in deep learning architectures.
Torsor CNNs: Gauge-Equivariant Learning on Graphs with Local Symmetries
Motivation and Framework
The paper introduces Torsor CNNs, a discrete geometric deep learning framework for graphs endowed with local symmetries, formalized via edge potentials—group-valued transformations encoding the relationship between neighboring coordinate frames. Unlike classical CNNs and G-CNNs, which rely on global symmetry groups acting transitively on the domain, or Gauge CNNs, which require smooth manifold structure and parallel transport, Torsor CNNs operate on arbitrary graphs without a global coordinate system or smoothness assumptions. This generality is achieved by leveraging the mathematical equivalence between gauge-equivariant learning and the group synchronization problem, a well-studied topic in robotics, vision, and distributed sensing.
The central construction is as follows: for each edge (u,v) in a graph X=(V,E), an edge potential ψuv∈G (e.g., a rotation matrix in SO(3)) specifies how to transform features from v's local frame to u's. Feature fields are modeled as global sections of an associated vector sheaf, and synchronization is enforced by requiring fu=ρ(ψuv)fv for all edges, where ρ is a representation of G on the feature space F. The deviation from perfect synchronization is quantified by the frustration loss, which is gauge-invariant and can be used as a regularizer in arbitrary neural architectures.
The framework is grounded in discrete analogues of principal bundles and sheaf theory. A network G-torsor is constructed from edge potentials, with stalks at vertices and edges given by G-torsors and restriction maps encoding the group action. The associated vector sheaf E=Pψ×ρF models feature fields as equivalence classes [p,w], where p is a local frame and w a feature vector, with the relation (p⋅g,w)∼(p,ρ(g)w) for g∈G.
Gauge transformations γ:V→G act by changing local frames, transforming edge potentials as ψuv′=γu−1ψuvγv and features as fv′=ρ(γv)−1fv. The frustration loss
ηF(f;X,ψ)=vol(X)1(u,v)∈E∑∥fu−ρ(ψuv)fv∥2
is strictly zero if and only if f is a global section (i.e., perfectly synchronized), and is invariant under gauge transformations.
Torsor Convolutional Layers
The Torsor Convolutional Layer is a gauge-equivariant linear operator on feature assignments FV, parameterized by a G-equivariant intertwiner K:Fin→Fout satisfying K(ρin(g)w)=ρout(g)K(w) for all g∈G. For vertex v, the output is
fout(v)=cv1u∼v∑wuvK(ρin(ψuv)−1fin(u))
where wuv are edge weights and cv is a normalization factor. This construction ensures that global sections are preserved and the layer is equivariant under gauge transformations. Nonlinearities must also be equivariant, which restricts admissible activation functions to those compatible with the group representation (e.g., norm-based, tensor product, or gated nonlinearities).
The framework subsumes classical CNNs (translation equivariance on grids), G-CNNs (equivariance on homogeneous spaces), and Gauge CNNs (local symmetries on manifolds) as special cases, with appropriate choices of graph structure, group G, and edge potentials.
Empirical Evaluation: Multi-View 3D Recognition
The practical utility of Torsor CNNs is demonstrated on multi-view 3D object recognition (ModelNet40), where each object is observed from multiple camera viewpoints with known relative poses. The view graph is constructed with vertices as camera views and edge potentials ψij∈SO(3) computed from camera rotations. Two approaches are evaluated:
- Direct Torsor CNN Implementation: Features are explicitly transported between views using the edge potentials, and synchronized to a reference view for pooling. This yields global descriptors with reduced intra-class variance, facilitating classification and retrieval tasks. In metric learning (e.g., triplet loss), alignment via known transformations collapses intra-class distances, simplifying optimization and improving mean average precision (mAP).
- Frustration Regularization: The frustration loss is added to the objective of standard multi-view networks (MVCNN, EMVN), encouraging feature consistency across views without architectural changes. This regularization improves convergence, robustness to noisy or missing views, and retrieval performance.
Numerical results indicate that both approaches reduce intra-class variance and improve retrieval mAP, with the frustration loss providing a flexible geometric regularizer for arbitrary architectures.
Theoretical and Practical Implications
The equivalence between gauge-equivariant learning and group synchronization provides a unified perspective on geometric deep learning, connecting neural architectures to classical problems in robotics, vision, and distributed sensing. The discrete torsor formalism enables learning on domains with heterogeneous local symmetries, such as molecular graphs with mixed symmetry groups, and is compatible with sheaf-theoretic generalizations.
Practically, the framework facilitates the incorporation of domain knowledge (e.g., sensor orientations, camera poses) into neural models, improving data efficiency and generalization. The frustration loss offers an immediate path to geometric regularization, and the torsor convolutional layer can be implemented as a reusable module in graph neural network libraries.
Future Directions
Potential extensions include support for heterogeneous structure groups at different nodes, richer edge-dependent kernels for expressive local interactions, and standardized implementations of torsor-aware layers and regularizers. The framework is well-suited for distributed learning scenarios, sensor networks, and scientific domains where global coordinate systems are unavailable or ill-defined.
Conclusion
Torsor CNNs provide a principled, mathematically rigorous approach to learning on graphs with local symmetries, unifying and generalizing existing equivariant architectures. The frustration loss serves as a gauge-invariant regularizer, and the torsor convolutional layer enables explicit geometric consistency. These tools are broadly applicable to domains with distributed, locally structured data, and offer a foundation for future developments in geometric and topological deep learning.