Learning from Frustration: Torsor CNNs on Graphs

Published 27 Oct 2025 in cs.LG and math.AT | (2510.23288v1)

Abstract: Most equivariant neural networks rely on a single global symmetry, limiting their use in domains where symmetries are instead local. We introduce Torsor CNNs, a framework for learning on graphs with local symmetries encoded as edge potentials -- group-valued transformations between neighboring coordinate frames. We establish that this geometric construction is fundamentally equivalent to the classical group synchronization problem, yielding: (1) a Torsor Convolutional Layer that is provably equivariant to local changes in coordinate frames, and (2) the frustration loss -- a standalone geometric regularizer that encourages locally equivariant representations when added to any NN's training objective. The Torsor CNN framework unifies and generalizes several architectures -- including classical CNNs and Gauge CNNs on manifolds -- by operating on arbitrary graphs without requiring a global coordinate system or smooth manifold structure. We establish the mathematical foundations of this framework and demonstrate its applicability to multi-view 3D recognition, where relative camera poses naturally define the required edge potentials.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Torsor CNNs, a novel framework unifying gauge-equivariant learning and group synchronization for arbitrary graphs with local symmetries.
It demonstrates practical benefits in multi-view 3D recognition by reducing intra-class variance and improving retrieval performance through explicit feature synchronization and frustration regularization.
The approach offers a rigorous mathematical foundation using discrete analogues of principal bundles and sheaf theory, inspiring robust geometric regularization in deep learning architectures.

Torsor CNNs: Gauge-Equivariant Learning on Graphs with Local Symmetries

Motivation and Framework

The paper introduces Torsor CNNs, a discrete geometric deep learning framework for graphs endowed with local symmetries, formalized via edge potentials—group-valued transformations encoding the relationship between neighboring coordinate frames. Unlike classical CNNs and G-CNNs, which rely on global symmetry groups acting transitively on the domain, or Gauge CNNs, which require smooth manifold structure and parallel transport, Torsor CNNs operate on arbitrary graphs without a global coordinate system or smoothness assumptions. This generality is achieved by leveraging the mathematical equivalence between gauge-equivariant learning and the group synchronization problem, a well-studied topic in robotics, vision, and distributed sensing.

The central construction is as follows: for each edge $(u,v)$ in a graph $X=(V,E)$ , an edge potential $\psi_{uv} \in G$ (e.g., a rotation matrix in $SO(3)$ ) specifies how to transform features from $v$ 's local frame to $u$ 's. Feature fields are modeled as global sections of an associated vector sheaf, and synchronization is enforced by requiring $f_u = \rho(\psi_{uv}) f_v$ for all edges, where $\rho$ is a representation of $G$ on the feature space $F$ . The deviation from perfect synchronization is quantified by the frustration loss, which is gauge-invariant and can be used as a regularizer in arbitrary neural architectures.

Mathematical Formalism

The framework is grounded in discrete analogues of principal bundles and sheaf theory. A network $G$ -torsor is constructed from edge potentials, with stalks at vertices and edges given by $G$ -torsors and restriction maps encoding the group action. The associated vector sheaf $\mathcal{E} = P^\psi \times_\rho F$ models feature fields as equivalence classes $[p, w]$ , where $p$ is a local frame and $w$ a feature vector, with the relation $(p \cdot g, w) \sim (p, \rho(g) w)$ for $g \in G$ .

Gauge transformations $\gamma: V \to G$ act by changing local frames, transforming edge potentials as $\psi'_{uv} = \gamma_u^{-1} \psi_{uv} \gamma_v$ and features as $f'_v = \rho(\gamma_v)^{-1} f_v$ . The frustration loss

$\eta_F(f; X, \psi) = \frac{1}{\mathrm{vol}(X)} \sum_{(u,v) \in E} \|f_u - \rho(\psi_{uv}) f_v\|^2$

is strictly zero if and only if $f$ is a global section (i.e., perfectly synchronized), and is invariant under gauge transformations.

Torsor Convolutional Layers

The Torsor Convolutional Layer is a gauge-equivariant linear operator on feature assignments $F^V$ , parameterized by a $G$ -equivariant intertwiner $K: F_{\text{in}} \to F_{\text{out}}$ satisfying $K(\rho_{\text{in}}(g) w) = \rho_{\text{out}}(g) K(w)$ for all $g \in G$ . For vertex $v$ , the output is

$f_{\text{out}}(v) = \frac{1}{c_v} \sum_{u \sim v} w_{uv} K\left(\rho_{\text{in}}(\psi_{uv})^{-1} f_{\text{in}}(u)\right)$

where $w_{uv}$ are edge weights and $c_v$ is a normalization factor. This construction ensures that global sections are preserved and the layer is equivariant under gauge transformations. Nonlinearities must also be equivariant, which restricts admissible activation functions to those compatible with the group representation (e.g., norm-based, tensor product, or gated nonlinearities).

The framework subsumes classical CNNs (translation equivariance on grids), G-CNNs (equivariance on homogeneous spaces), and Gauge CNNs (local symmetries on manifolds) as special cases, with appropriate choices of graph structure, group $G$ , and edge potentials.

Empirical Evaluation: Multi-View 3D Recognition

The practical utility of Torsor CNNs is demonstrated on multi-view 3D object recognition (ModelNet40), where each object is observed from multiple camera viewpoints with known relative poses. The view graph is constructed with vertices as camera views and edge potentials $\psi_{ij} \in SO(3)$ computed from camera rotations. Two approaches are evaluated:

Direct Torsor CNN Implementation: Features are explicitly transported between views using the edge potentials, and synchronized to a reference view for pooling. This yields global descriptors with reduced intra-class variance, facilitating classification and retrieval tasks. In metric learning (e.g., triplet loss), alignment via known transformations collapses intra-class distances, simplifying optimization and improving mean average precision (mAP).
Frustration Regularization: The frustration loss is added to the objective of standard multi-view networks (MVCNN, EMVN), encouraging feature consistency across views without architectural changes. This regularization improves convergence, robustness to noisy or missing views, and retrieval performance.

Numerical results indicate that both approaches reduce intra-class variance and improve retrieval mAP, with the frustration loss providing a flexible geometric regularizer for arbitrary architectures.

Theoretical and Practical Implications

The equivalence between gauge-equivariant learning and group synchronization provides a unified perspective on geometric deep learning, connecting neural architectures to classical problems in robotics, vision, and distributed sensing. The discrete torsor formalism enables learning on domains with heterogeneous local symmetries, such as molecular graphs with mixed symmetry groups, and is compatible with sheaf-theoretic generalizations.

Practically, the framework facilitates the incorporation of domain knowledge (e.g., sensor orientations, camera poses) into neural models, improving data efficiency and generalization. The frustration loss offers an immediate path to geometric regularization, and the torsor convolutional layer can be implemented as a reusable module in graph neural network libraries.

Future Directions

Potential extensions include support for heterogeneous structure groups at different nodes, richer edge-dependent kernels for expressive local interactions, and standardized implementations of torsor-aware layers and regularizers. The framework is well-suited for distributed learning scenarios, sensor networks, and scientific domains where global coordinate systems are unavailable or ill-defined.

Conclusion

Torsor CNNs provide a principled, mathematically rigorous approach to learning on graphs with local symmetries, unifying and generalizing existing equivariant architectures. The frustration loss serves as a gauge-invariant regularizer, and the torsor convolutional layer enables explicit geometric consistency. These tools are broadly applicable to domains with distributed, locally structured data, and offer a foundation for future developments in geometric and topological deep learning.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

Learning from Frustration: Torsor CNNs on Graphs

Summary

Torsor CNNs: Gauge-Equivariant Learning on Graphs with Local Symmetries

Motivation and Framework

Mathematical Formalism

Torsor Convolutional Layers

Empirical Evaluation: Multi-View 3D Recognition

Theoretical and Practical Implications

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Learning from Frustration: Torsor CNNs on Graphs

Summary

Torsor CNNs: Gauge-Equivariant Learning on Graphs with Local Symmetries

Motivation and Framework

Mathematical Formalism

Torsor Convolutional Layers

Empirical Evaluation: Multi-View 3D Recognition

Theoretical and Practical Implications

Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research