Grassmann-Valued Neural Networks
- Grassmann-valued neural networks are architectures where parameters and activations reside on Grassmann manifolds or in Grassmann algebras, encoding subspace invariance and anticommutativity.
- They employ Riemannian operations—such as QR retraction, weighted Fréchet means, and gyro-addition—to enable backpropagation on curved spaces, enhancing performance in vision, graph, and physics tasks.
- These models extend into supersymmetric formulations with graded variables and Berezin integration, facilitating the simulation of fermionic field theories and quantum chemistry phenomena.
Grassmann-valued neural networks are a class of machine learning architectures in which parameters, activations, or weights take values in a Grassmann manifold or, in more algebraic extensions, in a Grassmann (exterior) algebra. These models natively respect geometric or algebraic structures characterized by anticommutativity or subspace invariance, enabling representations and transformations that are inaccessible to classical Euclidean or vector-space-based networks. The topic subsumes several directions: (1) manifold-based deep networks operating intrinsically on the Grassmannian of -planes in ; (2) physics-inspired neural models with Grassmann-valued (anticommuting) variables for fermionic field theory and supersymmetry; and (3) algebraic approaches extending classical neural architectures to graded spaces or exterior algebras.
1. Fundamentals of the Grassmann Manifold and Grassmann Algebra
The Grassmann manifold is the space of all -dimensional linear subspaces of . Each point corresponds to an equivalence class of matrices with orthonormal columns under right action by , i.e., with and identification , . The ambient Riemannian geometry is governed by principal angles between subspaces, with canonical metric for tangent vectors at .
Separately, the Grassmann (exterior) algebra on a vector space is generated by elements such that and . Arbitrary elements are linear sums with even and odd grading.
These structures admit two types of neural architectures:
- Manifold-valued neural networks: Layers, parameters, or activations remain on , employing Riemannian operations for geometric fidelity (Huang et al., 2016, Chakraborty et al., 2018, Nguyen et al., 2023, Bendokat et al., 2020, Nguyen et al., 29 May 2024).
- Grassmann algebra-valued/supersymmetric neural networks: Weights and activations take values in , encoding anticommuting data and facilitating field-theoretic or supersymmetric formulations (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025).
2. Deep Networks on Grassmann Manifolds
Several architectures generalize standard deep networks to operate intrinsically on . The canonical representation employs orthonormal-basis matrices or orthogonal projectors .
Key building blocks:
- Full-Rank Mapping (FRMap): Linear transformation with , often required to be row-full-rank to preserve subspace dimension.
- Re-orthonormalization (ReOrth): Post-processing via QR-decomposition to restore the orthonormal-column constraint, , set , the Stiefel manifold.
- Projection mapping and pooling: Embedding subspaces as symmetric projectors, , and performing pooling via weighted means or coordinatewise aggregation in manifold coordinates.
- Riemannian backpropagation: Computation of gradients is carried out by first computing Euclidean derivatives and then projecting onto tangent spaces using (Stiefel view) or analogous projector-space formulas (Bendokat et al., 2020, Nguyen et al., 29 May 2024, Huang et al., 2016).
ManifoldNet employs weighted Fréchet mean (wFM) operations as intrinsic analogues of convolution and pooling; in this setting, explicit nonlinearities (e.g., ReLU) can be omitted because the wFM layers are strict contractions under the geodesic metric (Chakraborty et al., 2018).
Training: Manifold optimization is performed via Riemannian gradient descent, exponential mapping for retractions, and, when required, re-normalization via QR or SVD.
Empirical results: Models such as GrNet and ManifoldNet demonstrate superior or competitive accuracy to Euclidean and SPD-manifold methods in vision, action recognition, and autoencoding tasks, frequently converging faster and with fewer parameters (Huang et al., 2016, Chakraborty et al., 2018).
3. Grassmann Layers and Shallow Network Integration
For shallow architectures or dimensionality reduction, a prevalent pattern is to impose a Grassmann constraint on the input linear projection, with , and learn this subspace simultaneously with downstream neural parameters.
Formulation:
- Three-layer architecture: Input mapped by (Grassmann projection), followed by a Euclidean two-layer (ReLU) surrogate network (Bollinger et al., 2020).
- Alternating minimization: Empirical loss joint over and all NN weights, with updated by Riemannian gradient and retraction, and by classical SGD or ADAM.
Theory: Approximation accuracy is linked to active-subspace theory, with provable error bounds in terms of the trailing eigenvalues of the input-output Jacobian covariance (Bollinger et al., 2020).
Numerics: QR or SVD-based retraction of is used after each update. The approach yields strong generalization, particularly in data-scarce regimes in scientific domains such as CFD and aerospace engineering.
4. Gyrovector-Space and Group-Theoretic Grassmann Networks
Group-theoretic generalizations, particularly via gyrovector-space constructions, imbue Grassmannian neural layers with operations analogous to translation, scalar multiplication, and non-Euclidean addition.
Fundamental operations:
- Gyro-addition (): , with and commutator (Nguyen et al., 2023).
- Grassmann batch normalization: Centering by Fréchet mean in the manifold, U .
- Graph and convolutional Grassmann layers: Node features are aggregated using weighted gyro-addition, and message passing aggregates in the tangent space before exponential mapping back to the manifold (Nguyen et al., 29 May 2024).
Gradient computation and optimization are carried out using ambient gradients projected to tangent spaces, retraction via exponential mapping or QR, and backpropagation through matrix logarithm and exponential.
Empirical advantage: Models built on these principles (e.g., GrNet, GyroGr, Gr-GCN++) achieve improved performance in human action recognition (HDM05, NTU60, FPHA), node classification (Cora, Pubmed), and knowledge graph embedding tasks (Nguyen et al., 2023, Nguyen et al., 29 May 2024, Huang et al., 2016).
5. Grassmann Algebra, Supersymmetry, and Fermionic Neural Networks
Grassmann-valued neural networks, as defined in the context of supersymmetry and quantum field theory, incorporate variables and weights in , supporting anticommutativity and nilpotency (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025).
Construction principles:
- Super-neuron architecture: Input and outputs are -graded (even/odd), and layers are built from block matrices respecting parity. Block-wise multiplication and sign bookkeeping follow graded algebra rules; bias and activations can be defined at the level of superfunctions (e.g., super-ReLU) (Shaska, 26 Jul 2024).
- Backpropagation: Gradients and error propagation require graded chain rule with sign tracking, and the loss is extracted from the even (scalar) part of the Grassmann-valued functionals.
- Supersymmetric and fermionic field theory models: The large-width limit of a Grassmann-valued network exhibits Gaussianity via the Grassmann central limit theorem, reproducing free Dirac field propagators; finite width induces higher-order (e.g., four-fermion) interactions, and correlated weight distributions enable Yukawa couplings (Frank et al., 20 Nov 2025).
Examples and applications: These constructions admit use in quantum chemistry (modeling antisymmetry of fermions), cohomological/topological data analysis, and machine-learned QFT, providing a natural framework for Pauli-exclusion or fermionic statistics not achievable with classical real-valued models (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025).
6. Backpropagation, Numerical Techniques, and Implementation
Both manifold and algebraic Grassmann-valued networks require specialized numerical techniques.
Manifold networks:
- Gradient projection: Compute the Euclidean gradient, then project to the horizontal tangent space: for Stiefel coordinates (Bendokat et al., 2020, Nguyen et al., 29 May 2024).
- Exponential/log maps: Forward and backward passes require Riemannian exponential (for retraction/parameter updates) and logarithm (for mapping-between-point computations), implemented analytically or via differentiable SVD-based routines.
- QR/SVD retractions: Approximations via QR facilitate computational efficiency in large-scale networks, with complexity for .
Backpropagation through multistep operations (e.g., wFM, gyro-additions, exp/log) is supported in deep learning frameworks via differentiation of matrix exponentials, logarithms, and SVD operations (Bendokat et al., 2020, Chakraborty et al., 2018, Nguyen et al., 2023).
Algebraic networks:
- Sign management, parity bookkeeping, and extraction of the even loss part are integral.
- In field-theoretic constructions, explicit Berezin integration and cumulant expansion manage the transition from Grassmann-valued outputs to real-number observables (Frank et al., 20 Nov 2025).
7. Applications, Empirical Evidence, and Research Directions
Grassmann-valued networks demonstrate empirical superiority or strong regularization in multiple domains:
- Vision and graphics: Video-based recognition, subspace tracking, image-set recognition—Grassmannian deep networks outperform kernel or tangent-space methods by $6$– in accuracy in manifold-adapted tasks (Huang et al., 2016, Chakraborty et al., 2018).
- Reduced-order modeling: Networks with Grassmann layers outperform polynomial ridge, Gaussian-process, and LASSO regression in data-scarce settings typical of engineering and physics simulations (Bollinger et al., 2020).
- Graph-based learning: Message-passing and GCN analogues on Grassmann manifolds (Gr-GCN++) exhibit improved accuracy on standard citation and transportation benchmarks versus Euclidean, hyperbolic, and SPD-manifold architectures (Nguyen et al., 29 May 2024, Nguyen et al., 2023).
- Physics and quantum chemistry: Grassmann/SUSY-valued neural networks model fermionic statistics, encode antisymmetric patterns, and serve as laboratory QFTs for studying renormalization, non-Gaussian corrections, and SUSY breaking (Frank et al., 20 Nov 2025, Shaska, 26 Jul 2024).
Open directions include computational scaling for large or , extension to other matrix or group manifolds, rigorous treatment of loss landscapes on curved parameter spaces, and further exploitation of algebraic and graded structures for learning with symmetry or topological constraints (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025, Bendokat et al., 2020, Nguyen et al., 29 May 2024).
References:
- (Huang et al., 2016) Building Deep Networks on Grassmann Manifolds
- (Chakraborty et al., 2018) ManifoldNet: A Deep Network Framework for Manifold-valued Data
- (Bendokat et al., 2020) A Grassmann Manifold Handbook
- (Bollinger et al., 2020) Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers
- (Nguyen et al., 2023) Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach
- (Nguyen et al., 29 May 2024) Matrix Manifold Neural Networks++
- (Shaska, 26 Jul 2024) Artificial Neural Networks on Graded Vector Spaces
- (Frank et al., 20 Nov 2025) Fermions and Supersymmetry in Neural Network Field Theories
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free