Papers
Topics
Authors
Recent
2000 character limit reached

Grassmann-Valued Neural Networks

Updated 24 November 2025
  • Grassmann-valued neural networks are architectures where parameters and activations reside on Grassmann manifolds or in Grassmann algebras, encoding subspace invariance and anticommutativity.
  • They employ Riemannian operations—such as QR retraction, weighted Fréchet means, and gyro-addition—to enable backpropagation on curved spaces, enhancing performance in vision, graph, and physics tasks.
  • These models extend into supersymmetric formulations with graded variables and Berezin integration, facilitating the simulation of fermionic field theories and quantum chemistry phenomena.

Grassmann-valued neural networks are a class of machine learning architectures in which parameters, activations, or weights take values in a Grassmann manifold or, in more algebraic extensions, in a Grassmann (exterior) algebra. These models natively respect geometric or algebraic structures characterized by anticommutativity or subspace invariance, enabling representations and transformations that are inaccessible to classical Euclidean or vector-space-based networks. The topic subsumes several directions: (1) manifold-based deep networks operating intrinsically on the Grassmannian of pp-planes in Rn\mathbb{R}^n; (2) physics-inspired neural models with Grassmann-valued (anticommuting) variables for fermionic field theory and supersymmetry; and (3) algebraic approaches extending classical neural architectures to graded spaces or exterior algebras.

1. Fundamentals of the Grassmann Manifold and Grassmann Algebra

The Grassmann manifold Gr(p,n)Gr(p,n) is the space of all pp-dimensional linear subspaces of Rn\mathbb{R}^n. Each point corresponds to an equivalence class of n×pn \times p matrices with orthonormal columns under right action by O(p)O(p), i.e., XRn×pX\in\mathbb{R}^{n\times p} with XTX=IpX^T X = I_p and identification XXQX \sim XQ, QO(p)Q\in O(p). The ambient Riemannian geometry is governed by principal angles between subspaces, with canonical metric gX(H,K)=tr(HTK)g_X(H,K)=\operatorname{tr}(H^T K) for tangent vectors H,KH,K at XX.

Separately, the Grassmann (exterior) algebra Λ(V)\Lambda(V) on a vector space VV is generated by elements θ1,,θm\theta_1,\ldots,\theta_m such that θiθj=θjθi\theta_i\theta_j = -\theta_j\theta_i and θi2=0\theta_i^2=0. Arbitrary elements are linear sums X=x(0)+ixi(1)θi+i<jxij(2)θiθj+X = x^{(0)} + \sum_i x_i^{(1)}\theta_i + \sum_{i<j}x_{ij}^{(2)}\theta_i\theta_j + \dots with even and odd grading.

These structures admit two types of neural architectures:

2. Deep Networks on Grassmann Manifolds

Several architectures generalize standard deep networks to operate intrinsically on Gr(p,n)Gr(p,n). The canonical representation employs orthonormal-basis matrices URn×pU\in\mathbb{R}^{n\times p} or orthogonal projectors P=UUTP=UU^T.

Key building blocks:

  • Full-Rank Mapping (FRMap): Linear transformation Y=WXinY=W X_{in} with WRdout×dinW\in\mathbb{R}^{d_{out}\times d_{in}}, often required to be row-full-rank to preserve subspace dimension.
  • Re-orthonormalization (ReOrth): Post-processing via QR-decomposition to restore the orthonormal-column constraint, Y=QRY = Q R, set Xout=QSt(p,n)X_{out}=Q \in St(p,n), the Stiefel manifold.
  • Projection mapping and pooling: Embedding subspaces as symmetric projectors, M=XXTM=XX^T, and performing pooling via weighted means or coordinatewise aggregation in manifold coordinates.
  • Riemannian backpropagation: Computation of gradients is carried out by first computing Euclidean derivatives and then projecting onto tangent spaces using (IUUT)U(I-UU^T)\,\nabla_U \ell (Stiefel view) or analogous projector-space formulas (Bendokat et al., 2020, Nguyen et al., 29 May 2024, Huang et al., 2016).

ManifoldNet employs weighted Fréchet mean (wFM) operations as intrinsic analogues of convolution and pooling; in this setting, explicit nonlinearities (e.g., ReLU) can be omitted because the wFM layers are strict contractions under the geodesic metric (Chakraborty et al., 2018).

Training: Manifold optimization is performed via Riemannian gradient descent, exponential mapping for retractions, and, when required, re-normalization via QR or SVD.

Empirical results: Models such as GrNet and ManifoldNet demonstrate superior or competitive accuracy to Euclidean and SPD-manifold methods in vision, action recognition, and autoencoding tasks, frequently converging faster and with fewer parameters (Huang et al., 2016, Chakraborty et al., 2018).

3. Grassmann Layers and Shallow Network Integration

For shallow architectures or dimensionality reduction, a prevalent pattern is to impose a Grassmann constraint on the input linear projection, USt(m,k)U\in St(m,k) with UTU=IkU^T U=I_k, and learn this subspace simultaneously with downstream neural parameters.

Formulation:

  • Three-layer architecture: Input xRmx\in\mathbb{R}^m mapped by y=UTxy=U^T x (Grassmann projection), followed by a Euclidean two-layer (ReLU) surrogate network gθg_\theta (Bollinger et al., 2020).
  • Alternating minimization: Empirical loss L(U,θ)=1Mf(x)gθ(UTx)2+λθ2L(U,\theta)=\frac{1}{M} \sum_\ell \|f(x_\ell) - g_\theta(U^T x_\ell)\|^2 + \lambda \|\theta\|^2 joint over UU and all NN weights, with UU updated by Riemannian gradient and retraction, and θ\theta by classical SGD or ADAM.

Theory: Approximation accuracy is linked to active-subspace theory, with provable error bounds in terms of the trailing eigenvalues of the input-output Jacobian covariance (Bollinger et al., 2020).

Numerics: QR or SVD-based retraction of UU is used after each update. The approach yields strong generalization, particularly in data-scarce regimes in scientific domains such as CFD and aerospace engineering.

4. Gyrovector-Space and Group-Theoretic Grassmann Networks

Group-theoretic generalizations, particularly via gyrovector-space constructions, imbue Grassmannian neural layers with operations analogous to translation, scalar multiplication, and non-Euclidean addition.

Fundamental operations:

  • Gyro-addition (gr\oplus_{gr}): UgrVexp([P,In,p])VU\oplus_{gr}V \equiv \exp([P,In,p])V, with P=LogIn,p(UUT)P=Log_{In,p}(UU^T) and commutator [A,B]=ABBA[A,B]=AB-BA (Nguyen et al., 2023).
  • Grassmann batch normalization: Centering by Fréchet mean in the manifold, U \to μgrU\mu \oplus_{gr} U.
  • Graph and convolutional Grassmann layers: Node features UiGr(p,n)U_i\in Gr(p,n) are aggregated using weighted gyro-addition, and message passing aggregates in the tangent space before exponential mapping back to the manifold (Nguyen et al., 29 May 2024).

Gradient computation and optimization are carried out using ambient gradients projected to tangent spaces, retraction via exponential mapping or QR, and backpropagation through matrix logarithm and exponential.

Empirical advantage: Models built on these principles (e.g., GrNet, GyroGr, Gr-GCN++) achieve improved performance in human action recognition (HDM05, NTU60, FPHA), node classification (Cora, Pubmed), and knowledge graph embedding tasks (Nguyen et al., 2023, Nguyen et al., 29 May 2024, Huang et al., 2016).

5. Grassmann Algebra, Supersymmetry, and Fermionic Neural Networks

Grassmann-valued neural networks, as defined in the context of supersymmetry and quantum field theory, incorporate variables and weights in Λ(V)\Lambda(V), supporting anticommutativity and nilpotency (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025).

Construction principles:

  • Super-neuron architecture: Input and outputs are Z2\mathbb{Z}_2-graded (even/odd), and layers are built from block matrices respecting parity. Block-wise multiplication and sign bookkeeping follow graded algebra rules; bias and activations can be defined at the level of superfunctions (e.g., super-ReLU) (Shaska, 26 Jul 2024).
  • Backpropagation: Gradients and error propagation require graded chain rule with sign tracking, and the loss is extracted from the even (scalar) part of the Grassmann-valued functionals.
  • Supersymmetric and fermionic field theory models: The large-width limit of a Grassmann-valued network exhibits Gaussianity via the Grassmann central limit theorem, reproducing free Dirac field propagators; finite width induces higher-order (e.g., four-fermion) interactions, and correlated weight distributions enable Yukawa couplings (Frank et al., 20 Nov 2025).

Examples and applications: These constructions admit use in quantum chemistry (modeling antisymmetry of fermions), cohomological/topological data analysis, and machine-learned QFT, providing a natural framework for Pauli-exclusion or fermionic statistics not achievable with classical real-valued models (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025).

6. Backpropagation, Numerical Techniques, and Implementation

Both manifold and algebraic Grassmann-valued networks require specialized numerical techniques.

Manifold networks:

  • Gradient projection: Compute the Euclidean gradient, then project to the horizontal tangent space: R(U)=(IUUT)U\nabla_R \ell(U) = (I-UU^T)\nabla_U \ell for Stiefel coordinates (Bendokat et al., 2020, Nguyen et al., 29 May 2024).
  • Exponential/log maps: Forward and backward passes require Riemannian exponential (for retraction/parameter updates) and logarithm (for mapping-between-point computations), implemented analytically or via differentiable SVD-based routines.
  • QR/SVD retractions: Approximations via QR facilitate computational efficiency in large-scale networks, with complexity O(np2)O(np^2) for npn \gg p.

Backpropagation through multistep operations (e.g., wFM, gyro-additions, exp/log) is supported in deep learning frameworks via differentiation of matrix exponentials, logarithms, and SVD operations (Bendokat et al., 2020, Chakraborty et al., 2018, Nguyen et al., 2023).

Algebraic networks:

  • Sign management, parity bookkeeping, and extraction of the even loss part are integral.
  • In field-theoretic constructions, explicit Berezin integration and cumulant expansion manage the transition from Grassmann-valued outputs to real-number observables (Frank et al., 20 Nov 2025).

7. Applications, Empirical Evidence, and Research Directions

Grassmann-valued networks demonstrate empirical superiority or strong regularization in multiple domains:

  • Vision and graphics: Video-based recognition, subspace tracking, image-set recognition—Grassmannian deep networks outperform kernel or tangent-space methods by $6$–12%12\% in accuracy in manifold-adapted tasks (Huang et al., 2016, Chakraborty et al., 2018).
  • Reduced-order modeling: Networks with Grassmann layers outperform polynomial ridge, Gaussian-process, and LASSO regression in data-scarce settings typical of engineering and physics simulations (Bollinger et al., 2020).
  • Graph-based learning: Message-passing and GCN analogues on Grassmann manifolds (Gr-GCN++) exhibit improved accuracy on standard citation and transportation benchmarks versus Euclidean, hyperbolic, and SPD-manifold architectures (Nguyen et al., 29 May 2024, Nguyen et al., 2023).
  • Physics and quantum chemistry: Grassmann/SUSY-valued neural networks model fermionic statistics, encode antisymmetric patterns, and serve as laboratory QFTs for studying renormalization, non-Gaussian corrections, and SUSY breaking (Frank et al., 20 Nov 2025, Shaska, 26 Jul 2024).

Open directions include computational scaling for large nn or pp, extension to other matrix or group manifolds, rigorous treatment of loss landscapes on curved parameter spaces, and further exploitation of algebraic and graded structures for learning with symmetry or topological constraints (Shaska, 26 Jul 2024, Frank et al., 20 Nov 2025, Bendokat et al., 2020, Nguyen et al., 29 May 2024).


References:

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Grassmann-Valued Neural Networks.