Siegel Neural Networks
- Siegel neural networks are discriminative architectures defined on Siegel spaces, which generalize SPD matrices and complex-hyperbolic geometry.
- They employ novel formulations for multiclass logistic regression and fully-connected layers with Riemannian optimization, achieving state-of-the-art performance on radar clutter and node classification tasks.
- The design leverages closed-form layer constructions and group symmetries, but faces challenges in parameter efficiency and computational overhead.
Siegel neural networks are a class of discriminative architectures defined over Siegel spaces: Riemannian symmetric spaces (RSS) generalizing both symmetric positive definite (SPD) matrices and complex-hyperbolic geometry. By leveraging the quotient structure and symmetries of Siegel upper half-spaces , these networks enable learning and classification with data that naturally reside on disconnected or highly curved geometric domains. Siegel neural networks introduce new formulations for multiclass logistic regression (MLR) and fully-connected (FC) layers, allowing end-to-end training with Riemannian optimization tools. The approach yields state-of-the-art performance on radar clutter classification and node classification tasks.
1. Geometric Foundation: The Siegel Upper Half-Space
The Siegel upper half-space of complex dimension is defined as
$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$
where $\Sym_m$ denotes the real symmetric matrices and $\Sym_m^+$ denotes symmetric positive definite matrices of the same size.
Siegel spaces possess a transitive isometric action by the real symplectic group
$\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$
through generalized Möbius transformations: The stabilizer of is $\SpO_{2m} = \Sp_{2m} \cap O_{2m}$, making the symmetric space realization explicit: $\mathbb{SH}_m \cong \Sp_{2m} / \SpO_{2m},$ with rank and nonpositive sectional curvature.
A canonical -invariant metric on is: For ,
On any noncompact RSS , one defines a vector-valued (Weyl-chamber-valued) distance as the -congruence-invariant translation in a fixed maximal flat. This metric structure underpins the network constructions.
2. Layer Construction on
2.1 Multiclass Logistic Regression (MLR)
In Euclidean settings, MLR relies on linear scoring: This is interpreted as proportional to the exponential of the signed distance from to a class hyperplane.
For , two MLR constructions are defined:
(i) Quotient-Structure MLR (QMLR)
A class hyperplane is parameterized by points . The signed distance (Thm 2.1) is
where
$\phi(\cdot): u + iv \mapsto \begin{bmatrix} v^{1/2} & u v^{-1/2} \ 0 & v^{-1/2} \end{bmatrix} \in \Sp_{2m}.$
Class scores and probabilities are then:
(ii) Vector-Valued-Distance MLR (VMLR)
Fix a direction (Weyl chamber) and basepoint . Define
The distance upper bound (Prop 2.7) is
Set
In both cases, the cross-entropy loss is
2.2 Fully-Connected (FC) Layers
Two FC designs are given for :
(i) Affine via Group Action (AFC)
Let weights be with $a\in\Sym_m$, $b\in\Sym_m^+$:
(ii) Dimensionality-Reducing FC (DFC)
Let (Stiefel), $a\in \Sym_{m_2}$:
Pointwise nonlinearities, such as an SPD-valued ReLU on the imaginary part, follow these mappings.
3. Training Procedures and Riemannian Optimization
3.1 Riemannian Backpropagation
Parameters may reside in vector spaces or on manifolds:
- For $b \in \Sym_m^+$ (SPD): Project gradients onto $\Sym_m$ and update via exponential retraction:
$b \leftarrow \Exp_b\left(-\eta \nabla_b \mathcal{L}\right),\quad \Exp_b(H) = b^{1/2} \exp(b^{-1/2} H b^{-1/2}) b^{1/2}.$
- For Stiefel : gradient step in , then QR re-orthonormalization.
- For points: compute a tangent gradient (via Jacobians of distance) and retract (via group action or geodesic) to manifold.
Standard Riemannian optimizers, such as Riemannian SGD or Riemannian Adam (e.g., Geoopt), can be directly utilized with conventional hyperparameters.
3.2 Regularization and Projection
No additional regularization is required beyond maintaining parameter feasibility via manifold-valued retractions. Optional penalties on tangent-space parameters, such as the spectral or Frobenius norm (, ), can control model complexity.
4. Empirical Performance and Evaluation
4.1 Applications and Experimental Setup
Radar clutter classification: Uses simulated autoregressive (AR) Gaussian time series in (order ), summarized as $(\tilde p_0, z_1, ..., z_{q-1}) \in \Sym_m^+ \times \mathbb{SH}_m^{q-1}$. Four datasets with = (3,2), (4,2), (5,2), (6,2), varying sample sizes. Network: one FC (AFC or DFC) layer mapping to , followed by QMLR. Training: Riemannian Adam, learning rate 1e-3, batch size 32, 80 epochs.
Node classification: Datasets (Glass, Iris, Zoo from UCI) are small graphs. All-pairs “ground-truth” cosine distances are embedded into by minimizing
Network: AFC QMLR or VMLR. Training: Riemannian Adam, learning rate 1e-3, 100 epochs.
4.2 Key Quantitative Results
Table 1. Radar Clutter Classification (mean ± std over 10 runs)
| Method | Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 |
|---|---|---|---|---|
| kNN (Kähler dist.) | 76.22 | 93.00 | 76.75 | 73.20 |
| SPDNet [17] | 63.44 | 41.50 | 45.88 | 66.80 |
| SiegelNet–AFC–QMLR (Ours) | 80.94 | 96.50 | 91.00 | 85.60 |
Table 2. Node Classification
| Method | Glass | Iris | Zoo |
|---|---|---|---|
| kNN | 29.65 | 31.66 | 33.33 |
| LogEig [21] | 41.54 | 34.33 | 51.04 |
| SiegelNet–BFC–BMLR [25] | 41.12 | 37.26 | 48.12 |
| SiegelNet–AFC–QMLR (Ours) | 45.79 | 38.20 | 53.37 |
Siegel neural networks demonstrate superior performance across all datasets compared to SPD-based and kNN baselines.
5. Analysis, Limitations, and Prospects
5.1 Advantages
- Expressivity: Siegel spaces naturally generalize SPD and complex-hyperbolic settings, enabling the representation of intricate correlations and dependencies.
- Closed-form FC layers: The symplectic group action allows explicit formulae for affine mappings within the space.
- Empirical results: State-of-the-art accuracy on radar signal and node classification benchmarks.
5.2 Limitations
- Parameter efficiency: QMLR structure requires two points per class, effectively doubling the parameter count relative to Euclidean and SPD analogues.
- Computational overhead: Riemannian distance calculations involve eigen-decompositions and matrix logarithms. Retractions and Cayley transforms further increase computation.
- Curvature restriction: Only nonpositive curvature is supported; thus, structures with intrinsic positive curvature are not accommodated.
- Architectural scope: Convolutional, batch-normalization, pooling, and attention layers on have not been developed.
5.3 Potential Extensions
- Compact MLR: Design of more parameter-efficient Siegel hyperplane representations.
- Convolutional layers: Definition of local Siegel-valued filters via horospheres or -equivariant constructions.
- Horospherical nonlinearities: Proposals to mimic ReLU via projections onto convex Weyl chambers.
- Generative models: Development of Riemannian normalizing flows on for generative modeling.
- Hybrid manifolds: Integration of Siegel spaces with other curvature components in product manifold networks.
Siegel neural networks formalize geometric deep learning within a rich family of symmetric spaces, providing theoretical and practical advances for data with complex intrinsic geometry.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free