Papers
Topics
Authors
Recent
2000 character limit reached

Siegel Neural Networks

Updated 14 November 2025
  • Siegel neural networks are discriminative architectures defined on Siegel spaces, which generalize SPD matrices and complex-hyperbolic geometry.
  • They employ novel formulations for multiclass logistic regression and fully-connected layers with Riemannian optimization, achieving state-of-the-art performance on radar clutter and node classification tasks.
  • The design leverages closed-form layer constructions and group symmetries, but faces challenges in parameter efficiency and computational overhead.

Siegel neural networks are a class of discriminative architectures defined over Siegel spaces: Riemannian symmetric spaces (RSS) generalizing both symmetric positive definite (SPD) matrices and complex-hyperbolic geometry. By leveraging the quotient structure and symmetries of Siegel upper half-spaces SHm\mathbb{SH}_m, these networks enable learning and classification with data that naturally reside on disconnected or highly curved geometric domains. Siegel neural networks introduce new formulations for multiclass logistic regression (MLR) and fully-connected (FC) layers, allowing end-to-end training with Riemannian optimization tools. The approach yields state-of-the-art performance on radar clutter classification and node classification tasks.

1. Geometric Foundation: The Siegel Upper Half-Space

The Siegel upper half-space of complex dimension mm is defined as

$\mathbb{SH}_m = \left\{x = u + iv \,\,\Big|\,\, u \in \Sym_m, \, v \in \Sym_m^+ \right\},$

where $\Sym_m$ denotes the m×mm \times m real symmetric matrices and $\Sym_m^+$ denotes symmetric positive definite matrices of the same size.

Siegel spaces possess a transitive isometric action by the real symplectic group

$\Sp_{2m} = \left\{\begin{pmatrix} a & b \ c & d \end{pmatrix}: ab^T = ba^T,\, cd^T = dc^T,\, ad^T - bc^T = I_m \right\},$

through generalized Möbius transformations: s=[ab cd]:x(ax+b)(cx+d)1.s = \begin{bmatrix} a & b \ c & d \end{bmatrix}: \quad x \mapsto (a x + b)(c x + d)^{-1}. The stabilizer of iImiI_m is $\SpO_{2m} = \Sp_{2m} \cap O_{2m}$, making the symmetric space realization explicit: $\mathbb{SH}_m \cong \Sp_{2m} / \SpO_{2m},$ with rank mm and nonpositive sectional curvature.

A canonical GG-invariant metric on SHm\mathbb{SH}_m is: ds2x=u+iv=2Tr(v1dxv1dx).ds^2|_{x = u + iv} = 2\,\mathrm{Tr} \left( v^{-1} dx\, v^{-1} d\overline{x} \right). For x(t)=u(t)+iv(t)x(t) = u(t) + i v(t),

x˙x2=2Tr(v1u˙v1u˙+v1v˙v1v˙).\| \dot{x} \|_x^2 = 2 \,\mathrm{Tr}\left( v^{-1} \dot{u} \, v^{-1} \dot{u} + v^{-1} \dot{v} \, v^{-1} \dot{v} \right).

On any noncompact RSS X=G/KX = G/K, one defines a vector-valued (Weyl-chamber-valued) distance dΔ(x,y)Δd_\Delta(x, y) \in \Delta as the GG-congruence-invariant translation in a fixed maximal flat. This metric structure underpins the network constructions.

2. Layer Construction on SHm\mathbb{SH}_m

2.1 Multiclass Logistic Regression (MLR)

In Euclidean settings, MLR relies on linear scoring: p(y=jx)=exp(aj,xbj)k=1Mexp(ak,xbk).p(y=j|x) = \frac{\exp(\langle a_j, x \rangle - b_j)}{ \sum_{k=1}^M \exp(\langle a_k, x \rangle - b_k)}. This is interpreted as proportional to the exponential of the signed distance from xx to a class hyperplane.

For SHm\mathbb{SH}_m, two MLR constructions are defined:

(i) Quotient-Structure MLR (QMLR)

A class hyperplane is parameterized by points aj,pjSHma_j, p_j \in \mathbb{SH}_m. The signed distance (Thm 2.1) is

dˉ(x,Haj,pj)=log(ϕ(pj)1ϕ(x)ϕ(x)Tϕ(pj)T),log(ϕ(aj)ϕ(aj)T)log(ϕ(aj)ϕ(aj)T),\bar d\left( x, \mathcal{H}_{a_j, p_j} \right) = \frac{ \big| \langle \log(\phi(p_j)^{-1} \phi(x) \phi(x)^T \phi(p_j)^{-T}),\, \log(\phi(a_j)\phi(a_j)^T) \rangle \big| } { \| \log(\phi(a_j) \phi(a_j)^T) \| },

where

$\phi(\cdot): u + iv \mapsto \begin{bmatrix} v^{1/2} & u v^{-1/2} \ 0 & v^{-1/2} \end{bmatrix} \in \Sp_{2m}.$

Class scores and probabilities are then: sj(x)=signpjx,ajlog(ϕ(aj)ϕ(aj)T)dˉ(x,Haj,pj),s_j(x) = \mathrm{sign} \langle \ominus p_j \oplus x, a_j \rangle \cdot \| \log(\phi(a_j)\phi(a_j)^T) \| \cdot \bar d(x, \mathcal H_{a_j,p_j}),

p(y=jx)exp(sj(x)).p(y=j|x) \propto \exp(s_j(x)).

(ii) Vector-Valued-Distance MLR (VMLR)

Fix a direction ajΔa_j\in\Delta (Weyl chamber) and basepoint pjSHmp_j\in\mathbb{SH}_m. Define

Hξj,pj={x:dΔ(x,pj),aj=0}.\mathcal H_{\xi_j, p_j} = \{ x : \langle d_\Delta(x, p_j), a_j \rangle = 0 \}.

The distance upper bound (Prop 2.7) is

dˉ(x,Hξj,pj)dΔ(x,pj),aj.\bar d(x, \mathcal{H}_{\xi_j, p_j}) \leq \langle d_\Delta(x, p_j), a_j \rangle.

Set

sj(x)=±dΔ(x,pj),aj;p(y=jx)exp(sj(x)).s_j(x) = \pm\,\langle d_\Delta(x, p_j), a_j \rangle;\quad p(y=j|x) \propto \exp(s_j(x)).

In both cases, the cross-entropy loss is

L=1Ni=1Nlogp(yixi).\mathcal{L} = -\frac{1}{N} \sum_{i=1}^N \log p(y_i | x_i).

2.2 Fully-Connected (FC) Layers

Two FC designs are given for SHm\mathbb{SH}_m:

(i) Affine via Group Action (AFC)

Let weights be (a,b)(a, b) with $a\in\Sym_m$, $b\in\Sym_m^+$: ϕ(a+ib)=(b1/2ab1/2 0b1/2),\phi(a+ib) = \begin{pmatrix} b^{1/2} & a b^{-1/2} \ 0 & b^{-1/2} \end{pmatrix},

x=u+ivt=(b1/2ub1/2+a)+i(b1/2vb1/2).x = u + iv \mapsto t = (b^{1/2} u b^{1/2} + a) + i (b^{1/2} v b^{1/2}).

(ii) Dimensionality-Reducing FC (DFC)

Let bStm,m2b\in \mathrm{St}_{m,m_2} (Stiefel), $a\in \Sym_{m_2}$: t=(bTub+a)+i(bTvb)SHm2.t = (b^T u b + a) + i (b^T v b) \in \mathbb{SH}_{m_2}.

Pointwise nonlinearities, such as an SPD-valued ReLU on the imaginary part, follow these mappings.

3. Training Procedures and Riemannian Optimization

3.1 Riemannian Backpropagation

Parameters may reside in vector spaces or on manifolds:

  • For $b \in \Sym_m^+$ (SPD): Project gradients onto $\Sym_m$ and update via exponential retraction:

$b \leftarrow \Exp_b\left(-\eta \nabla_b \mathcal{L}\right),\quad \Exp_b(H) = b^{1/2} \exp(b^{-1/2} H b^{-1/2}) b^{1/2}.$

  • For Stiefel bStm,m2b \in \mathrm{St}_{m,m_2}: gradient step in Rm×m2\mathbb{R}^{m \times m_2}, then QR re-orthonormalization.
  • For SHm\mathbb{SH}_m points: compute a tangent gradient (via Jacobians of distance) and retract (via group action or geodesic) to manifold.

Standard Riemannian optimizers, such as Riemannian SGD or Riemannian Adam (e.g., Geoopt), can be directly utilized with conventional hyperparameters.

3.2 Regularization and Projection

No additional regularization is required beyond maintaining parameter feasibility via manifold-valued retractions. Optional penalties on tangent-space parameters, such as the spectral or Frobenius norm (aF2\| a \|^2_F, aj2\| a_j \|^2), can control model complexity.

4. Empirical Performance and Evaluation

4.1 Applications and Experimental Setup

Radar clutter classification: Uses simulated autoregressive (AR) Gaussian time series in Cm\mathbb{C}^m (order qq), summarized as $(\tilde p_0, z_1, ..., z_{q-1}) \in \Sym_m^+ \times \mathbb{SH}_m^{q-1}$. Four datasets with (m,q)(m,q) = (3,2), (4,2), (5,2), (6,2), varying sample sizes. Network: one FC (AFC or DFC) layer mapping to SHm\mathbb{SH}_m, followed by QMLR. Training: Riemannian Adam, learning rate 1e-3, batch size 32, 80 epochs.

Node classification: Datasets (Glass, Iris, Zoo from UCI) are small graphs. All-pairs “ground-truth” cosine distances are embedded into SH6\mathbb{SH}_6 by minimizing

Lembed=ij(dSH(xi,xj)dG(i,j))21.\mathcal{L}_{\text{embed}} = \sum_{ij} \left| \left(\frac{d_{\mathbb{SH}}(x_i, x_j)}{d_G(i, j)}\right)^2 - 1 \right|.

Network: AFC \to QMLR or VMLR. Training: Riemannian Adam, learning rate 1e-3, 100 epochs.

4.2 Key Quantitative Results

Table 1. Radar Clutter Classification (mean ± std over 10 runs)

Method Dataset 1 Dataset 2 Dataset 3 Dataset 4
kNN (Kähler dist.) 76.22 93.00 76.75 73.20
SPDNet [17] 63.44 41.50 45.88 66.80
SiegelNet–AFC–QMLR (Ours) 80.94 96.50 91.00 85.60

Table 2. Node Classification

Method Glass Iris Zoo
kNN 29.65 31.66 33.33
LogEig [21] 41.54 34.33 51.04
SiegelNet–BFC–BMLR [25] 41.12 37.26 48.12
SiegelNet–AFC–QMLR (Ours) 45.79 38.20 53.37

Siegel neural networks demonstrate superior performance across all datasets compared to SPD-based and kNN baselines.

5. Analysis, Limitations, and Prospects

5.1 Advantages

  • Expressivity: Siegel spaces naturally generalize SPD and complex-hyperbolic settings, enabling the representation of intricate correlations and dependencies.
  • Closed-form FC layers: The symplectic group action allows explicit formulae for affine mappings within the space.
  • Empirical results: State-of-the-art accuracy on radar signal and node classification benchmarks.

5.2 Limitations

  • Parameter efficiency: QMLR structure requires two points per class, effectively doubling the parameter count relative to Euclidean and SPD analogues.
  • Computational overhead: Riemannian distance calculations involve eigen-decompositions and matrix logarithms. Retractions and Cayley transforms further increase computation.
  • Curvature restriction: Only nonpositive curvature is supported; thus, structures with intrinsic positive curvature are not accommodated.
  • Architectural scope: Convolutional, batch-normalization, pooling, and attention layers on SHm\mathbb{SH}_m have not been developed.

5.3 Potential Extensions

  • Compact MLR: Design of more parameter-efficient Siegel hyperplane representations.
  • Convolutional layers: Definition of local Siegel-valued filters via horospheres or KK-equivariant constructions.
  • Horospherical nonlinearities: Proposals to mimic ReLU via projections onto convex Weyl chambers.
  • Generative models: Development of Riemannian normalizing flows on SHm\mathbb{SH}_m for generative modeling.
  • Hybrid manifolds: Integration of Siegel spaces with other curvature components in product manifold networks.

Siegel neural networks formalize geometric deep learning within a rich family of symmetric spaces, providing theoretical and practical advances for data with complex intrinsic geometry.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Siegel Neural Networks.