ResNet18-SPD: Integrating SPD Geometry

Updated 28 December 2025

The paper introduces an end-to-end ResNet18-SPD pipeline that replaces standard pooling with SPD matrix transformations to enhance structured data representations.
Key methodology leverages Gaussian RBF kernel aggregation and vectorization to project convolutional features onto the SPD manifold with robust normalization.
SPD regularization via selective projection decay maintains pre-trained biases, thereby improving out-of-distribution accuracy and stability.

ResNet18-SPD refers to the class of architectures and methods that combine the ResNet-18 backbone with specialized mechanisms for learning with Symmetric Positive Definite (SPD) matrices or for leveraging Selective Projection Decay (SPD) regularization. Two primary research threads anchor this designation: (1) geometric deep learning frameworks that treat SPD matrices as fundamental representation objects, and (2) optimization strategies that employ selective projection decay to constrain fine-tuning drift in foundation models, including in ResNet-18.

1. Architectural Foundations: SPD Matrix Networks and ResNet-18

Networks operating on SPD matrices exploit the structure of the manifold $\mathrm{SPD}^n$ , the set of $n \times n$ real symmetric positive definite matrices. Such architectures replace traditional neural operations with layers that respect and propagate Riemannian geometry, resulting in improved representations, especially for structured data modalities (e.g., covariance features).

The ResNet-18 architecture is a canonical residual network characterized by four convolutional stages (conv2_x through conv5_x), designed for deep feature learning via skip connections. The integration of SPD layers into ResNet-18 results in a “ResNet18-SPD” model, in which convolutional features are aggregated and projected into the SPD manifold for subsequent transformation and vectorization (Gao et al., 2017). This paradigm replaces the standard global average pooling and fully connected head with an SPD-centric pipeline.

2. SPD Manifold Layers: Generation, Transformation, and Vectorization

The construction of ResNet18-SPD relies on a sequence of three key layers after the ResNet-18 feature extractor:

Nonlinear Kernel Aggregation Layer: Following the last convolutional block, the feature tensor $X \in \mathbb{R}^{C \times H \times W}$ is reshaped to $M \in \mathbb{R}^{C \times N}$ , with $N = HW$ . The SPD matrix $K$ is formed as a Gaussian RBF kernel, $K_{ij} = \exp(-\|f_i - f_j\|_2^2 / (2\sigma^2))$ , guaranteeing $K \in \mathrm{SPD}^C$ by Mercer’s theorem.
SPD Matrix Transformation Layer: The feature map is projected into a lower-dimensional SPD manifold using $S = W^\top K W$ , $W \in \mathbb{R}^{C \times d}$ . Orthogonality ( $W^\top W = I_d$ ) is typically enforced for numerical stability and Riemannian compatibility. This produces a compact SPD representation in $\mathrm{SPD}^d$ .
Vectorization and Normalization Layer: The resulting SPD matrix $S$ is vectorized via stacking of its unique entries (scaled upper triangle) and subjected to power normalization ( $v_i \gets \mathrm{sign}(v_i)\sqrt{|v_i|}$ ) and $\ell_2$ normalization ( $v \gets v/\|v\|_2$ ) (Gao et al., 2017).

The overall forward flow thus replaces the canonical ResNet-18 terminal layers with an end-to-end differentiable SPD pipeline.

3. Adaptive Log-Euclidean Metrics (ALEMs) and SPD Learning Dynamics

The integration of Adaptive Log-Euclidean Metrics (ALEMs) further enhances SPDNet-style blocks. The ALEM generalizes the fixed Log-Euclidean Metric (LEM) by introducing a learnable base vector $\alpha$ , yielding an adapted eigen-logarithm $\varphi(P) = U \operatorname{diag}(\log_{a_1}(\sigma_1), \dots, \log_{a_n}(\sigma_n)) U^\top$ for $P = U \operatorname{diag}(\sigma_1, \ldots, \sigma_n) U^\top$ (Chen et al., 2023).

This metric is parametrized via three options (RELU, DIV, MUL), with the “MUL” approach showing the most robust empirical behavior. The parameterization and its variants can be fully integrated into backpropagation, admitting closed-form forward, geodesic, and gradient computations via the Daleckii–Krein formula. The ALEM geodesic is the Frobenius norm of the difference of “generalized logs”:

$d_{\mathrm{ALEM}}(P, Q) = \|\varphi(P) - \varphi(Q)\|_F$

ALEMs are universally applicable to SPDNet architectures, including those adopting a ResNet-18-style backbone, replacing any LogEig layer with an ALog layer with minimal additional compute.

4. SPD Regularization: Selective Projection Decay in ResNet-18

The Selective Projection Decay (SPD) method provides an optimization framework for fine-tuning, which selectively penalizes layers whose gradients are anti-aligned with their drift from the pre-trained initialization (Tian et al., 2024). Applied to ResNet-18, the procedure operates layer-wise as follows:

Let $\theta_{t}^{(i)}$ be parameters of layer $i$ at step $t$ , and $\theta_0^{(i)}$ their initialization.
Compute the AdamW provisional update $\tilde{\theta}_t^{(i)}$ .
Evaluate the selection condition $c_t^{(i)} := - (g_t^{(i)})^\top (\theta_{t-1}^{(i)} - \theta_0^{(i)})$ , where $g_t^{(i)}$ is the layer gradient.
If $c_t^{(i)} < 0$ , project $\tilde{\theta}_t^{(i)}$ back towards initialization proportional to the local “deviation ratio”:

$r_t^{(i)} = \frac{\max \{ 0, \gamma_t^{(i)} - \gamma_{t-1}^{(i)} \} }{ \gamma_t^{(i)} }$

with $\gamma_t^{(i)} = \|\tilde{\theta}_t^{(i)} - \theta_0^{(i)}\|_2$ .

This targeted penalization maintains strong inductive bias by controlling the net drift from pre-training and has been found to enhance out-of-distribution generalization and preserve or improve in-distribution accuracy on typical benchmarks.

Optimizer	ID Acc (%)	OOD Avg Acc (%)	Deviation $\\|\theta-\theta_0\\|_2$
AdamW	95.1	71.2	0.82
AdamW + L2-SP	94.3	70.0	0.55
AdamW + SPD	95.3	75.4	0.48

Key empirical findings indicate that SPD regularization produces a ≈4 point rise in OOD accuracy without harming ID performance (Tian et al., 2024).

5. Implementation and Training Protocols

For geometric SPD networks based on ResNet-18 (Gao et al., 2017), the typical protocol is as follows: standard ResNet-18 (through conv-layer4), optional $1\times1$ convolution-with-ReLU channel reduction, SPD kernel aggregation, SPD matrix transformation to dimension $d$ , normalized vectorization, final fully connected classification head. Closed-form gradients are available for each block, enabling end-to-end training in most deep learning frameworks.

When leveraging Selective Projection Decay, parameters are grouped by stage (e.g., stem, stages2–5, fc), with SPD selection computed per group per step. Typical hyperparameter choices for SPD fine-tuning are AdamW (base LR $3\times 10^{-4}$ , weight decay $10^{-2}$ ), SPD strength $\lambda=1.0$ (tuned in $\{0.5, 1.0, 1.5\}$ ), batch size = 128 per GPU, with aggressive data augmentation (MixUp, CutMix), label smoothing, and training for 200 epochs with cosine annealing.

6. Empirical Impact and Open Directions

ALEM-augmented SPD networks have demonstrated performance gains up to 1–3% absolute accuracy versus fixed LEM baselines on action, gesture, and emotion datasets, when plugged into SPDNet-style architectures (Chen et al., 2023). The integration of SPD pipelines into ResNet-18 permits direct exploitation of SPD geometry for vision tasks.

Selective Projection Decay, while initially motivated by foundation model robustness, provides a general mechanism for selectively regularizing adaptation in ResNet architectures. The evidence suggests a strong linkage between net parameter deviation control and OOD generalization—motivating future research on more expressive selection schemes or hybrid geometric-regularization models.

A plausible implication is that advances in SPD geometry for network architectures and selective regularization for optimization are synergistic: both preserve structure, stabilize learning, and enhance robustness, and together define the contemporary landscape of “ResNet18-SPD” methodologies.