SCAP: Spectral Complex Autoencoder Pruning

Updated 21 January 2026

SCAP is a reconstruction-based structured pruning method that quantifies channel redundancy through spectral reconstruction fidelity of complex interaction fields.
It constructs complex-valued descriptors by coupling input activations with resized output channels and applies FFT along with a low-capacity autoencoder to expose redundancy.
SCAP enables aggressive CNN compression, achieving over 90% reduction in FLOPs and parameters while maintaining controlled accuracy loss post fine-tuning.

Spectral Complex Autoencoder Pruning (SCAP) is a reconstruction-based structured pruning methodology designed for channel-level compression of convolutional neural networks (CNNs). SCAP uniquely quantifies channel redundancy via spectral reconstruction fidelity of layer-local complex-valued descriptors (termed complex interaction fields) and supports aggressive network compression—often exceeding 90% reduction in both FLOPs and parameters—while maintaining controlled accuracy loss after fine-tuning (Liu et al., 14 Jan 2026).

1. Definition and Purpose

SCAP operates as a channel importance criterion aimed at identifying, ranking, and pruning output channels whose input–output interaction (quantified as a spatially organized complex-valued field) lies close to a low-dimensional spectral manifold captured by a deliberately low-capacity autoencoder. Its principal motivation is to provide a functionally grounded, data-driven metric of channel compressibility. Rather than relying solely on activation strength or heuristic statistics, SCAP seeks to measure whether the information encoded by a given output channel is redundant given the corresponding full layer input and CNN architecture. Under this criterion, only channels projecting far from the learned manifold are preserved, as they ostensibly encode unique, non-degenerate representations.

2. Construction of Complex Interaction Fields

For each convolutional layer $\ell$ with input activation $X^{(\ell)} \in \mathbb{R}^{B \times C_{\text{in}} \times H \times W}$ and output activation $Y^{(\ell)} \in \mathbb{R}^{B \times C_{\text{out}} \times H_1 \times W_1}$ , SCAP constructs a complex-valued descriptor for every output channel $k$ :

$Z^{(\ell)}_k = X^{(\ell)} + i \cdot \text{Broadcast}( \text{Resize}( Y^{(\ell)}_k ) )$

where $\text{Resize}( Y^{(\ell)}_k )$ uses bilinear interpolation to match the spatial resolution of $X^{(\ell)}$ , followed by channel-wise broadcasting. The real part of $Z^{(\ell)}_k$ is the full input activation, and the imaginary part is the single output channel replicated across all input channels. This forms a layer-local input–output coupling that preserves fine-grained spatial and channel correlations.

3. Spectral Domain Encoding and Autoencoder Training

To expose structural dependencies in the interaction field, SCAP applies a 2D discrete Fourier transform:

$F^{(\ell)}_k = \text{FFT2}( Z^{(\ell)}_k ) \in \mathbb{C}^{B \times C_{\text{in}} \times H \times W}$

real and imaginary parts are batched, standardized (zero-mean, unit-variance per batch), and flattened for processing:

$\widetilde{F}^R$ and $\widetilde{F}^I$ each reshaped as $(B \cdot C_{\text{in}}) \times N$ (with $N=H \cdot W$ ). A low-capacity MLP-based autoencoder $g_{\phi_\ell}$ , with bottleneck dimension $N/4$ , is then trained to reconstruct the normalized spectra—using mean-squared error loss in spectral space—so as to represent redundancy among channel interaction fields without overfitting.

4. Channel Importance Scoring and ℓ₁-Norm Fusion

During scoring, SCAP reconstructs channel interaction fields from their spectral encodings, inverts the FFT, and computes:

The cosine similarity (averaged over mini-batches) between flattened original and reconstructed fields:

$\text{Fid}_k^{(\ell)} = \frac{1}{B} \sum_{b=1}^B \left| \frac{\langle v_{k,b}, \hat v_{k,b} \rangle}{\|v_{k,b}\|_2 \| \hat v_{k,b}\|_2} \right|$

where $v_{k,b}, \hat v_{k,b} \in \mathbb{R}^D$ are flattened real+imag representations. Scores lie in $[0,1]$ ; higher fidelity ( $\text{Fid}$ near 1) implies that a channel's contribution is redundant (i.e., located close to the manifold captured by the autoencoder), while low fidelity denotes a unique or "uncompressible" channel.

To address the scale-invariance of pure fidelity, SCAP optionally fuses this score with a normalized filter $\ell_1$ -magnitude:

$I_{k,\ell_1}^{(\ell)} = \|W_k^{(\ell)}\|_1 / (\max_j \|W_j^{(\ell)}\|_1 + \epsilon)$
Final importance: $I_k^{(\ell)} = \Psi(1-\text{Fid}_k^{(\ell)}, I_{k,\ell_1}^{(\ell)})$ , with $\Psi$ as an additive (default), multiplicative, or power-multiplicative fusion. Additive fusion provides favorable accuracy–compression trade-offs and robust per-layer pruning.

5. Pruning Regime, Network Restructuring, and Fine-Tuning

Channel importance scores within each layer are normalized to $[0,1]$ . A fixed global threshold $\tau$ (e.g., $0.5$ or $0.6$) is applied to all layers. Channels with $I_k^{(\ell)} < \tau$ are pruned simultaneously, and dependencies are resolved by deleting the corresponding input channels in the subsequent layer—thus preserving network structural consistency. A minimal per-layer safeguard $K_{\text{min}}(\ell)$ is included but not triggered in practice. Rapid fine-tuning (typically 100 epochs) is sufficient to recover accuracy.

Threshold $\tau$	Dataset/Network	FLOP Reduction	Parameter Reduction	Accuracy Drop (absolute)
0.5	VGG16/CIFAR-10	81.85%	92.37%	0.77%
0.6	VGG16/CIFAR-10	90.11%	96.30%	1.67%
0.5	VGG16/CIFAR-100	79.62%	84.65%	3.51%
0.6	VGG16/CIFAR-100	90.70%	92.37%	8.98%

Consistent results are observed for ResNet-56, ResNet-110, and DenseNet-40, with SCAP matching or exceeding state-of-the-art compression-accuracy tradeoffs for extreme ( $>80$ –$90$\%) pruning.

6. Ablation Studies and Methodological Insights

Ablation experiments demonstrate that:

Fidelity-only scoring ("l1-none") already yields high-quality pruning; however, additive fusion with the $\ell_1$ -norm improves solution stability during recovery, particularly for architectures like VGG16 and ResNet-56.
The choice of fusion rule is consequential: additive outperforms multiplicative and power-multiplicative rules in accuracy–compression Pareto efficiency.
Autoencoder capacity is a critical parameter—excessive expressiveness permits memorization, collapsing fidelity scores and eliminating discriminative power.
There is a theoretical bound relating per-channel output distortion $\|Y_k - \widehat Y_k\|_2$ to $\sqrt{1 - \text{Fid}}$ , confirming $1-\text{Fid}$ as a normalized, sign-invariant measure of channel reconstruction error.

7. Significance, Scope, and Limitations

SCAP advances structured pruning by directly measuring channel-level compressibility using complex-valued spectral descriptors and manifold-based reconstruction fidelity. Its design enables:

Application to a broad class of convolutional architectures,
Uniform thresholding across all layers for simplicity,
Effective operation under extreme compression regimes (typically $>90\%$ FLOP/parameter reduction) with minimal retraining.

A notable insight is that SCAP's spectral reconstruction fidelity offers a robust, architecture-agnostic proxy for functional redundancy—even in settings where amplitude-based (e.g., $\ell_1$ -norm) criteria may fail. However, the method's sensitivity to autoencoder capacity and fusion rule selection constitutes an important axis for further investigation. The approach's layer-wise locality and spectral focus distinguish it from tensor decomposition, filter norm, and magnitude-based pruning schemes, offering a new framework for compression research (Liu et al., 14 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Spectral Complex Autoencoder Pruning: A Fidelity-Guided Criterion for Extreme Structured Channel Compression (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spectral Complex Autoencoder Pruning (SCAP).