SCAP: Spectral Complex Autoencoder Pruning
- SCAP is a reconstruction-based structured pruning method that quantifies channel redundancy through spectral reconstruction fidelity of complex interaction fields.
- It constructs complex-valued descriptors by coupling input activations with resized output channels and applies FFT along with a low-capacity autoencoder to expose redundancy.
- SCAP enables aggressive CNN compression, achieving over 90% reduction in FLOPs and parameters while maintaining controlled accuracy loss post fine-tuning.
Spectral Complex Autoencoder Pruning (SCAP) is a reconstruction-based structured pruning methodology designed for channel-level compression of convolutional neural networks (CNNs). SCAP uniquely quantifies channel redundancy via spectral reconstruction fidelity of layer-local complex-valued descriptors (termed complex interaction fields) and supports aggressive network compression—often exceeding 90% reduction in both FLOPs and parameters—while maintaining controlled accuracy loss after fine-tuning (Liu et al., 14 Jan 2026).
1. Definition and Purpose
SCAP operates as a channel importance criterion aimed at identifying, ranking, and pruning output channels whose input–output interaction (quantified as a spatially organized complex-valued field) lies close to a low-dimensional spectral manifold captured by a deliberately low-capacity autoencoder. Its principal motivation is to provide a functionally grounded, data-driven metric of channel compressibility. Rather than relying solely on activation strength or heuristic statistics, SCAP seeks to measure whether the information encoded by a given output channel is redundant given the corresponding full layer input and CNN architecture. Under this criterion, only channels projecting far from the learned manifold are preserved, as they ostensibly encode unique, non-degenerate representations.
2. Construction of Complex Interaction Fields
For each convolutional layer with input activation and output activation , SCAP constructs a complex-valued descriptor for every output channel :
where uses bilinear interpolation to match the spatial resolution of , followed by channel-wise broadcasting. The real part of is the full input activation, and the imaginary part is the single output channel replicated across all input channels. This forms a layer-local input–output coupling that preserves fine-grained spatial and channel correlations.
3. Spectral Domain Encoding and Autoencoder Training
To expose structural dependencies in the interaction field, SCAP applies a 2D discrete Fourier transform:
real and imaginary parts are batched, standardized (zero-mean, unit-variance per batch), and flattened for processing:
- and each reshaped as (with ). A low-capacity MLP-based autoencoder , with bottleneck dimension , is then trained to reconstruct the normalized spectra—using mean-squared error loss in spectral space—so as to represent redundancy among channel interaction fields without overfitting.
4. Channel Importance Scoring and ℓ₁-Norm Fusion
During scoring, SCAP reconstructs channel interaction fields from their spectral encodings, inverts the FFT, and computes:
- The cosine similarity (averaged over mini-batches) between flattened original and reconstructed fields:
where are flattened real+imag representations. Scores lie in ; higher fidelity ( near 1) implies that a channel's contribution is redundant (i.e., located close to the manifold captured by the autoencoder), while low fidelity denotes a unique or "uncompressible" channel.
To address the scale-invariance of pure fidelity, SCAP optionally fuses this score with a normalized filter -magnitude:
- Final importance: , with as an additive (default), multiplicative, or power-multiplicative fusion. Additive fusion provides favorable accuracy–compression trade-offs and robust per-layer pruning.
5. Pruning Regime, Network Restructuring, and Fine-Tuning
Channel importance scores within each layer are normalized to . A fixed global threshold (e.g., $0.5$ or $0.6$) is applied to all layers. Channels with are pruned simultaneously, and dependencies are resolved by deleting the corresponding input channels in the subsequent layer—thus preserving network structural consistency. A minimal per-layer safeguard is included but not triggered in practice. Rapid fine-tuning (typically 100 epochs) is sufficient to recover accuracy.
| Threshold | Dataset/Network | FLOP Reduction | Parameter Reduction | Accuracy Drop (absolute) |
|---|---|---|---|---|
| 0.5 | VGG16/CIFAR-10 | 81.85% | 92.37% | 0.77% |
| 0.6 | VGG16/CIFAR-10 | 90.11% | 96.30% | 1.67% |
| 0.5 | VGG16/CIFAR-100 | 79.62% | 84.65% | 3.51% |
| 0.6 | VGG16/CIFAR-100 | 90.70% | 92.37% | 8.98% |
Consistent results are observed for ResNet-56, ResNet-110, and DenseNet-40, with SCAP matching or exceeding state-of-the-art compression-accuracy tradeoffs for extreme (–$90$\%) pruning.
6. Ablation Studies and Methodological Insights
Ablation experiments demonstrate that:
- Fidelity-only scoring ("l1-none") already yields high-quality pruning; however, additive fusion with the -norm improves solution stability during recovery, particularly for architectures like VGG16 and ResNet-56.
- The choice of fusion rule is consequential: additive outperforms multiplicative and power-multiplicative rules in accuracy–compression Pareto efficiency.
- Autoencoder capacity is a critical parameter—excessive expressiveness permits memorization, collapsing fidelity scores and eliminating discriminative power.
- There is a theoretical bound relating per-channel output distortion to , confirming as a normalized, sign-invariant measure of channel reconstruction error.
7. Significance, Scope, and Limitations
SCAP advances structured pruning by directly measuring channel-level compressibility using complex-valued spectral descriptors and manifold-based reconstruction fidelity. Its design enables:
- Application to a broad class of convolutional architectures,
- Uniform thresholding across all layers for simplicity,
- Effective operation under extreme compression regimes (typically FLOP/parameter reduction) with minimal retraining.
A notable insight is that SCAP's spectral reconstruction fidelity offers a robust, architecture-agnostic proxy for functional redundancy—even in settings where amplitude-based (e.g., -norm) criteria may fail. However, the method's sensitivity to autoencoder capacity and fusion rule selection constitutes an important axis for further investigation. The approach's layer-wise locality and spectral focus distinguish it from tensor decomposition, filter norm, and magnitude-based pruning schemes, offering a new framework for compression research (Liu et al., 14 Jan 2026).