Stack-Based Pruning Algorithm
- Stack-Based Pruning Algorithm is a method that trims candidate decoding paths in polar/PAC codes and prunes redundant layers in deep models.
- It employs precise statistical thresholds, such as Chernoff bounds and cosine similarity metrics, to retain only high-quality candidates.
- This approach achieves significant complexity gains, with over 90% reduction in decoding paths and up to 45% pruning in transformer layers, while maintaining performance.
A stack-based pruning algorithm is a class of complexity-reduction techniques in sequential path-search decoders or deep learning architectures, where a stack or ordered heap of candidate hypotheses is dynamically pruned using per-branch metrics or redundancy measures. In coding theory, these methods are deployed in the decoding of polar codes and PAC (Polarization-Adjusted Convolutional) codes to discard statistically unlikely or demonstrably redundant decoding paths, while in neural networks the principle extends to pruning redundant layers or computational sub-blocks. Central to modern stack-based pruning is the definition of rigorous pruning metrics, statistically motivated thresholds, and guarantees that the optimal solution is almost never pruned.
1. Theoretical Foundations of Stack-Based Pruning
In sequential decoding of polar or PAC codes, a stack decoder explores a tree of possible codewords by expanding candidate paths ranked by a path metric. Each partial path at depth is scored by a path metric
where is a local bit-metric, derived as
This metric distinguishes between paths aligned with the true codeword (the “correct” path) and incorrect paths by comparing the metric value to the channel’s cutoff rate or the symmetric capacity (Moradi et al., 2022). For deep transformer models, a stack of layers is assessed for redundancy by similarity metrics and pruned by removal or replacement as entire computational blocks (Dorszewski et al., 2024).
2. Pruning Metrics and Threshold Criteria
Stack-based pruning algorithms rely on tightly controlled thresholding of path metrics to safely reduce the search space. In polar/PAC decoding, the unnormalized bit-metric for branch is:
with expected value on the correct path and on wrong branches. The algorithm prunes any partial path where falls below a threshold , capitalizing on the exponential gap between correct-path and incorrect-path metrics. Statistical bounds such as one-sided Chernoff or Chebyshev inequalities provide explicit exponential decay for the probability of mistakenly pruning the correct path:
(Moradi et al., 2022, Moradi et al., 8 Sep 2025).
For deep learning stacks, redundancy across layers is measured by cosine similarity, linear centered kernel alignment (CKA), or mutual nearest-neighbor (kNN) alignment, identifying blocks of highly similar or functionally redundant layers. Layers within such blocks are prime targets for pruning (Dorszewski et al., 2024).
3. Algorithmic Structures and Pseudocode
In polar/PAC decoding, the stack-based pruning algorithm proceeds as follows:
- Initialize the stack with the empty path.
- Repeatedly pop the top-ranked candidate.
- For each successor:
- Compute the local bit-metric.
- Prune (discard) the candidate unless the metric exceeds the adaptive per-bit threshold.
- Update metric and push surviving candidates into the stack.
- Re-sort the stack and truncate to the permitted maximum size.
Pseudocode, specialized for fast-decodable node types (e.g., rate-0, REP, type-IV, rate-1 nodes), further reduces complexity by admitting into the stack only those candidate paths whose aggregate bit-metrics exceed a threshold based on mean/variance approximations and the desired pruning error (Moradi et al., 8 Sep 2025).
For transformer models, pruning proceeds by computing similarity matrices across all layers, identifying block structures, and iteratively deleting the lowest-influence layers using either block-influence heuristics or explicit kNN-similarity reduction until a minimum target accuracy is reached. All steps may be performed post hoc, without retraining (Dorszewski et al., 2024).
4. Statistical Guarantees and Safe Pruning Regimes
Chernoff bound analysis in the coding context provides a quantifiable guarantee that the probability of pruning the correct path falls exponentially with the difference . Total pruning failure probability over all stages is then
which remains negligible provided thresholds are set below the cutoff rates (Moradi et al., 2022). In the variance-guided pruning regime introduced for PAC codes, per-bit thresholds are established such that the probability of pruning the correct path is bounded by either a Chebyshev or Chernoff tail, ensuring the probability can be made arbitrarily small while only keeping candidates statistically close to the expected mean bit-metric (Moradi et al., 8 Sep 2025).
In transformer model pruning, catastrophic degradation is only observed when an entire block is removed, affirming that at least some representation from each detected functional block must be preserved to retain performance. Pruning within blocks yields virtually no drop in predictive capacity due to within-block redundancy (Dorszewski et al., 2024).
5. Practical Complexity Gains
Substantial reductions in practical computational complexity have been documented. For a PAC(128,64) code, pruning with reduced average stack occupancy from (conventional) to (pruned), i.e., a reduction, with no compromise in frame-error-rate (FER) across a range of SNRs (Moradi et al., 2022). The variance-guided stack decoder achieved up to reduction in average number of paths with zero performance loss and further reduced heap operations and LLR evaluations (Moradi et al., 8 Sep 2025).
In transformer-based speech models, up to of layers can be pruned post-training with original accuracy maintained. Further replacement of the entire stack with mimicking layers resulted in parameter reduction and inference speedup, while maintaining nearly all predictive capability for downstream tasks (Dorszewski et al., 2024).
6. Design Assumptions, Algorithmic Limitations, and Parameter Selection
Stack-based pruning assumes an infinite or sufficiently large stack to avoid path loss due to overflow. Metric thresholding presumes offline computation and storage of channel parameters such as and . In short block-length or very low SNR regimes, thresholds must be set conservatively to avoid performance loss as the gap between and capacity may be substantial (Moradi et al., 2022). For variance-guided pruning, Gaussian approximations of polarized bit-channel means and variances enable fast, per-frame threshold computation, ensuring per-path adaptation (Moradi et al., 8 Sep 2025).
For deep model pruning, no retraining is required for layer removal unless entire functionally distinct blocks are removed. However, for best results, especially in stack replacement by mimicking networks, a two-stage knowledge distillation or matching loss optimization is advisable (Dorszewski et al., 2024).
7. Broader Implications and Interconnections
Stack-based pruning strategies highlight the universality of metric-concentration phenomena in both information-theoretic coding and neural network architectures. In sequential decoding, metric-driven pruning enables orders-of-magnitude complexity savings while guaranteeing exponentially small error. In deep learning, stack redundancy analysis and judicious removal or replacement of layers can yield radically leaner architectures. Across domains, the underlying principle is the exploitation of statistical predictability and redundancy to preserve optimal solutions while discarding vast numbers of provably suboptimal candidates (Moradi et al., 2022, Moradi et al., 8 Sep 2025, Dorszewski et al., 2024).