Papers
Topics
Authors
Recent
Search
2000 character limit reached

Probabilistic Magnitude Pruning

Updated 15 December 2025
  • Probabilistic Magnitude Pruning (PMP) is a neural network compression method that prunes low-magnitude weights using statistical calibration and probabilistic modeling.
  • It integrates magnitude-based pruning with rigorous uncertainty quantification to guarantee expressive power and controlled predictive risk.
  • PMP has practical applications across fully connected, convolutional, and graph neural networks, enabling efficient deployment in computer vision and recognition tasks.

Probabilistic Magnitude Pruning (PMP) encompasses a family of methodologies for neural network compression that systematically prune low-magnitude weights with guarantees on network expressive power, generalization performance, and, in recent advances, calibrated uncertainty on performance loss under finite data. Approaches to PMP unify magnitude-based pruning—removal of connections with smallest absolute values—with probabilistic modeling, variational optimization, and statistical calibration to deliver well-controlled tradeoffs between sparsity and predictive risk. PMP therefore stands as a cornerstone in uncertainty-aware neural network deployment, featuring applications from fully connected architectures, convolutional networks, and graph neural networks to computer vision and skeleton-based recognition.

1. Formal Definitions and Problem Setting

Let f:X[0,1]M×Nf: \mathcal{X} \to [0,1]^{M \times N} denote a pre-trained neural network with KK real-valued weights W={wi}i=1KW = \{w_i\}_{i=1}^K. PMP seeks to produce a sparse variant fλf_\lambda by zeroing out the fraction λ[0,1)\lambda \in [0,1) of weights with smallest absolute value, defined by the quantile threshold qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda) and the rule

wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}

A loss function \ell measuring degradation, such as (Y,Y^)=I{YY^}\ell(Y, \hat{Y}) = \mathbb{I}\{Y \neq \hat{Y}\} for classification, induces the “risk” R(λ)=E(X,Y)P[()]R(\lambda) = \mathbb{E}_{(X,Y) \sim P}[\ell(\cdot)] and empirical calibration risk KK0 over KK1 i.i.d. calibration samples.

The fundamental objective is: for given tolerance KK2 and error budget KK3, select the largest KK4 such that

KK5

which ensures with high confidence that the pruned network remains within tolerable risk (Alvarez, 2024).

2. Statistical Calibration and Distribution-Free Guarantees

PMP methods in (Alvarez, 2024) leverage distribution-free uncertainty quantification via the Learn–then–Test (LTT) framework. The range KK6 is discretized into KK7 candidate sparsity levels KK8. For each KK9, the null hypothesis W={wi}i=1KW = \{w_i\}_{i=1}^K0 is tested with super-uniform p-values:

  • Binomial-tail: W={wi}i=1KW = \{w_i\}_{i=1}^K1
  • Hoeffding–Bentkus: W={wi}i=1KW = \{w_i\}_{i=1}^K2
  • PRW p-values for general bounded loss

A family-wise error rate (FWER) controlling procedure W={wi}i=1KW = \{w_i\}_{i=1}^K3 rejects nulls at a subset W={wi}i=1KW = \{w_i\}_{i=1}^K4. Selecting W={wi}i=1KW = \{w_i\}_{i=1}^K5, one achieves

W={wi}i=1KW = \{w_i\}_{i=1}^K6

with no distributional assumptions beyond i.i.d. calibration draws (Theorem 3.1 of (Alvarez, 2024)). Monotonicity of risk in magnitude pruning supports fixed-sequence testing, terminating upon the first non-rejection.

3. Algorithmic Procedures and Variational PMP

PMP is instantiated by several algorithmic paradigms:

a. Calibrated Magnitude Pruning (Fixed-sequence Testing) (Alvarez, 2024):

  1. Sort W={wi}i=1KW = \{w_i\}_{i=1}^K7 to determine W={wi}i=1KW = \{w_i\}_{i=1}^K8.
  2. Sequentially prune to build W={wi}i=1KW = \{w_i\}_{i=1}^K9, compute fλf_\lambda0, and corresponding p-value fλf_\lambda1.
  3. Accumulate fλf_\lambda2 in fλf_\lambda3 until fλf_\lambda4.
  4. Output fλf_\lambda5, where fλf_\lambda6.

wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}7

b. Variational PMP in GCNs (Sahbi, 2023):

  • Introduces a continuous “band-stop” parameterization: fλf_\lambda7 KL divergence enforces the empirical latent weight distribution to align with a prior fλf_\lambda8, achieving an exact pruning budget via quantile mapping fλf_\lambda9.
  • Loss function: λ[0,1)\lambda \in [0,1)0 where λ[0,1)\lambda \in [0,1)1 is prediction loss.
  • End-to-end joint optimization of masks and network weights obviates explicit hard-masking or retraining steps.

4. Theoretical Guarantees and Generalization Bounds

PMP delivers quantifiable theoretical assurances:

  • In fully connected and convolutional networks, magnitude-based pruning of λ[0,1)\lambda \in [0,1)2 weights per layer preserves uniform approximation error λ[0,1)\lambda \in [0,1)3 with probability λ[0,1)\lambda \in [0,1)4 provided sufficient width λ[0,1)\lambda \in [0,1)5, as established in (Qian et al., 2021). Layer widths must satisfy polynomial lower bounds in λ[0,1)\lambda \in [0,1)6, λ[0,1)\lambda \in [0,1)7, and λ[0,1)\lambda \in [0,1)8 to guarantee

λ[0,1)\lambda \in [0,1)9

  • In stochastic mask optimization (Hayou et al., 2021), minimization of empirical Gibbs risk induces data-adaptive qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)0 regularization and preferential retention of weights best aligned to label features. Extensions to PAC-Bayes pruning provide explicit self-bounded generalization error via data-dependent priors and joint optimization of weights and stochastic mask parameters.

5. Computational Complexity and Implementation Considerations

Key operations for PMP include:

  • Sorting qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)1 weights: qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)2
  • Forward passes for empirical risk: qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)3, often reduced by incremental masking
  • Per-candidate p-value calculation: qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)4

Best practices highlighted:

  • Use sparse tensor representations post-pruning for accelerated inference
  • Precompute and cache layer-wise masks for repeated risk evaluation
  • Parallelize batch risk computation

Variational PMP settings (e.g., for GCNs) select smoothing parameters (qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)5) for band-stop functions and histogram binning (qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)6) to suit desired exact budget attainment. Training runs are efficiently supported on commodity GPUs (Sahbi, 2023).

6. Experimental Evaluations and Practical Tradeoffs

Experiments outline the efficacy of PMP:

Dataset & Architecture Calibration/Test Split Baseline(s) α-tolerance, δ-budget Achieved Sparsity Notes
MNIST / FCN (118k params) 9k calibration / 1k test Naive MP, calibrated MP α=0.03, δ=0.1 λ*=0.68–0.78 PMP respects α, naive method violates
PolypGen / U-Net (13.8M) 465 calibration / 50 test images Global MP α=0.05, δ=0.05 λ*=0.06 Significantly lower compression required
FPHA / GCN 575 test sequences Classical MP, PMP+Gaussian Fixed r r=55%–99% PMP+Gaussian outperforms MP at high sparsity

PMP achieves coverage at least qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)7 empirically, with selective calibration strategies enabling further control over confidence thresholds and abstention rates in prediction. Results indicate higher robustness of PMP with Laplace priors at extreme sparsities and improved generalization under Gaussian priors even without pruning (Sahbi, 2023).

7. Limitations and Extensions

PMP in its strongest form assumes monotonic risk increase with sparsity under one-shot magnitude pruning. Iterative or structured schemes may require more general multiple testing corrections (e.g., Holm–Bonferroni). PMP requires a held-out calibration set, rendering the method sensitive to calibration error in data-scarce regimes; bootstrap or cross-conformal variants offer mitigation at increased computation. Proposed extensions include:

  • Joint calibration of quantile qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)8 and confidence threshold qλ:=quantile(wi;λ)q_\lambda := \text{quantile}(|w_i|; \lambda)9
  • Adapting PMP to iterative or structured pruning settings
  • Incorporating second-order relevance metrics (e.g., Hessian-based scores)
  • Layer-wise, architecture-aware tuning of wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}0, wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}1 budgets

Practical selection of wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}2 should reflect domain-specific tolerable drops, typically in wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}3 for vision classification. Empirical evidence confirms that the sparsity parameter wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}4 is more sensitive to wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}5 than to wi,λ={wi,if wi>qλ 0,otherwisew_{i,\lambda} = \begin{cases} w_i, & \text{if } |w_i| > q_\lambda \ 0, & \text{otherwise} \end{cases}6 (Alvarez, 2024).


Probabilistic Magnitude Pruning thus merges rigorous statistical calibration, variational and probabilistic modeling of sparsity, and practical algorithmic strategies to reliably compress deep neural networks under explicit risk control and budget constraints (Alvarez, 2024, Sahbi, 2023, Qian et al., 2021, Hayou et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probabilistic Magnitude Pruning (PMP).