Protective Self-Adaptive Pruning (PSAP)

Updated 4 January 2026

PSAP is an adaptive structured pruning method that leverages statistics like weight sparsity ratios and gradient magnitudes to dynamically determine pruning schedules.
It employs pulse gradient supervision for protective reconstruction, swiftly restoring pruned filters that show significant utility to maintain performance.
In large language models, PSAP integrates alignment-aware dynamic pruning to safeguard safety-critical circuits against adversarial prompts.

Protective Self-Adaptive Pruning (PSAP) is an iterative structured pruning paradigm designed for both deep neural networks (DNNs) and LLMs, with the dual aims of maximizing computational efficiency and safeguarding critical functional or alignment properties. PSAP dispenses with external controllers and leverages inherent model statistics—specifically weight sparsity ratios and gradient magnitudes—to adaptively calibrate pruning schedules and prevent loss of indispensable units. In LLMs, PSAP further builds on dynamic pruning frameworks to specifically ensure the preservation of alignment-relevant substructures under adversarial or safety-critical prompts, offering superior refusal rates and safety compared to both static and conventional dynamic pruning approaches (Li et al., 2023, Patel et al., 9 Nov 2025).

1. Motivations and Distinctions from Prior Adaptive Pruning

Existing adaptive pruning techniques, including methods such as AMC, NetAdapt, SCP, DST, and RL-based controllers, typically allocate pruning budgets or assign filter importance via auxiliary scoring mechanisms—frequently requiring additional networks, meta-learners, or reinforcement learning agents. These external monitors introduce substantial computational overhead, reduced interpretability, and scalability bottlenecks for large-scale deployment (Li et al., 2023). PSAP was introduced to address these challenges by relying exclusively on information derived from the base network: the per-layer weight sparsity ratio (WSR) and post-pruning gradient norms, removing the need for auxiliary agents and maintaining both simplicity and transparency.

In the context of LLMs, conventional pruning can compromise alignment by deleting circuits responsible for refusal or safety behaviors—a vulnerability exacerbated in dynamic pruning regimes, which select subnetworks on a per-input basis without explicit regard for alignment preservation. Alignment-unaware pruning can result in catastrophic safety degradation when confronted with adversarial prompts or jailbreak attempts (Patel et al., 9 Nov 2025). PSAP corrects this by integrating alignment-relevance scoring and prompt-adaptive gating logic.

2. Methodology: Adaptive Pruning and Protective Reconstruction

The PSAP methodology consists of two tightly coupled mechanisms:

Adaptive Pruning Ratio Scheduling

At each pruning iteration, PSAP computes the WSR for each layer $l$ :

$s^{(l)} = \frac{|W^{(l)}|_0}{F(W^{(l)})}$

where $|W^{(l)}|_0$ is the count of zero weights and $F(W^{(l)})$ is the number of weights in the layer. The target per-layer pruning ratio $k^{(l)}$ is updated as:

$k_t^{(l)} = \begin{cases} \max(k_{t-1}^{(l)}, s_{t-1}^{(l)} + \delta) & \text{if } s_{t-1}^{(l)} \leq k_{t-1}^{(l)} \ k_{t-1}^{(l)} & \text{otherwise} \end{cases}$

where $\delta$ is a small increment, ensuring pruning ratios increase monotonically, never decreasing below observed sparsity. Layers with higher zero-counts (less utilized) are pruned more aggressively.

Protective Reconstruction via Pulse Gradient Supervision

After each pruning step, a mini-batch backward pass is performed to collect the gradient $\nabla W$ . For each filter $j$ pruned in the current step, its gradient norm $\|\nabla W_j\|_2$ is computed. If $\|\nabla W_j\|_2$ exceeds the average for that layer, its weights are restored from backup:

This mechanism exploits the “pulse-gradient” phenomenon, where pruned filters with non-negligible downstream utility (revealed by large gradients, particularly with Batch Normalization) are safeguarded against irreversible removal.
No additional regularization is applied; the only constraint is the immediate reloading of high-gradient filters to preserve capacity.

3. PSAP for LLMs: Alignment-Aware Dynamic Pruning

In LLMs, PSAP is an advancement over Alignment-Aware Probe Pruning (AAPP), using alignment-critical circuit identification and prompt-dependent structured pruning masks:

Each substructure (“circuit”): e.g., MLP input channels, attention projection columns.
Alignment relevance is estimated by a small probe network $f_\theta$ , trained to map hidden layer token representations to a per-circuit score $s_i(x)$ . Historical log-energies for benign ( $e^+$ ) and harmful ( $e^-$ ) prompts are precomputed.
The probe loss encourages $\theta$ to assign high $s_i$ to circuits essential for safe/refusal behaviors in harmful contexts and penalize score assignment to circuits only utilized under benign prompts:

$L_{\text{probe}}(\theta) = \mathbb{E}_{x\in\text{Harmful}}\left[ -\sum_i s_i(x)\cdot e^-_i \right] + \mathbb{E}_{x\in\text{Benign}} \left[ \sum_i s_i(x)\cdot e^+_i \right] + \lambda\|\theta\|_2^2$

At inference, the probe scores $s_{\text{live}}$ blend real-time importance and historical benign activation, and a softmax over circuits yields $p_i$ . KL divergences to historical safe and harmful distributions ( $q_{\text{safe}}$ , $q_{\text{harm}}$ ) trigger an alignment gate. When adversarial signal is detected (KL threshold $\tau$ exceeded), PSAP hard-preserves top $\alpha\cdot C$ circuits with highest $e^-_i$ for robustness, filling out the remainder by live importance. Otherwise, pruning proceeds solely by online scoring.

4. Algorithmic Workflow and Implementation Details

For DNNs, the PSAP workflow consists of two main phases:

Adaptive Search
- For each epoch, update $k^{(l)}$ as per WSR.
- Prune lowest-norm filters per $k^{(l)}$ .
- Execute one mini-batch step to collect $\nabla W$ ; restore pruned filters surpassing mean gradient norm.
- Continue standard SGD update.
- Stop when total pruning ratio exceeds user target $T_{\text{total}}$ .
Fine-Tuning
- Freeze the masks and fine-tune masked model for fixed epochs.

For LLMs, the PSAP dynamic inference loop includes masking at the circuit level, adaptive mode switching by alignment-aware gate (reevaluated every $N$ tokens), and exclusive pruning of input channels for attention and MLP projections (excluding initial/final layers to ensure fluency). Hyperparameters include pruning ratio $r$ , align fraction $\alpha$ , KL margin $\tau$ , and probe learning rate $1\times 10^{-4}$ (Patel et al., 9 Nov 2025, Li et al., 2023).

5. Empirical Results and Comparative Performance

PSAP has demonstrated state-of-the-art compression/accuracy trade-offs across image and language domains:

CIFAR-10, ResNet-56: PSAP attains $93.74\%$ accuracy at $48.6\%$ FLOPs (vs. SFP, FPGM, AMC, PFS, SCP, DCP, all with higher accuracy drop at similar compression). For pruning from scratch and high-compression regimes, PSAP maintains higher performance (Li et al., 2023).
ImageNet, ResNet-50: PSAP achieves $76.83\%$ accuracy at $36.6\%$ FLOPs, outperforming SFP ( $-1.54\%$ ), ASFP, and others.
ImageNet, MobileNetV2: PSAP reaches similar or lower performance drop vs. MetaPruning, AMC, PFS, AutoPruner.
LLMs (LLaMA-2-7B-chat, Qwen2.5-14B-Instruct, Gemma-3-12B-IT): At prune ratio $r=0.3$ , PSAP achieves refusal rate $0.62$ at $10.1$ GFLOPs/token (vs. $0.32$ for static prune at $10.8$ GFLOPs/token). Alignment F1/accuracy/FAR remain higher for PSAP than probe pruning or AAPP, especially at higher prune ratios. Toxicity remains within $5\%$ of unpruned baseline even at $r=0.5$ , while legacy approaches exhibit pronounced spikes (Patel et al., 9 Nov 2025).

Benchmark	FLOPs Retained	Baseline Acc.	PSAP Acc. / Δ	Best Baseline Δ
CIFAR-10, ResNet-56	48.6%	93.59%	93.74% (+0.15)	SFP (−0.83)
ImageNet, ResNet-50	36.6%	76.15%	76.83% (+0.18)	SFP (−1.54)
LLaMA-2-7B-chat	70%	—	Refusal: 0.62	Static: 0.32

6. Integration and Deployment

To deploy PSAP, the following pipeline is recommended (Patel et al., 9 Nov 2025):

Alignment Probe Training for LLMs
- Collect balanced safe and adversarial prompt sets (e.g., $5$K each).
- Record per-channel log-energies for $e^+$ , $e^-$ .
- Train probe $f_\theta$ on these distributions.
Pruning Schedule Selection
- Select $r$ and $\alpha$ via grid sweep on held-out prompts to tune throughput/safety trade-offs.
- Set refresh window $N$ for re-applying the alignment-aware gate at inference (e.g., $N=20$ tokens).
Validation
- Test on large-scale adversarial suites (e.g., $10$K WildJailbreak prompts).
- Assess refusal rates, false acceptance, and toxicity to confirm performance relative to unpruned baselines.
Inference-Time Pruning
- Implement dynamic masking logic per above pseudocode to ensure just-in-time pruning, alignment gating, and preservation.

7. Limitations and Future Work

PSAP, as formulated for DNNs, currently targets Batch Normalization-enabled architectures, leveraging pulse gradients for effective filter protection. Application to non-BN networks, including certain Transformer variants, may require adaptation. The global pruning increment $\delta$ is manually set and not layer- or data-adaptive; meta-learned or layer-wise schedules represent a plausible extension.

Further research may explore integration with hardware-specific latency objectives, combined pruning and quantization regimes, and expansion to other forms of model overparameterization. For LLMs, generalizing alignment probe training and pruning logic beyond current substructure definitions may enhance modularity and robustness (Patel et al., 9 Nov 2025, Li et al., 2023).

PDF Markdown Chat (Pro)

References (2)

Protective Self-Adaptive Pruning to Better Compress DNNs (2023)

Alignment-Constrained Dynamic Pruning for LLMs: Identifying and Preserving Alignment-Critical Circuits (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Protective Self-Adaptive Pruning (PSAP).