Mitigation strategies in pruning calibration pipelines
Develop effective mitigation strategies within the calibration pipeline used by unstructured pruning algorithms for large language models—specifically Magnitude pruning, SparseGPT, and Wanda—that reliably prevent pruning-triggered attacks while minimizing degradation on standard utility benchmarks. Concretely, design security-aware calibration procedures (e.g., dataset selection and scoring routines) that suppress attack activation across models, sparsity levels, and pruning configurations without incurring substantial performance loss.
References
Overall, security-aware calibration by itself is insufficient to reliably prevent pruning-triggered attacks in our setting. We leave methods for a better mitigation strategy in a calibration pipeline as an interesting and important open question for future work.