Papers
Topics
Authors
Recent
2000 character limit reached

Dynamic Prototype Pruning (DPP)

Updated 21 December 2025
  • Dynamic Prototype Pruning is a paradigm that uses trainable prototypes to guide structured, instance-specific pruning for efficient and adaptive neural networks or datasets.
  • It dynamically generates pruning masks based on personalized or class-based prototypes, reducing computational costs without major retraining.
  • Empirical results from methods like PPP and GDeR demonstrate significant reductions in model size or data volume while maintaining or improving accuracy.

Dynamic Prototype Pruning (DPP) is a general paradigm in which trainable prototypes are leveraged to guide the structured pruning of neural networks or datasets during or after training, with the dual aims of efficiency and adaptability. DPP methods dynamically discover and utilize prototypical representations—either for users (in personalized model pruning) or for data classes (in data pruning)—to generate instance- or client-specific pruned models or training sets with minimal retraining, while maintaining or improving performance. Notable instantiations of DPP include Prototype-based Personalized Pruning (PPP) for channel pruning of deep neural networks and GDeR (Graph Training Debugging via prototypical dynamic soft-pruning) for balanced, robust data pruning in graph neural network (GNN) training (Kim et al., 2021, Zhang et al., 17 Oct 2024).

1. Conceptual Foundation

Dynamic Prototype Pruning formalizes pruning decisions as a function of learned prototypes that summarize relevant structure in user data (for model/channel pruning) or in class-conditional data distributions (for data/batch pruning). In both paradigms, a prototype condenses either the channel usage pattern or the embedding structure, typically by averaging gating or projection outputs associated with instances belonging to a user or class. During pruning, these prototypes control which network channels or training samples are retained, yielding user- or data-dependent compressed models or training batches. DPP is designed to avoid costly retraining or fine-tuning and to enable rapid adaptation, especially in compute- or memory-constrained settings.

2. Prototype-based Personalized Pruning (PPP)

PPP executes two primary stages: a joint learning stage that induces per-user prototypes through augmented gating modules and associated regularizers; and a dynamic, retraining-free pruning stage at deployment.

During training, lightweight gate modules are introduced at each convolutional layer ll. For each user pp and data sample, these modules produce binary channel usage vectors zi,lpz^{p}_{i,l} (via straight-through Gumbel-Max). The continuous-valued per-user prototype zˉlp\bar z^{p}_l for layer ll is computed as:

zˉlp=1np∑i=1npzi,lp\bar z^{p}_l = \frac{1}{n_p} \sum_{i=1}^{n_p} z^{p}_{i,l}

A binary mask mlpm^{p}_l is allocated by thresholding zˉlp\bar z^{p}_l at value τ\tau, and the total loss function incorporates standard task loss, prototype regularization Lproto\mathcal{L}_{\text{proto}}, and a target utilization regularizer. At deployment, a small enrollment set is used to compute per-user prototypes and extract final pruning masks, with no retraining or fine-tuning. The resulting static pruned models retain only those filters corresponding to active bits in mlpm^{p}_l. This process is designed for edge-device deployability, as only the pruned weights and masks are stored per user (Kim et al., 2021).

3. Dynamic Prototypical Data Pruning: GDeR

The GDeR framework introduces DPP into dynamic training batch selection for GNNs. Each epoch maintains a "training basket" Xt\mathcal{X}_t—a dynamically sized set of data samples. GNN encoders produce DD-dimensional embeddings which are projected onto the unit hypersphere. For each class, KK trainable prototypes {pkc}\{\mathbf{p}_k^c\} represent principal embedding modes. Prototype learning objectives add intra-class compactness and inter-class separation to the standard task loss, with prototype distributions modeled by a von-Mises Fisher mixture and trained via cross-entropy objectives.

Candidate samples are scored by:

  • Outlier risk (ωr\omega_r): prototype-Mahalanobis distance from all class clusters;
  • Familiarity (ωe\omega_e): relative angular proximity to own vs. other class prototypes;
  • Balance (ωb\omega_b): cluster membership proportion.

Sampling weights are computed by composing these scores (after sigmoid) and used to stochastically select the next epoch's training basket, ensuring representative, balanced, and robust subsets. The process iteratively shapes embedding space and trains both GNN weights and prototypes (Zhang et al., 17 Oct 2024).

4. Mathematical Formalism and Algorithmic Steps

PPP’s mask generation operates as follows:

  1. For each user pp and layer ll, compute mean prototype embedding zˉlp\bar z^{p}_l from KK enrollment samples.
  2. Threshold zˉlp\bar z^{p}_l to derive the binary mask mlpm^{p}_l.
  3. Construct pruned model MpM^p by retaining only filters where ml,jp=1m^{p}_{l,j}=1.
  4. Deploy MpM^p for static inference.

GDeR’s algorithm, per epoch:

  1. For each sample ii in Xt\mathcal{X}_t, compute hyperspherical embedding zi\mathbf{z}_i.
  2. Calculate prototype-based outlier, familiarity, and balance scores.
  3. Compose sampling weights ω(zi)\omega(\mathbf{z}_i).
  4. Sample new Xt+1\mathcal{X}_{t+1} while maintaining the pruning ratio, using stochastic weighted selection.
  5. Minimize the sum of task loss plus prototype compactness/separation losses, updating all parameters.
  6. Repeat over epochs.

5. Empirical Results and Practical Outcomes

PPP, when evaluated on CIFAR-10/100 and keyword spotting tasks, achieves 2–3×\times model size reduction with negligible or improved accuracy relative to the full model: e.g., for CIFAR-10 on ResNet-56, accuracy of 94.4% with only 37.6% channel utilization (a 2.7×\times reduction), outperforming dynamic gating and vanilla models. Ablation ("PPP NoReg") reveals that the prototype regularization is crucial; omitting it degrades performance catastrophically (e.g., down to ∼\sim54% on keyword spotting). The storage and computation savings, coupled with the retraining-free procedure, make PPP particularly suited for resource-limited on-device deployment (Kim et al., 2021).

GDeR, applied to graph datasets and GNN backbone architectures, enables up to 50% dataset pruning with no or improved performance (e.g., +1.5% ROC-AUC on OGB-MolHIV when retaining 30% of pre-training data) and attains up to 2.81×\times training speedup. Under severe label imbalance or synthetic feature noise, GDeR outperforms state-of-the-art pruning and defense methods, boosting F1-macro by up to +4.3% and accuracy by up to +7.8% with 30-50% pruning (Zhang et al., 17 Oct 2024).

6. Robustness, Balancing, and Hyperparameterization

DPP explicitly addresses concerns of data and model bias through its balancing mechanisms. In GDeR, the balance score ωb\omega_b favors underrepresented clusters, while outlier scoring ωr\omega_r down-weights samples likely to be noisy or adversarial. Hyperparameters relevant to DPP include the number of prototypes per class, concentration and temperature parameters of the vMF mixture, loss weighting coefficients, pruning ratios, and sample scheduler parameters. Adjusting these allows for trade-offs between pruning aggressiveness, task performance, and prototype quality (Zhang et al., 17 Oct 2024).

PPP’s hyperparameterization is focused on threshold τ\tau, utilization target α\alpha, and regularization weights λ1,λ2\lambda_1, \lambda_2, governing the tightness of user prototype clustering and sparsity of pruned models (Kim et al., 2021).

7. Applicability and Cross-Domain Generalization

The DPP paradigm generalizes across tasks and domains. PPP demonstrates efficacy both in vision (CIFAR-10/100) and audio (keyword spotting), indicating its applicability to any scenario where model personalization and on-device efficiency are required. GDeR shows DPP’s viability in GNN contexts, including imbalanced and noisy graph classification tasks. In both cases, the dynamic prototype-based formulation achieves efficient, robust, and balanced pruning without sacrificing accuracy, suggesting broad utility wherever compact, adaptive representations are desired for deployment or training efficiency (Kim et al., 2021, Zhang et al., 17 Oct 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Dynamic Prototype Pruning (DPP).