PWMPR: Sparse-to-Dense Growth in Neural Nets

Updated 12 November 2025

PWMPR is a sparse-to-dense training paradigm that grows subnetworks from a sparse seed using NTK-inspired, path-based edge scoring.
It employs an L1-norm surrogate score and randomized sampling to add promising edges while avoiding bottlenecks and preserving connectivity.
The method achieves competitive vision benchmark accuracies at lower cumulative training costs compared to traditional pruning approaches.

Path Weight Magnitude Product-biased Random growth (PWMPR) is a constructive sparse-to-dense training paradigm for neural networks that systematically grows subnetworks from an initial sparse seed using an NTK-inspired, path-based edge selection criterion. PWMPR is designed to automatically discover optimal operating densities for sparse neural networks, in contrast to prevailing pruning-based approaches that assume a fixed target density and often incur higher computational costs. By leveraging a path weight magnitude product score and randomized sampling, PWMPR achieves efficient density discovery and competitive accuracy on standard vision benchmarks at a lower cumulative training cost.

1. Mathematical Formulation and Edge Selection Criterion

PWMPR operates on a feed-forward neural network represented by scalar weights $\theta_{ij}$ assigned to edges $(i, j)$ . A path $p$ is defined as a sequence of edges connecting an input node $s$ to an output node $k$ . The path’s weight-product is expressed as

$\pi_p(\theta) = \prod_{(u, v) \in p} \theta_{uv}.$

Gebhart et al. demonstrated that the Neural Tangent Kernel (NTK) of such a network can be decomposed into a “path kernel,” with its trace given by

$\Tr(\Pi_\theta) = \sum_p \sum_{(i, j)\in p} \left(\frac{\pi_p(\theta)}{\theta_{ij}}\right)^2.$

The incremental contribution to the trace by adding a new (zero-initialized) edge $(i, j)$ is

$\Delta\Tr(\Pi_\theta)_{(i, j)} = \sum_{p \ni (i, j)} \left(\prod_{(u, v) \in p \setminus \{(i, j)\}} \theta_{uv}\right)^2,$

but exact computation is intractable.

PWMPR therefore introduces an $L_1$ -norm surrogate, the Path Weight Magnitude Product (PWMP) score: $S(i, j) = \sum_{p \ni (i, j)} \prod_{(u, v) \in p \setminus \{(i, j)\}} |\theta_{uv}|.$ Operationally, node-level scores “complexity” $\phi(v)$ (forward) and “generality” $\psi(v)$ (backward) are computed in a single forward and backward pass, setting $S(i, j) = \phi(i)\psi(j)$ using network weights $\tilde\theta_{uv} = |\theta_{uv}|$ and all-ones input. Candidate edges not in the existing mask are assigned sampling probabilities

$P(i, j) = \frac{S(i, j)}{\sum_{(u,v)\notin E} S(u,v)},$

and grown by random sampling proportional to $P(i, j)$ , balancing the focus on promising paths with architectural diversity.

2. Constructive Sparse-to-Dense Growth: Operating Principles

The PWMPR workflow follows a strictly constructive paradigm:

A. Initialization: Begin from a sparse seed mask produced by PHEW with initial density $\rho_{\rm init}$ chosen to ensure connectivity (i.e., avoiding isolated nodes).

B. Iterative Growth Schedule: At each iteration $k$ , starting from mask $m_k$ with density $\rho(m_k)$ , train for a rough phase of length $\omega = 10\%$ of a full training budget, then add a fraction $\gamma = 25\%$ of currently active edges (exponential density schedule), initializing new weights to zero.

C. Stopping Criterion: Maintain tuples $(\rho_k, P(G_k))$ tracking density and validation accuracy. Accuracies are regressed with a logistic function

$\hat P(\rho) = \hat P_0 + A\left(1 - e^{-\beta \rho}\right),$

and growth is halted at the smallest $\rho_k$ for which $\hat P(\rho_k) \ge \hat P_0 + 0.95 A$ . This criterion signals the effective saturation of accuracy with respect to network density.

At termination, a final “extensive” training phase is conducted either from scratch or continuing from the stopped mask.

3. Theoretical Motivation: NTK and Bottleneck Avoidance

PWMPR is theoretically motivated by the functional form of the NTK in overparameterized networks. The trace $\Tr(\Pi_\theta)$ measures the global sum of squared path-derivatives and thus encodes average curvature in parameter space. Adding edges with high $\Delta\Tr$ increases NTK eigenvalues and thus accelerates convergence during training.

Since explicit computation of $\Delta\Tr$ is cubic in path count, PWMPR leverages the tractable $L_1$ -norm surrogate. Crucially, naïvely optimizing $S(i, j)$ (e.g., greedily adding top-scoring edges) leads to bottleneck structures concentrating edges on few nodes, which can degrade generalization. Random sampling proportional to $S(i, j)$ (rather than deterministic selection) empirically mitigates bottlenecks, maintaining higher average weighted $\tau$ -core connectivity during growth and improving path coverage and network robustness.

4. Algorithmic Description

The PWMPR algorithm proceeds as follows:

Initialize with sparse mask $(V, E_0, \theta)$ , ensure $\rho_0$ avoids isolated nodes.
Repeat: For $k=0,1,2,\dots$ $k = 0, 1, 2, \dots$
- Train for $t_{\rm per\ iter} = \omega T$ steps.
- Compute $\phi$ and $\psi$ via two sparse passes.
- For all missing edges, assign $S(i, j) = \phi(i)\psi(j)$ .
- Randomly sample $M = \lfloor \gamma \rho_k n \rfloor$ new edges (without replacement) according to $\Pr\{(i, j)\} \propto S(i, j)$ .
- Add to $E_k$ and initialize $\theta_{ij}=0$ .
- Record $(\rho_{k+1}, P(G_{k+1}))$ , refit the logistic model, check if stopping threshold is met.
Final Phase: Retrain comprehensively on the stopped mask.

Key hyperparameters are growth ratio $\gamma = 25\%$ , rough-phase fraction $\omega = 10\%$ , and a 95% logistic-fit threshold.

5. Computational Cost and Empirical Performance

Cumulative training cost is measured as total sparse-FLOPs across all growth iterations, normalized to the cost of full dense training. Empirical evaluation demonstrates:

Dataset/Model	PWMPR Cost (× dense)	IMP-C Cost (× dense)	Density at Stop	Accuracy
CIFAR-10/ResNet-32	≈1.5	≈3.2	≈40%	≈93.5%
CIFAR-100/ResNet-56	≈1.5	≈3.5	≈30%	≈70.2%
TinyImageNet/ResNet-18	≈2.0	≈4.5	≈40%	≈66.0%
TinyImageNet/ViT	≈1.8	≈3.2	≈50%	≈62.1%

On ImageNet/ResNet-50, PWMPR achieves 71.0 ± 0.04% at 10% density and 73.2 ± 0.13% at 20% density, trailing RigL and SparseMomentum by 1–2%. The method consistently locates “lottery ticket” subnetworks at less than half the cost of standard iterative magnitude pruning.

PWMPR’s randomized growth outperforms deterministic Path Weight Magnitude Product (PWMP), as well as simple random growth (RG) and global greedy (GG) heuristic baselines across all evaluated settings.

6. Advantages, Limitations, and Scope for Extension

Advantages

Density Discovery: PWMPR eliminates the need for a pre-specified target density, autonomously identifying the lowest sufficient density via its logistic-fit stopping rule.
Efficiency: Cumulative compute is 1.5–2× dense training, compared to 3–4× for IMP-based methods.
Generalization: Stochastic sampling mitigates bottleneck formation, preserving connectivity and broad path coverage.
Implementation Simplicity: Requires two sparse passes per growth phase for path-based scoring.

Limitations and Open Directions

Immutability of Existing Edges: Inability to prune already-added edges drives final densities higher than those produced by optimal pruning (IMP-C).
Attention Mechanisms: The path-kernel surrogate underlying PWMPR does not apply to query/key projections in attention, as their magnitudes are decoupled by the softmax; attention-specific extension is an open problem.
Stopping Heuristic: The logistic-fit rule is a simple heuristic; more principled stopping criteria could further optimize density discovery.
Domain Coverage: Experiments are limited to vision tasks; extension to language modeling and sequential domains is unexplored.
Potential Hybridization: Hybrid grow-prune regimes may combine constructive and destructive sparsification for even better “winning ticket” discovery at sparser densities.

7. Significance and Paradigm Shift

PWMPR reframes the network sparsification problem from destructive pruning to constructive growth, grounded in NTK-theoretic justification and path-based combinatorics. Its balance of principled score-driven edge addition, randomized exploration, and automatic density selection establishes growth-based density discovery as a complementary and competitive alternative to iterative pruning and dynamic sparsification. PWMPR’s observed cost advantage across standard visual benchmarks supports its adoption in scenarios where training efficiency and subnet structure discovery are both critical, signaling a renewed focus on constructive paradigms in sparse neural network research (Yao et al., 30 Sep 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Path Weight Magnitude Product-biased Random growth (PWMPR).

PWMPR: Sparse-to-Dense Growth in Neural Nets

1. Mathematical Formulation and Edge Selection Criterion

2. Constructive Sparse-to-Dense Growth: Operating Principles

3. Theoretical Motivation: NTK and Bottleneck Avoidance

4. Algorithmic Description

5. Computational Cost and Empirical Performance

6. Advantages, Limitations, and Scope for Extension

Advantages

Limitations and Open Directions

7. Significance and Paradigm Shift

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PWMPR: Sparse-to-Dense Growth in Neural Nets

1. Mathematical Formulation and Edge Selection Criterion

2. Constructive Sparse-to-Dense Growth: Operating Principles

3. Theoretical Motivation: NTK and Bottleneck Avoidance

4. Algorithmic Description

5. Computational Cost and Empirical Performance

6. Advantages, Limitations, and Scope for Extension

Advantages

Limitations and Open Directions

7. Significance and Paradigm Shift

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research