Synaptic Flow: Data-Agnostic Pruning
- Synaptic Flow is a data-agnostic pruning algorithm that leverages conservation laws to maintain synaptic strength across layers.
- It computes per-parameter saliency scores using gradient and weight products, then iteratively prunes the lowest scoring parameters based on an exponential schedule.
- Empirical results demonstrate that SynFlow achieves extreme sparsity without catastrophic layer collapse, ensuring network trainability across diverse architectures.
Synaptic Flow (SynFlow) is a theoretically grounded, data-agnostic, iterative network pruning algorithm that preserves the total flow of synaptic strengths in neural networks at initialization. SynFlow addresses a central challenge in data-free pruning: how to identify highly sparse, trainable subnetworks at initialization while rigorously avoiding the phenomenon of layer-collapse—whereby an entire layer is pruned, rendering the network untrainable. The algorithm leverages conservation laws for synaptic saliency scores derived from homogeneous network activations and operates by iteratively ranking and removing parameters based on their Synaptic Flow (S_SF) scores. SynFlow achieves state-of-the-art performance among initialization pruning algorithms across diverse architectures and datasets, sustaining accuracy at extreme sparsities and providing formal guarantees against layer-collapse (Tanaka et al., 2020).
1. Conservation Laws for Synaptic Saliency
The cornerstone of SynFlow is its reliance on score functions of the form for a scalar function defined on the network’s outputs. When the activation function is homogeneous (i.e., , as with ReLU), the following exact conservation laws apply:
- Neuron-wise Conservation (Theorem 1):
For each neuron , the sum of the synaptic saliency scores over its incoming weights and separately over its outgoing weights are equal. Formally, denoting incoming weights as and outgoing as :
with .
- Network-wise (Cut) Conservation (Theorem 2):
For any cut of parameters separating inputs and outputs,
For layers specifically, the total synaptic saliency score assigned to each layer is invariant across layers. The implication is that single-shot or global threshold pruning strategies—such as those used in SNIP and GraSP—produce an outsized removal of parameters from wide layers, increasing the risk of total layer collapse.
2. The Iterative Synaptic Flow Pruning Algorithm
SynFlow is instantiated by specifying as a “path” objective, denoted , which sums the products of the absolute values of weights along every input–output path: The per-parameter Synaptic Flow score is
These scores are non-negative and satisfy both neuron-wise and cut-wise conservation. Pruning proceeds iteratively according to the following process:
1 2 3 4 5 6 7 8 9 |
Input: untrained network f(x; θ₀), compression ratio ρ, number of iterations n
0. Initialize mask μ ← all-ones (retain all parameters)
For k = 1…n:
1. θ ← μ ∘ θ₀ # Apply current mask
2. R ← 1^T (∏_{l=1}^L |θ^{[l]}|) 1 # SynFlow objective (path-sum)
3. S ← (∂R/∂θ) ∘ θ # Synaptic Flow score
4. τ ← percentile at (1 − ρ^{-k/n}) of S
5. μ ← (S > τ) # Prune the lowest scoring parameters
Return f(x; μ ∘ θ₀) |
Key algorithmic features include the exponential pruning schedule to ensure, at each iteration, the pruning step does not remove more total score than exists in any one layer, thereby guaranteeing no premature collapse of any layer.
3. Mathematical Formulation of the Synaptic Flow Score
In a fully connected network with layers , the score for weight is: This expresses the parameter’s centrality to the propagation of absolute weight “flow” through all possible input–output paths that traverse . All scores are guaranteed to be non-negative, strictly positive for , and to be conserved across layers.
4. Critical Compression and Layer-Collapse Guarantees
Theoretical analysis (Theorem 3) established the sufficiency of three properties for avoiding layer-collapse: (i) strictly positive scores, (ii) layer-wise conservation, and (iii) pruning less total score per iteration than the cut size of any single layer. SynFlow meets all criteria, thus is provably layer-collapse-free. This implies the algorithm can compress a network down to one parameter per layer—achieving maximal critical compression—without catastrophic loss of trainability or representational capacity.
A plausible implication is that the architecture- and data-independent nature of the algorithm allows for rapid screening of model sparsity at initialization, circumventing the need for repeated cycles of training or any access to training data.
5. Objective, Constraint, and Ranking Proxy
The objective in SynFlow pruning is to maximize the total retained synaptic flow
subject to the constraint that exactly parameters are retained ( = compression ratio; = total parameters). While the optimal global mask maximizing would be combinatorially expensive, the algorithm employs the score as an efficient proxy for this selection.
6. Computational Aspects and Implementation
SynFlow requires, per iteration, a forward pass to compute the path-norm and a backward pass for Synaptic Flow scores. With iterations, the total computational cost is $100$ forward+backward passes, which is both lightweight and independent of dataset size. Competing methods such as SNIP or GraSP require number of samples number of classes per gradient call—typically – passes.
Batch normalization (BN) breaks activation homogeneity; thus, SynFlow must be run with BN in evaluation mode, freezing statistics to ensure the correctness of the gradients and conservation. Numerical underflow or overflow of is possible for deep architectures; single-precision arithmetic sufficed for VGG/ResNet-scale models, but deeper models may require layer rescaling or double precision.
7. Empirical Evaluation and Comparative Performance
SynFlow was evaluated on twelve network–dataset pairs: VGG-11, VGG-16, ResNet-18, WideResNet-18 crossed with CIFAR-10, CIFAR-100, and Tiny ImageNet. Compression ratios up to ( sparsity) were tested. Baselines included random pruning, magnitude-based pruning, SNIP, and GraSP.
Key results include:
- In the high-compression regime (), SynFlow consistently outperformed or matched all other initialization pruning methods, with no sudden collapse in test accuracy.
- In the moderate regime (), SynFlow was competitive, sometimes narrowly edged out at low sparsity by SNIP or GraSP, but those methods suffered catastrophic collapse at higher sparsification.
- Iterated SNIP—a contemporary approach—also suffered early collapse and required at least backward passes.
The evidence indicates SynFlow is the first provably layer-collapse-free, data-agnostic, globally masking pruning method at initialization capable of achieving state-of-the-art results at extreme sparsities (Tanaka et al., 2020).