Papers
Topics
Authors
Recent
Search
2000 character limit reached

Synaptic Flow: Data-Agnostic Pruning

Updated 23 January 2026
  • Synaptic Flow is a data-agnostic pruning algorithm that leverages conservation laws to maintain synaptic strength across layers.
  • It computes per-parameter saliency scores using gradient and weight products, then iteratively prunes the lowest scoring parameters based on an exponential schedule.
  • Empirical results demonstrate that SynFlow achieves extreme sparsity without catastrophic layer collapse, ensuring network trainability across diverse architectures.

Synaptic Flow (SynFlow) is a theoretically grounded, data-agnostic, iterative network pruning algorithm that preserves the total flow of synaptic strengths in neural networks at initialization. SynFlow addresses a central challenge in data-free pruning: how to identify highly sparse, trainable subnetworks at initialization while rigorously avoiding the phenomenon of layer-collapse—whereby an entire layer is pruned, rendering the network untrainable. The algorithm leverages conservation laws for synaptic saliency scores derived from homogeneous network activations and operates by iteratively ranking and removing parameters based on their Synaptic Flow (S_SF) scores. SynFlow achieves state-of-the-art performance among initialization pruning algorithms across diverse architectures and datasets, sustaining accuracy at extreme sparsities and providing formal guarantees against layer-collapse (Tanaka et al., 2020).

1. Conservation Laws for Synaptic Saliency

The cornerstone of SynFlow is its reliance on score functions of the form S(θ)=(R/θ)θS(\theta) = (\partial R/\partial \theta) \circ \theta for a scalar function RR defined on the network’s outputs. When the activation function ϕ\phi is homogeneous (i.e., ϕ(x)=ϕ(x)x\phi(x) = \phi'(x)x, as with ReLU), the following exact conservation laws apply:

  • Neuron-wise Conservation (Theorem 1):

For each neuron jj, the sum of the synaptic saliency scores over its incoming weights and separately over its outgoing weights are equal. Formally, denoting incoming weights as θjin\theta^{in}_{j\cdot} and outgoing as θjout\theta^{out}_{\cdot j}:

Sjin=k(Rθjkin)θjkin,Sjout=i(Rθijout)θijoutS^{in}_j = \sum_k \left(\frac{\partial R}{\partial \theta^{in}_{jk}}\right) \theta^{in}_{jk},\quad S^{out}_j = \sum_i \left(\frac{\partial R}{\partial \theta^{out}_{ij}}\right) \theta^{out}_{ij}

with Sjin=SjoutS^{in}_j = S^{out}_j.

  • Network-wise (Cut) Conservation (Theorem 2):

For any cut CC of parameters separating inputs and outputs,

θC(Rθ)θ=Ry,y=Rx,x\sum_{\theta \in C} \left(\frac{\partial R}{\partial \theta}\right)\theta = \langle \frac{\partial R}{\partial y}, y \rangle = \langle \frac{\partial R}{\partial x}, x \rangle

For layers specifically, the total synaptic saliency score assigned to each layer is invariant across layers. The implication is that single-shot or global threshold pruning strategies—such as those used in SNIP and GraSP—produce an outsized removal of parameters from wide layers, increasing the risk of total layer collapse.

2. The Iterative Synaptic Flow Pruning Algorithm

SynFlow is instantiated by specifying RR as a “path” objective, denoted RSFR_{SF}, which sums the products of the absolute values of weights along every input–output path: RSF(θ)=1(l=1Lθ[l])1=network absolute-weight path-sum1R_{SF}(\theta) = 1^\top \left(\prod_{l=1}^L |\theta^{[l]}| \right) 1 = \|\,\text{network absolute-weight path-sum}\,\|_1 The per-parameter Synaptic Flow score is

SSF(θ)=(RSFθ)θS_{SF}(\theta) = \left(\frac{\partial R_{SF}}{\partial \theta}\right) \circ \theta

These scores are non-negative and satisfy both neuron-wise and cut-wise conservation. Pruning proceeds iteratively according to the following process:

1
2
3
4
5
6
7
8
9
Input: untrained network f(x; θ₀), compression ratio ρ, number of iterations n
0. Initialize mask μ ← all-ones (retain all parameters)
For k = 1…n:
    1. θ ← μ ∘ θ₀                      # Apply current mask
    2. R ← 1^T (∏_{l=1}^L |θ^{[l]}|) 1 # SynFlow objective (path-sum)
    3. S ← (∂R/∂θ) ∘ θ                 # Synaptic Flow score
    4. τ ← percentile at (1 − ρ^{-k/n}) of S
    5. μ ← (S > τ)                     # Prune the lowest scoring parameters
Return f(x; μ ∘ θ₀)

Key algorithmic features include the exponential pruning schedule to ensure, at each iteration, the pruning step does not remove more total score than exists in any one layer, thereby guaranteeing no premature collapse of any layer.

3. Mathematical Formulation of the Synaptic Flow Score

In a fully connected network with layers W[1],...,W[L]{W^{[1]}, ..., W^{[L]}}, the score for weight wij[l]w^{[l]}_{ij} is: SSF(wij[l])=[1k=l+1LW[k]]iwij[l][k=1l1W[k]1]jS_{SF}(w^{[l]}_{ij}) = \left[1^\top \prod_{k=l+1}^{L} |W^{[k]}|\right]_i \cdot |w^{[l]}_{ij}| \cdot \left[ \prod_{k=1}^{l-1} |W^{[k]}| 1 \right]_j This expresses the parameter’s centrality to the propagation of absolute weight “flow” through all possible input–output paths that traverse wij[l]w^{[l]}_{ij}. All scores SSF(θ)S_{SF}(\theta) are guaranteed to be non-negative, strictly positive for θ0\theta \neq 0, and to be conserved across layers.

4. Critical Compression and Layer-Collapse Guarantees

Theoretical analysis (Theorem 3) established the sufficiency of three properties for avoiding layer-collapse: (i) strictly positive scores, (ii) layer-wise conservation, and (iii) pruning less total score per iteration than the cut size of any single layer. SynFlow meets all criteria, thus is provably layer-collapse-free. This implies the algorithm can compress a network down to one parameter per layer—achieving maximal critical compression—without catastrophic loss of trainability or representational capacity.

A plausible implication is that the architecture- and data-independent nature of the algorithm allows for rapid screening of model sparsity at initialization, circumventing the need for repeated cycles of training or any access to training data.

5. Objective, Constraint, and Ranking Proxy

The objective in SynFlow pruning is to maximize the total retained synaptic flow

maxθ keptSSF(θ)\max \sum_{\theta\ \text{kept}} S_{SF}(\theta)

subject to the constraint that exactly N/ρN/\rho parameters are retained (ρ\rho = compression ratio; NN = total parameters). While the optimal global mask μ{0,1}N\mu \in \{0,1\}^{N} maximizing 1(lμθ0)11^{\top} \left(\prod_{l} | \mu \circ \theta_0 | \right) 1 would be combinatorially expensive, the algorithm employs the SSFS_{SF} score as an efficient proxy for this selection.

6. Computational Aspects and Implementation

SynFlow requires, per iteration, a forward pass to compute the path-norm and a backward pass for Synaptic Flow scores. With n=100n=100 iterations, the total computational cost is $100$ forward+backward passes, which is both lightweight and independent of dataset size. Competing methods such as SNIP or GraSP require \langlenumber of samples10×\rangle \sim 10 \times number of classes per gradient call—typically 10001\,0001000010\,000 passes.

Batch normalization (BN) breaks activation homogeneity; thus, SynFlow must be run with BN in evaluation mode, freezing statistics to ensure the correctness of the gradients and conservation. Numerical underflow or overflow of RSFR_{SF} is possible for deep architectures; single-precision arithmetic sufficed for VGG/ResNet-scale models, but deeper models may require layer rescaling or double precision.

7. Empirical Evaluation and Comparative Performance

SynFlow was evaluated on twelve network–dataset pairs: VGG-11, VGG-16, ResNet-18, WideResNet-18 crossed with CIFAR-10, CIFAR-100, and Tiny ImageNet. Compression ratios up to ρ=10000\rho = 10\,000 (99.99%99.99\% sparsity) were tested. Baselines included random pruning, magnitude-based pruning, SNIP, and GraSP.

Key results include:

  • In the high-compression regime (ρ>30\rho > 30), SynFlow consistently outperformed or matched all other initialization pruning methods, with no sudden collapse in test accuracy.
  • In the moderate regime (ρ<30\rho < 30), SynFlow was competitive, sometimes narrowly edged out at low sparsity by SNIP or GraSP, but those methods suffered catastrophic collapse at higher sparsification.
  • Iterated SNIP—a contemporary approach—also suffered early collapse and required at least 10001\,000 backward passes.

The evidence indicates SynFlow is the first provably layer-collapse-free, data-agnostic, globally masking pruning method at initialization capable of achieving state-of-the-art results at extreme sparsities (Tanaka et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Synaptic Flow (SynFlow).