L2-Norm Synaptic Scaling
- L2-norm-based synaptic scaling is a normalization technique that maintains a fixed Euclidean norm for synaptic weights, ensuring balanced inputs and outputs.
- It utilizes multiplicative normalization, projection, or iterative reparameterization to enhance learning dynamics and improve generalization in both artificial and spiking neural networks.
- Practical applications include improved unsupervised STDP learning in SNNs and efficient network optimization in deep ReLU networks, leading to faster convergence and better performance.
L2-norm-based synaptic scaling is a class of normalization and homeostatic plasticity mechanisms that explicitly force synaptic weight vectors, or local collections of a neuron's input or output synapses, to maintain a fixed L2-norm. This operation is well defined in both artificial neural networks (ANNs) and biologically inspired @@@@1@@@@ (SNNs) and can be exact (by projection or multiplicative normalization), block-wise iterative (as in equi-normalization for entire networks), or arise from gradient-based minimization of an L2-regularized objective. L2-norm-based scaling preserves positive rescaling invariances of feedforward neural architectures, enforces balanced synaptic statistics at both neuron and network scales, and can substantially improve learning dynamics, generalization, and homeostasis.
1. Mathematical Definition and Mechanism
For a vector of synaptic weights associated with a given neuron or layer, the L2-norm is defined as:
L2-norm-based scaling applies a multiplicative normalization step to maintain a target norm :
This operation can be performed strictly after any local plasticity update (e.g., pairwise spike-timing-dependent plasticity/STDP in SNNs), or as part of an iterative blockwise reparameterization acting globally on the entire network (Touda et al., 16 Jan 2026, Stock et al., 2019).
When applied to a feedforward or recurrent ANN, each hidden neuron can have its incoming weights scaled by a positive scalar and outgoing weights by without changing the overall function in ReLU or BiLU state regimes. The minimization of the global L2 norm of the weights within this equivalence class can be solved exactly and yields a canonical balanced representation (Baldi et al., 2024, Stock et al., 2019).
2. Local and Global Synaptic Balance
Synaptic balance under the L2 norm condition is realized when, for every neuron ,
To achieve balance, local rescalings are applied, multiplying all incoming weights of neuron by and all outgoing ones by . The optimal rescaling factor enforcing balance is
Applying these balancing moves stochastically or sequentially induces the network to converge to a unique balanced state that minimizes the global L2-regularized cost, as proven via a strictly convex optimization over the rescaling variables with architecture-dependent linear constraints (Baldi et al., 2024). This canonicalization is function-preserving and produces a unique representative for each class of rescaling-equivalent weights.
In the equi-normalization scheme for deep ReLU networks, block coordinate minimization is employed to cycle through layers, updating local scaling factors and rescaling layer weights until the global L2 cost is minimized within the equivalence class (Stock et al., 2019).
3. Application in Spiking Neural Networks
In SNNs implementing unsupervised plasticity, L2-norm-based synaptic scaling has been shown to improve classification performance and stabilize learning, especially in winner-take-all network topologies. The procedure consists of (1) a local STDP update per synapse at the millisecond scale, followed by (2) a normalization step that projects the entire afferent weight vector for a postsynaptic neuron onto the L2 sphere of radius :
For each excitatory neuron, after STDP,
- Accumulate STDP-induced ,
- Set .
This approach, with careful tuning of the L2 target and STDP time constants, led to MNIST test accuracy of 88.84% and Fashion-MNIST test accuracy of 68.01% after one epoch (400 excitatory/inhibitory neurons) (Touda et al., 16 Jan 2026).
Comparative Effectiveness
- L2-norm scaling outperforms L1-norm scaling on both MNIST and Fashion-MNIST datasets, generating sharper class-selective receptive fields and more stable dynamics during unsupervised STDP learning.
- L2-scaling avoids excessive penalization of large synapses compared to L1-scaling, providing better preservation of relative weights and preventing the collapse or explosion of synaptic strengths (Touda et al., 16 Jan 2026).
4. Integration in Artificial Neural Network Optimization
Modern deep neural networks, especially those utilizing ReLU activations, possess a positive scaling invariance: hidden units can have incoming weights scaled by and outgoing weights by without affecting their function. L2-norm-based synaptic scaling (e.g., equi-normalization) exploits this by driving the global weight decay term to its minimum over the equivalence class:
For a deep feedforward network, one minimizes
using multiplicative layer-wise rescalings such that each rescaled weight block satisfies balance constraints. This approach, inspired by the Sinkhorn-Knopp algorithm, involves cycling over layers, updating local scaling vectors, and rescaling weight matrices using closed-form updates (Stock et al., 2019).
ENorm can be efficiently interleaved with SGD steps. After each mini-batch SGD update, a single ENorm cycle suffices to maintain or restore balance, and momentum buffers are transformed under the same scaling. No additional hyperparameters are required.
Empirically, ENorm provides an effective batch-free normalization alternative to BN/GN, achieving or exceeding their performance especially for small batch sizes, with lower computational/memory cost (Stock et al., 2019).
| Normalization | Operations cost | Batch dependency | Typical accuracy gain (ResNet-18/ImageNet) |
|---|---|---|---|
| ENorm | No | Outperforms BN at | |
| BN | Yes | Strong for large batch | |
| GN | No | Stable, slightly below ENorm |
5. Theoretical Properties and Convergence
The L2 balancing operation forms a strictly convex program whose unique global optimal state is characterized by all neurons having matched incoming and outgoing L2 costs. Local balancing updates strictly reduce the regularizer unless balance already holds; sequences of updates converge monotonically to the global optimum (Baldi et al., 2024, Stock et al., 2019).
Alternating balancing sweeps with gradient-based optimization preserves this convergence. Empirically, networks pre-balanced or periodically balanced in this manner show accelerated loss minimization and improved generalization, with up to 20% fewer epochs to reach a given loss under interleaved balancing (Baldi et al., 2024).
In SNNs, immediate application of L2-projection after each STDP step ensures that STDP-induced positive feedback does not lead to uncontrolled growth, but instead facilitates emergence of class-selective, stable representations (Touda et al., 16 Jan 2026).
6. Biological and Functional Interpretations
L2-norm-based synaptic scaling arises naturally in biologically motivated synaptic plasticity models. In the natural-gradient framework, weight-dependent multiplicative scaling terms can conserve the L2 norm under appropriate balance conditions (Kreutzer et al., 2020). The rule
admits a regime where the sum of weight changes vanishes, thereby preserving the L2 norm. Heterosynaptic homeostatic components in this rule (the and terms) realize both uniform and proportional compensatory plasticity, which help enforce synaptic stability and norm control (Kreutzer et al., 2020).
This perspective supports L2 scaling as both computationally and biologically relevant, ensuring alignment between functional (performance-oriented) and homeostatic (structure-preserving) objectives.
7. Practical Recommendations and Empirical Insights
- L2-norm synaptic scaling is particularly suited to unsupervised SNNs trained via STDP, as it enables stable learning and emergent selectivity without backpropagation or supervision (Touda et al., 16 Jan 2026).
- In deep ANNs, equi-normalization or local balancing interleaved with standard SGD regularization yields improved conditioning, accelerates convergence, and reduces batch size or memory requirements compared to traditional normalization schemes (Stock et al., 2019, Baldi et al., 2024).
- The target L2 norm should be empirically tuned; excessively low values suppress feature selectivity, while high values can cause saturation or instability.
- In both SNNs and ANNs, L2 scaling fixes the scaling gauge redundancy in parameter space, leading to improved generalization and canonical representations.
- For implementation, apply multiplicative L2 normalization immediately after local plasticity updates (in SNNs) or interleaved with training epochs/SGD steps (in ANNs).
References
- "Equi-normalization of Neural Networks" (Stock et al., 2019)
- "Effects of Introducing Synaptic Scaling on Spiking Neural Network Learning" (Touda et al., 16 Jan 2026)
- "A Theory of Synaptic Neural Balance: From Local to Global Order" (Baldi et al., 2024)
- "Natural-gradient learning for spiking neurons" (Kreutzer et al., 2020)