Hebbian Principal Component Analysis

Updated 7 March 2026

HPCA is a biologically plausible algorithm that extracts the principal subspace via local Hebbian and anti-Hebbian plasticity rules.
It implements online unsupervised learning with both linear and nonlinear rules, bridging classical PCA and modern neural architectures.
HPCA supports both single-layer and deep networks, offering rapid convergence, efficient feature extraction, and resource-constrained adaptation.

Hebbian Principal Component Analysis (HPCA) denotes a class of biologically-plausible, online learning algorithms for principal subspace extraction based on local Hebbian and anti-Hebbian plasticity rules. HPCA instantiates unsupervised learning of the principal components or principal subspace of high-dimensional data via neural architectures whose synaptic updates depend only on coincident activity of pre- and postsynaptic units, as opposed to requiring batch optimization or nonlocal computation. HPCA encompasses both linear and nonlinear rules, and supports both single-layer and multi-layer (deep) architectures. Significant connections exist to classical machine learning algorithms (e.g., Oja’s rule, Generalized Hebbian Algorithm), network neuroscience, and applications in deep networks for efficient feature extraction and transfer learning (Pehlevan et al., 2015, Pehlevan et al., 2015, Lagani et al., 2020, Lagani et al., 2022, Lipshutz et al., 19 Jan 2026).

1. Formal Objectives and Cost Functions

The canonical HPCA objective is to find an encoding $y = F x$ (for weights or “filters” $F$ ) such that the output dimension spans the top- $k$ eigenspace of the data covariance. The formulation can arise directly from several cost functions:

Similarity matching (strain/MDS objective):

$\min_Y \| X^\top X - Y^\top Y \|_F^2,$

where solutions $Y$ span a principal subspace of $X$ , though additional constraints (e.g., decorrelation penalty) are required for strict PCA bases (Pehlevan et al., 2015, Pehlevan et al., 2015).

Variance maximization:

$\underset{W^\top W = I_k}{\max} \operatorname{Tr}(W^\top C W)$

with $C = E[xx^\top]$ projects data via $y = W^\top x$ onto orthonormal directions maximizing projected variance (Johard et al., 2017).

Reconstruction error:

$L(F) = E[\| x - F^\top F x\|^2]$

or, for sequential learning, $F$ 0 (Lagani et al., 2020, Lagani et al., 2022).

The inclusion of decorrelation penalties (e.g., off-diagonal suppression in $F$ 1) in the similarity-matching objective enforces extraction of principal components with orthogonal outputs (Pehlevan et al., 2015, Pehlevan et al., 2015). Nonlinear extensions compose the output with a nonlinearity $F$ 2, supporting richer codes (Lagani et al., 2020, Lagani et al., 2022).

2. Derivation of Local Hebbian and Anti-Hebbian Update Rules

A defining characteristic of HPCA is that synaptic updates are functions of only locally available information — namely, the pre- and postsynaptic activities and, for laterally connected networks, activity of local neighbors. The architecture typically features:

Feedforward Hebbian term: Reinforcement of synapse if $F$ 3 and $F$ 4 are coactive; update proportional to $F$ 5 for weights $F$ 6 (Pehlevan et al., 2015).
Lateral anti-Hebbian term: Depression of lateral weights if $F$ 7 and $F$ 8 cofire; update proportional to $F$ 9 for $k$ 0 (Pehlevan et al., 2015, Pehlevan et al., 2015).

A general update (for $k$ 1-th output neuron) takes the form

$k$ 2

where $k$ 3 is a feedforward weight vector and $k$ 4 are lateral (inhibitory) weights. In the nonlinear variant, $k$ 5 is replaced by $k$ 6 (Lagani et al., 2020, Lagani et al., 2022). For deflationary or sequential extraction, only earlier component outputs enter the decorrelation sum, as in the “Simple Hebbian PCA” (Johard et al., 2017).

Distinct algorithms emerge from particular choices:

Sanger’s/GHA rule: Incorporates lateral weights to enforce orthogonality; requires $k$ 7 lateral parameters (Lagani et al., 2020, Johard et al., 2017).
Simple Hebbian PCA: A stacked node architecture omitting all lateral weights, orthogonalizing by subtracting explicit projections of earlier outputs (Johard et al., 2017).

3. Network Architectures and Algorithmic Implementations

3.1 Single-Layer Networks

HPCA rules are implementable in single-layer architectures, with each principal neuron receiving weighted input from the data and — optionally — lateral inputs from other neurons. The update rules can be written as:

Activity dynamics:

$k$ 8

for fixed-point relaxation (Pehlevan et al., 2015, Pehlevan et al., 2015).

Weights update (local):

$k$ 9

$\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 0

where $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 1 is a running estimate of postsynaptic activity (Pehlevan et al., 2015, Pehlevan et al., 2015).

3.2 Layer-Wise and Deep Extensions

In neural networks, HPCA can be applied as a layer-wise unsupervised pre-training algorithm, with filters trained by the local HPCA rule (including nonlinearities as desired) (Lagani et al., 2022, Lagani et al., 2020). After unsupervised HPCA pre-training, the entire network can be fine-tuned by supervised SGD or hybrid local-global rules.

Sample pseudocode for the nonlinear HPCA rule per convolutional layer:

$Y$ 5 (Lagani et al., 2022).

4. Convergence Properties and Global Stability

Global convergence of HPCA dynamics has been established:

Two-phase convergence: In phase 1, the neural filters orthonormalize exponentially fast; in phase 2, the system follows a gradient flow on a nonconvex potential, whose global minima correspond to the true PCA subspace (Lipshutz et al., 19 Jan 2026).
- Orthonormalization manifold: Solutions are driven to $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 2.
- Gradient descent on potential: On the orthonormal set, $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 3 follows $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 4 with $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 5.
Attractor structure: Only the top- $\min_Y \| X^\top X - Y^\top Y \|_F^2,$ 6 eigenspace is linearly stable; all other nonprincipal equilibria are unstable and of zero measure (Lipshutz et al., 19 Jan 2026, Pehlevan et al., 2015, Pehlevan et al., 2015).
Biological interpretability: The dynamics admit only local (biologically plausible) information, and such networks can “drop out” underutilized units and recruit new ones if the input distribution drifts (Pehlevan et al., 2015).

5. Algorithmic Complexity, Parameterization, and Comparison

HPCA’s per-iteration computational and communication requirements depend on the architectural instantiation:

Method	Main Weights	Orthogonalization	Communication
Batch SVD	$\min_Y \\| X^\top X - Y^\top Y \\|_F^2,$ 7	Full matrix op.	Centralized
Oja/GHA	$\min_Y \\| X^\top X - Y^\top Y \\|_F^2,$ 8	Lateral weights	Local
Simple Hebbian PCA	$\min_Y \\| X^\top X - Y^\top Y \\|_F^2,$ 9	Relay outputs	$Y$ 0/step

Parameter reduction: Methods that omit lateral weights, such as Simple Hebbian PCA, halve the storage demands compared to GHA (Johard et al., 2017).
Distributed/streaming suitability: Feedforward-only HPCA rules (without lateral weights) are well-adapted to streaming or decentralized scenarios (Johard et al., 2017).
Normalization: Many rules require explicit or implicit normalization of each weight vector ( $Y$ 1) to constrain growth (Lagani et al., 2020, Johard et al., 2017).

6. Practical Applications and Empirical Results

HPCA’s locality, online adaptability, and plugin compatibility with deep networks support several practical uses:

Feature extraction for CBIR: Pre-training convnet layers with nonlinear HPCA rules boosts mean average precision in content-based image retrieval on CIFAR-10 and CIFAR-100 datasets, especially in low-label regimes (HPCA-pretrain mAP up to +3.6% over VAE or no-pretrain) (Lagani et al., 2022).
Hybrid deep architectures: Replacement of higher layers in CNNs with HPCA-trained layers yields competitive performance to full backpropagation at reduced training cost (e.g., Conv5+classifier by HPCA: 83.9% vs 84.95% on CIFAR-10) (Lagani et al., 2020).
Speed: HPCA convergence (for practical DNNs) typically occurs in 1–2 epochs, versus 10–20 for supervised SGD (Lagani et al., 2020).
Transfer learning and resource-constrained training: Efficient shallow models can be trained on devices with restricted resources or for fast adaptation (Lagani et al., 2020).
Neuroscience predictions: The dropout of neurons with low cumulative activity and anti-Hebbian suppression of correlated outputs map to observed biological phenomena such as synaptic silencing or inhibitory potentiation (Pehlevan et al., 2015).

Empirically, under i.i.d. Gaussian data, HPCA convergence is expected to scale as $Y$ 2 steps; recovered subspace error decays as $Y$ 3 for standard learning rates and eigenvalue gaps (Johard et al., 2017). However, several works note the absence of direct experimental benchmarks in some theoretical proposals, with such validation identified as future work (Johard et al., 2017).

7. Extensions, Limitations, and Open Questions

Limitations: When replacing mid-network layers in deep architectures, HPCA’s performance can lag behind standard backpropagation. Kernel extensions and ICA-based rules are suggested as future enhancements (Lagani et al., 2020). Theoretical analysis for adversarial robustness and generalization also remains open.
Nonstationarity adaptation: Exponential forgetting in the strain cost ( $Y$ 4) allows HPCA to track drifting statistics, beneficial for nonstationary environments (Pehlevan et al., 2015).
Comparison to classical methods: HPCA strictly enforces local learning and exact PCA subspace extraction, distinguishing it from models that perform only partial decorrelation or require global information.
Stability analyses: Global convergence has been established for the matched-timestep ODE case; further work explores timescale separation and robustness to input non-Gaussianity or higher-order dependence (Lipshutz et al., 19 Jan 2026).
Hybrid and supervised variants: Incorporation of supervised Hebbian rules in final layers, and combination with standard SGD/backprop for end-to-end training (Lagani et al., 2020, Lagani et al., 2022).

Further research directions include extension to kernelized or independent component analysis, scaling HPCA to very large-scale or heterogeneous data, and integration with neuromorphic hardware for energy-efficient processing (Lagani et al., 2020).