Orthogonal Attractor States in Neural Networks

Updated 28 November 2025

Orthogonal attractor states are emergent neural representations in attractor networks characterized by near mutual orthogonality, enabling efficient memory storage and pattern completion.
The Free Energy Principle guides network dynamics along with Hebbian/anti-Hebbian learning to naturally enforce orthogonality and enhance generalization.
Empirical measures such as Pearson correlations and angular deviations validate that attractor states form an approximate orthonormal basis, reducing redundancy.

Orthogonal attractor states are emergent neural representations in attractor networks characterized by (approximate) mutual orthogonality, thereby maximizing representational capacity while minimizing redundancy. These states are fundamental to both biological and artificial intelligence systems, serving as prototypical explanations for efficient memory storage, pattern completion, and generalization within complex dynamical systems. Recent work demonstrates how such states can arise naturally from the Free Energy Principle (FEP) applied to random dynamical systems, without requiring explicitly imposed learning or inference rules (Spisak et al., 28 May 2025).

1. Variational Free-Energy Principle and Attractor Networks

The Free Energy Principle formulates the dynamics of complex systems in terms of the minimization of variational free energy. For a system with internal states $\mu$ , sensory states $s$ , active states $a$ , and external states $\eta$ , the FEP posits the minimization of the variational free energy functional:

$F(s, a, \mu) = \mathbb{E}_{q_\mu(\eta)}\left[\ln q_\mu(\eta) - \ln p(s, a, \eta)\right]$

where $q_\mu(\eta)$ is the approximate posterior over external states, and $p(s,a,\eta)$ denotes the generative model (joint probability of all states). Internal states $\mu$ undergo dynamics

$\dot{\mu} = -\nabla_\mu F(s, a, \mu)$

such that $\mu$ infers the most likely causes $\eta$ given sensory and active states, enforcing a Markov-blanket structure that maintains separation between internal and external dynamics. When applied to recurrent networks, this framework yields attractor landscapes where local minima correspond to stable memory or concept states.

2. Inference Dynamics: Hopfield-like State Updates

For a network node $\sigma_i$ , local free-energy minimization can be recast as

$F_i = D_{KL}\left[q(\sigma_i)\,\|\,p(\sigma_i)\right] - \mathbb{E}_{q(\sigma_i)}\left[\ln p(\sigma_{\backslash i}\mid\sigma_i)\right]$

where $q(\sigma_i)$ and $p(\sigma_i)$ are continuous Bernoulli distributions each parameterized by a log-odds bias $b$ . Free-energy gradients yield the nodal update

$\sigma_i \leftarrow L\left(b_i + \sum_j J_{ij} \sigma_j\right)$

where $L(\cdot)$ denotes the Langevin function, providing a sigmoid-like activation. Thus, each $\sigma_i$ dynamically adjusts to the network's current state, reminiscent of Hopfield network inference but generalized under the FEP formalism (Spisak et al., 28 May 2025).

3. Hebbian/Anti-Hebbian Learning and Emergence of Orthogonality

Learning proceeds as a stochastic, online Hebbian-contrastive process by minimizing free energy with respect to the coupling $J_{ij}$ :

$\Delta J_{ij} \propto \sigma_i \sigma_j - L\left(b_i + \sum_k J_{ik}\sigma_k\right) \sigma_j$

where the first (Hebbian) term reflects observed correlations and the second (anti-Hebbian) subtracts predicted correlations. This learning rule is formally equivalent to Sanger's rule for online principal component analysis (PCA), which is known to enforce mutual orthogonality in learned components. As such, new patterns are stored in the subspace orthogonal to already-learned attractors, ensuring minimal overlap between representations (Spisak et al., 28 May 2025).

4. Analytical Mechanism for Attractor Orthogonalization

A key result is that attractors $\sigma^{(\mu)}$ arise as modes of the steady-state distribution:

$p(\boldsymbol{\sigma}) \propto \exp\left\{\sum_i b_i \sigma_i + \frac{1}{2}\sum_{i,j}J^\dagger_{ij} \sigma_i \sigma_j\right\}$

where $J^\dagger = \frac{1}{2}(J + J^\top)$ . The learning dynamics ensure that each new attractor is stored in the space orthogonal to previous attractors, and the weight matrix evolves as

$J \approx \sum_{\mu=1}^M \sigma^{(\mu)} \sigma^{(\mu)\top}$

such that $\sigma^{(\mu)} \cdot \sigma^{(\nu)} \approx 0$ for $\mu \ne \nu$ . Consequently, the coupling matrix $J$ becomes symmetric, with the attractors forming an approximate orthonormal basis. This orthogonalization increases memory capacity and reduces redundancy, tightly linking to minimization of the FEP's complexity term $D_{KL}[q\,\|\,p]$ (Spisak et al., 28 May 2025).

5. Structure and Spectral Properties of the Coupling Matrix

At convergence, only the symmetric interaction matrix $J^\dagger$ matters:

$J^\dagger = Q\Lambda Q^\top$

where $Q$ is the orthonormal matrix whose columns are the attractor vectors $[\sigma^{(1)},..., \sigma^{(M)}]$ and $\Lambda$ is the diagonal eigenvalue matrix. Each attractor thus becomes an eigenvector corresponding to a unique eigenvalue, ensuring global orthogonality and diagonalizability of $J^\dagger$ . The network's stationary distribution therefore reflects a decomposition into non-overlapping, maximally distinct modes (Spisak et al., 28 May 2025).

6. Empirical Measurements of Orthogonality

Orthogonality is empirically validated by assessing pairwise Pearson correlations $r_{\mu\nu} = \mathrm{corr}(\sigma^{(\mu)}, \sigma^{(\nu)})$ and the associated angular distances $\theta_{\mu\nu} = \arccos(r_{\mu\nu})$ . The mean-squared angular deviation from $90^\circ$ ,

$\Delta^2 = \frac{2}{M(M-1)}\sum_{\mu<\nu}\left(\theta_{\mu\nu}-\frac{\pi}{2}\right)^2$

quantifies the approach to orthogonality. Simulations demonstrate that, even for correlated inputs (e.g., $r=0.77$ ), learned attractors evolve to $r\approx -0.19$ , with $\theta\approx 100^\circ\pm15^\circ$ and low $\Delta^2$ . In more complex tasks (e.g., handwritten-digit recognition), grid searches over hyperparameters reveal regimes where attractors form an approximate orthonormal basis—mean pairwise correlations near zero and angular distributions tightly clustered around $90^\circ$ (Spisak et al., 28 May 2025).

7. Functional Implications and Theoretical Significance

The emergence of orthogonal attractor states through FEP-driven Hebbian/anti-Hebbian learning offers several computational advantages. Such networks maximize retrieval and one-shot generalization, efficiently span the input subspace, and enhance the mutual information between hidden causes and observable data. The self-organization of orthogonality provides a principled, biologically plausible route to high-capacity memory architectures—eliminating the need for explicitly designed orthogonalization or ad hoc training rules. This unifying theory links the FEP, emergent attractor dynamics, and orthogonal representations, providing novel insights for both neuroscience and artificial intelligence (Spisak et al., 28 May 2025).

PDF Markdown Chat (Pro)

References (1)

Self-orthogonalizing attractor neural networks emerging from the free energy principle (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Orthogonal Attractor States.