Continuous-State Hopfield Networks

Updated 18 November 2025

Continuous-state Hopfield networks are continuous-valued associative memory models that generalize binary Hopfield models using natural gradient flows on a Riemannian manifold.
They integrate deterministic and stochastic dynamics, leveraging mirror descent, Wasserstein gradient flows, and proximal algorithms to optimize memory retrieval.
Modern implementations achieve exponential storage capacity with robust attractor landscapes, linking attention mechanisms and thermodynamic principles.

A continuous-state Hopfield network generalizes classical binary Hopfield associative memories to systems where each neuron’s state evolves continuously. These models encompass both continuous-time recurrent dynamical systems and a large class of discrete- or continuous-time, continuous-valued memory architectures. Modern research has established diverse geometric, thermodynamic, and algorithmic perspectives on their dynamics, storage capacity, and attractor structure, including links to natural gradient flows, Wasserstein geometry, nonequilibrium thermodynamics, and attention mechanisms.

1. Deterministic Continuous-State Hopfield Dynamics

The core deterministic model comprises $n$ neurons with state $x \in (0,1)^n$ and hidden state $x_H \in \mathbb{R}^n$ , minimizing a smooth cost function $f(x) \geq 0$ . The continuous-time dynamics are given by: $\frac{d x_H}{d t} = -\nabla f(x), \qquad x = \sigma(x_H),$ where each $\sigma_i$ is a strictly increasing $C^1$ homeomorphism $\mathbb{R} \to (0,1)$ . Eliminating $x_H$ yields an ODE on $x$ : $\frac{d x}{dt} = -G(x)^{-1} \nabla f(x)$ with $G(x) = \mathrm{diag}\left(1 /\sigma'_i(\sigma^{-1}_i(x_i))\right)$ . This yields a natural gradient descent over the Riemannian manifold $M=(0,1)^n$ under metric tensor $G(x)$ , monotonic decrease of the Lyapunov energy $E(x)=f(x)$ , and convergence to equilibrium points. The geometric structure is governed by the choice of activation. For example, with

$\sigma_i(u) = \tfrac12\left[ \tanh(\beta_i(u - \tfrac12)) + 1 \right]$

the induced metric is

$g_{ii}(x) = \frac{1}{2\beta_i x_i(1-x_i)},$

making trajectories geodesics of an explicitly non-Euclidean metric (Halder et al., 2019).

Natural gradient flow dynamics are equivalent to mirror descent for an appropriate convex mirror map $\psi$ , where each step in dual space preserves the geometric structure. The choice of $\sigma$ thus encodes both memory dynamics and the underlying geometry, enabling flexible control over trajectory structure and stationary points.

2. Stochastic Extensions and Wasserstein Geometry

Introducing isotropic, state-dependent noise at fixed temperature $T=\beta^{-1}$ produces the diffusion machine: $dx = -G^{-1}\nabla E \;dt + \sqrt{2\beta^{-1} G^{-1}}\,dW,$ yielding associated Fokker–Planck dynamics for the state probability density $\rho(x,t)$ : $\partial_t \rho = \nabla \cdot [\rho G^{-1}\nabla f + TG^{-1}\nabla \rho] = \nabla \cdot [\rho G^{-1}\nabla (f + T \log \rho)].$ The corresponding free-energy functional,

$F[\rho] = \int \rho(x) f(x)\,dx + T \int \rho(x)\log \rho(x)\,dx,$

acts as a Lyapunov function for the infinite-dimensional evolution. This evolution is a Wasserstein gradient flow under a ground metric defined by $G(x)$ , providing a variational and geometric framework for understanding probabilistic Hopfield evolution and the long-term structure of state distributions. The squared Wasserstein- $G$ distance between densities,

$W_G^2(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \int d_G(x, y)^2 \,\pi(dx, dy),$

directly links the local geometry of the activation functions to global evolution of ensembles of network states (Halder et al., 2019).

3. Modern Hopfield Networks and Continuous-Time Memory

"Modern" Hopfield networks encode patterns $x_i \in \mathbb{R}^d$ via a log-sum-exponential energy: $E(x) = -\frac{1}{\beta} \log \sum_{i=1}^N \exp(\beta x_i^\top x) + \frac{1}{2}x^\top x + \mathrm{const},$ resulting in a parallel update rule equivalent to transformer-style attention heads: $x^\text{new} = X\, \mathrm{softmax}(\beta X^\top x),$ where $X$ is the matrix of stored patterns (Ramsauer et al., 2020, Schäfl et al., 2022). These systems exhibit single-pattern attractors, metastable subset averages, and global fixed-point attractors, with provably exponential storage capacity in dimension and global convergence. The retrieval error after one update is exponentially small in the pattern separation.

Recent work extends this formulation to continuous-time memories, replacing the discrete sum over memories by an integral: $E(m; \bar{x}(\cdot)) = -\tfrac{1}{\beta} \log \int_0^1 \exp(\beta\,\bar{x}(t)^\top m)\,dt + \tfrac{1}{2}\|m\|^2 + \mathrm{const}$ where $\bar{x}(t)=B^\top\psi(t)$ is a compressed, continuous representation. The dynamics become: $\frac{dm}{d\tau} = \int_0^1 p(t|m)\,\bar{x}(t)\,dt - m$ with the "softmax" replaced by a Gibbs density over the continuum. Empirical evidence shows that such compression retains retrieval quality while reducing computational resources when $N \ll L$ , where $N$ is the number of basis functions and $L$ the original number of patterns (Santos et al., 14 Feb 2025).

4. Nonequilibrium Thermodynamics and Asymmetric Models

Continuous-time Hopfield-like associative memories have been generalized to low-rank, possibly asymmetric CTRNNs, with states $x_i$ evolving via: $\tau_i\,\dot{x}_i = -x_i + \sum_j J_{ij}\,\phi(x_j) + \theta_i + \eta_i(t).$ With odd sigmoid activations (typically $\phi(x)=\tanh(x)$ ), these models encode stored patterns in the coupling matrix $J_{ij}$ , including through low-rank kernels parametrized by an asymmetric matrix $A$ : $J_{ij} = \frac{1}{N}\sum_{a,b}A_{ab} \,\xi_i^a\xi_j^b.$ Symmetric $J$ yields classical Lyapunov dynamics (energy monotonic decrease, fixed-point attractors). Asymmetric $J$ drives the system out of detailed balance, producing positive entropy production in steady state, cyclic or even chaotic attractors, and supports sequence retrieval and complex temporal evolution of macroscopic order parameters. The macroscopic observables (overlaps with stored patterns) satisfy closed deterministic (mean-field) or stochastic (finite $N$ ) ODEs or SDEs, permitting direct paper of entropy, dissipation, and the impact of nonequilibrium driving on memory structure (Aguilera et al., 14 Nov 2025).

5. Chaotic Dynamics and Piecewise-Affine Constructions

Continuous-state Hopfield networks with non-monotone, piecewise-affine activation functions and non-symmetric weight matrices may exhibit provable chaos. In one construction, a discrete-time network with

$x(k+1) = F(x(k)), \qquad F_i(x) = f_i((W x)_i),$

and special $f_i$ (with two breakpoints and non-monotonicity), generates a Cantor-set attractor with sensitive dependence on initial conditions. This construction exploits recent results in the topological dynamics of piecewise contractions and demonstrates that, for appropriate $W$ , the $\omega$ -limit set of typical orbits is a compact, minimal Cantor set. Hence, unlike monotone (gradient-flow) continuous Hopfield networks, these constructions furnish systems with uncountably many, non-periodic but repeatable memory patterns (Pires, 2022).

Table: Deterministic, Stochastic, and Chaotic Continuous-State Hopfield Dynamics

Model Type	Governing Equation/Formulation	Typical Long-Term Behavior
Natural gradient deterministic	$dx/dt = -G(x)^{-1} \nabla f(x)$ (Halder et al., 2019)	Gradient descent to fixed-point attractor
Diffusion (stochastic)	$dx = -G^{-1}\nabla E\,dt + \sqrt{2\beta^{-1}G^{-1}}dW$ (Halder et al., 2019)	Equilibrium measure under $F[\rho]$
Modern (attention-based)	$x^\text{new}=X\,\mathrm{softmax}(\beta X^\top x)$ (Ramsauer et al., 2020, Santos et al., 14 Feb 2025)	Single update to stored/metastable state
Asymmetric CTRNN	$\dot{x}_i = -x_i+\sum_j J_{ij}\phi(x_j)+\theta_i$ (Aguilera et al., 14 Nov 2025)	Limit cycle/chaos if $J$ asymmetric
Piecewise-affine (chaotic)	$x(k+1)=F(x(k))$ , $F_i$ piecewise-affine, non-monotone (Pires, 2022)	Chaotic Cantor-set attractors

6. Proximal Algorithms and Computational Aspects

Continuous-state Hopfield models admit powerful algorithmic implementations:

Proximal Steps: Discretizing natural gradient flows yields variable-metric Moreau–Yosida proximal operators,

$x_{k+1} = \arg\min_{x \in M} \tfrac{1}{2}d_G(x, x_k)^2 + hE(x),$

where $d_G$ is the geodesic distance under $G(x)$ .

JKO Schemes: For measure-valued stochastic dynamics, the Wasserstein gradient flow inspires Jordan–Kinderlehrer–Otto recursions in probability space,

$\rho_k = \arg\min_{\rho \in P_2(M)} \tfrac{1}{2}W_G(\rho, \rho_{k-1})^2 + h F[\rho].$

Efficient numerical solution employs gradient-based inner loops, mirror or natural gradient steps for the deterministic case, and particle or Sinkhorn-based methods for stochastic/diffusion cases (Halder et al., 2019).

In modern architectures, Hopfield and attention modules enable content-based lookup, pooling, and associative retrieval inside deep networks with fast, highly parallelizable updates, and massively increased capacity due to continuous representations (Schäfl et al., 2022, Santos et al., 14 Feb 2025).

7. Capacity, Retrieval, and Attractor Landscape

The attractor structure of continuous-state Hopfield networks extends the classic fixed-point paradigm:

Capacity: Modern continuous-state Hopfield networks store $O(\exp(c d))$ random patterns in $d$ -dimensional space, a profound improvement over $O(d)$ for binary Hopfield models (Ramsauer et al., 2020, Santos et al., 14 Feb 2025).
Retrieval: Fixed-point and continuous-time memory networks retrieve patterns via rapid, typically single-step convergence (or through continuous flow), generalizing subset averaging and supporting robust attractor basins. With high pattern separation, retrieval error decays exponentially with distance to nearest stored pattern.
Attractor types: Beyond fixed points, the landscape includes metastable subset averages and, in asymmetric or non-monotonic cases, cyclic and chaotic attractors. The structure is governed by the geometry of activations, symmetry properties of the weight matrix, and memory compression or continuous extension schemes.

In sum, continuous-state Hopfield networks provide a unifying and highly expressive framework for memory, optimization, attention, and dynamical modeling, encompassing gradient-flow, probabilistic, and even chaotic regimes, with deep ties to modern architectures and theoretical advances in geometry, thermodynamics, and optimization (Halder et al., 2019, Santos et al., 14 Feb 2025, Aguilera et al., 14 Nov 2025, Pires, 2022, Ramsauer et al., 2020, Schäfl et al., 2022).