Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mean-Field Limits of Neural Networks

Updated 4 July 2026
  • Mean-field limits of neural networks are a continuum description that replaces discrete neurons with a probability law over parameters as width becomes infinite.
  • They employ techniques like PDEs, integro-differential equations, and McKean–Vlasov formulations to capture the evolution of network dynamics.
  • This framework provides insights into optimization, finite-width corrections, and global convergence, deepening our understanding of deep learning behavior.

Mean-field limits of neural networks are asymptotic descriptions in which a network with width tending to infinity, together with a compatible scaling of optimization, is replaced by a deterministic evolution of a probability law over parameters, neurons, functions, or paths. In this regime, empirical averages over neurons converge to continuum objects, stochastic gradient dynamics are recast as continuity equations, integro-differential systems, or McKean–Vlasov equations, and the resulting limit is generally nonlinear and feature-learning rather than a fixed-kernel linearization (Sirignano et al., 2019, Golikov, 2020, Araújo et al., 2019).

1. Scaling regimes and state variables

The basic two-layer mean-field model writes

fm(x;θ1,,θm)=m1i=1maiϕ(wiTx),θi=(ai,wi)R1+d0,f_m(x;\theta_1,\dots,\theta_m)=m^{-1}\sum_{i=1}^m a_i\,\phi(w_i^T x), \qquad \theta_i=(a_i,w_i)\in\mathbb R^{1+d_0},

with empirical measure

ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.

The population loss is

R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).

Under mean-field scaling, initialization is O(1)O(1), learning rates are width-dependent, and parameter increments are O(1/m)O(1/m), so that finite parameter changes persist in the infinite-width limit (Golikov, 2020).

For deep fully connected networks, the scaling is layer dependent. In the formulation of Sirignano and Spiliopoulos, an LL-hidden-layer network has forward normalization by 1/N11/N_{\ell-1} at each hidden layer and 1/NL1/N_L at the output. For the LL-layer case, the learning-rate scaling is

αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).

This scaling “balances contributions of each layer in the infinite-width limit”; with a constant rate, the network “freezes” (Sirignano et al., 2019).

Several mathematically distinct state descriptions have been developed for these limits.

Setting State variable Limit evolution
Two-layer mean-field training ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.0 discrete map or parameter-space PDE
Deep sequential limit empirical layerwise parameter distributions deterministic integro-differential system / Liouville PDE
Three-layer neuronal embedding ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.1 on a fixed probability space deterministic ODE system
Deep path-space limit law ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.2 of input-output paths McKean–Vlasov ODE
Functional-space three-layer limit ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.3 kernel gradient flow with time-varying kernel

The deep sequential formulation uses empirical parameter distributions over products of layerwise parameters and output weights, seeded by an initial law ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.4 with compact support and continuous densities (Sirignano et al., 2019). The three-layer neuronal-embedding framework places finite-width networks and the infinite-width limit on a single probability space ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.5, with deterministic anchor functions ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.6, so that initialization for all widths is coupled through sampled abstract neurons (Pham et al., 2021). A different deep construction treats each input-to-output parameter path as a particle and studies the empirical distribution of such paths, yielding a path-space McKean–Vlasov limit for deep networks with fixed random features near the input and output (Araújo et al., 2019). For partially trained three-layer models with a fixed random first layer, the relevant state is a measure on a functional space, ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.7, where neurons are represented by output weights and pre-activation functions (Chen et al., 2022).

These formulations are not interchangeable. A plausible implication is that “the” mean-field limit of a neural network is not a single universal object, but a family of asymptotic descriptions indexed by architecture, training rule, and parametrization.

2. Limiting equations and derivation methods

In the two-layer setting, discrete-time gradient descent on the empirical measure is expressed by a deterministic Markov operator: ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.8 where ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.9 pushes each atom by the parameter update R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).0. The corresponding continuous-time limit, obtained by rescaling time by R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).1, is the parameter-space PDE

R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).2

The induced network evolution is likewise closed at the level of the output function R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).3 through a kernel built from R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).4 and the current measure R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).5 (Golikov, 2020).

For deep networks, Sirignano and Spiliopoulos obtain a limit output

R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).6

where the parameter trajectories satisfy coupled deterministic ODEs indexed by initialization. In the two-hidden-layer case these equations govern R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).7, R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).8, and R(ρ)=Ex,y[(y,f[ρ;x])],f[ρ;x]=aϕ(wTx)ρ(da,dw).R(\rho)=\mathbb E_{x,y}[\ell(y,f[\rho;x])], \qquad f[\rho;x]=\int a\,\phi(w^T x)\,\rho(da,dw).9, while in measure form each layer obeys a continuity equation

O(1)O(1)0

The velocity fields are explicit nonlocal functionals of O(1)O(1)1, O(1)O(1)2, and the current network output (Sirignano et al., 2019).

The deep path-space theory replaces neuronwise empirical measures by a law O(1)O(1)3 over parameter paths. A typical particle satisfies a self-consistent ODE

O(1)O(1)4

with drift obtained by replacing finite-width backpropagation averages by integrals under O(1)O(1)5. This yields existence, uniqueness, and propagation of chaos for the simultaneous width-O(1)O(1)6 limit of all hidden layers (Araújo et al., 2019).

The proofs of these limits rely on different technical packages. The sequential deep limit uses tightness in Skorokhod space, uniform moment bounds via Gronwall, martingale-vanishing arguments, identification by test-function calculus, and uniqueness via coupling or fixed-point arguments (Sirignano et al., 2019). The discrete-time two-layer theory proves convergence by showing that the operator O(1)O(1)7 is Lipschitz-continuous in O(1)O(1)8 and then inducting on the iteration index (Golikov, 2020). The deep McKean–Vlasov path-space theory constructs a closed subset of measures on path space on which the McKean map is well defined and eventually contracting under Wasserstein distance (Araújo et al., 2019). The neuronal-embedding framework avoids measure-over-measure closure issues by coupling all widths to abstract neurons on a fixed probability space (Pham et al., 2021).

A recurrent point of methodology is that mean-field derivations are not purely formal law-of-large-numbers arguments. In deep models, the main obstruction is not only width but also the nested dependence created by backpropagation through multiple layers.

3. Optimization, expressivity, and global convergence

Under suitable assumptions, the limiting dynamics can inherit strong optimization properties. In the deep analysis of Sirignano and Spiliopoulos, assuming O(1)O(1)9 and, for global convergence, that O(1/m)O(1/m)0 is bounded, non-constant, monotone, and hence discriminatory, together with full support assumptions on the data measure and the initialization, the limit system is a gradient flow in probability space with Lyapunov function

O(1/m)O(1/m)1

They prove O(1/m)O(1/m)2 and that any stationary measure yields zero loss; any limit point O(1/m)O(1/m)3 as O(1/m)O(1/m)4 satisfies O(1/m)O(1/m)5 for all O(1/m)O(1/m)6 (Sirignano et al., 2019).

For unregularized three-layer networks, Nguyen and Pham prove a global convergence theorem in the mean-field regime under bounded-Lipschitz assumptions on O(1/m)O(1/m)7 and O(1/m)O(1/m)8, full support of the first-layer initialization on O(1/m)O(1/m)9, convergence of the mean-field parameters LL0, and density of LL1 in LL2. If LL3 is convex, then

LL4

More generally, when LL5 and LL6 is a deterministic function of LL7, then LL8. A central ingredient is a universal-approximation property at any finite training time, obtained through an algebraic topology argument showing that the first-layer support remains all of LL9 (Pham et al., 2021).

The functional-space mean-field theory of partially trained three-layer networks yields a different optimization picture. There the limit output

1/N11/N_{\ell-1}0

obeys a kernel gradient flow with a time-varying kernel

1/N11/N_{\ell-1}1

Because 1/N11/N_{\ell-1}2 is symmetric and, under mild assumptions, remains strictly positive-definite for all 1/N11/N_{\ell-1}3, the empirical 1/N11/N_{\ell-1}4-loss satisfies a linear-rate decay estimate of the form

1/N11/N_{\ell-1}5

whenever 1/N11/N_{\ell-1}6 uniformly in time (Chen et al., 2022).

These results clarify a common misconception: nonconvexity of the finite-width parameterization does not preclude strong convergence statements for the mean-field limit. The available theorems are conditional on activation regularity, support conditions, convergence modes, or positivity of time-varying kernels, but they establish that multilayer mean-field dynamics can be globally optimizing rather than merely descriptive.

4. Fluctuations, finite-width corrections, and trajectorial stability

The law-of-large-numbers limit is only the first term in the large-width expansion. For a single hidden layer, the centered fluctuation

1/N11/N_{\ell-1}7

converges in a dual Sobolev space 1/N11/N_{\ell-1}8 to a Gaussian process solving a linear stochastic partial differential equation

1/N11/N_{\ell-1}9

or equivalently

1/NL1/N_L0

The proof uses weak convergence, relative compactness, martingale problems, and uniqueness in a suitable Sobolev space (Sirignano et al., 2018).

For multilayer networks, Nguyen and Pham derive a second-order mean-field limit through the neuronal-embedding framework. The rescaled parameter deviations

1/NL1/N_L1

are approximated in 1/NL1/N_L2 by a linear ODE system driven by a Gaussian sampling fluctuation process. They further obtain a central-limit theorem for the output fluctuation 1/NL1/N_L3, with convergence in finite-dimensional moments at rate 1/NL1/N_L4. Under additional assumptions, the width-scaled asymptotic variance

1/NL1/N_L5

is non-increasing and tends to zero as 1/NL1/N_L6, so gradient descent in the mean-field regime progressively biases training toward solutions with “minimal fluctuation” in the learned output function (Pham et al., 2021).

Mean-field theory is also used as a quantitative approximation to finite-width training trajectories. Sirignano and Spiliopoulos state that finite networks, “even moderately wide,” follow the mean-field curves closely and refer to a CIFAR10 example (Sirignano et al., 2019). Nguyen’s nonrigorous multilayer formalism reports that as the uniform width 1/NL1/N_L7 grows, the full training-loss and test-loss curves “lock in” onto a limiting trajectory; for depths 1/NL1/N_L8, curves for 1/NL1/N_L9 “essentially coincide” (Nguyen, 2019). In the discrete-time two-layer theory, the mean-field limit is shown to approximate finite-width networks better than the NTK limit when learning rates are not very small, precisely because it retains the LL0 interaction term associated with feature learning (Golikov, 2020).

This finite-width perspective changes the role of the mean-field limit. It is not only an asymptotic simplification; it is also a controlled surrogate for wide-but-finite dynamics, with explicit next-order corrections in shallow networks and second-order fluctuation systems in multilayer settings.

5. Relation to NTK, LL1P, and control-theoretic extensions

Mean-field and NTK limits arise from different scalings. Under NTK scaling,

LL2

with LL3 learning rate and LL4 parameter movement, the network admits a first-order Taylor linearization and the kernel converges to a fixed LL5. By contrast, the mean-field scaling preserves parameter movement of order LL6 over LL7 iterations and retains the nonlinear LL8 term. The discrete-time comparison shows that if LL9, then αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).0 matters at finite width and the mean-field limit tracks finite-width networks better; if αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).1, αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).2 becomes negligible and the NTK approximation suffices (Golikov, 2020).

The same framework also identifies an “intermediate” lazy limit that is neither NTK nor mean-field and shows a depth-dependent optimizer effect: for αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).3, plain gradient descent with MF-type hyper-scaling has no non-trivial discrete-time mean-field limit, while RMSProp does. Under RMSProp, normalization removes the layer-wise width dependence, all layers move αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).4, and one recovers a non-trivial discrete-time mean-field limit for any depth (Golikov, 2020). This is a precise statement about discrete-time scaling rather than a universal impossibility result for deep mean-field analysis.

The αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).5P literature extends the scope of mean-field analysis beyond classical αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).6 parametrizations. For noisy gradient descent with entropic regularization in wide two-layer networks under αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).7P, Acciaio, Heiss, Pammer, and Yan formulate the dynamics as a Fokker–Planck PDE

αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).8

prove global existence and uniqueness in a maximal weighted-moment class αC=NLN1,αW,1=1,αW,=NN1(=2,,L).\alpha_C=\frac{N_L}{N_1},\qquad \alpha_{W,1}=1,\qquad \alpha_{W,\ell}=\frac{N_\ell}{N_1}\quad (\ell=2,\dots,L).9, obtain a uniform-in-time squared-Wasserstein propagation rate ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.00, characterize identifiability modulo finite-rank realization symmetry, and derive a sparse-dictionary decomposition of the long-time limit under a Barron–Hermite target condition (Xodarev, 23 May 2026).

A complementary control-theoretic extension interprets the mean-field gradient flow of a two-layer network as a McKean–Vlasov stochastic-control problem. The measure-valued continuity equation

ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.01

is linked to a Hamilton–Jacobi–Bellman equation on ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.02 and a Dynamic Programming Principle. This yields a Finsler-type metric ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.03 on probability measures and the variational characterization

ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.04

for early stopping, up to a controllable error. The long-time limit selects, among global minimizers, one of minimal ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.05 (Acciaio et al., 21 Mar 2026).

Mean-field analysis has also been adapted to particle-based optimizers other than gradient descent. For two-layer networks trained by consensus-based optimization, the network-width limit lifts parameters to ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.06, the particle-ensemble limit lifts the ensemble to ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.07, and the resulting dynamics become a gradient flow on the Wasserstein-over-Wasserstein space. In this setting the population variance contracts as

ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.08

under the stated barycenter and absolute-continuity assumptions (Deyn et al., 26 Nov 2025).

6. Broader meanings in neural-network research

The phrase “mean-field limit of neural networks” is not confined to supervised training of artificial feedforward models. In statistical mechanics and theoretical neuroscience, it refers to several neighboring but non-identical asymptotic programs.

For Hopfield-like and related rate networks, mean-field limits describe thermodynamic or stochastic population equations. One recent thermodynamic result proves that a measure-concentration assumption on order parameters suffices for existence of the asymptotic free energy of the Hopfield model and recovers the replica-symmetric free-energy formula through a decomposition into hard and soft spin-glass free energies (Agliari et al., 2024). A separate universality result establishes the mean-field equations for large networks of Hopfield-like neurons on ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.09 without assuming i.i.d. zero-mean Gaussian synaptic weights; the limit is stochastic and characterized by a mean function ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.10 and a correlation function ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.11, with effective noise given through a Volterra equation (Faugeras et al., 2024). For correlated Gaussian synaptic weights, Faugeras, Maclaurin, and Tanré prove an annealed large deviation principle and identify a unique Gaussian, non-Markovian minimizer, leading to an infinite countable family of linear non-Markovian SDEs in the limit (1901.10248).

For spiking or integrate-and-fire networks, mean-field limits are often PDE limits for empirical measures rather than parameter distributions. In dense stochastic integrate-and-fire networks with arbitrary synaptic weights satisfying ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.12 scaling, the empirical measure converges, up to subsequence, to a spatially extended PDE indexed by a graphon variable ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.13: ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.14 with ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.15 determined by a graphon kernel ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.16 (Jabin et al., 2024). For sparse integrate-and-fire networks, a tree-indexed extension of the BBGKY hierarchy yields convergence of one-particle observables to a non-exchangeable Vlasov equation under generalized mean-field scaling and non-vanishing diffusion (Jabin et al., 2023). Replica-mean-field theory for intensity-based spiking networks takes a different route: instead of letting interactions vanish as ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.17, it considers infinitely many interacting replicas of a fixed finite network and derives stationary ODEs and self-consistency equations via the Poisson Hypothesis, preserving finite-size effects such as saturation and sparse-induced metastability (Baccelli et al., 2019).

Even in rate-based noisy networks, the term can mean a macroscopic reduction of neuronal activity rather than a parameter-space transport equation. For a random network of noisy rate neurons on an Erdős–Rényi graph, a second-order stochastic mean-field model for the mean rate ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.18 and variance ρkm=1mi=1mδθi(k).\rho_k^m=\frac1m\sum_{i=1}^m \delta_{\theta_i^{(k)}}.19 distinguishes the effects of external and internal noise: in the thermodynamic limit, external noise reshapes the deterministic mean-field vector field, while internal noise affects only the variance equation (Klinshov et al., 2015).

These adjacent literatures show that “mean-field” is a family resemblance term. In machine learning, it usually denotes infinite-width training dynamics in parameter or function space; in statistical mechanics and neuroscience, it often denotes thermodynamic limits of interacting neuronal states, empirical membrane-potential laws, or free-energy formulas. The shared structure is passage from many-body randomness to a deterministic or self-consistent continuum description, but the state variables, limiting equations, and proof techniques differ substantially.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mean-Field Limits of Neural Networks.