Stability of Singular Distribution (SoSD)

Updated 4 July 2026

SoSD is a family of phenomena describing the stability of singular components—such as measures, spectra, and drifts—across various mathematical and applied domains.
In operator theory, SoSD characterizes the relative stability of the singular spectrum using a rigged resolvent and Hilbert–Schmidt control through coupling resonances.
Across numerical PDEs, stochastic dynamics, and language-model pre-training, SoSD provides practical criteria for energy dissipation, convergence of singular profiles, and spectral stabilization.

Searching arXiv for the core paper and recent usage of the term. Searching arXiv for "Relative stability of singular spectrum" and related SoSD terminology. Stability of Singular Distribution (SoSD) is a nonstandard research expression used for several mathematically distinct stability phenomena involving singular measures, singular spectra, singular drifts, singular PDE profiles, or singular-value distributions. In the supplied arXiv literature, the term ranges from the relative stability of the singular spectral measure of a self-adjoint operator under the limiting absorption principle, to unconditional energy stability for discretizations of singular flows, to law-level stability for McKean–Vlasov and singular stochastic differential equations, to the asymptotic stability of boundary spike layers converging to Dirac masses, and, in a distinct machine-learning usage, to early stabilization of trace-normalized singular-value spectra during language-model pre-training (Azamov, 2021, Bartels et al., 2017, Galeati et al., 2022, Carrillo et al., 2019, Zhang et al., 26 May 2026).

1. Terminological scope

In the supplied literature, SoSD does not denote a single standardized invariant. Rather, it labels a family of problems in which an object with a singular component is evolved, perturbed, regularized, or linearized, and one asks whether that singular component remains controlled, converges, becomes unstable, or determines mutual singularity of laws. This suggests a family resemblance rather than a uniform doctrine.

Domain	Singular object	Stability notion
Spectral theory	Singular spectral measure $E^{(s)}(H)$	Hilbert–Schmidt control after sandwiching by a rigging $F$ on large compact subsets (Azamov, 2021)
Numerical/PDE analysis	Interfaces, spike layers, singular forces	Energy dissipation, asymptotic stability, or persistence of singular profiles (Bartels et al., 2017, Cannone et al., 2020, Carrillo et al., 2019)
Stochastic analysis	Laws and densities under singular drift or measure dependence	Well-posedness, quantitative continuous dependence, or spectral instability of equilibria (Röckner et al., 2018, Galeati et al., 2022, Wang, 2021, Raynal et al., 2022, Zhang, 5 Oct 2025)
Path-space probability	Laws of dilatively stable processes	Mutual singularity of distributions for different scaling exponents (Igloi et al., 2011)
Machine learning	Trace-normalized singular-value spectrum of weights	Early stabilization of normalized spectra during pre-training (Zhang et al., 26 May 2026)

A recurring pattern is the separation of a singular object from a regular background, followed by a stability statement in a specific topology: Hilbert–Schmidt class in operator theory, weighted or unweighted $L^q$ control in PDE, $L^k$ , $L^\infty$ , Wasserstein, or weighted total variation metrics in stochastic analysis, and Frobenius-norm variation of normalized singular spectra in the language-model setting. A common misconception is that SoSD must mean invariance of the singular part. In several of these usages it does not. In particular, the operator-theoretic formulation explicitly does not claim invariance of singular spectrum, and the machine-learning formulation concerns stabilization of a normalized spectral shape rather than freezing of the underlying parameter matrices.

2. Operator-theoretic SoSD: relative stability of singular spectrum

The most explicit formalization of SoSD in the supplied corpus appears in the note on relative stability of singular spectrum. Let $H_0$ be self-adjoint on a separable Hilbert space $\mathcal H$ , let $F:\mathcal H\to\mathcal K$ be bounded with trivial kernel and co-kernel, and define the sandwiched resolvent

$T_z(H_0)=F(H_0-z)^{-1}F^*.$

Assume $T_z(H_0)$ is compact for $F$ 0, and that for almost every $F$ 1 the norm limit

$F$ 2

exists on an open interval $F$ 3. For bounded self-adjoint $F$ 4 on $F$ 5, set $F$ 6 and $F$ 7. The main theorem states that for any $F$ 8 there exists a compact subset $F$ 9 with $L^q$ 0 such that for any $L^q$ 1 the operator $L^q$ 2 is Hilbert–Schmidt (Azamov, 2021).

This is a relative stability statement rather than an invariance theorem. The singular spectrum is not claimed to be unchanged, nor are singular subspaces claimed to be unitarily equivalent. The point is that, outside a subset of arbitrarily small Lebesgue measure, the singular distribution becomes tame in the rigged sense: when sandwiched by $L^q$ 3, the singular operator-valued spectral measure is controlled in the Hilbert–Schmidt class. The note therefore complements Weyl’s theorem for essential spectrum and Kato–Rosenblum for absolutely continuous spectrum by isolating a residual stability property for the singular component.

The mechanism is expressed through coupling resonances. For $L^q$ 4, the meromorphic dependence of $L^q$ 5 admits, on suitable sectors $L^q$ 6, a finite-pole Laurent expansion

$L^q$ 7

where the $L^q$ 8 are coupling resonance functions and the residues

$L^q$ 9

are finite-rank operators (Azamov, 2021). After removing a set of small measure from $L^k$ 0, only finitely many “impacting resonances” intersect the coupling interval $L^k$ 1, and their residues are continuous in trace-class norm. Stone’s formula then separates an absolutely continuous part coming from the holomorphic term and a singular part controlled by these finite-rank residues. The resulting sandwiched singular distribution is Hilbert–Schmidt on $L^k$ 2.

The note also turns the theorem into a falsification principle for LAP. If one can show that for every compact $L^k$ 3 with $L^k$ 4 small there exists a coupling $L^k$ 5 such that $L^k$ 6 fails to be Hilbert–Schmidt, then the limiting absorption principle cannot hold on $L^k$ 7. In this sense SoSD is simultaneously a stability theorem and a diagnostic for instability of boundary resolvent behavior.

3. Deterministic PDE and numerical realizations

In numerical analysis of singular flows, SoSD is formulated as stability of the discrete evolution of singular structures. For the singular $L^k$ 8-Laplace flow with $L^k$ 9, and in particular the $L^\infty$ 0 total variation flow, the semi-implicit scheme

$L^\infty$ 1

freezes the singular factor at the previous time step while treating the weighted elliptic operator implicitly. Under the structural assumptions that $L^\infty$ 2 is convex, $L^\infty$ 3, $L^\infty$ 4, and $L^\infty$ 5 is positive, nonincreasing, and continuous, the iterates satisfy

$L^\infty$ 6

and for TV-flow the sharper identity

$L^\infty$ 7

holds for all $L^\infty$ 8. In this usage, SoSD means unconditional energy dissipation and monotone relaxation of singular structures such as interfaces and concentrated diffusion, although the fully discrete error bounds still display an unfavorable dependence on $L^\infty$ 9 and inverse powers of $H_0$ 0 (Bartels et al., 2017).

For the three-dimensional Navier–Stokes system on $H_0$ 1 with singular external forces, the relevant singular objects are not spectral measures but singular solutions generated by measure or distributional forcing. The framework uses pseudomeasure spaces

$H_0$ 2

Small data in $H_0$ 3 and small forcing in $H_0$ 4 yield global mild solutions, and the far-field asymptotics are controlled by the linear heat flow and the forced term. For stationary solutions, small perturbations of singular forcing preserve both far-field and local singular profiles. In particular, if $H_0$ 5 for $H_0$ 6, then corresponding stationary solutions satisfy $H_0$ 7, and quantitative $H_0$ 8 stability follows in ranges depending on $H_0$ 9. For the Cauchy problem, if $\mathcal H$ 0 for $\mathcal H$ 1, then

$\mathcal H$ 2

for admissible $\mathcal H$ 3. In this PDE setting, SoSD denotes robustness of singular asymptotics, including persistence of Slezkin–Landau-type profiles under small perturbations of singular forces (Cannone et al., 2020).

For the one-dimensional half-line Keller–Segel system with logarithmic sensitivity and nonlinear consumption,

$\mathcal H$ 4

the singular limit is measure-valued. The unique boundary spike-layer steady state has explicit power-law profiles $\mathcal H$ 5 and $\mathcal H$ 6, and as $\mathcal H$ 7 or $\mathcal H$ 8 one has $\mathcal H$ 9 in the sense of distributions while $F:\mathcal H\to\mathcal K$ 0 forms a boundary layer. Stability is proved after the Cole–Hopf transformation $F:\mathcal H\to\mathcal K$ 1 and passage to antiderivative variables $F:\mathcal H\to\mathcal K$ 2, $F:\mathcal H\to\mathcal K$ 3. Weighted energy estimates, together with Hardy’s inequality, yield global existence and asymptotic nonlinear stability: $F:\mathcal H\to\mathcal K$ 4 Here SoSD refers to stability of regular steady states whose singular limit is the boundary Dirac mass $F:\mathcal H\to\mathcal K$ 5 (Carrillo et al., 2019).

4. Singular stochastic dynamics and McKean–Vlasov stability

In stochastic analysis, SoSD is chiefly a question of law-level well-posedness and quantitative dependence for SDEs with singular drift or distribution dependence. A basic distribution-dependent SDE of this type is

$F:\mathcal H\to\mathcal K$ 6

Under uniform ellipticity and Hölder continuity in $F:\mathcal H\to\mathcal K$ 7 for $F:\mathcal H\to\mathcal K$ 8, Krylov–Röckner integrability $F:\mathcal H\to\mathcal K$ 9 for the drift, and Lipschitz dependence on the measure variable either in $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 0 or in a weighted total variation norm, strong or weak well-posedness follows by combining Krylov estimates, Khasminskii-type exponential integrability, a Zvonkin transform, and stability estimates for the associated backward parabolic PDE. The law flow is then stable in Wasserstein-type or weighted total-variation metrics, and uniqueness transfers to the nonlinear Fokker–Planck equation through the superposition principle (Röckner et al., 2018).

A more quantitative perturbative theory is developed for singular Itô and Stratonovich SDEs with Sobolev diffusion coefficients. For two SDEs with coefficients $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 1, the stability estimate

$T_z(H_0)=F(H_0-z)^{-1}F^*.$ 2

holds under $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 3 and $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 4. This identifies negative Sobolev norms as natural drift perturbation metrics after the Zvonkin transform. The same framework yields applications to McKean–Vlasov equations, strong compactness, and Wong–Zakai approximations, so here SoSD is a quantitative continuity property of solution laws with respect to singular-coefficient perturbations (Galeati et al., 2022).

A related law-stability result appears for density-dependent singular SDEs

$T_z(H_0)=F(H_0-z)^{-1}F^*.$ 5

where the drift is singular in $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 6 but Lipschitz in the density, both pointwise and globally in a local $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 7-norm. For $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 8 above an explicit threshold $T_z(H_0)=F(H_0-z)^{-1}F^*.$ 9, weak well-posedness and density stability hold: $T_z(H_0)$ 0 An analogous estimate holds for reflecting SDEs on $T_z(H_0)$ 1-domains with Neumann boundary conditions. In this usage, SoSD is explicit uniform-in-time Lipschitz continuity of the nonlinear density flow (Wang, 2021).

For stable-driven McKean–Vlasov SDEs with distributional interaction kernel,

$T_z(H_0)$ 2

the singularity is carried by the kernel $T_z(H_0)$ 3 with $T_z(H_0)$ 4. The main effect is regularization by noise: the $T_z(H_0)$ 5-stable semigroup and the convolution with $T_z(H_0)$ 6 regularize the distributional kernel sufficiently to obtain weak well-posedness under a threshold condition labeled $T_z(H_0)$ 7, presented in the scanned text as $T_z(H_0)$ 8, and strong well-posedness under the stronger condition $T_z(H_0)$ 9, presented as $F$ 00. The paper emphasizes that the McKean–Vlasov nonlinearity permits the scaling-optimal threshold $F$ 01, improving over the linear singular-drift threshold $F$ 02 (Raynal et al., 2022).

Not all stochastic SoSD is stabilizing. For distribution-dependent SDEs with multiple stationary laws, a spectral criterion yields instability of a stationary distribution $F$ 03. If the generator $F$ 04 of the linearized semigroup has spectrum intersecting the open right half-plane and the dual linearized semigroup is quasi-compact on $F$ 05, then $F$ 06 is unstable in a weighted Kantorovich–Rubinstein metric. Concrete examples include granular media models with double-well structure, where a symmetry-breaking mode produces a positive real-part eigenvalue. Thus, in the stochastic literature, SoSD encompasses both robust law-level dependence and spectral criteria for failure of stability (Zhang, 5 Oct 2025).

5. Path-space singularity and scaling exponents

A different meaning of singular distribution appears in the theory of dilatively stable processes. An $F$ 07-dilatively stable process has infinitely divisible finite-dimensional distributions and obeys the scaling relation

$F$ 08

equivalently

$F$ 09

for the log-characteristic exponent. In the stationary-increment case, the covariance scales like $F$ 10, and the local Hölder regularity is governed by $F$ 11 with corrections depending on $F$ 12. The main path-regularity theorem gives continuous modifications with local $F$ 13-Hölder paths for every $F$ 14 when $F$ 15, every $F$ 16 when $F$ 17, and every $F$ 18 when $F$ 19 and $F$ 20 (Igloi et al., 2011).

The principal SoSD statement in this context is not continuous dependence under perturbation but mutual singularity of distributions on path space. If $F$ 21 and $F$ 22 are two mean-zero $F$ 23-dilatively stable processes with stationary increments and almost surely continuous paths on a closed interval $F$ 24, then

$F$ 25

The proof constructs disjoint Borel support sets in $F$ 26 from sharp limsup scaling of increments along sequences $F$ 27. Different Hurst-type exponents force different almost sure local scaling classes, hence disjoint supports. In this sense the stability exponent $F$ 28 rigidly determines the singularity class of the path law (Igloi et al., 2011).

This usage is conceptually distinct from the others. The singularity is mutual singularity of probability measures on function space, not singularity of a spectral measure, drift term, or PDE profile. Nevertheless, it fits the broader family resemblance: a parameter controlling local singular structure also determines a sharp law-level classification.

6. Spectral SoSD in language-model pre-training

A recent and domain-specific usage defines the singular distribution of a weight matrix $F$ 29 as its trace-normalized singular value spectrum. If $F$ 30, the singular distribution is

$F$ 31

and the basic stability metric is

$F$ 32

This quantity is tracked layerwise for attention matrices $F$ 33 and MLP projections. The reported empirical phenomenon is a two-phase trajectory in which validation loss first decreases rapidly and then enters a slow-descent regime, while $F$ 34 exhibits an initial impulse of order $F$ 35 followed by a metastable floor around $F$ 36. The onset of this spectral stabilization is reported to synchronize with the slow-descent regime across GPT-2 Small and Medium on FineWeb, LLaMA 0.5B and 2B on C4, multiple learning-rate schedules, AdamW and Muon, and with and without weight decay (Zhang et al., 26 May 2026).

The theoretical model is a simplified one-layer, single-head Transformer trained by full-batch gradient descent. Under strictly increasing weight norms and regularity assumptions, the main theorem states that for each $F$ 37 there exists a threshold time after which

$F$ 38

Explicit asymptotic threshold times $F$ 39, $F$ 40, and a global $F$ 41 are given. The mechanism is that growth of $F$ 42 suppresses the relative size of normalized spectral updates, so the trace-normalized singular-value distribution stabilizes even though the raw matrices continue to evolve (Zhang et al., 26 May 2026).

A second theorem links SoSD to optimization dynamics. In Phase I, the loss decrease satisfies

$F$ 43

whereas in Phase II, once SoSD holds and $F$ 44,

$F$ 45

This is not a low-rank-collapse statement. It is a statement about stabilization of the normalized spectral shape, not freezing of parameters or vanishing of singular values. In this setting, SoSD becomes a spectral diagnostic for the transition from fast descent to slow asymptotic improvement, and schedule or optimizer choices are interpreted through their effect on the SoSD scale $F$ 46 and the post-stabilization floor of $F$ 47 (Zhang et al., 26 May 2026).

Across these literatures, the term therefore remains context-sensitive. The shared idea is not a single theorem but a recurring research strategy: isolate a singular component, choose a topology in which it is observable, and prove either that it remains controlled, converges to a singular limit, determines disjoint law classes, or becomes unstable through a spectral mechanism.