Mode Interpolation Phenomenon

Updated 2 March 2026

Mode interpolation is the process of blending learned or mathematical representations to yield intermediate solutions with distinct performance profiles.
In deep networks, linear mode connectivity can boost out-of-distribution accuracy, while in generative models it may create hallucinated outputs by interpolating between data modes.
In reduced-order modeling and signal decomposition, mode interpolation enables rapid surrogate predictions and artifact-free decompositions, emphasizing its broad applicability.

Mode interpolation refers to a range of phenomena in which mathematical or learned representations—often called "modes"—are combined (typically via interpolation in weight or data space) to produce new, intermediate solutions. This phenomenon manifests in diverse domains, including neural network parameter spaces, generative models, reduced-order modeling, and signal decomposition. In each, interpolation between modes yields distinctive theoretical and practical consequences: improved generalization, hallucinated samples, superior surrogate models, or rigorous decompositions.

1. Linear Mode Interpolation in Deep Networks

In neural networks, mode interpolation—also commonly called linear mode connectivity—refers to connecting two significant parameter vectors, $w_0$ and $w_1$ (often representing, e.g., a zero-shot and a fine-tuned CLIP model), via the linear path $w_\alpha = \alpha w_0 + (1-\alpha) w_1$ for $\alpha \in [0,1]$ (Abdollahpoorrostam, 2024, Zhan et al., 8 Mar 2025).

A central observation is that along this path, there generally exists an $\alpha^*$ such that the out-of-distribution (OOD) accuracy of the interpolated model $w_{\alpha^*}$ can strictly exceed that of either endpoint—a phenomenon exploited in robust fine-tuning (RFT) of CLIP. The OOD accuracy function $\mathcal{A}_\alpha = \mathcal{A}(w_\alpha; S_\mathrm{OOD})$ defines the performance landscape, and the OOD gain is measured as $\max_\alpha \mathcal{A}_\alpha - \mathcal{A}(w_0)$ . High-gain modes (positive difference) indicate successful interpolation, while failure modes (no improvement at any $\alpha$ ) indicate pathologies in the weight space.

In overparameterized neural networks, linear mode connectivity (LMC) can often be attained modulo permutation invariance. That is, after permuting hidden-layer units appropriately, the loss barrier along the interpolation path is diminished or eliminated. In two-layer ReLU teacher–student models, the loss barrier modulo permutation displays a double-descent with width, vanishing as $O(m^{-1/2})$ for network width $m$ much greater than the number of teacher neurons, and is independent of input dimensionality (Zhan et al., 8 Mar 2025).

2. Mode Interpolation and Hallucination in Generative Diffusion Models

In diffusion-based generative models, mode interpolation manifests as the generation of samples outside the true data support by interpolating between distinct data modes (Aithal et al., 2024). A formal definition is given: for a true data density $q(x)$ , the set of points with $q(x)>\epsilon$ are called its $\epsilon$ -support; hallucinations are samples $x$ with $q(x)\leq\epsilon$ .

Diffusion models, trained via mean-squared error to predict $\epsilon$ -noise, inherently learn a smoothing (convex averaging) over multimodal true posteriors. For mixtures of discrete (or well-separated) modes (e.g., point masses or narrowly supported Gaussians), the model's prediction necessarily interpolates between modes: for $x^{(i)}, x^{(j)}$ in the data support and $\theta \in (0,1)$ , the sample $x = \theta x^{(i)} + (1-\theta)x^{(j)}$ is produced even though $x$ itself is not in the original support, constituting a hallucination.

Empirical studies with 1D/2D Gaussian mixtures reveal that such interpolation fills the gaps between data modes, and that the prevalence of hallucinations falls but does not disappear with more training data or greater mode separation. In higher dimensions and with complex shapes, artifacts correspond to combinations never present in the original data.

Variance of the reverse-sampling trajectory is a marker of mode interpolation-induced hallucinations, and discarding high-variance samples during generation is an effective filtering strategy, removing ~95% of hallucinations while retaining most in-support samples. This filtering is crucial to prevent distributional collapse in recursive (synthetic-only) training (Aithal et al., 2024).

3. Mode Interpolation in Reduced-Order Parametric Modeling

In parametric reduced-order modeling—especially in the context of proper orthogonal decomposition (POD)—mode interpolation refers to interpolating modal coefficients across parameter space to rapidly predict system responses at unseen parameters (Hardy et al., 2023).

Given a set of full-order solutions $x(t;\mu)$ sampled at parameters $\{\mu_j\}$ , POD yields orthonormal mode bases $\{\phi_i\}$ and associated time-dependent coefficients $a_i(t;\mu_j)$ . The mode coefficients $a_i(\mu)$ are interpolated using methods such as radial basis functions, splines, or Gaussian processes to any new parameter $\mu^*$ . The reconstructed field is then $x_\mathrm{ROM}(t; \mu^*) = \sum_{i=1}^r \phi_i a_i(t, \mu^*)$ .

Crucially, the interpolation of modal coefficients is robust: low-order (high-energy) modes vary smoothly in parameter space and dominate the solution energy, so errors due to poor interpolation of higher-order (low-energy) modes have minimal effect. This modal energy decay effect underpins the significant reduction in prediction error, enabling extremely fast queries while maintaining high accuracy within the sampled parameter domain (Hardy et al., 2023).

4. Mode Interpolation in Dynamic Mode Decomposition and Manifold Geometry

Model reduction approaches that leverage dynamic mode decomposition (DMD) and manifold-based interpolation also utilize mode interpolation, particularly in time-dependent PDEs (Hess et al., 2022). Here, DMD modes at different parameter points (e.g., different Grashof numbers in Rayleigh–Bénard convection) are subspaces on the Grassmann manifold $\mathrm{Gr}(N, r)$ .

Tangential (Riemannian) interpolation is performed by mapping sample modes to the tangent space at a base point (via the Grassmann logarithm), interpolating linearly there, and mapping back to the manifold (Grassmann exponential). The reduced Koopman operators, residing in $GL(r)$ , are likewise interpolated. This geometric procedure enables generation of new dynamical behaviors—such as frequencies that do not occur at any training sample—by reconstructing the full Koopman operator at unsampled parameters.

Numerical experiments confirm that, with appropriately small parameter sample spacing, the interpolated models achieve low mean and maximum errors, robustly approximating complex phenomena such as phase transitions and frequency bifurcations (Hess et al., 2022).

5. Sharpness and Straggler-Layer Effects in Neural Mode Interpolation

In neural network mode interpolation, the geometry of the interpolated path is influenced by sharpness—both global and layer-wise. Adaptive average-case sharpness is measured by perturbing model weights with Gaussian noise proportional to their magnitude and observing loss increments. While global sharpness does not reliably predict OOD generalization or the efficacy of RFT in architectures like CLIP, layer-wise sharpness is critical (Abdollahpoorrostam, 2024).

A "straggler layer" is defined as a layer whose sharpness remains near zero along the interpolation path; such pinning correlates with OOD failure modes—no $\alpha$ yields a gain over the zero-shot baseline. Inducing $50\%$ random sparsity in identified straggler layers before interpolation restores the possibility of successful RFT, as shown by improved OOD accuracy in previously failing cases. Prudent monitoring and manipulation of layer-wise sharpness are thus important practical strategies (Abdollahpoorrostam, 2024).

6. Mode Interpolation Artifacts in Signal Decomposition

In time-frequency and signal processing, mode interpolation has historically introduced artifacts in empirical mode decomposition (EMD) and variational mode decomposition (VMD) due to non-orthogonal, non-band-limited, and locally varying spline-based projections. The orthogonal mode decomposition method defines a unique, fixed interpolation-function space $\mathcal{I}$ of finite-length discrete signals, within which modes are extracted as orthogonal projections onto narrow-band subspaces (Li et al., 2024).

This procedure yields mathematically unique, orthogonal, and artifact-free decompositions, immune to mode-mixing and boundary effects caused by interpolation in less rigid frameworks. Explicitly, by constructing modes as projections, the end effects and spurious frequency artifacts common to interpolation in EMD are eliminated (Li et al., 2024).

7. Summary Table: Principal Contexts of Mode Interpolation

Domain	Mode Interpolation Mechanism	Canonical Effects / Issues
Deep Networks (CLIP, LMC)	Linear weight-space paths, possibly after permutation	OOD gains/failures, loss barriers, sharpness
Diffusion Models	Sample-space interpolation between data modes	Hallucinated outputs (artifacts)
ROMs (POD, DMD, manifolds)	Modal coefficient interpolation, tangent-space manifold paths	Rapid parametric predictions, new behaviors
Signal Decomposition	Orthogonal projection in interpolation-function space	Artifact-free, orthogonal narrow-band modes

In all these settings, mode interpolation serves as a unifying principle with important implications for generalization, signal representation, generative fidelity, and surrogate modeling. The precise mathematical and algorithmic formalizations are domain-specific, but the essential phenomenon—interpolating between representations to yield new, often useful, sometimes pathological behaviors—is ubiquitous and of foundational importance (Abdollahpoorrostam, 2024, Aithal et al., 2024, Li et al., 2024, Zhan et al., 8 Mar 2025, Hardy et al., 2023, Hess et al., 2022).