Space-Time Noise Inversion Methods

Updated 4 July 2026

Space-Time Noise Inversion is a framework for recovering spatiotemporal noise structures by integrating temporal shifts and spatial correlations in inverse problems.
It leverages methodologies such as extended-source inversion, fixed-point reconstruction, and tensorization to stabilize and accurately retrieve latent noise maps from noisy data.
Applications span acoustic, subdiffusive, and stochastic PDEs as well as diffusion models for image and video editing, demonstrating broad impact on signal reconstruction.

Space-time noise inversion denotes a class of inverse and reconstruction problems in which the unknown is distributed over space and time, or in which noise itself becomes the object of inversion, estimation, transport, or control across spatial and temporal degrees of freedom. In the literature, the term spans several technically distinct settings: inverse source recovery in wave and subdiffusion equations, stochastic PDE identification under space-time Gaussian noise, ambient-noise interferometry with correlated sources, microlocal analysis of how additive noise is transformed by inversion, and diffusion-model inversion methods that recover or refine time-indexed latent noise maps for image and video editing (Symes, 2021). A common structural feature is that inversion is not performed on a static parameter alone; rather, one must handle temporal shifts, temporal support, time-stepping trajectories, or temporal correlations together with spatial structure, and noise is either amplified, regularized, or explicitly encoded as part of the solution.

1. Foundational inverse-problem formulations

A mathematically complete prototype appears in an acoustic transmission inverse problem in which one seeks both the slowness $m$ and the transient waveform $w$ from a measured trace at a receiver. The causal acoustic pressure field satisfies

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$

and the trace at $X_r$ is

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$

The unknowns are therefore the slowness $m\in M=(m_{\min},m_{\max})$ and the source waveform $w\in L^2(\mathbb R)$ , with prior compact support

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$

and noisy data constrained by

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$

(Symes, 2021).

In a distinct parabolic setting, a space-time dependent source factor is reconstructed in a subdiffusion equation on a cylindrical domain. The forward model is

$\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$

with separable source

$w$ 0

and the inverse problem is to recover $w$ 1 from the lateral boundary trace

$w$ 2

Here the unknown is explicitly space-time dependent, and the measured trace must be differentiated in time and tangential space during reconstruction, which the paper identifies as a central source of noise amplification (Cen et al., 7 May 2026).

Another canonical formulation arises in stochastic PDEs. For the stochastic heat equation

$w$ 3

the inverse unknown is not a deterministic coefficient but the covariance operator $w$ 4 of a random multiplicative potential. The associated random potential is described as space-time Gaussian noise that is spatially colored and temporally white: $w$ 5 with $w$ 6. The data consist of correlation operators at a final time, generated from a complete orthonormal basis of initial conditions (Li et al., 7 Feb 2025).

These formulations show that “space-time noise inversion” is not restricted to one PDE class. It includes hyperbolic, fractional diffusive, and stochastic evolutions, provided that temporal structure and noise handling are intrinsic to the inverse map.

2. Stabilization, fixed-point reconstruction, and explicit error control

A central issue in space-time inverse problems is the instability created by temporal misalignment. In the acoustic model, the standard least-squares objective

$w$ 7

exhibits cycle skipping. For noise-free data $w$ 8, if

$w$ 9

then the shifted traces are orthogonal, and

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 0

so $m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 1 is a local minimizer. The pathology is therefore a concrete support-overlap failure induced by the wrong travel time (Symes, 2021).

The stabilization mechanism in that setting is extended-source inversion. The support constraint is replaced by a weighted quadratic penalty,

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 2

with multiplication operator

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 3

Using variable projection, the inner minimizer satisfies

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 4

with explicit solution

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 5

For noise-free data and that cutoff choice, any stationary point of the reduced objective must satisfy

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 6

and with noisy data $m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 7, $m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 8, the paper gives

$m^2 \partial_t^2 p(x,t) - \Delta p(x,t) = w(t)\,\delta(x-X_s), \qquad p(x,t)=0\ \text{for } t<0,$ 9

The qualitative conclusion stated there is that the slowness error grows at most linearly with source support size and only mildly with relative noise (Symes, 2021).

In the subdiffusion problem, the reconstruction is reformulated as a fixed point. From

$X_r$ 0

and $X_r$ 1, the exact source becomes a fixed point of

$X_r$ 2

The paper then constructs a fully discrete fixed-point method using Galerkin FEM in space and backward-Euler convolution quadrature in time, and proves that the discrete map is contractive in a weighted discrete norm: $X_r$ 3 Consequently,

$X_r$ 4

Its a priori error estimate makes the role of noise amplification explicit: $X_r$ 5 The $X_r$ 6 term is attributed to amplification by fractional time differentiation, while $X_r$ 7 reflects amplification by the discrete spatial Laplacian (Cen et al., 7 May 2026).

Taken together, these results establish a recurring principle: space-time inversion becomes tractable when the reconstruction operator is modified so that temporal misfit is relaxed or differentiated data are embedded in a contractive or penalized scheme with explicit noise-dependent bounds.

3. Stochastic and operator-theoretic identification under space-time noise

In the stochastic heat equation with multiplicative space-time Gaussian noise, the object of inversion is the covariance operator $X_r$ 8, rather than a deterministic source. The decisive step is tensorization. For the mild solution $X_r$ 9, the correlation kernel is

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 0

The paper introduces the symmetric tensor space $F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 1 and shows that the correlation obeys a closed deterministic evolution,

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 2

where $F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 3 packages the covariance information into an operator on the tensor space (Li et al., 7 Feb 2025).

The data are built from a polarization identity. For Laplacian eigenfunctions $F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 4,

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 5

which extracts cross-correlations. The main identifiability statement is that, assuming there exists $F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 6 such that

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 7

the covariance operator $F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 8 is uniquely determined by the collection

$F[m]w(t) = p(X_r,t) = \frac{1}{4\pi r}\,w(t-mr), \qquad r = |X_r-X_s|.$ 9

at any fixed time $m\in M=(m_{\min},m_{\max})$ 0 (Li et al., 7 Feb 2025).

This result is notable because the stochasticity is eliminated at the level of second moments: the random SPDE is converted into a deterministic semigroup identification problem on a tensor-product space. A plausible implication is that “noise inversion” here means inversion through noise statistics rather than inversion despite noise. The required data are strong—final-time correlations for a complete orthonormal basis of initial conditions—but the theorem is correspondingly exact.

The same operator-theoretic perspective appears in microlocal treatments of additive noise. For a linear inverse problem $m\in M=(m_{\min},m_{\max})$ 1 with noisy data and parametrix $m\in M=(m_{\min},m_{\max})$ 2, the reconstruction noise is analyzed by semiclassical defect measures. If $m\in M=(m_{\min},m_{\max})$ 3 is the defect measure of the data noise, then for an elliptic Fourier integral operator $m\in M=(m_{\min},m_{\max})$ 4 with canonical relation a local diffeomorphism,

$m\in M=(m_{\min},m_{\max})$ 5

Thus inversion transports the phase-space power spectrum through the canonical map and weights it by the inverse normal-operator symbol (Stefanov et al., 2020). In this sense, space-time noise inversion is not only a reconstruction task but also a transport law for noise statistics under the inverse map.

4. Correlated-noise interferometry and microlocal transformation of noise

In ambient-noise seismic interferometry, the ideal objective is to recover an inter-receiver Green’s function from cross-correlated observations. The recorded field is

$m\in M=(m_{\min},m_{\max})$ 6

and the interferometric product is

$m\in M=(m_{\min},m_{\max})$ 7

Standard theory assumes spatially delta-correlated sources,

$m\in M=(m_{\min},m_{\max})$ 8

which yields the correct phase after stacking. The paper emphasizes that many real ambient sources such as trains, highway traffic, and ocean waves are inherently correlated both in space and time, so off-diagonal terms in the double integral survive and generate crosstalk (Ayala-Garcia et al., 2021).

To analyze that contamination, the interferometric quantity is split into an $m\in M=(m_{\min},m_{\max})$ 9-diagonal part and an off-diagonal part: $w\in L^2(\mathbb R)$ 0 where $w\in L^2(\mathbb R)$ 1 collects contributions from $w\in L^2(\mathbb R)$ 2. For moving sources, Doppler effects further distort the phase, with instantaneous recorded frequency

$w\in L^2(\mathbb R)$ 3

The paper’s conclusion is explicit: correlated sources always perturb and sometimes obscure the phase one wishes to retrieve (Ayala-Garcia et al., 2021).

Its mitigation strategy is random windowing. Windowed recordings are cross-correlated over many random short windows, producing

$w\in L^2(\mathbb R)$ 4

with the key relation

$w\in L^2(\mathbb R)$ 5

The method therefore acts as an $w\in L^2(\mathbb R)$ 6-diagonal filter that suppresses crosstalk without requiring explicit knowledge of the source kernel (Ayala-Garcia et al., 2021).

Microlocal inversion theory provides a complementary description of how noise is altered by the inverse operator itself. For interpolated white noise with i.i.d. samples of variance $w\in L^2(\mathbb R)$ 7, the defect measure is

$w\in L^2(\mathbb R)$ 8

while under inversion by an elliptic FIO it is pulled back and reweighted as noted above (Stefanov et al., 2020). In the two-dimensional Radon transform, the normal-operator symbol is

$w\in L^2(\mathbb R)$ 9

so the inverse amplifies high frequencies. For unfiltered parallel-geometry inversion, the paper gives

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 0

while with a filter $W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 1,

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 2

For fan-beam geometry, the reconstructed noise becomes position- and direction-dependent through

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 3

These results show that inversion does not preserve “white noise”; it reshapes noise power in phase space, and the geometry of the acquisition enters explicitly (Stefanov et al., 2020).

5. Diffusion-model noise inversion as latent space-time reconstruction

In diffusion models, space-time noise inversion refers to recovery or refinement of a sequence of latent noise maps indexed by diffusion time. In a standard DDPM, an image is associated with a sequence

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 4

one map per diffusion timestep. An edit-friendly inversion procedure recovers such a sequence so that the reverse DDPM chain reconstructs the target image exactly. The resulting noise maps are not independent across timesteps and are not standard normal; instead they are correlated across time and contain spatial patterns that reflect the target image (Huberman-Spiegelglas et al., 2023). The paper explicitly characterizes this latent as space-time structured because there is one recovered map per timestep.

This perspective is developed further in inversion methods for real-image editing. Noise Map Guidance treats DDIM inversion latents

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 5

as spatial tensors that preserve layout information and uses them as conditions in reverse denoising. The method is formulated through an energy-guidance term based on mismatch to the inversion trajectory,

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 6

and a noise-map-guided prediction

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 7

Conditioning is then applied in two stages at each timestep: first on the noise map, then on the text prompt (Cho et al., 2024). The paper’s emphasis is that DDIM inversion alone gives a latent trajectory, but classifier-free guidance can drift away from it during editing; direct conditioning on the noise maps preserves spatial context without per-timestep optimization.

Editable Noise Map Inversion makes the target prompt part of inversion itself. Starting from the diffusion forward process

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 8

it introduces an editability term

$W_\lambda = \{ w\in L^2(\mathbb R): \operatorname{supp}(w)\subset[-\lambda,\lambda]\}$ 9

together with the reconstruction loss

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 0

and optimizes

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 1

The reported interpretation is that the target image is “imprinted” into the noise maps: inversion is no longer a pure reconstruction problem, but a search for noise trajectories that preserve content while remaining compatible with the intended edit (Kang et al., 30 Sep 2025).

A related but distinct formulation appears in Noise Combination Sampling for linear inverse problems. Instead of adding an explicit measurement-score guidance term, the method synthesizes the reverse-process noise directly: $\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 2 with

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 3

The paper explicitly describes this as time-dependent noise inversion: at each reverse step, the algorithm finds the best noise vector in a codebook subspace that encodes the measurement information (Su et al., 24 Oct 2025).

These diffusion-model works support a broader usage of the term in machine learning: noise is not merely a perturbation to suppress, but a latent object to invert, refine, or condition across diffusion time while preserving spatial structure.

6. Video, temporal consistency, and latent temporal signatures

When diffusion inversion is extended from images to video, the temporal dimension becomes explicit. Editable Noise Map Inversion is applied frame by frame and then integrated into Video-P2P. Each frame is inverted into noise maps using the same editable-noise principle, and attention control across all frames is used to keep edits consistent. The paper states that this is not a single joint optimization over a spatiotemporal latent volume; rather, it is a frame-wise noise inversion method extended to video with temporal-consistency mechanisms (Kang et al., 30 Sep 2025). That qualification is important because it distinguishes frame-wise temporal coordination from a fully coupled space-time latent optimization.

A forensic use of temporal noise inversion appears in DBINDS. The method inverts each frame $\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 4 into an initial diffusion noise tensor

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 5

forming for an 8-frame clip the sequence

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 6

DBINDS then constructs the Initial Noise Difference Sequence

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 7

The stated motivation is that adjacent differences emphasize temporal change and reduce redundancy, so the representation captures space-time noise inversion dynamics rather than one frame’s latent code in isolation (Wu et al., 12 Nov 2025).

The feature extraction that follows is explicitly multi-domain and multi-scale. Examples include global, temporal, and spatial energy,

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 8

and the fused gradient magnitude

$\|F_\lambda[m]w - d\| \le \epsilon \|d\|,\qquad \epsilon\in[0,1)$ 9

DBINDS then uses engineered spatiotemporal, frequency-domain, statistical, texture, and PCA-based features together with a cost-sensitive LightGBM classifier. Its core claim is that real videos and AI-generated videos produce systematically different temporal evolutions of recovered diffusion starting noises (Wu et al., 12 Nov 2025).

A plausible synthesis across these video-oriented papers is that temporal consistency is being enforced or measured at the level of latent noise trajectories rather than only in pixel space. That differs from classical denoising, where time is usually treated as an axis along which signals are smoothed; here time indexes either the video frames, the reverse diffusion process, or both.

One recurring misconception is that “space-time noise inversion” refers only to denoising. The cited literature is broader. In the acoustic and subdiffusion settings, the problem is source or parameter recovery from noisy data, with error bounds and contractive iterations (Symes, 2021); (Cen et al., 7 May 2026). In the stochastic heat equation, the object recovered is the covariance operator of a random potential, using second-order statistics rather than denoised trajectories (Li et al., 7 Feb 2025). In microlocal inversion, the principal question is how inversion transforms the noise power spectrum, not how to remove it (Stefanov et al., 2020). In diffusion models, the central operation is often inversion into latent noise maps, not inversion from noise-corrupted observations (Huberman-Spiegelglas et al., 2023); (Cho et al., 2024).

A second misconception is that more accurate inversion always improves downstream performance. DBINDS explicitly reports that more inversion steps do not necessarily improve detection performance: 1 step performs poorly, 5 steps give the best accuracy-efficiency tradeoff, and 10, 15, and 20 steps often degrade detection performance (Wu et al., 12 Nov 2025). This suggests that the optimal inverse representation may depend on the downstream objective rather than on reconstruction fidelity alone.

A third misconception is that temporal stacking or averaging necessarily resolves correlated-noise artifacts. In seismic interferometry, stacking only works when the source-correlation kernel becomes sufficiently localized; with correlated phases across frequencies or narrowband structure, crosstalk can persist (Ayala-Garcia et al., 2021). The same practical lesson appears in fractional source reconstruction, where refining $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 0 or $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 1 alone does not always improve reconstruction unless the amplified noise terms $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 2 and $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 3 are also controlled (Cen et al., 7 May 2026).

Beyond inversion in the strict sense, there are adjacent methods that use space-time structure to recover clean signals from noisy observations. An unstructured-mesh nonlinear denoising method reconstructs a clean target signal $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 4 by embedding a clean reference signal from the same dynamical system and averaging noisy observations over Voronoi cells in phase space, with Delaunay interpolation for continuity (Kirtland et al., 2022). TICaN, in turn, performs time-domain electromagnetic inversion by placing a U-net denoising block before an EfficientNet-based inversion block; it reconstructs constitutive-parameter images directly from noisy $\partial_t^\alpha u(t,x',x_d) - \Delta u(t,x',x_d) = \mathcal{F}(t,x',x_d),$ 5 time-domain field images and reports strong robustness under Gaussian, Rayleigh, and Uniform noise down to 1 dB SNR (Gao et al., 2022). These works are not identical to the PDE- and diffusion-based formulations above, but they reinforce the same structural idea: noise handling becomes an integral part of the inverse map when the data are intrinsically spatiotemporal.

In summary, the contemporary literature supports using “space-time noise inversion” as an umbrella term for inverse procedures in which temporal evolution and spatial structure are inseparable, and in which noise may be a nuisance to regularize, a statistic to identify, a phase-space quantity to transport, or a latent trajectory to recover and manipulate.