Understanding Latent Diffusability via Fisher Geometry

Published 3 Apr 2026 in cs.LG | (2604.02751v1)

Abstract: Diffusion models often degrade when trained in latent spaces (e.g., VAEs), yet the formal causes remain poorly understood. We quantify latent-space diffusability through the rate of change of the Minimum Mean Squared Error (MMSE) along the diffusion trajectory. Our framework decomposes this MMSE rate into contributions from Fisher Information (FI) and Fisher Information Rate (FIR). We demonstrate that while global isometry ensures FI alignment, FIR is governed by the encoder's local geometric properties. Our analysis explicitly decouples latent geometric distortion into three measurable penalties: dimensional compression, tangential distortion, and curvature injection. We derive theoretical conditions for FIR preservation across spaces, ensuring maintained diffusability. Experiments across diverse autoencoding architectures validate our framework and establish these efficient FI and FIR metrics as a robust diagnostic suite for identifying and mitigating latent diffusion failure.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates that latent diffusability in diffusion models is quantified by decomposing denoising errors into Fisher Information and Fisher Information Rate components.
It decouples geometric distortions into dimensional compression, tangential distortion, and curvature injection, each reflecting different aspects of encoder regularity.
Empirical validations on toy data and real images show that geometry-preserving encoders yield stable denoising performance and higher generative quality.

Information-Geometric Analysis of Latent Diffusability in Diffusion Models

Introduction

The paper "Understanding Latent Diffusability via Fisher Geometry" (2604.02751) presents a formal framework for analyzing the degradations observed when score-based diffusion models are applied in latent spaces, particularly those generated by autoencoders such as VAEs. The authors address the so-called diffusability problem—whereby generative performance in latent space is compromised due to geometric distortions introduced by the encoder—via a rigorous decomposition rooted in information geometry. The approach centers on Fisher Information (FI) and Fisher Information Rate (FIR), quantifying both the intrinsic complexity and the geometric penalties along the diffusion trajectory. Theoretical results are matched with controlled toy settings and real-world data (FFHQ), offering practical diagnostic metrics for latent-space suitability in generative modeling.

Formal Definition of Diffusability and Fisher Metrics

The work introduces a principled, information-theoretic measure of "diffusability," formalized as the rate of change of Minimum Mean Squared Error (MMSE) in denoising along the diffusion path. The MMSE is decomposed, via the I-MMSE identity, into two terms: global FI (the squared norm of the score function) and FIR (the dissipation rate, i.e., the Hessian's squared Frobenius norm). This decomposition captures two distinct sources of denoising resistance: baseline noise gain (linear with dimension and geometric stretch/compression) and high-order geometric complexity (quadratic penalty at high noise scales).

The authors derive explicit bounds on these metrics, showing:

FI preservation requires global near-isometry of the encoder: formalized via bi-Lipschitz constants bounding the Jacobian along the data manifold. This ensures the dominant error term reflects intrinsic noise, not latent artifacts.
FIR preservation imposes stricter, local regularity: the encoder must be locally near-isometric and low-curvature, with nonlinearity and manifold crinkling directly contributing to FIR deviation.

Decoupling Geometric Distortions

A central contribution is the precise decoupling of geometric distortions into measurable penalties:

Dimensional Compression: The drop from ambient data space to the latent (i.e., $D-d$ ), manifest as a constant offset that is universal and easy to characterize (Figure 1).

Figure 1: Illustration of dimension-induced distortion in latent-space FIR, with penalty of $(D-d)/\tau^2$ .

Tangential Distortion: Local non-isometry in encoder Jacobian ( $\delta$ ), i.e., stretching or compression along the manifold, as observed by linear mappings. This is directly bounded in their stability theorems (Figure 2).

Figure 2: Visualization of tangential metric distortion effect, highlighting $\delta$ -induced deviation.

Curvature Injection: Second-order nonlinearity (curvature injection, $\varepsilon/\sqrt{\tau}$ ), arising from encoder Hessian. Even smooth, low-distortion encoders cause quadratic penalties if curvature is not controlled, especially at low noise scales (Figure 3).

Figure 3: Depiction of curvature-induced FIR deviation, significant for nonlinear encoders.

These penalties are linked to score Jacobian structure, demonstrating that even when latent distributions are smoothed by diffusion, artificial curvature remnants from the encoder can severely impair generative performance.

Theoretical Stability Bounds

Theoretical results include:

Linear Stability (Theorem 1): Under linear encoders acting on flat support manifolds, FIR deviation scales linearly with tangential distortion and dimension drop. The empirical relationship $\mathcal{D}_{\mathcal{R}}\leq C(\delta + \frac{D-d}{D-m})$ is confirmed in toy settings (Figure 4).

Figure 4: FIR deviation scaling as a function of tangential distortion $\delta$ and ambient dimension $d$ .

Nonlinear Stability (Theorem 2): For nonlinear encoders, FIR deviation splits into dimension drop, tangential distortion, and curvature-dependent penalty $\varepsilon/\sqrt{\tau}$ , with higher sensitivity for small $\tau$ . Encoders trained under geometry-preserving losses (e.g., GPE) inherently suppress these penalties, as shown via explicit bounds tied to spectral regularity and empirical curvature estimates.

Empirical Validation

Gaussian Toy Experiments

A suite of experiments on two-dimensional Gaussian data and simple encoders (ReLU, Leaky ReLU, tanh, GELU) demonstrates the validity of FI/FIR decomposition. ReLU, with its Dirac-delta second derivatives, produces orders-of-magnitude higher FIR deviation than smooth activations, empirically confirming the theoretical divergence predicted for nonsmooth mappings (Figure 5).

Figure 5: FI and FIR curves for toy models; sharp deviation for nonsmooth encoders (e.g., ReLU).

FIR Deviation and Real Image Data

On FFHQ images processed via GPE and VAE encoding, models trained on GPE latents show both FI and FIR curves tightly aligned with the pixel baseline, reflecting near-isometric and low-curvature mappings, whereas VAEs induce severe FIR deviation correlating with generation failure (Figure 6).

Figure 6: FI and FIR comparison between pixel-space, GPE, and VAE latent diffusion models on FFHQ.

FIR deviation is shown to reliably predict generative quality—models with large FIR deviation (VAEs, NVAE with smaller spatial resolution) show degraded sample fidelity and FID scores, as confirmed in Figure 7.

Figure 7: FIR deviation as a function of encoder architecture and latent dimension; high deviation predicts generative failure.

Power Spectrum and Geometry Invariance

The authors contrast FI/FIR with spectral methods, highlighting that unlike power spectrum—which is not permutation or geometry-invariant—Fisher metrics directly capture geometric complexity and score smoothness in a representation-agnostic way (Figure 8).

Figure 8: Power spectra analysis across representations; geometric invariance of FI/FIR supports universal diagnostics.

Practical and Theoretical Implications

The proposed FI and FIR framework enables pre-training diagnostics of latent spaces before committing to expensive diffusion model training. This allows model designers to select or optimize encoders (GPE, sphere/hyperbolic autoencoders, etc.) that guarantee low deviation in Fisher metrics, thus ensuring stable and high-fidelity generation.

Theoretical implications include rigorous guarantees for the interplay between latent geometry and denoising complexity; practical implications extend to architecture design, regularization strategies, and model selection for latent diffusion.

Future Directions

Further research should examine the spectral bias of neural optimization in controlling encoder curvature globally, tighten bounds for deep nonlinear architectures, and link Fisher geometry to other measures of complexity (e.g., curvature-aware loss functions or manifold priors). Improved analytic techniques for measuring FIR in high-dimensional neural fields and adaptive regularization protocols based on Fisher metric feedback are promising for advancing generative modeling in latent spaces.

Conclusion

The paper establishes a rigorous, information-geometric foundation for latent diffusability in diffusion models. By quantifying and decoupling geometric penalties through FI and FIR, the authors provide both theoretical guarantees and practical diagnostics for encoder design. Empirical findings confirm that geometry-preserving encoders maintain denoising complexity close to the data space, while standard VAEs and nonlinear mappings inject significant geometric artifacts leading to generative failure. This framework is positioned to influence future developments in latent generative modeling, where explicit control of manifold geometry is essential for robust diffusion.