Interpretable Diffusion via Info Decomposition

Updated 29 December 2025

The paper introduces a framework that leverages Shannon information measures to decompose and quantify semantic relationships in diffusion models.
It presents pointwise and feature-wise attribution methods that enable precise localization and editing of outputs at pixels, tokens, and latent dimensions.
Algorithmic advances support compositional analysis, unsupervised object localization, and diagnostic auditing, enhancing controllability in generative tasks.

Interpretable Diffusion via Information Decomposition refers to a suite of theoretical, algorithmic, and empirical advances that link diffusion-based generative models to information theory, enabling rigorous quantification and attribution of the learned structure within these models. Techniques under this umbrella provide detailed explanations for how, where, and why diffusion models capture semantic relationships—at the granularity of tokens, pixels, features, or latent dimensions—by explicitly decomposing measures such as mutual information and conditional mutual information during or after model training.

1. Theoretical Foundations: Diffusion as Information Decomposition

Denoising diffusion models, central to modern generative modeling for images, text, and more, can be interpreted through the lens of information decomposition. The foundational mathematical insight is that the forward noising process in diffusion admits an exact relation with Shannon information measures. Given an observed variable $X$ (e.g., an image or data sample) and a condition or prompt $C$ (e.g., text caption), the variance-preserving forward process generates noisy observations $Y_\alpha = \sqrt{\gamma(\alpha)} X + \sqrt{1 - \gamma(\alpha)} \varepsilon$ , with $\varepsilon \sim \mathcal N(0, I)$ and log-SNR parameter $\alpha$ .

A diffusion model fits the MMSE denoiser at each $\alpha$ , which, crucially, allows log-likelihood and (conditional) mutual information to be written as time-integrals of denoising errors:

$I(X; C) = \mathbb{E}_{x,c} \left[\ell(x; c)\right] = \frac{1}{2} \int_0^\infty \mathbb{E}\left[\|\varepsilon - \hat{\epsilon}_\alpha(y_\alpha)\|^2 - \|\varepsilon - \hat{\epsilon}_\alpha(y_\alpha, c)\|^2\right] d\alpha$

This yields a tractable, architecture-agnostic estimator for mutual information and its decomposition at any subsample (e.g., pixel or latent dimension) granularity (Kong et al., 2023).

In discrete settings, similar decompositions arise via the Information-Minimum Denoising Score Entropy (I-MDSE) identity, where the instantaneous decay of mutual information along the forward diffusion trajectory is measured by the optimal DSE loss, and integrating this rate recovers data likelihood exactly (Jeon et al., 28 Oct 2025).

2. Pointwise and Feature-wise Information Attribution

Moving beyond global measures, information decomposition for diffusion enables the attribution of information flow to individual variables, whether spatial (pixels), latent features, or semantic units (tokens).

A key outcome is the existence of non-negative, orthogonally-decomposed pointwise estimators. For instance, the orthogonal pointwise conditional mutual information at a given pixel $j$ :

$i^o_j(x; c) = \frac{1}{2} \int_0^\infty \mathbb{E}\left[(\hat{\epsilon}_\alpha(y_\alpha) - \hat{\epsilon}_\alpha(y_\alpha, c))_j^2\right] d\alpha$

assigns a precise quantitative value to the informativeness of $c$ for that pixel in sample $x$ . This can be readily aggregated to generate heatmaps illustrating where and how a prompt influences the generated output (Kong et al., 2023).

In high-dimensional problems such as sensory neuroscience, axiomatic decomposition of mutual information—requiring completeness, nonnegativity, locality, and additivity—admits a unique solution: the per-feature information is an integral of Fisher information along the noise path, efficiently estimated via diffusion (Laquitaine et al., 16 May 2025).

3. Partial Information Decomposition and Higher-Order Attribution

Partial Information Decomposition (PID) extends the analysis to disentangle unique, redundant, and synergistic information among multiple input sources. In the diffusion model setting, PID is applied to interpret the contribution of individual prompt tokens and their interactions to generated outputs at both image and pixel level (Zawar et al., 2024).

Given prompts $T = \{T_1, \dots, T_n\}$ , for two inputs $Y_1, Y_2$ and output $X$ , the decomposition:

$I(Y_1, Y_2; X) = r(Y_1, Y_2; X) + u(Y_1 \backslash Y_2; X) + u(Y_2 \backslash Y_1; X) + s(Y_1, Y_2; X)$

produces fine-grained attribution maps. Here:

Redundancy $r$ measures shared information
Unique information $u$ quantifies what is carried only by a specific source
Synergy $s$ captures information accessible only via joint observation.

Empirical results demonstrate the ability to localize objects via unique information, expose model bias through redundancy (e.g., gender-occupation associations), and identify synergistic dependencies that resolve word ambiguity (Zawar et al., 2024).

4. Algorithmic Realizations: Practical Estimators and Editing Procedures

Information decomposition methods translate into concrete algorithms for both interpretation and editing:

Pointwise and pixel-wise decomposition: Integrate squared denoising differences over SNR (or time) for conditional and unconditional runs, using pre-trained diffusion models. These estimators can be batched for efficiency and require no re-training or auxiliary networks (Kong et al., 2023).
Eigen-decomposition in self-attention: Analytical decomposition of U-Net self-attention weight matrices provides semantic editing directions by extracting eigenvectors corresponding to interpretable axes of variation. Perturbations are injected into the U-Net's self-attention latents during specific diffusion intervals. This method yields nearly orthogonal, sample-independent editing directions and supports rapid, disentangled edits across datasets (Anand et al., 26 Oct 2025).
Hierarchical Koopman lifting: The nonlinear diffusion process is elevated to a multi-scale, globally linear latent space (Koopman subspace). Each scale encodes distinct spatial and spectral features, enabling closed-form, editable trajectories and spectral-mode-specific interventions (Bai et al., 14 Oct 2025).
Time-free and coupled-sampling estimators: For discrete diffusion, new estimators collapse time-integration into a single random mask and tightly control variance for likelihood or likelihood ratio estimation, making possible efficient and robust post hoc audits of model behavior (Jeon et al., 28 Oct 2025).

5. Applications: Compositionality, Localization, Editing, and Model Diagnosis

Information decomposition approaches have been leveraged for multiple interpretability and control tasks:

Compositional understanding: Model's capability to capture relational structure is quantitatively measured by information-decomposition scores on established compositionality benchmarks, outperforming CLIP and direct attention-based methods (Kong et al., 2023).
Unsupervised object localization: Pixel-wise conditional mutual information maps identify object regions grounded in textual prompts, outperforming standard attention (DAAM) and supporting integration for improved segmentation (Kong et al., 2023).
Prompt intervention and editing: Selective removal or modification of input tokens, guided by decomposed information score, allows prediction of actual output changes, outperforming attention for intervention forecasting and supporting prompt pruning to eliminate redundant inputs (Zawar et al., 2024).
Latent space disentanglement: InfoDiffusion regularizes the diffusion process via mutual information maximization, yielding low-dimensional latents that align with semantic factors and can be manipulated independently for controlled generation (Wang et al., 2023).
Diagnostics and auditing: Time-integral and time-free estimators reveal OOD detection, model data-influence detection, and enable sharp auditing of training data provenance (Jeon et al., 28 Oct 2025).

6. Generalization to Time Series, Neural Coding, and Discrete Domains

Techniques for interpretable diffusion via information decomposition generalize beyond image-text settings to other modalities:

Time series (Diffusion-TS): Disentangled temporal decomposition yields trend, seasonality, and residual layers in the diffusion decoder, each with explicit semantic roles accessible at every generation step. Fourier-based penalties sharpen interpretability and sample quality while supporting conditional tasks like imputation and forecasting with no architectural change (Yuan et al., 2024).
Sensory neural coding: Decomposition of mutual information into per-feature or per-stimulus contributions is made tractable for biological and artificial neural data via diffusion-based Fisher information estimation, satisfying axiomatic requirements crucial for neuroscience applications (Laquitaine et al., 16 May 2025).
Discrete domains: Information-theoretic discrete diffusion analysis shows that the standard denoising losses tightly align with mutual information decay and log-likelihoods, enabling principled, exact estimation and attribution for token-based data such as text or DNA (Jeon et al., 28 Oct 2025).

7. Limitations and Future Directions

Limitations include reliance on denoiser optimality (MMSE fidelity), imperfect generalization for higher-order PID beyond pairs or low-arity groups, and computational cost for conditional diffusion models in very high dimensions (Kong et al., 2023, Laquitaine et al., 16 May 2025). Current approaches primarily address the sum-decomposition or pairwise redundancy/synergy, whereas full higher-order interactions remain an open challenge for scaling and interpretation (Zawar et al., 2024). Empirical tuning of regularization weights for mutual information objectives and assumptions (such as isotropic noise or specific corruption kernels) limit generality, although recent algorithmic advances such as spectral Koopman analysis and time-free discrete likelihood estimation indicate ways forward (Bai et al., 14 Oct 2025, Jeon et al., 28 Oct 2025).

A plausible implication is that as model architectures grow more complex, diffusion-based information decomposition will become foundational for auditing, controlling, and understanding the semantics of generative models in research and applied domains. Further research directions include learning decompositions in the training loop, extending analysis to multi-modal (audio, genomics, video) data, and developing estimator consistency guarantees under architectural imperfections.

Markdown Upgrade to Chat

References (8)

Interpretable Diffusion via Information Decomposition (2023)

Information-Theoretic Discrete Diffusion (2025)

Decomposing stimulus-specific sensory neural information via diffusion models (2025)

DiffusionPID: Interpreting Diffusion via Partial Information Decomposition (2024)

Self-Attention Decomposition For Training Free Diffusion Editing (2025)

Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory (2025)

InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models (2023)

Diffusion-TS: Interpretable Diffusion for General Time Series Generation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interpretable Diffusion via Information Decomposition.