Information-Minimum Denoising Score Entropy
- I-MDSE is a theoretical framework that precisely relates the instantaneous decay of mutual information to the minimum DSE loss in discrete diffusion models.
- It establishes an exact likelihood decomposition, analogous to the I-MMSE identity, by integrating the DSE loss to recover negative log-likelihood without variational gaps.
- The framework enables practical likelihood estimation in score-based generative modeling through continuous-time Markov processes and is validated with empirical studies.
The Information-Minimum Denoising Score Entropy (I-MDSE) is a fundamental theoretical relation in discrete diffusion models, establishing an exact connection between the rate of mutual information decay in a continuous-time Markov process and the minimum achievable value of the Denoising Score Entropy (DSE) loss function. Analogous to the I-MMSE identity for Gaussian diffusion, I-MDSE enables a tight, time-integral decomposition of the negative log-likelihood in discrete data spaces. This framework provides both rigorous justification and practical tools for likelihood estimation in discrete score-based generative modeling, without loose variational approximations (Jeon et al., 28 Oct 2025).
1. Discrete Diffusion, DSE Loss, and Score Estimation
The I-MDSE framework is formulated within the discrete diffusion paradigm where the data space is a finite set, with possible extension to sequences as . The forward process consists of a continuous-time Markov chain (CTMC) governed by a transition rate matrix , where is a nonnegative, smooth rate schedule. The process evolves the initial data distribution towards a stationary distribution as .
In this context, a score network is trained to approximate the marginal ratio for . The central training loss is the Denoising Score Entropy (DSE) loss, defined pointwise for clean and noisy as:
where . At the optimal value , the loss attains zero in terms of first-order optimality. The pointwise minimum DSE is:
and the marginal variant is
2. The I-MDSE Theorem: Information Decay and Score Entropy
The central result, the I-MDSE theorem, expresses the instantaneous rate of mutual information decay as the negative of the minimum denoising score entropy:
- Pointwise: For every and ,
- Marginal: Taking expectation over yields
These identities show that the minimum DSE loss captures the exact, instantaneous loss of information about under the forward diffusion at time . Since , mutual information is monotonically decreasing in as expected for diffusion processes.
3. Derivation and Mathematical Foundations
The derivation proceeds by analyzing the time derivative of for a CTMC using the chain rule for path-measure KL and Dynkin's formula:
Algebraic manipulations reveal that, evaluated at the true ratio , the DSE loss matches this expectation up to sign. Rearranging yields the I-MDSE identity.
4. Log-Likelihood Decomposition and Tightness
Integrating the I-MDSE identity from to some yields:
As and , provided is independent of , the terminal KL term vanishes or is computable. The decomposition is then
This result is exact—unlike typical variational bounds on negative log-likelihood, no looseness remains when the optimal is used. In applied settings, a neural approximates , introducing only estimator error, not variational gap.
5. Practical Estimation and Empirical Deployment
Estimation of in practice proceeds as follows:
- Time discretization: Choose grid or .
- Sampling: For each , sample using the CTMC's known forward law (direct sampling, thinning, etc.).
- Loss evaluation: Compute .
- Integration: Form the Riemann sum . As , this converges to the exact log-likelihood integral.
Empirical evaluations, including on synthetic DNA-alphabet CTMCs, demonstrate that this estimator closely matches ground-truth likelihoods. The score network trained via DSE simultaneously learns the required integrand for likelihood estimation (Jeon et al., 28 Oct 2025).
6. Underlying Assumptions and Theoretical Significance
The I-MDSE identity, as well as its log-likelihood decomposition, rely on several conditions:
- The forward diffusion is a CTMC with smooth, time-dependent rates.
- The DSE loss is minimized exactly at the true marginal ratio .
- The forward process approaches a stationary distribution as , with independent of to ensure the terminal KL term vanishes or is trivial.
Provided these hold, I-MDSE is an equality, not a bound. This contrasts with standard variational inference, which incurs additional slack. The result thus theoretically validates score-matching for discrete diffusion as a tight likelihood estimator.
7. Extensions and Broader Context
I-MDSE is part of a broader information-theoretic framework for discrete diffusion, which also includes the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation for masked processes and conditional likelihoods. These developments allow time-free likelihood estimation, likelihood-ratio estimation via Monte Carlo coupling, and principled treatment of prompt-response or in-context prediction tasks (Jeon et al., 28 Oct 2025). The I-MDSE result is the discrete diffusion analogue of the classical I-MMSE identity in Gaussian settings, thus generalizing information-theoretic score-based learning beyond the continuous domain.