Continuous-Time Discrete Diffusion

Updated 26 July 2025

Continuous-time discrete diffusion is a stochastic process that evolves continuously in time but transitions between discrete states, incorporating memory effects via arbitrary waiting time distributions.
It generalizes classical diffusion equations by using integro-differential formulations and accommodates both Markovian and non-Markovian regimes through exponential or power-law waiting times.
This framework underpins advanced applications in machine learning, physics, and finance by enabling efficient simulation methods like uniformization and tau-leaping for generative modeling.

A continuous-time discrete diffusion process is a stochastic process characterized by continuous temporal evolution but a discrete state space, in which probability mass propagates between states according to transition rates defined by the generator of a continuous-time Markov chain (CTMC). This framework underpins modern modeling approaches in statistical physics, generative machine learning, econometrics, and quantitative finance, allowing principled treatment of both classical (e.g., random walks, birth–death processes) and modern high-dimensional structured data such as text, graphs, and CAD sketches.

1. Mathematical Foundations and the Integro-Differential Equation

The core mathematical formalism of continuous-time discrete diffusion is encapsulated by the evolution equation for the probability distribution $p(x, t)$ over discrete states $x$ as a function of time $t$ . For the generic continuous-time random walk (CTRW) on a lattice or network, the master equation takes an integro-differential form that incorporates a general waiting time probability density $g(t)$ between transitions: $\frac{d}{dt} \bigg[ p(x, t) - \int_0^t g(t - t_1) p(x, t_1)\, dt_1 \bigg] = D \int_0^t g(t - t_1) \frac{\partial^2}{\partial x^2} p(x, t_1)\, dt_1$ or equivalently,

$\frac{\partial p(x, t)}{\partial t} - \int_0^t g(t - t_1) \frac{\partial p(x, t_1)}{\partial t_1} dt_1 = D \int_0^t g(t - t_1) \frac{\partial^2}{\partial x^2} p(x, t_1) dt_1$

This equation generalizes the standard diffusion (heat) equation by admitting memory effects and nonlocal temporal kernels through $g(t)$ . For exponential waiting times, the process is Markovian and the equation reduces to the standard diffusion equation. For long-tailed (e.g., power-law) waiting time distributions, the system exhibits non-Markovian, subdiffusive, or anomalous transport regimes (Fa et al., 2010).

The CTMC perspective connects this with discrete state spaces, where the generator matrix $Q$ determines transition rates between states, and the Kolmogorov forward equation governs the time evolution: $\frac{d}{dt} p(t) = p(t) Q$ or, for time-inhomogeneous cases,

$\frac{d}{dt} p(t) = p(t) Q(t)$

with transition kernel $P_{x, y}(s, t) = [\exp((t-s) Q)]_{x,y}$ .

2. Waiting Time Distributions and Diffusion Regimes

The waiting time distribution $g(t)$ determines the nature of the continuous-time discrete diffusion:

Exponential $g_1(t) = A e^{-A t}$ : The process is memoryless, and the mean-square displacement (MSD) grows linearly: $\langle x^2 \rangle(t) = \langle x^2 \rangle_0 + 2 D A t$ ; the propagator is a Gaussian.
Power-law/Generalized Mittag–Leffler $g_2(t) = \frac{t^{\alpha-1}}{T^\alpha} E_{\alpha,\alpha}(- (t/T)^\alpha)$ : For $0 < \alpha < 1$ , the MSD grows sublinearly, $\langle x^2 \rangle(t) \propto t^\alpha$ , and the probability distribution is non-Gaussian at all times, reflecting anomalous (subdiffusive) kinetics (Fa et al., 2010).

In the CTMC framework, these behaviors correspond to different choices of the generator $Q$ and the spectral properties of the rate matrix.

3. Construction and Reversal of CTMCs

A detailed, general construction of continuous-time discrete diffusion for modern machine learning and generative modeling employs the CTMC formalism for both the forward noising (diffusion) and the reverse (denoising, generative) processes (Campbell et al., 2022, Sun et al., 2022, Santos et al., 2023).

Forward process (noising): The process evolves by random transitions according to a generator $R_t$ . Each infinitesimal time increment $dt$ causes independent state transitions according to $q_{t|t-dt}(x'|x) = \delta_{x',x} + R_t(x, x')dt + o(dt)$ .
Reverse process (generation): The time reversal yields another CTMC with generator

$\tilde{R}_t(x, y) = R_t(y, x) \frac{q_t(y)}{q_t(x)}$

where $q_t(x)$ is the time- $t$ marginal of the forward process. Intractable $q_t(\cdot)$ ratios can be parameterized and estimated using neural networks.

Both processes admit efficient simulation via uniformization (simulating Poisson jump times) (Chen et al., 12 Feb 2024), tau-leaping (for scalable approximate sampling) (Campbell et al., 2022), or predictor-corrector schemes (Siraudin et al., 10 Jun 2024).

4. Score Matching, Denoising, and Training Objectives

Continuous-time discrete diffusion models leverage analogues of “score matching” in discrete spaces, despite the absence of true gradients. Categorical ratio matching is a core technique: $q_t(y)/q_t(x) \approx \frac{p_\theta(X^d = y | x^{(-d)})}{p_\theta(X^d = x^d | x^{(-d)})}$ where $p_\theta$ is a neural estimator of the conditional marginal, and $x^{(-d)}$ omits coordinate $d$ .

Training is performed by minimizing an evidence lower bound (ELBO) or, in some formulations, a cross-entropy loss over conditional marginals (Sun et al., 2022). In continuous time, the variational objective becomes a CTELBO, coinciding in the small-step limit with the traditional discrete time bound (Campbell et al., 2022). For generative applications, both Euler numerical and analytical samplers are used to simulate the reverse jump process and reconstruct data from noise.

Theoretical convergence guarantees in discrete spaces are tight, with KL and Total Variation divergence scaling as $O(d \log(d/\epsilon))$ under mild estimator assumptions (Chen et al., 12 Feb 2024, Zhang et al., 3 Oct 2024). This matches or surpasses analogous results in the $\mathbb{R}^d$ SDE diffusion literature.

5. Applications: From Physics to Machine Learning

The continuous-time discrete diffusion framework is broadly deployed in:

Disordered and biological systems: Modeling random walks, anomalous diffusion, and subdiffusive transport due to non-exponential waiting times (Fa et al., 2010).
Language, text, and sequence modeling: Extending discrete denoising diffusion models using CTMCs for text generation and translation—capturing token-level stochasticity, supporting context-aware and non-simultaneous diffusion with superior BLEU scores on WMT and IWSLT tasks (Li et al., 28 May 2025).
Graph and molecular generation: Discrete-state continuous-time diffusion elegantly models graph evolution by defining CTMCs over node/edge types, preserving permutation invariance and enabling efficient tau-leaping sampling—yielding improvements on chemical and synthetic graph benchmarks (Xu et al., 19 May 2024, Siraudin et al., 10 Jun 2024).
Structured sketch and layout synthesis: Unified continuous–discrete Gaussian–Softmax diffusion enables joint modeling of continuous parameters and discrete labels (e.g., in CAD sketches or UI layouts), substantially improving FID and NLL over previous baselines by leveraging blended class superpositions and respecting permutation invariance (Chereddy et al., 15 Jul 2025).

Additionally, the ability to merge reinforcement learning objectives, detailed balance training, and maximum-entropy control with continuous-time sampling connects classical policy learning with modern diffusion-based samplers (Berner et al., 10 Jan 2025).

6. Algorithmic Developments and Sampling Strategies

Several efficient sampling and training algorithms have emerged:

Uniformization: Simulates CTMCs exactly by (i) simulating Poisson-distributed jump times, and (ii) at each random time, applying a modified transition determined by the generator (Chen et al., 12 Feb 2024).
Tau-leaping: Approximates many small jumps by simultaneous Poisson-distributed transitions over fixed intervals, enabling scalable, high-dimensional simulation without per-jump iteration (Campbell et al., 2022, Xu et al., 19 May 2024).
Gaussian–Softmax diffusion: Allows discrete variables to be perturbed in the logit space with Gaussian noise, yielding soft memberships and seamless integration with continuous diffusion for mixed data (Chereddy et al., 15 Jul 2025).
Context-aware time prediction: Supports non-simultaneous token denoising by learning intrinsic (per-token) time states and adapting the reverse process to semantic content (Li et al., 28 May 2025).

7. Theoretical and Empirical Impact

The continuous-time discrete diffusion framework has produced models with state-of-the-art results in multiple domains:

On the SketchGraphs CAD benchmark, SketchDNN using Gaussian-Softmax diffusion reduced FID from 16.04 to 7.80 and NLL from 84.8 to 81.33 (Chereddy et al., 15 Jul 2025).
For text and language generation, continuous-time discrete and joint continuous–discrete models outperform discrete or continuous-only baselines on BLEU, FID, and user studies, with speedups in sampling by orders of magnitude (Chen et al., 2023, Gong et al., 2023).
On molecular and graph generation, integration of continuous-time with permutation-equivariant structures and efficient sampling schemes yields validation rates, MMD scores, and VUN metrics that surpass previous models (Xu et al., 19 May 2024, Siraudin et al., 10 Jun 2024).

The theoretical machinery—closed-form kernels for forward processes, tight bounds for reverse process error, and mappings between discrete and continuous limits (e.g., Ehrenfest process $\to$ OU process) (Winkler et al., 6 May 2024)—provides a robust analytical foundation for future advances in discrete data modeling and generative machine learning.

In summary, continuous-time discrete diffusion processes unify and extend classical random walk, Markov chain, and stochastic differential equation paradigms for discrete spaces. By admitting arbitrary waiting time distributions, supporting efficient training and sampling, and enabling tight theoretical guarantees, these models underpin new state-of-the-art generative and inference methods across domains where the discrete nature of data is fundamental.