Non-Markovian Diffusion Framework

Updated 23 February 2026

Non-Markovian discrete diffusion is a framework where stochastic processes depend on multiple past states, incorporating features like heterogeneous sojourn times and memory effects.
Mathematical formulations leverage higher-order recursion relations, propagators with memory kernels, and continuum limits that yield position-dependent diffusion equations.
Algorithmic realizations include Monte Carlo simulations, geometric graph random walks, and advanced sequence generation models, demonstrating superior performance in complex settings.

A non-Markovian discrete diffusion framework is a class of models for random walks or stochastic processes in discrete settings, where the evolution law at each step depends on more than just the immediate previous state. Non-Markovianity emerges due to heterogeneous sojourn times, temporal correlations, spatial or structural inhomogeneities, or memory encoded in sequences. Such frameworks offer rigorous mathematical characterizations, algorithmic procedures, and sometimes continuum limits for various phenomena where Markovian approximations are inadequate, including heterogeneous media, temporal networks, and discrete generative models.

1. Formulations of Non-Markovianity in Discrete Diffusion

Multiple mechanisms generate non-Markovianity in discrete diffusion frameworks:

Heterogeneous Sojourn Times: In a 1D lattice, each site can be assigned a sojourn time $\tau(x)$ , so a particle arriving at $x$ waits $\tau(x)\,\Delta t$ before jumping. The resulting law for occupation probability $p_n^j$ links current occupation not only to the previous but, depending on position, also to the two previous time layers—exhibiting explicit dependence on the “history” and thus breaking the Markov property (Chung et al., 2023).
Partially Absorbing Barriers and Geometric Graphs: On a geometric graph $\mathcal{G}=(V, E)$ , a particle jumping to a vertex $v_\ell$ can be absorbed with probability $\rho_\ell$ or continue randomly along adjacent edges. The transition law is non-Markovian since the path history through partially absorbing barriers shapes the propagation kernel (Buhl, 2018).
Disorder in Temporal Networks: Time-ordered edge activations on networks (temporal networks) engender order-correlations, as the probability of a walker traversing an edge depends on both the current position and the arrival route (one-step memory). This leads to non-Markovian master equations unless the path-ordered memory is explicitly tracked (Scholtes et al., 2013).
Trajectory-Dependent Diffusion for Generative Models: For discrete diffusion models in structured data (e.g., language), recent frameworks allow the noising and denoising phases to depend on the entire sequence (past and future steps) rather than only the current one, yielding a fundamentally non-Markovian transition structure (Zhang et al., 13 Feb 2025).

2. Mathematical Structures and Continuum Limits

The mathematical analysis of non-Markovian discrete diffusion leverages both combinatorial and analytic techniques:

Difference and Recursion Laws: Typical recursion relations span multiple time layers:

$p_{n}^{j}\;=\;\frac12 \begin{cases} p_{n-1}^{\,j-1}+p_{n-1}^{\,j+1},&j<0,\ p_{n-1}^{-1}+p_{n-2}^{+1},&j=0,\ p_{n-2}^{\,j-1}+p_{n-2}^{\,j+1},&j>0. \end{cases}$

Markovian reductions may be constructed on sub-sampled time grids, enabling analytic tractability (Chung et al., 2023).

Propagators and Memory Kernels: The propagator $K(y, x, t)_{ji}$ for the position at time $t$ can be expressed as a series incorporating all possible reflection and absorption histories:

$K(y,x,t)_{j i} = \delta_{j i} q_{0}(y,x,t) + \sum_{k=1}^{\infty}\sum_{(\ell_1,\dots,\ell_k)} \left(\prod_{m=1}^k\frac{2 c_{\ell_m}}{d_{\ell_m}}\right) q_k(\cdot,\cdot, t)$

(Buhl, 2018).

Continuum Limits and Effective Equations:
- For spatially heterogeneous sojourn times, the scaling limit ( $\Delta x \to 0$ , $\Delta t \to 0$ with $\Delta x^2/\Delta t$ fixed) yields a heterogeneous diffusion PDE:
$\tau(x) \partial_t w = \frac12 \partial_{xx} w, \qquad v_t = \frac12 \partial_{xx}\!\left(\frac{v}{\tau(x)}\right)$

(Chung et al., 2023). - For geometric graphs, continuum propagators can be written either as sums of Gaussians (with non-Markovian weights) or as sums over eigenmodes with band gaps determined by barrier parameters (Buhl, 2018).
Higher-Order Markov Representations: For temporal networks, a second-order Markov chain (states are pairs of consecutive edges) is constructed so that, despite genuine non-Markovianity on the original network, the higher-order chain is Markovian in the pair space (Scholtes et al., 2013).

3. Algorithmic Realizations and Sampling Procedures

Practical frameworks instantiate non-Markovian discrete diffusion as algorithmic procedures:

Monte Carlo Simulation with Position-Dependent Waiting: For heterogeneous sojourn times, particles are propagated according to local $\tau(x)$ , updating positions via Gaussian increments until a fixed macroscopic time is reached, and endpoint histograms are compared to analytic Green's functions (Chung et al., 2023).
Geometric Graph Random Walks: Steps are performed along edges, with random branch selection and stochastic absorption at vertices. The non-Markovian effects are manifest in how memories of past traversals influence absorption and overall path lengths (Buhl, 2018).
Discrete Diffusion in Structured Sequence Generation: In non-Markovian causal discrete diffusion (“CaDDi”), corruption is applied to the initial data independently at every time step, and the denoising model conditions on the full future trajectory ( $p_\theta(x_{t-1} | x_{t:T})$ ). This allows the generation to revisit and improve previous states, which is operationalized via standard causal transformers with extended positional embeddings in both sequence and diffusion-step dimensions (Zhang et al., 13 Feb 2025).
Second-Order Random Walks in Temporal Networks: The walker proceeds on the space of order pairs (edges), using empirically measured transition probabilities, thereby encoding memory without requiring the full event sequence (Scholtes et al., 2013).

4. Analytical Results: Green's Functions, Eigenstructure, and Mixing

Explicit solutions and metrics for non-Markovian discrete diffusion frameworks include:

Green’s Functions for Heterogeneous Diffusion: Closed-form integral expressions for the fundamental solution, e.g.,

$G(t,x;a) = \frac{1}{\gamma(x)} W(t, x; a)$

with $W$ a piecewise-integral involving the local diffusion coefficient, enable direct comparison with simulation (Chung et al., 2023).

Image and Eigenmode Expansions: On geometric graphs,

$\bar K(y, x, t)_{ji} = \sum_{n=-\infty}^\infty \left[S_n)_{ji} \bar g_n(y, x, t) + (J S_n)_{ji} \bar g_n(L - y, x, t)\right]$

and

$\bar K(y, x, t)_{ji} = \sum_{n=0}^\infty e^{-k_n^2 D t} u_{jn}(y) u_{in}(x)$

as infinite sums of Gaussians or plane waves. Spectral features such as line splitting and band gaps emerge when partial absorption is present (Buhl, 2018).

Spectral Predictors of Diffusion Speed: In temporal networks, the mixing time to stationarity is governed by the second-largest eigenvalue $\lambda_2$ of the memory-encoded transition matrix. The slow-down/speed-up ratio

$S^* = \frac{\ln|\tilde{\lambda}_2|}{\ln|\lambda_2|}$

quantifies how much memory in the system accelerates or retards diffusion relative to its Markovian null model (Scholtes et al., 2013).

5. Applications and Empirical Findings

Non-Markovian discrete diffusion frameworks are broadly applicable:

Complex Media and Heterogeneous Materials: The modeling of diffusion across interfaces with different waiting times or partially permeable membranes necessitates position-dependent memory kernels (Chung et al., 2023, Buhl, 2018).
Porous Structures and Biological Networks: The geometric-graph approach naturally extends to model diffusion in porous materials, vascular or airway trees, and mesoscopic boundary-structured domains (Buhl, 2018).
Sequence Modeling and Machine Learning: CaDDi outperforms standard Markovian discrete diffusion models and approaches the performance of large autoregressive LLMs on structured sequence generation, protein design tasks (e.g., ACYP protein: pLDDT=92.9, TM-score=0.97, RMSD=0.90, 100% homology) and infilling in text, with downstream improvements in perplexity and diversity (Zhang et al., 13 Feb 2025).
Temporal Networks: The analytic machinery accurately predicts under which circumstances ordering effects—causal correlations of events—either slow down or accelerate information diffusion, with applications in communications, transportation, and transmission in realistic time-varying systems (Scholtes et al., 2013).

6. Special Cases, Reductions, and Generalizations

Markovian Limits: Non-Markovian frameworks often recover well-known Markovian processes in limits such as homogeneous sojourn times ( $\tau\equiv 1$ ), no partial absorption ( $c_\ell=1$ ), or synthetic models with no order correlations, yielding the standard heat equation, classical image expansions, or memoryless random walks respectively (Chung et al., 2023, Buhl, 2018, Scholtes et al., 2013).
Framework Unification: In the CaDDi model, setting number of diffusion steps $T=1$ and restricting the context to the current position maps the framework back to standard causal language modeling; architectural minimalism ensures interoperability with pretrained models (Zhang et al., 13 Feb 2025).
Markovization via State-Expansion: Higher-order Markovization (e.g., via edge pairs in temporal networks) renders certain classes of non-Markovian systems accessible to standard spectral analysis while preserving empirically observed memory effects (Scholtes et al., 2013).

7. Outlook and Theoretical Significance

Non-Markovian discrete diffusion frameworks rigorously generalize the random-walk paradigm to systems with memory, spatial or temporal heterogeneity, or causal ordering. They offer explicit propagators, continuum limits, and algorithmic recipes for Monte Carlo or sequence modeling tasks. Spectral criteria enable analytical predictions of global quantities such as mixing times and steady-state distributions. Empirical validations reveal that these models can capture essential behaviors (e.g., steady-state densities, propagation speeds, enhanced sampling diversity) not accounted for by Markovian approaches, providing essential tools for both theoretical physics and modern machine learning (Chung et al., 2023, Zhang et al., 13 Feb 2025, Buhl, 2018, Scholtes et al., 2013).