First-Hitting Sampler (FHS)
- First-Hitting Sampler (FHS) is a class of probabilistic algorithms that samples the distribution of a stochastic process at the first time it reaches a specified target set.
- It leverages explicit path decompositions, combining quasi-stationary mixing phases with geometric tails to ensure unbiased simulation in both discrete and continuous settings.
- FHS finds practical applications in rare-event analysis and generative modeling, offering rigorous error bounds, finite computational cost, and adaptability to high-dimensional spaces.
A First-Hitting Sampler (FHS) is a class of probabilistic sampling algorithms that generate random samples from the distribution of a stochastic process at the first time it reaches a specified target set or boundary. This framework generalizes the classical idea of strong stationary times and is applied to both discrete and continuous Markov processes (including diffusion models and Markov chains), underpinning unbiased simulation in rare-event analysis, generative modeling, sequential decision problems, and more. FHS fundamentally exploits the path decomposition at the first hitting time, offering rigorous guarantees, sharp error bounds, and practical efficiency in high-dimensional and structured spaces.
1. Foundational Principles of First-Hitting Sampling
FHS leverages the explicit pathwise decomposition of Markov processes at the stopping, or hitting, time of a target set. In discrete-time Markov chains, for a set in state space , the first hitting time is the minimum such that . In diffusions or continuous time Markov processes (CTMCs), the first hitting time is defined analogously as the infimum over for which the process enters the absorbing boundary.
The FHS approach arises from the study of strong stationary times and their generalizations. The latter, especially Conditionally Strong Quasi-Stationary Times (CSQST), allow one to decompose the first-hitting probability law into a "mixing phase" toward a quasi-stationary regime and an independent geometric (or exponential) tail, which can be sampled exactly under explicit, verifiable conditions (e.g., ergodicity, primitivity) (Manzo et al., 2016). In continuous spaces, parametrix expansions or Doob’s -transform are employed to represent exit-time densities and bridge laws (Frikha et al., 2016, Ye et al., 2022).
2. FHS in Discrete Markov Chains and Quasi-Stationarity
In a discrete Markov chain with transition matrix , target/absorbing set 0, and complement 1, FHS exploits the quasi-stationary structure of 2. After initial mixing to the quasi-stationary law 3 in 4 (CSQST 5), the residual time until hitting 6 is exactly geometric with parameter 7 (where 8 is the spectral radius of 9). The joint law of 0 is then
1
with 2 and 3 a computable shift. This separation underpins an exact sampling algorithm: first mix to quasi-stationarity via the minimal CSQST, then simulate a geometric tail to the first hit, sampling the exit location from 4, the quasi-stationary exit distribution (Manzo et al., 2016). All components are computable from the underlying chain's eigenstructure, with pathwise and probabilistic correctness guarantees.
3. FHS for Continuous Diffusions and Stochastic Differential Equations
For stochastic differential equations (SDEs) and continuous diffusions, the FHS method is grounded in the parametrix expansion of hitting-time densities and Doob’s 5-transform for path conditioning. Specifically, for a diffusion stopped at an absorbing boundary 6, the joint law of 7 is represented as a convergent series of density kernels (frozen bridge densities, Levy densities, and parametrix corrections), each admitting Gaussian-type bounds and analytic expressions involving Hermite polynomials (Frikha et al., 2016). The Monte Carlo FHS simulates Poisson-distributed random inter-arrival times and propagates "frozen" Euler bridges between these times, terminating upon hitting 8 and correcting for bias through analytic weights—yielding an unbiased estimator for expectations involving 9. This method is robust to regularity, supports heavy-tailed or truncated parametrix series, and achieves finite expected computational cost proportional to the mean hitting time.
4. FHS in Diffusion-Based Generative Modeling
The FHS paradigm has recently found impactful use in discrete latent diffusion models for generative modeling of symbolic data, such as text sequences and categorical images. In the masked diffusion setting, the process is described by a CTMC on 0 with mask-absorbing rates for each coordinate. FHS here proceeds by iteratively unmasking coordinates in continuous time, exactly matching the jump-time and jump-index laws of the reverse CTMC. The sampling path is described by sequences of 1, with randomness realized by uniform variables 2 and coordinate selections (Liang et al., 26 Feb 2026).
Crucially, error analysis reveals that FHS’s sampling discrepancy is solely attributable to score estimation (e.g., the error in predicting conditional token probabilities by the learned network), with no discretization or surrogate-initialization error. This is in contrast to standard 3-leaping Euler schemes, where discretization error persists even with perfect scores. The pathwise KL bound for FHS is
4
which is tight in an information-theoretic sense (Liang et al., 26 Feb 2026). This dimension-free, vocabulary-free guarantee is significant for high-dimensional symbolic domains.
5. Algorithmic Formulations and Implementation
Several algorithmic variants of FHS exist, corresponding to discrete chains, diffusions, and masked CTMCs. Key components include:
- Precomputation of transition eigendata 5 and the local separation curve 6 (discrete setting).
- Sampling of CSQST and geometric components for the total hitting time (discrete Markov chains).
- Poisson grid simulation, frozen bridges, and unbiased reweighting for one-dimensional diffusions (Frikha et al., 2016).
- Direct jump-time computations, coordinate selection, and neural score predictions for masked diffusion models (Liang et al., 26 Feb 2026).
- Conditioning SDEs on exit locations via Doob’s 7-transform and absorbing-surface boundary laws for FHS in generative modeling (Ye et al., 2022).
A table summarizing key FHS algorithmic elements in various settings:
| Setting | Path Decomposition | Practical Steps |
|---|---|---|
| Discrete Markov chains | CSQST + geometric tail | Eigenproblem, separation, sampling CSQST/tail (Manzo et al., 2016) |
| 1D Diffusions | Parametrix series, Poisson grid | Simulate Poisson events, Euler bridges, weight corrections (Frikha et al., 2016) |
| Masked Diffusion CTMCs | Exact unmasking sequence | Sample jump-times/indices, neural score prediction per event (Liang et al., 26 Feb 2026) |
6. Convergence, Error Bounds, and Complexity
FHS often delivers strong, explicit guarantees. In masked diffusion models, the dimensionality and vocabulary size do not affect error bounds or convergence rates; the only source of statistical error is the score estimation accuracy (8), as opposed to schemes like Euler/Ï„-leaping that incur additional initialization and discretization errors. Moreover, the information-theoretic lower bound for FHS matches the upper bound, confirming the tightness of analysis (Liang et al., 26 Feb 2026).
For elliptic diffusions, Gaussian-type bounds and complete control over the variance of Monte Carlo weights ensure practical unbiasedness and finite computational overhead for moderate parametrix truncation or Poisson intensity parameter choice (Frikha et al., 2016). In discrete metastable regimes, the CSQST is typically much shorter than the mean hitting time, making FHS computationally near-optimal (Manzo et al., 2016).
7. Applications and Empirical Outcomes
FHS has demonstrated substantial benefits in a variety of domains:
- In generative modeling on point clouds, graphs, and categorical images, FHS (and the associated First Hitting Diffusion Models, FHDM) achieves higher sample quality and substantially reduces the number of diffusion steps required—often by an order of magnitude—compared to fixed-time, non-adaptive schedulers (Ye et al., 2022).
- In rare-event simulation and stochastic process analysis, FHS enables exact estimation of hitting-time distributions even in complex, non-reversible Markov chains or SDEs with general drift/diffusion structure (Manzo et al., 2016, Frikha et al., 2016).
- In masked language modeling and large-scale symbolic data, FHS's guarantee of zero discretization error and tight convergence is particularly advantageous for scalability and sharp performance control (Liang et al., 26 Feb 2026).
A plausible implication is that future methodologies for generative modeling and rare-event simulation across a broad array of discrete and continuous domains will increasingly standardize on algorithms grounded in first-hitting path decompositions, supplanting classical fixed-time diffusion or decoupled sampling paradigms.
References:
- (Liang et al., 26 Feb 2026) Sharp Convergence Rates for Masked Diffusion Models
- (Ye et al., 2022) First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data
- (Frikha et al., 2016) On the first hitting times of one dimensional elliptic diffusions
- (Manzo et al., 2016) Strong times and first hitting