High-accuracy and dimension-free sampling with diffusions
Abstract: Diffusion models have shown remarkable empirical success in sampling from rich multi-modal distributions. Their inference relies on numerically solving a certain differential equation. This differential equation cannot be solved in closed form, and its resolution via discretization typically requires many small iterations to produce \emph{high-quality} samples. More precisely, prior works have shown that the iteration complexity of discretization methods for diffusion models scales polynomially in the ambient dimension and the inverse accuracy $1/\varepsilon$. In this work, we propose a new solver for diffusion models relying on a subtle interplay between low-degree approximation and the collocation method (Lee, Song, Vempala 2018), and we prove that its iteration complexity scales \emph{polylogarithmically} in $1/\varepsilon$, yielding the first ``high-accuracy'' guarantee for a diffusion-based sampler that only uses (approximate) access to the scores of the data distribution. In addition, our bound does not depend explicitly on the ambient dimension; more precisely, the dimension affects the complexity of our solver through the \emph{effective radius} of the support of the target distribution only.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper studies how to make diffusion models (the kind that generate images and other data) sample faster and more accurately. Normally, these models follow a “reverse” process described by a math rule called an ODE (ordinary differential equation). Because you can’t solve this ODE exactly, you simulate it step-by-step on a computer. Previous methods needed lots of tiny steps—especially when the data are really high-dimensional or when you want very high accuracy. This paper introduces a new way to take much bigger, smarter steps, so you can get high-quality samples with far fewer iterations.
What questions does the paper ask?
- Can we build a sampler for diffusion models whose number of steps grows only like a polylogarithm in 1/ε (extremely slowly as you demand more accuracy), instead of a polynomial (much faster growth)?
- Can we do this without the runtime directly depending on the data’s dimension (like the number of pixels), but instead depending on a simpler “radius” measure of where the data live?
- Can we achieve this using only access to the “score” function (the gradient of the log-density) that diffusion models learn?
Methods explained simply
Think of the forward diffusion as adding blur/noise to an image over time. The reverse diffusion tries to carefully remove that blur to recover a clean sample. The reverse process follows an ODE whose “speed and direction” are determined by the score function. Standard solvers use small, fixed steps and assume the speed doesn’t change much within each step—this forces very tiny steps in high dimensions or for high accuracy.
The core idea in this paper:
- Along the true reverse path, the score’s time behavior is surprisingly simple: each coordinate of the score changes over time like a low-degree polynomial (a smooth curve with only a few “wiggles”).
- If you know a function is well-approximated by a low-degree polynomial, you can sample it at a few carefully chosen times, fit the polynomial, and then use it to predict the path far more accurately over a whole time window.
They use the collocation method (also known as Picard iteration) to do this:
- Pick special time points (nodes) in a small window. These nodes come from a basis built using Chebyshev polynomials, which are good at stable approximation.
- Evaluate the score at those nodes (using the model’s score estimates).
- Fit a low-degree polynomial to approximate how the score changes over time in that window.
- Integrate this polynomial to step the path forward.
- Repeat in short windows to cover the full reverse process.
Analogy: Instead of driving by looking only at your speedometer at the start of each minute, you check your speed at a few smartly chosen moments, fit a smooth curve to how your speed changes, and then follow that curve. You can drive farther per “planning cycle” without veering off the route.
Technical terms in everyday language:
- ODE: A rule that tells you how your position changes over time (like a GPS instruction that continuously updates).
- Score function: Points you in directions where data are more likely (like a wind that pushes you toward realistic samples).
- Collocation/Picard iteration: A way to find a curve that fits a differential rule by matching at certain points and iteratively refining it.
- Low-degree polynomial approximation: Capturing a smooth signal with a simple curve (few pieces), so it’s easy to compute and predict.
- Lipschitz: A function that doesn’t change too wildly—small input changes lead to controlled output changes.
Assumptions:
- The target distribution looks like “bounded plus noise”: imagine all clean data points lie within a ball of radius R, and then a little Gaussian noise with size σ is added. This covers common cases like mixtures of Gaussians and approximates real practice (early stopping in diffusion).
- The learned score is Lipschitz and its error has sub-exponential tails (meaning large errors are very unlikely). These are standard smoothness and reliability conditions in theory.
Main findings
This section summarizes the key results and why they matter.
- High-accuracy, fewer iterations: The number of steps needed scales polylogarithmically in 1/ε, where ε is your target accuracy. Polylogarithmic is much better than polynomial—it means that demanding very high accuracy does not blow up the runtime.
- Dimension-free rate (in a precise sense): The iteration complexity does not depend directly on the ambient dimension d. Instead, it depends on the ratio (R/σ), which reflects the effective radius where the clean data live and how much noise you add. In many practical settings (like mixtures of Gaussians), R can be much smaller than d.
- Works with approximate scores: The method only needs access to the score function estimates (what diffusion models actually learn), not exact log densities. It tolerates realistic estimation error under mild tail and smoothness conditions.
- Strong accuracy guarantees:
- Wasserstein-2 (W2) closeness: The generated samples are close to the target distribution in W2 distance with iteration complexity about (R/σ)2 times polylog(1/ε).
- Total variation (TV) closeness: Under an extra mild smoothness assumption (Lipschitz true score), you can convert W2 closeness into TV closeness after a short regularization step. This yields a sampler with TV error ε in roughly (R/σ)2 iterations.
Why this is important:
- Prior diffusion solvers generally had iteration counts that grew polynomially with 1/ε and often with dimension d. This work gives the first “high-accuracy” guarantee (polylog(1/ε)) for diffusion-based sampling using score access alone, and a dimension-independent complexity that’s much more favorable when d is large.
Implications and impact
- Faster sampling for high-quality outputs: If you want very accurate samples (e.g., high-fidelity images), this approach can reach that accuracy with far fewer steps than standard solvers.
- Scales better to high dimensions: Since the runtime doesn’t directly depend on d, it’s promising for large-scale problems (images, audio, other high-dimensional data). The complexity depends on R/σ, which can be moderate in practical scenarios (like mixtures of Gaussians).
- New theoretical tools: The paper shows that the score along the reverse path has bounded high-order time derivatives, enabling low-degree polynomial approximation. This opens doors for improved numerical methods in generative modeling.
- Limitations and future directions: The method assumes the data are “bounded plus noise” and that score errors have sub-exponential tails. Future work may relax these assumptions, handle weaker moment conditions, or reduce dependence on σ. It would also be interesting to see practical implementations and benchmarks in real-world diffusion models.
Overall, the paper offers a principled way to take bigger, smarter steps in the reverse diffusion, achieving very high accuracy with iteration counts that grow very slowly as you tighten the error, and without paying a heavy price for high dimensionality.
Knowledge Gaps
Below is a consolidated list of concrete knowledge gaps, limitations, and open questions that remain unresolved and could guide future research.
- Relaxing the “bounded plus noise” assumption: Replace compact support of the clean distribution with weaker tail conditions (e.g., subgaussian, finite moments) while preserving dimension-free iteration complexity and the low-degree time approximation of the score.
- Improving dependence on : Reduce the polynomial dependence (currently roughly in iteration count and stronger in derivative bounds) to linear or polylogarithmic, or prove matching lower bounds that show the current dependence is unavoidable.
- Eliminating sub-exponential score error tails: Replace Assumption “Sub-exponential score error” with the standard score accuracy assumption and still obtain polylogarithmic dependence in .
- High-accuracy in TV or KL without auxiliary MCMC: Achieve polylogarithmic-in- guarantees directly in total variation or KL using only a diffusion-based solver, avoiding post-processing via underdamped Langevin.
- Logarithmic dependence on : Match low-accuracy results that scale in the high-accuracy regime; identify whether the current analysis can be sharpened or if new techniques are needed.
- Extending beyond OU/VP processes: Generalize to other forward processes (e.g., VE SDEs, non-OU drifts, data-dependent schedules) and guided sampling (e.g., classifier guidance), with corresponding time-derivative bounds and collocation guarantees.
- Continuous-time score access vs. discrete networks: Collocation needs at arbitrary time nodes; design and analyze interpolation/parametrization schemes (e.g., time-embedding, spline-in-time score nets) that control the induced approximation error.
- Adaptive windowing and degree selection: Devise principled, data-driven rules for choosing window length , polynomial degree , and Picard depth based on online error estimators, with provable end-to-end accuracy and minimal score queries.
- Quantifying hidden constants: Make explicit the constants hidden in and exponential factors like in derivative bounds to assess practical feasibility and to guide implementation choices.
- Lipschitzness of the true score: Identify broad, verifiable conditions (e.g., for Gaussian mixtures and other structured families) under which the vanilla score is globally or locally Lipschitz, or design analyses that require only local/average Lipschitzness.
- Sensitivity to parameter misspecification: Analyze robustness to inexact and ; develop practical estimators of these quantities and quantify how estimation error affects iteration complexity and accuracy.
- Removing dimension terms in coupling preconditions: The coupling lemma’s condition depends on ; refine the analysis to fully eliminate ambient-dimension effects in preconditions and failure probabilities.
- Small- and schedule design: The analysis uses conditions like ; characterize guarantees when is smaller or when using nonstandard/learned time schedules, and provide principled ways to choose .
- Runtime and memory complexity: Translate iteration complexity into end-to-end wall-clock and memory costs, accounting for per-call cost of in high dimensions and the conditioning of collocation matrices.
- Training and sample complexity for score error tails: Provide theoretical links between training data size/model capacity and the sub-exponential tail parameter , including how these tails behave across .
- Numerical stability of collocation: Analyze conditioning of , susceptibility to Runge phenomena, and floating-point errors; explore alternative bases (e.g., splines, orthogonal polynomials with improved boundedness) and preconditioning.
- Combining with low intrinsic dimension: Integrate the proposed method with intrinsic-dimension-adaptive analyses to handle supports that are large in radius but low-dimensional (or vice versa), and clarify trade-offs.
- Beyond Gaussian noise: Generalize to anisotropic Gaussian or non-Gaussian noise convolutions; re-derive time-derivative bounds and contraction properties under these noises.
- Lower bounds in the score-oracle model: Establish oracle complexity lower bounds for high-accuracy sampling given only score access to clarify whether polylogarithmic dependence in is optimal.
- TV conversion without Lipschitz score: Remove the reliance on global Lipschitzness of in the TV upgrade (e.g., via alternative regularizers/couplings or local smoothness conditions).
- Accounting for failure probabilities across windows: Provide a tight, compositional analysis of failure events when stitching many small-time windows, ensuring that high-probability guarantees remain uniform across the entire trajectory.
- Applicability under classifier-free guidance: Extend time-derivative and stability bounds to guided drifts that include data-dependent and guidance terms; determine how guidance scales the required polynomial degree and window size.
- Reverse-time SDE counterpart: Develop a stochastic collocation or hybrid approach for reverse SDEs and analyze whether similar high-accuracy iteration complexity (polylogarithmic in ) is achievable with stochastic discretizations.
- Empirical validation: Benchmark the proposed solver on realistic high-dimensional datasets (e.g., images), quantify score-query savings versus higher-order solvers, and test sensitivity to misspecifications and training-induced score errors.
Practical Applications
Immediate Applications
Below are specific, deployable use cases that leverage the paper’s high-accuracy, dimension-robust diffusion sampler based on low-degree time-approximation and collocation (Picard) steps.
- Faster, accuracy-controlled inference for pre-trained diffusion models in generative AI
- Sector: software and media (image, audio, video generation), cloud inference, edge AI
- What it enables: Replace standard discretization (e.g., Euler/UniPC/DPM variants) with a collocation-based solver that achieves polylogarithmic dependence on the target accuracy and iteration counts scaling with the effective radius R/σ rather than the ambient dimension d. This can cut latency and energy consumption while maintaining or improving sample quality at tight error budgets.
- Potential tools/products/workflows:
- A “CollocationDiffusion” solver module in PyTorch/TensorFlow (e.g., an extension to HuggingFace Diffusers) that:
- Builds Chebyshev-based collocation nodes and matrix Aφ once.
- Runs short windows of Picard iterations with score queries at the collocation nodes.
- Provides an epsilon controller that targets either W2 or TV accuracy (with the underdamped Langevin corrector for TV as needed).
- Integration into inference schedulers that choose window length h ≈ O(σ²/R²) and degree/iterations adaptively.
- Assumptions/dependencies:
- Access to time-indexed score estimates s_t from the pre-trained model.
- Lipschitz score estimate and sub-exponential tails for score error; may require spectral norm regularization, gradient clipping, or smoothness-promoting training objectives.
- Early stopping/noise schedules ensuring σ > 0; estimation of effective radius R (can be data- or model-dependent).
- For TV guarantees: Lipschitzness of the true score and a short underdamped Langevin post-processing step.
- Dimension-robust sampling for Gaussian mixtures and bounded-support generative tasks
- Sector: academia (statistical learning, clustering benchmarks), software (synthetic data generation)
- What it enables: High-accuracy sampling with iteration complexity depending on R/σ, not d, which is advantageous in regimes where R is small relative to d (e.g., Gaussian mixtures with center separations O(√log k)). This facilitates fast generation of synthetic multimodal data and benchmark datasets.
- Potential tools/products/workflows:
- A synthetic generator that inputs mixture parameters (centers, σ) and outputs high-quality samples quickly with provable accuracy control.
- Assumptions/dependencies:
- Target distribution satisfies “bounded plus noise” (compact support convolved with Gaussian).
- Score access for the convolved distribution along the noise process, or a trained score-based model.
- Latency and energy reductions for inference servers and on-device generative apps
- Sector: energy, cloud, mobile/embedded
- What it enables: Fewer score calls per sample at tight accuracy (polylog(1/ε)), enabling lower energy per generated asset and lower cost-per-inference. Useful in real-time or interactive applications (photo filters, avatars, creative tools).
- Potential tools/products/workflows:
- Deployment scripts that switch to collocation-based sampler under tight SLAs (latency, power).
- Monitoring dashboards that track achieved W2/TV targets and compute savings.
- Assumptions/dependencies:
- Reliable score access on-device or via API.
- Window scheduling tuned to σ and an empirical/estimated R; misestimation can degrade benefits.
- Accuracy-budgeting workflows for compliance and reproducibility
- Sector: policy/compliance, enterprise software
- What it enables: Exponential error contraction per Picard window allows explicit error budgeting and logs (e.g., “this sample is within ε in W2/TV”). Supports risk-managed content generation pipelines.
- Potential tools/products/workflows:
- “Provable accuracy” switches for enterprise diffusion services that record window sizes, degrees, iterations, and derived bounds on final error (plus optional underdamped Langevin correction to TV).
- Assumptions/dependencies:
- Model and data satisfy the paper’s regularity (Lipschitz score est., sub-exponential error tails).
- Post-processing step for TV if required by policy.
- Parallelized score evaluation at collocation nodes for throughput gains
- Sector: HPC/software engineering
- What it enables: Batch score queries across collocation nodes per window map cleanly to GPU/TPU kernels (matrix–vector multiplication with Aφ), increasing throughput without increasing iteration count.
- Potential tools/products/workflows:
- CUDA/ROCm kernels for batched score evaluation and Picard updates; integration with mixed-precision to further speed up inference.
- Assumptions/dependencies:
- Stable numerical implementation (Chebyshev basis, Aφ construction) and careful window sizing (h ≤ 1/(2γφ)).
Long-Term Applications
These uses require additional research, scaling, or development—primarily to relax assumptions (compact support + Gaussian noise, sub-exponential score error) and to engineer robust, general-purpose solvers.
- General-purpose, high-accuracy diffusion samplers beyond compact-support-plus-noise
- Sector: software/ML research
- What it enables: Extending the collocation method to broader classes (heavy tails, non-Gaussian forward processes, discrete modalities), removing sub-exponential score error and strong Lipschitz requirements.
- Potential tools/products/workflows:
- “Universal Collocation Solver” that auto-detects model regularity and chooses basis/degree, possibly combining with existing higher-order SDE solvers and Metropolis-inspired corrections.
- Assumptions/dependencies:
- New theory to bound time-derivatives of the score under weaker conditions; robust training procedures to yield smoother scores.
- Auto-schedulers that estimate effective radius R and noise level σ online
- Sector: software tooling
- What it enables: Runtime measurement or estimation of R/σ to adapt window length and degree for the collocation solver, achieving near-optimal iteration complexity across diverse models/datasets.
- Potential tools/products/workflows:
- “AutoCollocation” modules that infer R from sample norms or learned embeddings and adjust step sizes/degrees; fallback to conventional solvers when estimates are uncertain.
- Assumptions/dependencies:
- Reliable, low-overhead estimation procedures; confidence-aware scheduling to avoid instability when R scales with d (e.g., pixel-space images with large d).
- Real-time diffusion-based planners in robotics and control
- Sector: robotics
- What it enables: High-accuracy sampling in fewer iterations for generative planners and trajectory samplers, potentially enabling real-time operation in complex, multimodal planning landscapes.
- Potential tools/products/workflows:
- Integration of collocation-based sampling with diffusion planners; error budgets per planning horizon; GPU-accelerated batched trajectory proposals.
- Assumptions/dependencies:
- Availability of reliable score models for planning distributions; bounded-plus-noise structure or alternative theory to justify low-degree time approximation.
- High-accuracy scenario generation in finance and econometrics
- Sector: finance
- What it enables: Sampling from complex multimodal distributions (e.g., stress scenarios) with tight error control and reduced compute, improving tail-risk estimation and Monte Carlo pipelines.
- Potential tools/products/workflows:
- Score-based scenario engines with collocation sampling; provable accuracy modes for regulatory reporting or internal validation.
- Assumptions/dependencies:
- Score models that faithfully represent target distributions; verification of Lipschitzness and tail conditions; careful calibration of σ.
- Synthetic data generation in healthcare and privacy-preserving analytics
- Sector: healthcare, privacy
- What it enables: Faster, accuracy-controlled generation of medical images/EHR-like datasets for augmentation and simulation, with explicit error control for downstream validation.
- Potential tools/products/workflows:
- Synthetic data platforms embedding collocation-based samplers; quality gates based on W2/TV thresholds; optional DP training to mitigate privacy leakage.
- Assumptions/dependencies:
- Validity of compact-support-plus-noise for the chosen representation (e.g., latent spaces); reliable score models; regulatory acceptance of accuracy metrics.
- Scientific modeling and weather/climate generative surrogates
- Sector: energy, climate science
- What it enables: High-accuracy sampling from complex surrogate models with fewer iterations, enabling larger ensembles or faster turnaround in forecasting/simulation.
- Potential tools/products/workflows:
- Collocation-integrated generative surrogates for state-space sampling; ensemble controllers with explicit ε-targets.
- Assumptions/dependencies:
- Score access for the surrogate distribution; extension of the theory to non-OU forward processes if needed.
- Standards and policy around energy-efficient, accuracy-reported generative AI
- Sector: policy/standards
- What it enables: Reporting frameworks that include iteration complexity vs. target accuracy, R/σ estimates, and energy per sample, encouraging greener and more transparent deployment.
- Potential tools/products/workflows:
- Benchmarks and certification processes that adopt error-controlled sampling and publish energy/latency trade-offs; recommended practices for epsilon-controlled content generation.
- Assumptions/dependencies:
- Community adoption; agreed-upon metrics and validators (e.g., practical proxies for W2/TV in high dimensions).
Notes on feasibility across applications:
- The method excels when the effective radius R is not scaling linearly with d; mixtures with small center norms are particularly favorable. For raw pixel-space image synthesis, R may scale with √d, dampening “dimension-free” benefits; latent-space formulations may help.
- TV guarantees require additional smoothness and a short underdamped Langevin step; this is practical but adds a dependency on corrector tuning.
- Ensuring Lipschitz score estimates with sub-exponential error tails may motivate changes to training (e.g., spectral normalization, gradient penalties) and careful calibration of denoising schedules.
Glossary
Below is an alphabetical list of advanced domain-specific terms from the paper, each with a short definition and a verbatim usage example from the text.
- Ambient dimension: The number of coordinates in the space in which data or distributions reside, often denoted by d and affecting algorithmic complexity. "iteration complexity of discretization methods for diffusion models scales polynomially in the ambient dimension and the inverse accuracy ."
- Chebyshev polynomial: A classical orthogonal polynomial family used for approximation; its roots serve as interpolation nodes with favorable numerical properties. "where is the -th root of the Chebyshev polynomial of degree ."
- Collocation method: A numerical ODE-solving technique that frames the solution as a fixed-point problem and uses basis functions and interpolation at selected nodes. "The collocation method is a numerical scheme for approximating the solution to an ordinary differential equation through fixed-point iteration."
- Convolution: An operation combining two distributions to produce a third; here, mixing a compactly supported distribution with Gaussian noise. "we consider sampling from a distribution which is the convolution of a compactly supported distribution with Gaussian noise."
- Coupling: A joint distribution over two random variables with specified marginals, used to compare distributions (e.g., in Wasserstein distance). "where is the set of couplings."
- Denoising schedule: The sequence of time points at which the reverse process (or solver) is applied to progressively remove noise. "denoising schedule "
- Diffusion models: Generative models that sample by simulating a learned reverse-time process that removes noise added by a forward diffusion. "Diffusion models are the dominant paradigm in image generation, among other modalities."
- Effective radius: A distribution-dependent geometric scale (here captured by R) controlling complexity in place of explicit dependence on ambient dimension. "the dimension affects the complexity of our solver through the effective radius of the support of the target distribution only."
- Euler–Maruyama discretization: A first-order numerical method for SDEs/ODEs that uses the drift (and diffusion) at the start of each step. "Traditionally, to simulate the continuous-time ODE in discrete time, some numerical method like Euler-Maruyama discretization is used."
- Forward process: The noise-adding stochastic process (e.g., OU) that maps clean data to progressively noisier states. "The forward process is a noise process driven by a stochastic differential equation of the form"
- High-accuracy: A regime in sampling guarantees with iteration complexity scaling polylogarithmically in the target error. "yielding the first ``high-accuracy'' guarantee for a diffusion-based sampler"
- Intrinsic dimension: A measure of the effective complexity of the support (e.g., via covering numbers), lower than the ambient dimension. "for any distribution whose intrinsic dimension, which they quantify in terms of covering number of the support denoted , DDPM can sample with iteration complexity ."
- Jacobian: The matrix of first-order partial derivatives of a vector-valued function; here, of the score estimate compared to the true score. "it assumes its Jacobian of the score estimate is close to that of the true score."
- Lipschitz (score estimate): A regularity property ensuring the score estimate does not change too rapidly; bounds enable stability of solvers. "Given access to Lipschitz score estimates"
- Log-concave sampling: Sampling methods and theory for distributions whose log-density is concave, offering strong convergence properties. "In the log-concave sampling literature, there is a well-understood taxonomy along this axis"
- Metropolis adjustment: A correction step (e.g., in MCMC) that removes discretization bias by accepting/rejecting proposals according to a ratio. "high-accuracy methods which correct for discretization bias, e.g., via Metropolis adjustment, and get iteration complexity polylogarithmic in ."
- Metropolis–Hastings filter: The specific accept/reject mechanism in MCMC that ensures the target distribution is preserved despite discretization. "in order to implement a Metropolis-Hastings filter."
- Mixture of isotropic Gaussians: A distribution composed of several Gaussian components with identity covariance, serving as a natural example in the paper’s assumptions. "A natural example of a distribution satisfying Assumption~\ref{assumption:bounded_plus_noise} is a mixture of isotropic Gaussians"
- Ornstein–Uhlenbeck (OU) process: A Gaussian Markov process with linear drift toward zero; the standard forward process in many diffusion models. "in the example of the standard OU process, we can take "
- Picard iteration: A fixed-point iteration method for solving integral-form ODEs; here, implemented via polynomial collocation. "the collocation method, i.e. Picard iteration, has been used in prior diffusion model theory work"
- Polynomial interpolation: Approximating a function by a polynomial that matches the function at selected nodes. "by polynomial interpolation we can find nodes such that $\phi_j(c_i) = \mathds{1}[i = j]$"
- Posterior mean: The conditional expectation of latent clean variables given a noisy observation, central to score/time-derivative identities. "Define the posterior mean ."
- Probability flow ODE: A deterministic reverse-time ODE that exactly transports the forward marginal back to the data distribution under exact scores. "One version of this reverse process is given by the probability flow ODE"
- Reverse process: The learned or constructed process that removes noise, transforming the forward terminal distribution back toward the data. "The reverse process is designed to undo this noise process"
- Score estimates: Approximations of the gradient of the log-density at each time, used in place of the true score for practical sampling. "In practice, this is run using score estimates instead of the actual score functions"
- Score function: The gradient of the log-density of a distribution; in diffusion models, time-dependent along the reverse process. "where is called the score function."
- Sub-exponential score error: An assumption that the norm of the score estimation error has tails decaying like exp(−z/α), enabling robust control. "has subexponential tails"
- Total variation distance: A strong divergence metric measuring the maximum difference in probability assigned to events by two distributions. "total variation (TV) distance"
- Tweedie’s formula: A classical identity relating posterior expectations to derivatives of the log-density; here used as an analogy for time-derivative identities. "This result can be understood in the same spirit as Tweedie's formula"
- Underdamped Langevin dynamics: A second-order (in time) Markov process with momentum that can regularize distributions and aid in TV guarantees. "regularizing properties of underdamped Langevin dynamics"
- Wasserstein coupling: A method to compare distributions by optimally transporting mass; used in analyzing discretization/approximation errors. "appealing to a naive Wasserstein coupling argument"
- Wasserstein-2 distance: An optimal transport metric measuring the squared Euclidean transport cost between distributions’ mass. "satisfying "
- Vector field (of an ODE): The function defining the instantaneous velocity of the state in the ODE; here, drift plus score. "The time derivative of the vector field of probability flow ODE can be calculated as"
Collections
Sign up for free to add this paper to one or more collections.