Deterministic/Design-Based Sampling Overview

Updated 1 June 2026

Deterministic/Design-based Sampling is a strategy that employs fixed, algorithmically specified designs to select sampling units, decoupling uncertainty from the underlying population.
It underpins classical finite-population inference with methods like the Horvitz–Thompson estimator and Bessel’s correction, ensuring unbiased estimates and precise variance calculations.
Modern extensions include adaptive sampling for signal reconstruction, deterministic compressed sensing, and flow-based generative models, enhancing performance over random methods.

Deterministic/Design-based Sampling refers to a broad class of sampling strategies in statistical inference, signal processing, numerical integration, robotics, and uncertainty quantification, in which sampling locations or units are selected according to a fixed, algorithmically specified (“design-based”) scheme rather than by randomization. The deterministic paradigm stands in contrast to model-based or random sampling and is foundational in finite population sampling, optimal experimental design, compressed sensing, motion planning, and modern generative modeling. It explicitly decouples inference from probabilistic assumptions on the data-generating mechanism for the population, with randomness entering only via the sampling design or not at all.

1. Foundations: Population, Design, and the Scope of Randomness

In design-based sampling, the finite population is regarded as fixed and unknown (e.g., $y_1,\ldots,y_N$ ), and all randomness arises only from the explicit sampling protocol. Classical survey sampling operates within this regime: a random subset $s$ of specified size $n$ is drawn from $U=\{1,...,N\}$ according to a user-defined design $p(s)$ , yielding inclusion probabilities $\pi_k = \Pr\{k\in s|F\}$ and joint probabilities $\pi_{ij}$ . The celebrated Horvitz–Thompson estimator $\widehat{t}_{HT} = \sum_{k\in s} y_k/\pi_k$ is unbiased for the total $t=\sum y_k$ for any design, with variance $\operatorname{Var}(\widehat{t}_{HT}|F) = \sum_{i,j} \frac{\pi_{ij} - \pi_i \pi_j}{\pi_i\pi_j} y_i y_j$ (O'Neill, 2024).

Design-based inference applies conditioning to the realized empirical distribution $s$ 0 of the finite population, restricting probability and expectation operators to the randomness induced by the design and never to the unknown values $s$ 1 themselves. This conditionality is central: $s$ 2 and $s$ 3 are explicit, model-free quantities. Parameters such as the population mean and variance retain purely descriptive status in the design-based context.

A unified framework distinguishes three levels: (i) infinite superpopulation (model-based), generating population values $s$ 4; (ii) the finite realized population (design-based, fixed), and (iii) the sample (drawn via the design). All probabilities, expectations, and variances are conditioned on the realized population in design-based calculations (O'Neill, 2024).

2. Population Variance, Bessel’s Correction, and Inference

A fundamental aspect of deterministic sampling is the treatment of population and sample variances. For a finite population mean $s$ 5, natural definitions for variance include:

$s$ 6 (no correction),
$s$ 7 (Bessel-corrected).

In design-based theory, only the sample selection is random, so the sample variance $s$ 8 is unbiased for the population $s$ 9, not for $n$ 0. This justifies the universal use of Bessel’s correction in complex designs and in classical simple random sampling. Variance estimation and confidence intervals require the finite-population correction (FPC): $n$ 1 (O'Neill, 2024).

Through explicit presentation of all conditioning and adoption of Bessel’s correction both at the sample and population level, the design-based paradigm yields logically self-consistent inferences and clear operational interpretations for all estimators and variance formulas.

3. Modern Extensions: Adaptive and Optimal Deterministic Sampling

Deterministic sampling emerged as a dominant paradigm not only in survey inference but across computational and signal-processing disciplines:

3a. High-Resolution Adaptive Sampling

High-resolution adaptive sampling for deterministic, continuously differentiable signals $n$ 2 (with $n$ 3) is linked via a fundamental duality to optimal high-rate quantization. For piecewise-constant reconstructions, the mean squared error (MSE) is minimized by distributing sample density $n$ 4, i.e., proportional to the cube-root of the local gradient energy. The optimal total MSE is $n$ 5 (Dar et al., 2016). The practical algorithm places segmentation points $n$ 6 so each cell has identical $n$ 7 “mass," and reconstruction leverages these points (plus extrema) to achieve optimal error. This design-based approach yields order-of-magnitude reductions in error or transmitted bit-rate for the same sample budget compared to uniform or tree-based schemes.

3b. Deterministic Sampling in Compressed Sensing

Classical compressed sensing utilizes random subsampling for guarantees such as the Restricted Isometry Property (RIP), but deterministic sampling schemes can match or nearly match these guarantees. For sparse trigonometric polynomials, explicit deterministic sampling via Weil's exponential sums produces a sensing matrix with controlled coherence and uniform exact recovery guarantees: $n$ 8 samples suffice for all $n$ 9-sparse vectors in $U=\{1,...,N\}$ 0 dimensions, nearly optimal up to logarithmic factors (Xu, 2010). Empirical results confirm practical recovery rates indistinguishable from random sampling.

A hybrid—partially deterministic—compressed sensing scheme arises in settings where some measurements (e.g., low-frequency FFT coefficients) must be included deterministically. Recent work (Plan et al., 6 Apr 2026) formalizes this as optimized Bernoulli selection, computing inclusion probabilities $U=\{1,...,N\}$ 1 to minimize worst-case noise amplification subject to a sampling budget. High-coherence rows are deterministically included ( $U=\{1,...,N\}$ 2); the rest are randomized. Closed-form formulas provide sample-complexity and denoising guarantees superior to classic schemes, and numerical results show outperforming both purely deterministic and fully randomized approaches over a range of sparse and generative priors.

4. Algorithmic Determinism in Multivariate and Geometric Sampling

Deterministic sampling methods based on minimizing discrepancy and exploiting geometric structure have proliferated in multivariate density approximation, motion planning, and generative modeling.

4a. Projected Cumulative Distributions (PCD) and Radon Projections

For general multivariate densities, deterministic sample placement minimizing projection-based discrepancy yields strong approximation results. The central notion is to minimize the average Cramér–von Mises distance between the one-dimensional CDFs of all projections (the Radon transform) of the density $U=\{1,...,N\}$ 3 and a Dirac mixture $U=\{1,...,N\}$ 4 (Hanebeck, 2019). The optimization is efficient (sorting and Newton increments in each projection), converges with rate $U=\{1,...,N\}$ 5, and empirically yields “blue-noise”–like coverage of high-density regions, outperforming i.i.d. sampling for a given $U=\{1,...,N\}$ 6.

For circular densities, projected cumulative distributions enable deterministic Dirac set selection on $U=\{1,...,N\}$ 7 and outperform both minimum-point unscented transforms and Monte Carlo in estimation and filtering, especially for multimodal or concentrated densities (Frisch et al., 2021).

4b. Robotics and Motion Planning: Dispersion-Optimized Sets

In motion planning under driftless nonlinear dynamics, deterministic sampling sets minimizing the dispersion in reachable-set metrics (not just Euclidean distance) guarantee coverage and asymptotic optimality. The Dispertio algorithm finds points that minimize the largest “uncovered” reachable ball, and when used in PRM* planners, achieves deterministic completeness and optimality on both flat and sub-Riemannian manifolds (Palmieri et al., 2019). Empirical benchmarks show superior success rates and convergence speed compared to uniform, Halton, and greedy tree-based samplers.

4c. Model Predictive Control and Signal Recovery

In control-oriented sampling (e.g., model predictive control), deterministic low-discrepancy samples over the design space outperform random-sampled controls in terms of smoothness of resulting control laws, convergence, and computational efficiency (Walker et al., 7 Jan 2026).

5. Deterministic Approaches in Modern Probabilistic Inference

Recent advances in generative modeling, Bayesian computation, and constrained sampling exploit deterministic sampling not as a mere implementation detail but as a method with superior properties.

5a. Deterministic Flows and Score-based Sampling

Deterministic model sampling via Wasserstein gradient flows or ODEs (e.g., “probability-flow” ODEs) exhibits superior geometric regularity and convergence. In high-dimensional diffusion models, every deterministic ODE trajectory lies in a universal low-dimensional "boomerang-shaped" subspace, regardless of architecture, and this geometric property can be exploited with dynamic-programming–based time-scheduling to obtain much better sample quality at low function-evaluation (“NFE”) counts (Chen et al., 11 Jun 2025). Deterministic flows in these settings guarantee monotone decrease of divergence and more stable sample quality (Ilin et al., 25 Apr 2025).

In text-to-3D generation, deterministic ODE sampling (as in Consistent3D) replaces stochastic score-distillation sampling, yielding sharper textures, robust geometry, and improved CLIP R-precision (0.348 versus 0.310–0.336 for baselines) (Wu et al., 2024).

5b. Space-Filling and Minimum Energy Designs

For Bayesian computation where every likelihood evaluation is expensive, Minimum Energy Design (MED) offers a deterministic, weighted-space-filling approach specifically adapted to the shape of the posterior (or likelihood). The MED maximizes the minimum "weighted" pairwise distance between sample points, with the weights derived from the unnormalized posterior, yielding better coverage of high-density regions than Quasi-Monte Carlo (QMC) or MCMC for the same computational budget (Joseph et al., 2017). Enhancements such as Mahalanobis or generalized distances further extend applicability to high-dimensional, correlated regimes.

5c. Piecewise Deterministic Markov Process (PDMP) Sampling

PDMP-based deterministic samplers, especially with mirror-maps for domain constraints, provide efficient, score-based samplers for constrained domains. These flows are unbiased, avoid discretization error, and maintain ergodicity and mixing in constrained or non-Euclidean spaces, outperforming SDE-based constrained samplers both theoretically and in Wasserstein error for practical MCMC (Demano et al., 7 Aug 2025).

6. Role and Scope in Causal Inference and Uncertainty Quantification

The design-based methodology underlies all classical causal inference (e.g., Rubin’s SUTVA), survey sampling, and regression standard error estimation when the sample or assignment process is partially or entirely deterministic. In the general potential-outcomes framework with arbitrary interference, design-based inference constructs estimands (expected potential outcomes, AEPO, EED) that remain meaningful and estimable even in the absence of SUTVA. Under the weaker No Unmodeled Revealable Variation Assumption (NURVA), unbiased estimation holds for exposure-specific averages, but substantive external validity is not guaranteed (Aronow et al., 15 May 2025).

In regression, “design-based” standard errors are both smaller and more robust than classical sampling-based (Eicker–Huber–White) errors unless constant treatment effects or correct model specification are assured. The difference depends on the finite population correction and on residual heterogeneity; design-based inference avoids artificial inflation of uncertainty in full-population datasets or deterministic assignments (Abadie et al., 2017).

7. Summary Table of Key Application Domains and Methods

Domain	Deterministic/Design-based Sampling Method	Representative Paper
Finite-population inference	Explicit design; HT estimator, Bessel correction	(O'Neill, 2024, Aronow et al., 15 May 2025)
Signal processing	Gradient energy–adaptive, companded, time-axis design	(Dar et al., 2016)
Multivariate density	PCD (Radon projections), Newton minimization	(Hanebeck, 2019, Frisch et al., 2021)
Compressed sensing	Weil sum, partially deterministic Bernoulli selectors	(Xu, 2010, Plan et al., 6 Apr 2026)
Motion planning	Dispersion-optimal sets (Dispertio)	(Palmieri et al., 2019)
Model predictive control	Low-discrepancy Gaussian samples, permutation-invariant designs	(Walker et al., 7 Jan 2026)
Bayesian computation	Minimum Energy Design (MED)	(Joseph et al., 2017)
Generative modeling	Deterministic PF-ODE flows, geometric scheduling	(Chen et al., 11 Jun 2025, Wu et al., 2024)
Constrained MCMC	Mirror-PDMP deterministic sampling	(Demano et al., 7 Aug 2025)

Deterministic/design-based sampling is thus a unifying methodological theme with deep theoretical roots in statistics, pragmatic advantages in computational science, and proven efficacy in modern machine learning, robotics, and uncertainty quantification.