Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deterministic/Design-Based Sampling Overview

Updated 1 June 2026
  • Deterministic/Design-based Sampling is a strategy that employs fixed, algorithmically specified designs to select sampling units, decoupling uncertainty from the underlying population.
  • It underpins classical finite-population inference with methods like the Horvitz–Thompson estimator and Bessel’s correction, ensuring unbiased estimates and precise variance calculations.
  • Modern extensions include adaptive sampling for signal reconstruction, deterministic compressed sensing, and flow-based generative models, enhancing performance over random methods.

Deterministic/Design-based Sampling refers to a broad class of sampling strategies in statistical inference, signal processing, numerical integration, robotics, and uncertainty quantification, in which sampling locations or units are selected according to a fixed, algorithmically specified (“design-based”) scheme rather than by randomization. The deterministic paradigm stands in contrast to model-based or random sampling and is foundational in finite population sampling, optimal experimental design, compressed sensing, motion planning, and modern generative modeling. It explicitly decouples inference from probabilistic assumptions on the data-generating mechanism for the population, with randomness entering only via the sampling design or not at all.

1. Foundations: Population, Design, and the Scope of Randomness

In design-based sampling, the finite population is regarded as fixed and unknown (e.g., y1,,yNy_1,\ldots,y_N), and all randomness arises only from the explicit sampling protocol. Classical survey sampling operates within this regime: a random subset ss of specified size nn is drawn from U={1,...,N}U=\{1,...,N\} according to a user-defined design p(s)p(s), yielding inclusion probabilities πk=Pr{ksF}\pi_k = \Pr\{k\in s|F\} and joint probabilities πij\pi_{ij}. The celebrated Horvitz–Thompson estimator t^HT=ksyk/πk\widehat{t}_{HT} = \sum_{k\in s} y_k/\pi_k is unbiased for the total t=ykt=\sum y_k for any design, with variance Var(t^HTF)=i,jπijπiπjπiπjyiyj\operatorname{Var}(\widehat{t}_{HT}|F) = \sum_{i,j} \frac{\pi_{ij} - \pi_i \pi_j}{\pi_i\pi_j} y_i y_j (O'Neill, 2024).

Design-based inference applies conditioning to the realized empirical distribution ss0 of the finite population, restricting probability and expectation operators to the randomness induced by the design and never to the unknown values ss1 themselves. This conditionality is central: ss2 and ss3 are explicit, model-free quantities. Parameters such as the population mean and variance retain purely descriptive status in the design-based context.

A unified framework distinguishes three levels: (i) infinite superpopulation (model-based), generating population values ss4; (ii) the finite realized population (design-based, fixed), and (iii) the sample (drawn via the design). All probabilities, expectations, and variances are conditioned on the realized population in design-based calculations (O'Neill, 2024).

2. Population Variance, Bessel’s Correction, and Inference

A fundamental aspect of deterministic sampling is the treatment of population and sample variances. For a finite population mean ss5, natural definitions for variance include:

  • ss6 (no correction),
  • ss7 (Bessel-corrected).

In design-based theory, only the sample selection is random, so the sample variance ss8 is unbiased for the population ss9, not for nn0. This justifies the universal use of Bessel’s correction in complex designs and in classical simple random sampling. Variance estimation and confidence intervals require the finite-population correction (FPC): nn1 (O'Neill, 2024).

Through explicit presentation of all conditioning and adoption of Bessel’s correction both at the sample and population level, the design-based paradigm yields logically self-consistent inferences and clear operational interpretations for all estimators and variance formulas.

3. Modern Extensions: Adaptive and Optimal Deterministic Sampling

Deterministic sampling emerged as a dominant paradigm not only in survey inference but across computational and signal-processing disciplines:

3a. High-Resolution Adaptive Sampling

High-resolution adaptive sampling for deterministic, continuously differentiable signals nn2 (with nn3) is linked via a fundamental duality to optimal high-rate quantization. For piecewise-constant reconstructions, the mean squared error (MSE) is minimized by distributing sample density nn4, i.e., proportional to the cube-root of the local gradient energy. The optimal total MSE is nn5 (Dar et al., 2016). The practical algorithm places segmentation points nn6 so each cell has identical nn7 “mass," and reconstruction leverages these points (plus extrema) to achieve optimal error. This design-based approach yields order-of-magnitude reductions in error or transmitted bit-rate for the same sample budget compared to uniform or tree-based schemes.

3b. Deterministic Sampling in Compressed Sensing

Classical compressed sensing utilizes random subsampling for guarantees such as the Restricted Isometry Property (RIP), but deterministic sampling schemes can match or nearly match these guarantees. For sparse trigonometric polynomials, explicit deterministic sampling via Weil's exponential sums produces a sensing matrix with controlled coherence and uniform exact recovery guarantees: nn8 samples suffice for all nn9-sparse vectors in U={1,...,N}U=\{1,...,N\}0 dimensions, nearly optimal up to logarithmic factors (Xu, 2010). Empirical results confirm practical recovery rates indistinguishable from random sampling.

A hybrid—partially deterministic—compressed sensing scheme arises in settings where some measurements (e.g., low-frequency FFT coefficients) must be included deterministically. Recent work (Plan et al., 6 Apr 2026) formalizes this as optimized Bernoulli selection, computing inclusion probabilities U={1,...,N}U=\{1,...,N\}1 to minimize worst-case noise amplification subject to a sampling budget. High-coherence rows are deterministically included (U={1,...,N}U=\{1,...,N\}2); the rest are randomized. Closed-form formulas provide sample-complexity and denoising guarantees superior to classic schemes, and numerical results show outperforming both purely deterministic and fully randomized approaches over a range of sparse and generative priors.

4. Algorithmic Determinism in Multivariate and Geometric Sampling

Deterministic sampling methods based on minimizing discrepancy and exploiting geometric structure have proliferated in multivariate density approximation, motion planning, and generative modeling.

4a. Projected Cumulative Distributions (PCD) and Radon Projections

For general multivariate densities, deterministic sample placement minimizing projection-based discrepancy yields strong approximation results. The central notion is to minimize the average Cramér–von Mises distance between the one-dimensional CDFs of all projections (the Radon transform) of the density U={1,...,N}U=\{1,...,N\}3 and a Dirac mixture U={1,...,N}U=\{1,...,N\}4 (Hanebeck, 2019). The optimization is efficient (sorting and Newton increments in each projection), converges with rate U={1,...,N}U=\{1,...,N\}5, and empirically yields “blue-noise”–like coverage of high-density regions, outperforming i.i.d. sampling for a given U={1,...,N}U=\{1,...,N\}6.

For circular densities, projected cumulative distributions enable deterministic Dirac set selection on U={1,...,N}U=\{1,...,N\}7 and outperform both minimum-point unscented transforms and Monte Carlo in estimation and filtering, especially for multimodal or concentrated densities (Frisch et al., 2021).

4b. Robotics and Motion Planning: Dispersion-Optimized Sets

In motion planning under driftless nonlinear dynamics, deterministic sampling sets minimizing the dispersion in reachable-set metrics (not just Euclidean distance) guarantee coverage and asymptotic optimality. The Dispertio algorithm finds points that minimize the largest “uncovered” reachable ball, and when used in PRM* planners, achieves deterministic completeness and optimality on both flat and sub-Riemannian manifolds (Palmieri et al., 2019). Empirical benchmarks show superior success rates and convergence speed compared to uniform, Halton, and greedy tree-based samplers.

4c. Model Predictive Control and Signal Recovery

In control-oriented sampling (e.g., model predictive control), deterministic low-discrepancy samples over the design space outperform random-sampled controls in terms of smoothness of resulting control laws, convergence, and computational efficiency (Walker et al., 7 Jan 2026).

5. Deterministic Approaches in Modern Probabilistic Inference

Recent advances in generative modeling, Bayesian computation, and constrained sampling exploit deterministic sampling not as a mere implementation detail but as a method with superior properties.

5a. Deterministic Flows and Score-based Sampling

Deterministic model sampling via Wasserstein gradient flows or ODEs (e.g., “probability-flow” ODEs) exhibits superior geometric regularity and convergence. In high-dimensional diffusion models, every deterministic ODE trajectory lies in a universal low-dimensional "boomerang-shaped" subspace, regardless of architecture, and this geometric property can be exploited with dynamic-programming–based time-scheduling to obtain much better sample quality at low function-evaluation (“NFE”) counts (Chen et al., 11 Jun 2025). Deterministic flows in these settings guarantee monotone decrease of divergence and more stable sample quality (Ilin et al., 25 Apr 2025).

In text-to-3D generation, deterministic ODE sampling (as in Consistent3D) replaces stochastic score-distillation sampling, yielding sharper textures, robust geometry, and improved CLIP R-precision (0.348 versus 0.310–0.336 for baselines) (Wu et al., 2024).

5b. Space-Filling and Minimum Energy Designs

For Bayesian computation where every likelihood evaluation is expensive, Minimum Energy Design (MED) offers a deterministic, weighted-space-filling approach specifically adapted to the shape of the posterior (or likelihood). The MED maximizes the minimum "weighted" pairwise distance between sample points, with the weights derived from the unnormalized posterior, yielding better coverage of high-density regions than Quasi-Monte Carlo (QMC) or MCMC for the same computational budget (Joseph et al., 2017). Enhancements such as Mahalanobis or generalized distances further extend applicability to high-dimensional, correlated regimes.

5c. Piecewise Deterministic Markov Process (PDMP) Sampling

PDMP-based deterministic samplers, especially with mirror-maps for domain constraints, provide efficient, score-based samplers for constrained domains. These flows are unbiased, avoid discretization error, and maintain ergodicity and mixing in constrained or non-Euclidean spaces, outperforming SDE-based constrained samplers both theoretically and in Wasserstein error for practical MCMC (Demano et al., 7 Aug 2025).

6. Role and Scope in Causal Inference and Uncertainty Quantification

The design-based methodology underlies all classical causal inference (e.g., Rubin’s SUTVA), survey sampling, and regression standard error estimation when the sample or assignment process is partially or entirely deterministic. In the general potential-outcomes framework with arbitrary interference, design-based inference constructs estimands (expected potential outcomes, AEPO, EED) that remain meaningful and estimable even in the absence of SUTVA. Under the weaker No Unmodeled Revealable Variation Assumption (NURVA), unbiased estimation holds for exposure-specific averages, but substantive external validity is not guaranteed (Aronow et al., 15 May 2025).

In regression, “design-based” standard errors are both smaller and more robust than classical sampling-based (Eicker–Huber–White) errors unless constant treatment effects or correct model specification are assured. The difference depends on the finite population correction and on residual heterogeneity; design-based inference avoids artificial inflation of uncertainty in full-population datasets or deterministic assignments (Abadie et al., 2017).

7. Summary Table of Key Application Domains and Methods

Domain Deterministic/Design-based Sampling Method Representative Paper
Finite-population inference Explicit design; HT estimator, Bessel correction (O'Neill, 2024, Aronow et al., 15 May 2025)
Signal processing Gradient energy–adaptive, companded, time-axis design (Dar et al., 2016)
Multivariate density PCD (Radon projections), Newton minimization (Hanebeck, 2019, Frisch et al., 2021)
Compressed sensing Weil sum, partially deterministic Bernoulli selectors (Xu, 2010, Plan et al., 6 Apr 2026)
Motion planning Dispersion-optimal sets (Dispertio) (Palmieri et al., 2019)
Model predictive control Low-discrepancy Gaussian samples, permutation-invariant designs (Walker et al., 7 Jan 2026)
Bayesian computation Minimum Energy Design (MED) (Joseph et al., 2017)
Generative modeling Deterministic PF-ODE flows, geometric scheduling (Chen et al., 11 Jun 2025, Wu et al., 2024)
Constrained MCMC Mirror-PDMP deterministic sampling (Demano et al., 7 Aug 2025)

Deterministic/design-based sampling is thus a unifying methodological theme with deep theoretical roots in statistics, pragmatic advantages in computational science, and proven efficacy in modern machine learning, robotics, and uncertainty quantification.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deterministic/Design-based Sampling.