Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latin Hypercube Sampling

Updated 5 February 2026
  • Latin Hypercube Sampling is a stratified sampling method that partitions each variable into equal-probability intervals to ensure complete coverage and reduce variance.
  • The algorithm uses random permutations and uniform jitter in each dimension, effectively capturing the full range of univariate marginals and aiding effective numerical integration.
  • Extensions such as copula-based, constrained, and adaptive LHS methods enhance its applicability to dependent inputs, non-rectangular domains, and high-dimensional simulation challenges.

Latin Hypercube Sampling (LHS) is a stratified sampling design widely utilized for uncertainty quantification, numerical integration, simulation-based optimization, and experiment design in high-dimensional spaces. LHS strategically ensures that univariate marginals are perfectly stratified—each variable's range is partitioned into equal-probability intervals, so every interval is represented in the sample—while randomizing the joint multivariate allocation. This yields improved univariate coverage, variance reduction, and space-filling properties, especially valuable when computational budgets constrain the number of simulation runs.

1. Formal Definition and Algorithmic Construction

Consider NN samples %%%%1%%%% in [0,1]d[0,1]^d. Standard LHS is constructed as follows:

  1. Marginal stratification: For each dimension j=1,,dj = 1, \ldots, d, subdivide [0,1][0,1] into NN intervals Ij,k=[(k1)/N,k/N)I_{j,k} = [(k-1)/N, k/N) for k=1,,Nk = 1, \ldots, N.
  2. Random permutation: For each jj, draw a random permutation πj\pi_j of {1,,N}\{1, \ldots, N\}.
  3. Uniform jitter: For each i=1,,Ni = 1, \ldots, N and each j=1,,dj = 1, \ldots, d, draw Ui,jUniform(0,1)U_{i,j} \sim \mathrm{Uniform}(0,1).
  4. Sample placement: Set

xij=πj(i)1+Ui,jNor equivalentlyxij=πj(i)Ui,jN.x_i^j = \frac{\pi_j(i)-1 + U_{i,j}}{N} \qquad \text{or equivalently} \qquad x_i^j = \frac{\pi_j(i) - U_{i,j}}{N}.

This ensures that, for each coordinate jj, the set {xij}i=1N\{x_i^j\}_{i=1}^N is a random permutation with one point in each stratum.

The defining property is that in every one-dimensional projection, all NN strata are hit exactly once, but the joint design is not stratified beyond the imposed marginal constraints; high-dimensional uniformity is attained only in expectation. Standard implementations exist in R (lhs), Python (pyDOE, expandLHS), and MATLAB (lhsdesign) (Kucherenko et al., 2015, Boschini et al., 29 Aug 2025).

2. Variance Reduction and Central Limit Results

Given f:[0,1]dRf: [0,1]^d \to \mathbb{R} and the integration task I=[0,1]df(x)dxI = \int_{[0,1]^d} f(x)\,dx, the LHS estimator is

I^LHS=1Ni=1Nf(xi).\widehat I_{\rm LHS} = \frac{1}{N} \sum_{i=1}^N f(x_i).

Variance analysis via ANOVA/functional decomposition of ff yields (Kucherenko et al., 2015, Hakimi, 10 Feb 2025):

Var ⁣[I^LHS]=1Nj=1dVarxj(Exj[f(x)xj])1N2jkCov ⁣(E[fxj],E[fxk])\operatorname{Var}\!\left[\widehat I_{\rm LHS}\right] = \frac{1}{N} \sum_{j=1}^d \operatorname{Var}_{x^j}\left(\mathbb{E}_{x^{-j}}[f(x) \mid x^j]\right) -\frac{1}{N^2}\sum_{j \neq k}\operatorname{Cov}\!\left(\mathbb{E}[f \mid x^j],\, \mathbb{E}[f\mid x^k]\right)

and, under sufficient smoothness, the decay rate is O(N2)O(N^{-2}) (main-effect dominated), whereas plain Monte Carlo is O(N1)O(N^{-1}). A Central Limit Theorem for LHS guarantees, more generally for ZZ-estimators, that

N(θ^NLHSθ0)dN(0,A1BLHSAT),\sqrt{N}\,\bigl(\widehat\theta_N^{\rm LHS} - \theta_0\bigr) \rightarrow_d N(0,\,A^{-1}B_{\rm LHS}A^{-T}),

where BLHSBi.i.d.B_{\rm LHS} \leq B_{\text{i.i.d.}} and AA is the Jacobian of the estimating function. The variance reduction is directly traceable to perfect marginal stratification removing first-order (main-effect) variance; compared to the i.i.d. estimator, the LHS estimator always has equal or lower asymptotic variance when main effects are substantial (Hakimi, 10 Feb 2025).

3. Comparative Performance and Function Typology

Empirical convergence analysis across various analytic test functions yields key performance regimes (Kucherenko et al., 2015):

Function Type MC Exponent α\alpha QMC Exponent α\alpha LHS Exponent α\alpha
Type A (Low dTd_T) \approx0.5 up to 0.94 \approx0.5
Type B (Low-order interactions) \approx0.5 up to 0.96 $0.69$--$0.75$
Type C (High-order interactions) \approx0.5 $0.64$--$0.68$ \approx0.5
  • Type B (superposition dimension dSdd_S \ll d): LHS improves on MC and may outperform QMC for small NN due to alignment with low-order effects.
  • Type A/C: LHS and MC are similar unless NN is very small; QMC is strictly superior asymptotically.
  • Guideline: For functions with additive or low-order dominant structure and small-to-moderate sample sizes, LHS is advantageous; QMC is generally preferable for unknown typology and large NN (Kucherenko et al., 2015).

4. Generalizations and Design Extensions

4.1 Partially Stratified and Latinized Stratified Sampling

Partially Stratified Sampling (PSS) interpolates between full univariate LHS (variance reduction for main effects only) and full stratified sampling (SS, variance reduction for interactions) (Shields et al., 2015).

  • PSS Construction: Partition the space into subspaces (blocks), perform SS in each, and combine via random pairing.
  • Latinized Stratified Sampling (LSS): Assigns LHS structure within each block, leveraging orthogonal arrays for simultaneous marginal and low-order interaction stratification:

Var[TO]=Var[TR]1niVar[hi(Xi)]1ni<jVar[hij(Xi,Xj)]+O(n1)\operatorname{Var}[T_O] = \operatorname{Var}[T_R] - \frac{1}{n}\sum_{i}\operatorname{Var}[h_i(X_i)] - \frac{1}{n}\sum_{i<j}\operatorname{Var}[h_{ij}(X_i,X_j)] + O(n^{-1})

  • Practical guidance: Use PSS or LSS when interactions are expected; group variables with strong Sobol’ indices for block stratification.

4.2 LHS with Dependent Inputs

Classical LHS assumes independent marginals; for dependent structures, extensions such as:

  • Copula-based LHSD: Transforms i.i.d. samples via a specified copula CC to recover dependences; a central limit theorem applies under bounded variation and right-continuity (Aistleitner et al., 2013).
  • Quantization-based LHS: Uses Voronoi vector quantization for empirical joint structures, preserving stratification without explicit copula knowledge; unbiasedness and variance reduction demonstrated in environmental models (Lambert et al., 2024).
  • Both methods yield lower variance than MC if dependence structure is known or accurately quantized.

4.3 Constrained and Adaptive LHS

  • Constraint handling: Standard LHS fails for simplex, mixture, or synthesis constraints. Sequential or divide-and-conquer LHS (e.g., CASTRO) constructs feasible points component-wise, partitioning high-dimensional problems and enforcing linear constraints, with LHSMDU for multidimensional uniformity (Schenk et al., 2024).
  • Adaptive/Sequential LHS: Approaches such as Local Latin Hypercube Refinement (LoLHR) and sequential hierarchical stratification dynamically allocate LHS samples to regions of high importance, guided by surrogate models, sensitivity indices, or clustering (Bogoclu et al., 2021, Krumscheid et al., 2023).

4.4 Expansion and Replication

  • “LHS in LHS” expansion: Incrementally grows an existing LHS by regridding and filling empty bins with new LHS samples, quantified by a “degree of LH-ness” D(0,1]D \in (0, 1] (Boschini et al., 29 Aug 2025).
  • Replicated LHS for Sobol’ indices: Replicated designs enable efficient reordering (“permutation trick”) for computing first-order and total sensitivity indices without additional simulations; averaging schemes reduce estimator variance (Damblin et al., 2021).

5. Space-Filling, Discrepancy, and Optimization Criteria

LHS is widely used as a “space-filling” design, but raw LHS can exhibit clustering or subprojection non-uniformity in high dd. Optimizing LHS using:

  • Center L2L^2-discrepancy or wrap-around discrepancy: Ensures uniform coverage in all low-dimensional projections, essential for robust screening and model calibration (Damblin et al., 2013).
  • Maximin criterion: Maximizes the minimal pairwise Euclidean distance, promoting even spacing; effective for moderate dd, but projection uniformity can deteriorate.
  • Minimum Spanning Tree criteria: Maximizes mean edge length and minimizes variance, offering a balance between regularity and randomness.
  • Enhanced Stochastic Evolutionary (ESE) and Simulated Annealing methods support efficient optimization of these criteria for large NN, dd (Damblin et al., 2013, Boschini et al., 29 Aug 2025).

6. Negative Dependence, Discrepancy Bounds, and High-Dimensional Coverage

LHS random variables exhibit γ\gamma-negative dependence (γed\gamma \leq e^d for dd-dimensional LHS) rather than strict negative orthant dependence (Doerr et al., 2021). This property ensures favorable large-deviation bounds and star-discrepancy scaling:

DN(PLHS)2.6442dND_N^*(P_{\rm LHS}) \leq 2.6442 \sqrt{\frac{d}{N}}

with exponentially small probability of exceeding this bound. Coverage of tt-dimensional projections is determined solely by tt, not dd; the empirical law

P(k,n,d,t)=1[1n(t1)]k1exp(k/nt1)P(k,n,d,t) = 1 - [1 - n^{-(t-1)}]^k \approx 1 - \exp(-k/n^{t-1})

yields high coverage with O(nt1logn)O(n^{t-1}\log n) trials, even as ambient dimension dd \to \infty (Burrage et al., 2015). Orthogonal Sampling (OS) further enforces uniform sub-block coverage, matching or exceeding LHS in subspace uniformity.

7. Practical Guidance and Limitations

  • Sample size selection: LHS exhibits variance reduction and RMSE benefits at moderate NN when main effects or additive structure dominate; otherwise, Quasi-Monte Carlo or fully stratified designs may outperform for large NN or strong interactions.
  • Constraint management: For non-rectangular domains, standard LHS is infeasible; use constrained/sequential or quantization-based LHS (Schenk et al., 2024, Lambert et al., 2024).
  • Expanding designs: Incremental expansion while preserving stratification requires dedicated algorithms (Boschini et al., 29 Aug 2025).
  • Optimization: For predictive screening, optimize LHS discrepancy for robustness of subprojections; for maximal distance, use maximin or MST-based criteria.
  • High dimension: Negative dependence constants (γed\gamma \sim e^d) mildly impact error bounds; practical utility remains strong for d10d \lesssim 10, with variance-reduced estimators and empirical discrepancy bounds holding for larger dd.

LHS remains central in surrogate-based optimization, sensitivity analysis, and uncertainty quantification, with ongoing advances in optimized, adaptive, and constraint-handling variants enabling its application to ever more complex scientific and engineering design challenges (Kucherenko et al., 2015, Shields et al., 2015, Boschini et al., 29 Aug 2025, Schenk et al., 2024, Lambert et al., 2024, Damblin et al., 2013, Doerr et al., 2021).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latin Hypercube Sampling.