Serfling’s Inequality: Finite-Sampling Tail Bounds
- Serfling’s inequality is a finite-sampling exponential concentration inequality that extends Hoeffding’s bounds to sampling without replacement.
- It utilizes a telescoping sum and conditional moment generating functions to derive sharp tail bounds incorporating finite-sample corrections.
- The method is crucial in hypergeometric settings and two-sample empirical processes, enhancing statistical inference in nonparametric tests.
Serfling’s inequality is a finite-sampling exponential concentration inequality that extends Hoeffding’s classical bounds for sums of independent bounded random variables to the case of sampling without replacement from a finite population. It provides sharp exponential tail bounds for deviations of sample means from the population mean under sampling fraction corrections, with particular relevance for hypergeometric and empirical process contexts (Greene et al., 2015).
1. Formal Statement and Interpretation
Let be a finite population (“urn”) with , population mean , variance , minimum , and maximum . Consider sampling without replacement elements, yielding , with sample mean . Define the finite-sampling fractions and .
Serfling’s inequality states that for any ,
A frequently employed specialization is when : These forms quantify the upper tail probabilities of the sample mean deviating from the population mean under sampling without replacement, with explicit finite-sample corrections.
2. Analytical Strategy and Key Proof Elements
The proof adapts Hoeffding’s martingale-based approach for independent variables to the without-replacement regime. The essential steps are:
- Expressing as a telescoping sum of conditional expectations.
- Stepwise control of the conditional moment-generating function , leveraging that at each stage, the remaining population values remain bounded in .
- Demonstrating, by induction,
- Invoking Markov’s inequality and optimizing to obtain the exponential rate. The correction factor directly arises from the diminishing uncertainty after each observed draw, distinguishing the without-replacement scenario from the i.i.d. case (Greene et al., 2015).
3. Relationship to Hoeffding’s Inequality and Refinements
Hoeffding’s classical bound for independent random variables with is
For sampling with replacement from , substituting yields
Serfling’s bound, with its exponent augmentation, tightens the concentration in the without-replacement regime. Since , the bound always strictly improves upon the naive i.i.d. Hoeffding bound for sampling without replacement.
A conceivable enhancement is to replace with the more accurate , which has been achieved in special cases but remains open in general. Bennett-type refinements, applying Ehm’s representation of the hypergeometric as sums of independent Bernoullis, yield
where , . For binary populations (), an explicit Hoeffding-style bound follows (Greene et al., 2015).
4. Hypergeometric Specialization
Consider the population made up of ones and zeros. Here, is the sample proportion of ones; ; . Serfling’s bound yields
Specifically, for ,
Such hypergeometric tail bounds are central in settings where binary attributes are counted under finite sampling, for example, in quality control and resampling inference (Greene et al., 2015).
5. Finite-Sampling Correction Terms and Open Questions
The “primitive” finite-sampling correction in Serfling’s exponent is . The “true” correction , corresponding to the classical finite-population variance reduction, appears in recent Bennett-type and Hoeffding-type inequalities but has not been universally established for Serfling’s bound in general settings. The proximity of these correction factors is essential for maximal sharpness in empirical process and finite-population inferential theory. Existing results and evidence suggest such refinement is plausible and likely achievable in further generalizations, especially under independence approximations (Greene et al., 2015).
6. Applications to Two-Sample Empirical Process Statistics
Serfling’s inequality forms the backbone of corrected exponential tail bounds for two-sample Kolmogorov–Smirnov (K–S) statistics. For independent samples and from the same continuous distribution , with empirical CDFs and , the two-sample one-sided K–S statistic is
Viewing the pooled empirical CDF as the population, each empirical CDF is a sample without replacement, and Serfling’s bound applies. In the balanced case , with ,
with the two-sided statistic
The finite-sampling correction adjusts the exponent of the classical Dvoretzky–Kiefer–Wolfowitz–Massart inequality . For unbalanced samples (), conjecturally, similar exponential bounds hold: These corrections have significance for the tightness and calibration of empirical process-based inference, notably in nonparametric hypothesis testing and distributional comparison (Greene et al., 2015).