Weighted Empirical Distribution Methods

Updated 23 October 2025

Weighted empirical distribution is a generalization of the classical ECDF that uses non-uniform, data-dependent weights to adjust for heterogeneity and bias.
The methodology incorporates higher order asymptotic expansions like Edgeworth–Cornish–Fisher to improve quantile estimation and confidence interval accuracy.
This framework enhances robust statistical inference and nonparametric analysis by enabling corrections for non-identically distributed data and complex sampling designs.

A weighted empirical distribution generalizes the classical empirical distribution by assigning non-uniform, possibly data-dependent, and potentially predetermined weights to individual observations. This concept is crucial in contemporary statistical inference, nonparametric theory, large-scale resampling, importance sampling, transfer learning, optimal risk minimization under dataset shift, and robust and distributionally-constrained modeling. The weighted empirical distribution appears both as an explicit object of inference—enabling corrections for heterogeneity or bias—and as an implicit tool for constructing statistics, likelihoods, and loss functions in high-dimensional and complex data scenarios.

1. Definition and Fundamental Properties

Given independent observations $X_1, \dotsc, X_n$ (not necessarily identically distributed) and nonnegative weights $w_1, \dotsc, w_n$ with normalization $\sum_{i=1}^n w_i = n$ (or %%%%3%%%% in some conventions), the weighted empirical distribution $\hat{F}$ is

$\hat{F}(x) = \frac{1}{n} \sum_{i=1}^n w_i \, \mathbb{I}(X_i < x)$

When $w_i \equiv 1$ , $\hat{F}$ reduces to the classical empirical cumulative distribution function (ECDF). In general, $\hat{F}$ is a discrete probability measure supported on the sample points but with non-uniform masses.

The expectation of $\hat{F}(x)$ under sampling is

$\mathbb{E}\,\hat{F}(x) = \frac{1}{n} \sum_{i=1}^n w_i F_i(x)$

where $F_i$ is the true distribution function of $X_i$ .

Weighted empirical distributions facilitate correct representation of data under non-identically distributed samples, enable efficiency improvements through deterministic or data-adaptive reweighting, and underpin modern procedures for robust modeling and inference in heterogeneous data environments (Withers et al., 2010).

2. Edgeworth, Cornish–Fisher, and Higher Order Expansions

Weighted empirical distributions enable not only classical Central Limit Theorem (CLT) results for plug-in estimators $T(\hat{F})$ , but also refined, higher-order distributional approximations. Any smooth functional $T(\cdot)$ —e.g., mean, variance, quantiles, or functionals arising in statistical estimation—can have its sampling distribution expanded to third order as

$P(Y_n < x) \approx \Phi(x) - n^{-1/2}h_1(x)\varphi(x) - n^{-1} h_2(x)\varphi(x) - \cdots$

where $Y_n = n^{1/2}[T(\hat{F}) - T(F)] / a_{21}^{1/2}$ , $\Phi$ and $\varphi$ are the standard normal CDF and density, and $h_1(x), h_2(x),\dots$ are Hermite-polynomial-based correction terms determined by cumulants involving von Mises derivatives and the weight structure.

These expansions provide accurate quantile inference and coverage for confidence intervals beyond first-order CLT normality, correcting both bias and variance due to heterogeneity and weighting, and yielding meaningful improvements in classical statistical inferential tasks (Withers et al., 2010).

3. Cumulant Expansions and von Mises Derivatives

To obtain higher order expansions for $T(\hat{F})$ , the paper employs von Mises–Taylor expansions for functionals defined not only on probability measures but on signed measures (of total measure $1$). The general expansion is

$T(G) - T(F) = \sum_{r=1}^\infty \frac{1}{r!} T_{(r)}(F)\bigl( G-F \bigr)^r$

where $T_{(r)}$ is the $r$ -th order von Mises derivative, defined so $\int T_{(r)}(x_1, \ldots, x_r) \, dF(x_1) = 0$ for all $r \geq 1$ .

Cumulant expansions for $T(\hat{F})$ take the form

$\kappa_r = \sum_{j=r-1}^\infty a_{rj}\, n^{-j/2}$

with explicit, weight-dependent terms; e.g., the coefficients involve moments and combinations such as

$a_{21} = \{12\},\qquad a_{32} = \{13\} + 3\{1,2,12\},\qquad \text{etc.}$

Here, $\{12\}$ denotes an expectation of a product of first and second von Mises derivatives, averaged against the weight profile $w_1, ..., w_n$ . The $r$ -th order moment structure further involves cross-product terms coupling distinct sample points, reflecting both heterogeneity and weighting (Withers et al., 2010).

4. Practical Applications: Estimators and Edge Cases

a) Sample Mean and Variance

For the mean, the influence function is $x-p$ and cumulant coefficients reduce to scaled moments of the (mean) distribution convolved with the $r$ -th moment of the weights. For the sample variance $T(F) = \mu_2 - (\mu_1)^2$ , higher order derivatives and mixed moments appear, essential under non-identical distributions; e.g.,

$a_{21} = \{12\},\qquad a_{32} = M_3 - 3 M_{24} + 2M_{222} - 6(M_3 - M_{22})^2$

with $M_k$ denoting empirical moments.

b) Studentized Mean, Coefficient of Variation

The framework extends directly to plug-in functionals such as the Studentized mean and sample coefficient of variation, where the higher order corrections account for weighted, heterogeneous sampling. In each case, practical inference—confidence intervals, hypothesis tests—benefits from improved asymptotics reflecting both the weights and the varying sampling distributions (Withers et al., 2010).

5. Edgeworth–Cornish–Fisher Expansions: Quantiles and Distributional Approximations

The explicit cumulant expansion allows the use of Edgeworth–Cornish–Fisher (ECF) formulas to correct the estimated distribution and quantiles for $T(\hat{F})$ : $P(Y_n < x) \sim \Phi(x) + n^{-1/2} P_1(x) + n^{-1} P_2(x) + \text{higher terms}$ where $P_1(x), P_2(x), \dots$ are determined via Hermite expansions in terms of the standardized cumulant coefficients. This sequence yields more accurate p-values and critical values for functionals under both heterogeneity and weighting, and directly extends classical asymptotic inference.

6. Methodological Extensions: Non-i.i.d. Data and Signed Measures

A crucial methodological development is the extension of von Mises functional derivatives to signed measures of total measure $1$—a necessary move since the weighted empirical distribution may involve negative or non-uniform weighting (e.g., in leave-one-out or importance resampling). The normalization $\int T_{(r)}(x_1,\ldots,x_r)\,dF(x_1) = 0$ for all $r \ge 1$ ensures the expansion is valid in generality.

This approach is essential for robust inference under - Non-identical distributions (distinct $F_i$ for each $X_i$ ), - Preassigned or data-adaptive weighting (e.g., regression, survey sampling, sandwich estimation), - Settings where classical plug-in theory fails.

Thus, the framework generalizes both the sample measures themselves and the associated inferential expansions (Withers et al., 2010).

7. Implications for Nonparametric and Robust Inference

The explicit third order expansions and cumulant-based inferential methodology provide a substantial improvement over traditional CLT-level inference. The ability to model and correct for both heterogeneity and sampling design is fundamental for

Nonparametric inference with complex surveys or regression residuals,
Robust estimation, model validation, and hypothesis testing in high-variance environments,
Bayesian and likelihood-based procedures requiring accurate quantile or tail approximations.

By integrating weighting into both the empirical measure and its higher order derivative structure, the approach provides a rigorous, systematic framework for precision inference in real-world, heterogeneous data scenarios.

Summary Table: Core Components of Weighted Empirical Distribution Results

Component	Mathematical Expression / Role	Key Implications
Weighted empirical CDF	$\hat{F}(x) = \frac{1}{n} \sum_i w_i\, \mathbb{I}(X_i<x)$	Accommodates weights and heterogeneity
Mean under weights	$E\hat{F}(x) = \frac{1}{n}\sum_i w_i F_i(x)$	Accurate expectation under non-i.i.d.
Von Mises expansion	$T(G)-T(F) = \sum_{r=1}^\infty \frac{1}{r!}T_{(r)}(F)(G-F)^r$	Generalizes Taylor expansion for functionals
Cumulant expansion	$\kappa_r = \sum_{j=r-1}^\infty a_{rj} n^{-j/2}$	Enables EC, CF expansions (finite-sample correction)
Higher order asymptotics	Edgeworth–Cornish–Fisher expansions to $O(n^{-3/2})$	Improved confidence intervals, p-values

References

For third order asymptotic expansions and cumulants for weighted empirical distributions: "The distribution and quantiles of functionals of weighted empirical distributions when observations have different distributions" (Withers et al., 2010).
For applications in regression, rank statistics, and Bayes theory: see and references within (Withers et al., 2010).

PDF Markdown Chat (Pro)

References (1)

The distribution and quantiles of functionals of weighted empirical distributions when observations have different distributions (2010)

Follow Topic

Get notified by email when new papers are published related to Weighted Empirical Distribution.