Smoothed Wilcoxon Rank Scores
- Smoothed Wilcoxon Rank Scores are nonparametric estimators that replace discrete rank indicators with kernel-smoothed functions to yield continuous, tie-robust statistics.
- They enhance traditional Wilcoxon procedures by improving efficiency in correlation estimation and hypothesis testing under monotone, non-Gaussian associations.
- Practical implementation hinges on optimal kernel and bandwidth choices to ensure asymptotic normality and accurate p-value approximations in small sample sizes.
The smoothed Wilcoxon rank scores refer to a family of nonparametric statistics and estimators in which the classical discrete rank indicators in Wilcoxon-type tests are replaced with smooth (kernel-based) functions of the data. This approach yields statistics that are continuous with respect to the data, inherit the fundamental distribution-free properties of Wilcoxon procedures, and offer practical benefits in terms of handling ties and improving efficiency under monotone but non-Gaussian associations. The method has been developed in several directions, including robust correlation estimation, one-sample and two-sample location inference, and hypothesis testing, providing a high-accuracy approximation to orthodox signed-rank and rank-sum procedures (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).
1. Smoothed Empirical Cumulative Distribution Functions and Kernelifying Ranks
The core step is the replacement of the empirical cumulative distribution function (ecdf) with a smoothed or “kernelized” ecdf. The classical ecdf for a sample is
The smoothed version substitutes the indicator with a continuous cumulative distribution function (CDF) , typically a kernel CDF such as the standard normal: with bandwidth satisfying , , as . For each sample point , the smoothed rank is then
Setting recovers the integer-valued ranks. Thus, the smoothing operation produces real-valued, tie-robust ranks that approach classical ranks in the limit (Tasdan et al., 12 Nov 2025).
2. Construction of Smoothed Wilcoxon Rank Scores and Correlation Estimators
The Wilcoxon linear score function for rank is
with analogous extension to the smoothed case: These scores are used to build generalized inner-product statistics. For estimating rank correlations, the smoothed Wilcoxon correlation estimator is
which, after algebraic manipulation, is equivalent to the classical Spearman correlation but evaluated on smoothed ranks: This approach can be interpreted as a "smoothed Spearman-type estimator" or a continuous extension of Wilcoxon’s statistic, handling ties and preserving the nonparametric spirit (Tasdan et al., 12 Nov 2025).
3. Smoothed Wilcoxon-Type Tests for One-Sample and Two-Sample Problems
In the one-sample signed-rank scenario, the smoothed Wilcoxon statistic for a sample symmetric about $0$ is
where is a kernel CDF. Under the null, the mean and variance match the classical statistic up to and the leading order does not depend on the parent distribution (Maesono et al., 2016).
For two-sample inference, the discrete sum in the Wilcoxon rank-sum statistic
is replaced with its smoothed analogue: The key effect is that the statistic becomes real-valued, its distribution under the null is close to normality (enabling accurate normal approximation), and it avoids the lattice-related discreteness artifacts that distort -values in small samples (Moriyama et al., 2017).
4. Asymptotic Properties and Efficiency
Across all smoothed Wilcoxon variants, asymptotic expectations and variances under the null hypothesis are free of the underlying distribution to first order. For the smoothed Spearman-type rank correlation estimator :
- Under independence, and .
- More generally, for a fixed value of the true association parameter , a CLT holds:
- The asymptotic variance for Wilcoxon linear scores is strictly smaller than for classical Spearman’s under many monotonic but non-Gaussian settings; simulated MSE reduction up to $10$– is observed (Tasdan et al., 12 Nov 2025).
For the smoothed Wilcoxon signed-rank and rank-sum tests, the Pitman asymptotic relative efficiency (ARE) with respect to their classical analogues is $1$; the two statistics are asymptotically equivalent: Refined Edgeworth expansions with remainder are available, leading to highly accurate -value approximations even for moderate sample sizes (Maesono et al., 2016, Moriyama et al., 2017).
5. Handling of Ties and Robustness to Data Discreteness
Smoothed Wilcoxon rank scores automatically handle ties via the kernel function. If , then , ensuring both observations receive the same, non-integer smoothed rank without resorting to ad-hoc average or random tie-breaking. This feature eliminates small bias present in classical rank-based methods (Tasdan et al., 12 Nov 2025). In the two-sample context, smoothing removes gaps in attainable -values caused by the discreteness of the rank-sum statistic, yielding continuous -values with accurate calibration (Moriyama et al., 2017).
6. Implementation: Kernel and Bandwidth Choices
The choice of kernel and bandwidth is central for the practical performance of smoothed Wilcoxon procedures:
- The kernel should be symmetric, typically of higher order (e.g., 4th-order) to eliminate bias in the Edgeworth expansion for -values.
- Bandwidth must satisfy , for asymptotic normality; typical choices include , , or for refined Edgeworth expansions (Maesono et al., 2016).
A practical computation path:
- Construct the smoothed ecdf with kernel and bandwidth .
- Compute smoothed ranks for all data points.
- For correlation: apply Wilcoxon linear scores and form the inner-product estimator.
- For tests: compute the smoothed sum and studentize according to the limiting variance.
- Approximate -values using the normal (or Edgeworth-corrected) distribution (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).
7. Applications and Empirical Efficiency Gains
Simulation studies document that under data with strong monotone but non-Gaussian association, the smoothed Wilcoxon correlation estimator outperforms both classical Spearman and Kendall , reducing MSE by $10$– while matching performance under Gaussian data (Tasdan et al., 12 Nov 2025). In testing scenarios, smoothed procedures exhibit empirical size close to nominal significance levels and avoid biases seen with classical Wilcoxon tests. Under heavy-tailed alternatives, smoothed medians can outperform the smoothed rank-sum, while under light-tailed alternatives, the smoothed Wilcoxon rank scores exhibit optimal power (Maesono et al., 2016, Moriyama et al., 2017).
In summary, the smoothed Wilcoxon rank score constructions offer a principled nonparametric approach yielding continuous, tie-robust, asymptotically normal statistics. They preserve efficiency and distribution-free properties while resolving issues associated with data discreteness and bias in classical rank-based methods (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free