Papers
Topics
Authors
Recent
2000 character limit reached

Smoothed Wilcoxon Rank Scores

Updated 19 November 2025
  • Smoothed Wilcoxon Rank Scores are nonparametric estimators that replace discrete rank indicators with kernel-smoothed functions to yield continuous, tie-robust statistics.
  • They enhance traditional Wilcoxon procedures by improving efficiency in correlation estimation and hypothesis testing under monotone, non-Gaussian associations.
  • Practical implementation hinges on optimal kernel and bandwidth choices to ensure asymptotic normality and accurate p-value approximations in small sample sizes.

The smoothed Wilcoxon rank scores refer to a family of nonparametric statistics and estimators in which the classical discrete rank indicators in Wilcoxon-type tests are replaced with smooth (kernel-based) functions of the data. This approach yields statistics that are continuous with respect to the data, inherit the fundamental distribution-free properties of Wilcoxon procedures, and offer practical benefits in terms of handling ties and improving efficiency under monotone but non-Gaussian associations. The method has been developed in several directions, including robust correlation estimation, one-sample and two-sample location inference, and hypothesis testing, providing a high-accuracy approximation to orthodox signed-rank and rank-sum procedures (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).

1. Smoothed Empirical Cumulative Distribution Functions and Kernelifying Ranks

The core step is the replacement of the empirical cumulative distribution function (ecdf) with a smoothed or “kernelized” ecdf. The classical ecdf for a sample {Xj}j=1n\{X_j\}_{j=1}^n is

Fn(x)=1nj=1n1{Xjx}.F_n(x)=\frac{1}{n}\sum_{j=1}^n\mathbf{1}\{X_j\leq x\}.

The smoothed version substitutes the indicator with a continuous cumulative distribution function (CDF) HH, typically a kernel CDF such as the standard normal: F~n(x)=1nj=1nH(xXjh),\widetilde F_n(x)=\frac{1}{n}\sum_{j=1}^n H\left(\frac{x - X_j}{h}\right), with bandwidth h=hn>0h=h_n>0 satisfying hn0h_n\to 0, nhnn h_n\to\infty, nhn40n h_n^4\to 0 as nn\to\infty. For each sample point XiX_i, the smoothed rank is then

R~i=nF~n(Xi)=j=1nH(XiXjh).\widetilde R_i = n\,\widetilde F_n(X_i) = \sum_{j=1}^n H\left(\frac{X_i - X_j}{h}\right).

Setting h0h\to 0 recovers the integer-valued ranks. Thus, the smoothing operation produces real-valued, tie-robust ranks that approach classical ranks in the limit (Tasdan et al., 12 Nov 2025).

2. Construction of Smoothed Wilcoxon Rank Scores and Correlation Estimators

The Wilcoxon linear score function for rank r{1,,n}r\in\{1,\dots,n\} is

a(r)=12(rn+112),a(r) = \sqrt{12}\left(\frac{r}{n+1} - \frac{1}{2}\right),

with analogous extension to the smoothed case: a(R~i)=12(R~in+112).a(\widetilde R_i) = \sqrt{12}\left(\frac{\widetilde R_i}{n+1} - \frac{1}{2}\right). These scores are used to build generalized inner-product statistics. For estimating rank correlations, the smoothed Wilcoxon correlation estimator is

ρ^sa=1sai=1na(R~iX)a(R~iY),sa=n(n1)n+1,\widehat\rho_{sa} = \frac{1}{s_a} \sum_{i=1}^n a(\widetilde R_i^X)a(\widetilde R_i^Y), \qquad s_a = \frac{n(n-1)}{n+1},

which, after algebraic manipulation, is equivalent to the classical Spearman correlation but evaluated on smoothed ranks: ρ^sa=i=1n(R~iXn+12)(R~iYn+12)n(n21)/12.\widehat\rho_{sa} = \frac{\sum_{i=1}^n (\widetilde R_i^X - \frac{n+1}{2})(\widetilde R_i^Y - \frac{n+1}{2})}{n(n^2-1)/12}. This approach can be interpreted as a "smoothed Spearman-type estimator" or a continuous extension of Wilcoxon’s statistic, handling ties and preserving the nonparametric spirit (Tasdan et al., 12 Nov 2025).

3. Smoothed Wilcoxon-Type Tests for One-Sample and Two-Sample Problems

In the one-sample signed-rank scenario, the smoothed Wilcoxon statistic for a sample {Xi}\{X_i\} symmetric about $0$ is

Wn=1i<jnK(Xi+Xj2hn)+12i=1nK(2Xi2hn),W_n^* = \sum_{1\le i<j\le n} K\left(\frac{X_i+X_j}{2h_n}\right) + \frac{1}{2}\sum_{i=1}^n K\left(\frac{2X_i}{2h_n}\right),

where K(u)K(u) is a kernel CDF. Under the null, the mean and variance match the classical statistic up to O(nhn2)O(n h_n^2) and the leading order does not depend on the parent distribution (Maesono et al., 2016).

For two-sample inference, the discrete sum in the Wilcoxon rank-sum statistic

W2=i=1mj=1n1{Yj>Xi}W_2 = \sum_{i=1}^m\sum_{j=1}^n \mathbf{1}\{Y_j > X_i\}

is replaced with its smoothed analogue: W~2=i=1mj=1nK(YjXih).\widetilde W_2 = \sum_{i=1}^m\sum_{j=1}^n K\left(\frac{Y_j - X_i}{h}\right). The key effect is that the statistic becomes real-valued, its distribution under the null is close to normality (enabling accurate normal approximation), and it avoids the lattice-related discreteness artifacts that distort pp-values in small samples (Moriyama et al., 2017).

4. Asymptotic Properties and Efficiency

Across all smoothed Wilcoxon variants, asymptotic expectations and variances under the null hypothesis are free of the underlying distribution to first order. For the smoothed Spearman-type rank correlation estimator ρ^sa\widehat\rho_{sa}:

  • Under independence, E[ρ^sa]=0E[\widehat\rho_{sa}] = 0 and Var[ρ^sa]1/(n1)\operatorname{Var}[\widehat\rho_{sa}]\sim 1/(n-1).
  • More generally, for a fixed value of the true association parameter ρ\rho, a CLT holds: n(ρ^saρ)N(0,σsa2(ρ)).\sqrt{n}\left(\widehat\rho_{sa} - \rho\right) \rightsquigarrow N\left(0,\,\sigma_{sa}^2(\rho)\right).
  • The asymptotic variance for Wilcoxon linear scores is strictly smaller than for classical Spearman’s ρ\rho under many monotonic but non-Gaussian settings; simulated MSE reduction up to $10$–50%50\% is observed (Tasdan et al., 12 Nov 2025).

For the smoothed Wilcoxon signed-rank and rank-sum tests, the Pitman asymptotic relative efficiency (ARE) with respect to their classical analogues is $1$; the two statistics are asymptotically equivalent: ZnZnp0.Z_n^* - Z_n \stackrel{p}{\longrightarrow} 0. Refined Edgeworth expansions with remainder o(n1)o(n^{-1}) are available, leading to highly accurate pp-value approximations even for moderate sample sizes (Maesono et al., 2016, Moriyama et al., 2017).

5. Handling of Ties and Robustness to Data Discreteness

Smoothed Wilcoxon rank scores automatically handle ties via the kernel function. If Xi=XjX_i = X_j, then H(0)=1/2H(0)=1/2, ensuring both observations receive the same, non-integer smoothed rank without resorting to ad-hoc average or random tie-breaking. This feature eliminates small bias present in classical rank-based methods (Tasdan et al., 12 Nov 2025). In the two-sample context, smoothing removes gaps in attainable pp-values caused by the discreteness of the rank-sum statistic, yielding continuous pp-values with accurate calibration (Moriyama et al., 2017).

6. Implementation: Kernel and Bandwidth Choices

The choice of kernel and bandwidth is central for the practical performance of smoothed Wilcoxon procedures:

  • The kernel kk should be symmetric, typically of higher order (e.g., 4th-order) to eliminate O(n1/2)O(n^{-1/2}) bias in the Edgeworth expansion for pp-values.
  • Bandwidth hh must satisfy hn0h_n\to 0, nhnn h_n\to\infty for asymptotic normality; typical choices include hn=n1/4h_n = n^{-1/4}, hn=n1/3h_n = n^{-1/3}, or hn=n1/3(logn)1h_n = n^{-1/3}(\log n)^{-1} for refined Edgeworth expansions (Maesono et al., 2016).

A practical computation path:

  1. Construct the smoothed ecdf F~n(x)\widetilde F_n(x) with kernel HH and bandwidth hh.
  2. Compute smoothed ranks R~i\widetilde R_i for all data points.
  3. For correlation: apply Wilcoxon linear scores and form the inner-product estimator.
  4. For tests: compute the smoothed sum and studentize according to the limiting variance.
  5. Approximate pp-values using the normal (or Edgeworth-corrected) distribution (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).

7. Applications and Empirical Efficiency Gains

Simulation studies document that under data with strong monotone but non-Gaussian association, the smoothed Wilcoxon correlation estimator outperforms both classical Spearman ρ\rho and Kendall τ\tau, reducing MSE by $10$–50%50\% while matching performance under Gaussian data (Tasdan et al., 12 Nov 2025). In testing scenarios, smoothed procedures exhibit empirical size close to nominal significance levels and avoid biases seen with classical Wilcoxon tests. Under heavy-tailed alternatives, smoothed medians can outperform the smoothed rank-sum, while under light-tailed alternatives, the smoothed Wilcoxon rank scores exhibit optimal power (Maesono et al., 2016, Moriyama et al., 2017).

In summary, the smoothed Wilcoxon rank score constructions offer a principled nonparametric approach yielding continuous, tie-robust, asymptotically normal statistics. They preserve efficiency and distribution-free properties while resolving issues associated with data discreteness and bias in classical rank-based methods (Tasdan et al., 12 Nov 2025, Maesono et al., 2016, Moriyama et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Smoothed Wilcoxon Rank Scores.