A consistent test of independence based on a sign covariance related to Kendall's tau

Published 24 Jul 2010 in math.ST and stat.ME | (1007.4259v6)

Abstract: The most popular ways to test for independence of two ordinal random variables are by means of Kendall's tau and Spearman's rho. However, such tests are not consistent, only having power for alternatives with ``monotonic'' association. In this paper, we introduce a natural extension of Kendall's tau, called $τ^*$, which is non-negative and zero if and only if independence holds, thus leading to a consistent independence test. Furthermore, normalization gives a rank correlation which can be used as a measure of dependence, taking values between zero and one. A comparison with alternative measures of dependence for ordinal random variables is given, and it is shown that, in a well-defined sense, $τ^*$ is the simplest, similarly to Kendall's tau being the simplest of ordinal measures of monotone association. Simulation studies show our test compares well with the alternatives in terms of average $p$-values.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces τ*, a sign-based covariance measure that is zero if and only if independence holds, ensuring test consistency.
It leverages quadruple configurations and Monte Carlo permutation tests to offer a robust, computationally efficient nonparametric methodology.
Empirical evaluations demonstrate that τ* outperforms traditional measures like Kendall's tau and Spearman's rho, particularly under non-monotonic dependence.

Introduction and Motivation

Tests of independence for ordinal random variables, such as Kendall's tau and Spearman's rho, have been standard due to their robustness and ease of implementation. However, these classical statistics are not consistent: under general alternatives (i.e., non-monotonic or complex dependence structures), they can fail to detect statistical dependence, since their null values do not guarantee independence except under monotonicity.

This work introduces $\tau^*$ , a new symmetric, sign-based covariance statistic, as a natural extension of Kendall's tau. $\tau^*$ is provably non-negative and is exactly zero if and only if independence holds, thus addressing the central deficiency of traditional rank correlation measures. This property directly enables the construction of a consistent test of independence for ordinal data.

Definition and Formal Properties

Given i.i.d. samples $(X_1,Y_1),\ldots,(X_n,Y_n)$ , the new statistic is formally defined at the population level as: $\tau^* = E \bigl[ a(X_1, X_2, X_3, X_4) \, a(Y_1, Y_2, Y_3, Y_4) \bigr]$ where the sign function $a$ is

$a(z_1, z_2, z_3, z_4) = \operatorname{sign}\bigl( |z_1-z_2| + |z_3-z_4| - |z_1-z_3| - |z_2-z_4| \bigr).$

The empirical statistic $t^*$ replaces population expectations with averaging over all quadruples in the sample.

Main theorem: For all (possibly mixed discrete/continuous) bivariate distributions, $\tau^* \ge 0$ , with equality if and only if $X$ and $Y$ are independent. This establishes $\tau^*$ as a bona fide, consistent measure of independence for ordinal data.

A normalized version, $\tau_b^* = \tau^*(X,Y)/\sqrt{\tau^*(X,X)\tau^*(Y,Y)}$ , analogously to Kendall's tau- $b$ , ensures the measure is bounded above by one.

Probabilistic Interpretation

Like Kendall's tau, $\tau^*$ admits an interpretation in terms of probabilities of concordant and discordant sub-configurations within quadruples of points: $\tau^* = \frac{2\Pi_{C_4} - \Pi_{D_4}}{3}$ where $\Pi_{C_4}$ and $\Pi_{D_4}$ denote the probabilities that four points are in concordant or discordant configuration, by a certain rigorous generalization of pairwise notions to quadruples. This connects $\tau^*$ probabilistically to symmetry and dependence structure in the joint distribution.

Comparison to Existing Dependence Measures

Inconsistency of Traditional Measures

Kendall's tau and Spearman's rho correlate sign functions over pairs or triples, but can be zero despite dependence if association is non-monotonic. Consequently, tests based on these are not consistent against all alternatives.

Other consistent rank-based dependence measures in the literature include:

Hoeffding's $H$ : Based on squared differences between joint and product marginal cdfs, non-negative and zero for independence, but unfortunately, $H$ can also vanish under dependence for discrete variables.
Blum-Kiefer-Rosenblatt $D$ : Similar to $H$ , but defined with Lebesgue (not empirical) marginal cdfs, and provably zero if and only if independence.

Simplicity and Ordinality

$\tau^*$ is shown to be a function of the copula (and thus invariant under strictly increasing marginal transforms). Unlike $D$ and $H$ , which build on higher-order configurations or require complex probabilistic setups (five points for $H$ ), $\tau^*$ is computed from quadruples and only uses sign and absolute value calculations, making it algebraically and statistically simpler while retaining full sensitivity to arbitrary dependence.

Relation to Distance Covariance

The distance covariance (Székely-Rizzo-Bakirov, Gretton et al.) is a related, consistent measure, based on Euclidean distances rather than ranks; its sign-based version is $\tau^*$ . However, the theoretical machinery (kernel-based interpretations and positive definite forms) that powers distance covariance does not transfer to $\tau^*$ . The proof of its key properties is thus substantially more intricate.

Computational and Practical Aspects

A Monte Carlo permutation test using $t^*$ provides a practical method for inference. Despite the quartic computational complexity in $n$ , random subsampling of quadruples offers an efficient approximation with strong empirical performance. The test is exact under the permutation distribution, conditioning on the observed marginals (as is standard in rank statistics).

Empirical Evaluation

Simulations demonstrate superior consistency and power of $t^*$ compared to both $D$ and $H$ , especially in settings with complex or non-monotonic dependence structures. In examples (e.g., artificial contingency tables and real data from clinical studies), the $\tau^*$ -based test yields significant $p$ -values where Kendall's tau and Pearson’s $\chi^2$ fail. Average $p$ -value analyses across synthetic alternatives ("zig-zag", "cross", etc.) indicate that $\tau^*$ either matches or outperforms existing consistent ordinal measures except under explicit favorable conditions for $H$ .

Notably, Hoeffding's test can perform very poorly under certain alternatives (e.g., "parallel lines"), while $D$ and $t^*$ maintain relatively stable power.

Theoretical and Practical Implications

The introduction of $\tau^*$ provides the statistical community with a consistent, interpretable, and conceptually simple test of independence for ordinal data. The fact that $\tau^*$ is a copula-based sign covariance expands the toolbox for dependence assessment in nonparametric statistics and aligns with the trend of distribution-free, robust inference methods.

Practically, $\tau^*$ enables reliable independence testing in situations where non-monotonic association or complex marginal structures would defeat traditional rank-based techniques, common in contemporary social and biomedical studies with mixed or low-resolution ordinal data.

Theoretically, the elegant yet non-trivial characterization of $\tau^*$ 's null properties may prompt future work on kernel-free, combinatorial dependence measures and deeper connections between rank statistics and copula theory in higher dimensions.

Conclusion

The $\tau^*$ statistic establishes the existence of a consistent, interpretable, and computationally tractable test for independence against arbitrary alternatives in the setting of ordinal data. This advances both theoretical understanding and practical methodology for nonparametric independence testing, closing a notable gap in the statistical canon for rank-based association analysis (1007.4259).