On two-sample testing for data with arbitrarily missing values (2403.15327v1)
Abstract: We develop a new rank-based approach for univariate two-sample testing in the presence of missing data which makes no assumptions about the missingness mechanism. This approach is a theoretical extension of the Wilcoxon-Mann-Whitney test that controls the Type I error by providing exact bounds for the test statistic after accounting for the number of missing values. Greater statistical power is shown when the method is extended to account for a bounded domain. Furthermore, exact bounds are provided on the proportions of data that can be missing in the two samples while yielding a significant result. Simulations demonstrate that our method has good power, typically for cases of $10\%$ to $20\%$ missing data, while standard imputation approaches fail to control the Type I error. We illustrate our method on complex clinical trial data in which patients' withdrawal from the trial lead to missing values.
- D. W. Alling. Early decision in the Wilcoxon two-sample test. J. Am. Statist. Assoc., 58(303):713–720, 1963.
- G. Bakris et al. Effect of finerenone on albuminuria in patients with diabetic nephropathy a randomized clinical trial. Jama, 61(2):524–531, 2015.
- Bayer. A randomized, double-blind, placebo-controlled, multi-center study to assess the safety and efficacy of different oral doses of BAY94-8862 in subjects with type 2 diabetes mellitus and the clinical diagnosis of diabetic nephropathy [dataset]. Vivli. https://doi.org/10.25934/PR00008549, 2022.
- Y. K. Cheung. Exact two-sample inference with missing data. Biometrics, 61(2):524–531, 2005.
- D. R. Cox. Regression models and life-tables (with Discussion). J. R. Statist. Soc. B, 34:187–220, 1972.
- Effects of vesnarinone on morbidity and mortality in patients with heart failure. New Engl. J. Med., 329(3):149–155, 1993.
- E. A. Gehan. A generalized two-sample Wilcoxon test for doubly censored data. Biometrika, 52(3/4):650–653, 1965a.
- E. A. Gehan. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52(1-2):203–224, 1965b.
- M. Halperin and J. Ware. Early decision in a censored Wilcoxon two-sample test for accumulating survival data. J. Am. Statist. Assoc., 69(346):414–422, 1974.
- Sture Holm. A simple sequentially rejective multiple test procedure. Scand. J. Stat., 6(2):65–70, 1979.
- J. M. Lachin. Worst-rank score analysis with informatively missing observations in clinical trials. Controlled clinical trials, 20(5):408–422, 1999.
- R. B. Latta. Generalized Wilcoxon statistics for the two-sample problem with censored data. Biometrika, 64(3):633–635, 1977.
- Y. J. Lee. A two-sample nonparametric test with missing observations. Am. J. Math. Manag. Sci., 17(1-2):187–200, 1997.
- Nonparametrics: statistical methods based on ranks. Holden-Day Inc., 1975.
- Statistical analysis with missing data, volume 793. John Wiley & Sons, 3rd edition, 2020.
- On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1):50–60, 1947.
- Effects of liraglutide on clinical stability among patients with advanced heart failure and reduced ejection fraction: a randomized clinical trial. Jama, 316(5):500–508, 2016.
- Power and sample size calculations for the Wilcoxon-Mann-Whitney test in the presence of death-censored observations. Stat. Med., 34(3):406–431, 2015.
- NIDDK. Albuminuria: Albumin in the urine. https://www.niddk.nih.gov/health-information/kidney-disease/chronic-kidney-disease-ckd/tests-diagnosis/albuminuria-albumin-urine, 2016. National Institute of Diabetes and Digestive and Kidney Diseases, Accessed: 2024-01-01.
- R. Peto and J. Peto. Asymptotically efficient rank invariant test procedures. J. R. Statist. Soc. A, 135(2):185–198, 1972.
- R. L. Prentice. Linear rank tests with right censored data. Biometrika, 65(1):167–179, 1978.
- R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2012. URL http://www.R-project.org. ISBN 3-900051-07-0, http://www.R-project.org.
- D. B. Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976.
- R. E. Tarone. A modified Bonferroni method for discrete data. Biometrics, 46(2):515–522, 1990.
- F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80–83, 1945.