Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Complete Characterisation of Structured Missingness (2307.02650v1)

Published 5 Jul 2023 in stat.ME, stat.AP, and stat.ML

Abstract: Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors $M_1$, $M_2$, ..., $M_p$ are independent after conditioning on the relevant portion of the data matrix $\mathbf{X}$. As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each ${M}j$ can depend on $\mathbf{M}{-j}$ (i.e., all missingness indicator vectors except ${M}j$), in addition to $\mathbf{X}$. We embed this new framework within the well-established decomposition of mechanisms into MCAR, MAR, and MNAR (Rubin, 1976), allowing us to recast mechanisms into a broader setting, where we can consider the combined effect of $\mathbf{X}$ and $\mathbf{M}{-j}$ on ${M}_j$. We also demonstrate, via simulations, the impact of SM on inference and prediction, and consider contextual instances of SM arising in a de-identified nationwide (US-based) clinico-genomic database (CGDB). We hope to stimulate interest in SM, and encourage timely research into this phenomenon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Hoboken, NJ: John Wiley & Sons, 3rd edn.
  2. Statistical Science, 33, 160 – 183. URL: https://doi.org/10.1214/18-STS646.
  3. Annals of Oncology, 31, 1561–1568. URL: https://www.sciencedirect.com/science/article/pii/S0923753420399701.
  4. New York: The MIT Press.
  5. Breiman, L. (2001) Random Forests. Machine Learning, 45, 5–32.
  6. Boca Raton, FL: Chapman & Hall / CRC.
  7. Journal of Statistical Software, 45, 1–67.
  8. Chichester: John Wiley & Sons, 2nd edn.
  9. The Annals of Applied Statistics, 4, 266–298.
  10. Couper, M. (1998) Measuring survey quality in a CASIC environment. Proceedings of the Survey Research Methods Section of the ASA at JSM1998, 41–49.
  11. 10.
  12. Journal of the Royal Statistical Society Series C: Applied Statistics, 43, 49–73.
  13. The Lancet Oncology, 21, 271–282. URL: https://www.sciencedirect.com/science/article/pii/S1470204519306916.
  14. Bioinformatics, 35, 1278–1283.
  15. Enders, C. K. (2010) Applied missing data analysis. New York, NY: The Guilford Press.
  16. Nature biotechnology, 31, 1023–1031.
  17. Journal of the American Statistical Association, 93, 846–857.
  18. Blood, The Journal of the American Society of Hematology, 127, 3004–3014.
  19. Wiley Interdisciplinary Reviews: Computational Statistics, e1626.
  20. Laird, N. M. (1988) Missing data in longitudinal studies. Statistics in Medicine, 7, 305–315.
  21. Journal of Computational and Graphical Statistics, 23, 877–892. URL: https://doi.org/10.1080/10618600.2013.826583.
  22. Nature Machine Intelligence, 5, 13–23.
  23. Journal of the American Statistical Association, 116, 1023–1037.
  24. Boca Raton, FL: CRC Press.
  25. Chichester: John Wiley & Sons.
  26. European Urology, 76, 831–842. URL: https://www.sciencedirect.com/science/article/pii/S0302283819306682.
  27. Survey Methodology, 27, 85–96.
  28. Rubin, D. (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
  29. Rubin, D. B. (1976) Inference and missing data. Biometrika, 63, 581–592.
  30. The Annals of Applied Statistics, 6, 1814 – 1837. URL: https://doi.org/10.1214/12-AOAS555.
  31. Journal of Clinical Oncology, 35, 2514.
  32. BMJ open, 5, e007450.
  33. Van Buuren, S. (2018) Flexible imputation of missing data. Boca Raton, FL: CRC Press, 2nd edn.
  34. PloS one, 15, e0237802.
Citations (2)

Summary

We haven't generated a summary for this paper yet.