Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Imputation of missing values in multi-view data (2210.14484v4)

Published 26 Oct 2022 in stat.ML, cs.LG, and stat.ME

Abstract: Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Multi-view learning overview: recent progress and new challenges, Information Fusion 38 (2017) 43–54.
  2. A review on machine learning principles for multi-view biological data integration, Briefings in Bioinformatics 19 (2018) 325–340.
  3. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLoS Medicine 12 (2015) e1001779.
  4. The UK biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions, Nature Communications 11 (2020) 1–12.
  5. The Alzheimer’s disease neuroimaging initiative, Neuroimaging Clinics of North America 15 (2005) 869.
  6. Combining anatomical, diffusion, and resting state functional magnetic resonance imaging for individual classification of mild and moderate Alzheimer’s disease, NeuroImage: Clinical 11 (2016) 46–51.
  7. Combining multiple anatomical MRI measures improves Alzheimer’s disease classification, Human Brain Mapping 37 (2016) 1920–1929.
  8. A comprehensive analysis of resting state fMRI measures to classify individual patients with Alzheimer’s disease, NeuroImage 167 (2017) 62–72.
  9. Multimodal integration of brain images for MRI-based diagnosis in schizophrenia, Frontiers in Neuroscience 13 (2019) 1–9.
  10. A multimodal neuroimaging classifier for alcohol dependence, Scientific Reports 10 (2020) 1–12.
  11. MMDD-ensemble: A multimodal data driven ensemble approach for Parkinson’s disease detection, Frontiers in Neuroscience 15 (2021) 1–11.
  12. Stacked penalized logistic regression for selecting views in multi-view learning, Information Fusion 61 (2020a) 113–123. doi:https://doi.org/10.1016/j.inffus.2020.03.007.
  13. View selection in multi-view stacking: choosing the meta-learner, arXiv preprint arXiv:2010.16271 (2020b).
  14. Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer’s disease classification, Frontiers in Neuroscience 16 (2022).
  15. D. B. Rubin, Inference and missing data, Biometrika 63 (1976) 581–592.
  16. On the consistency of supervised learning with missing values, arXiv preprint arXiv:1902.06931 (2019).
  17. E. S. Nordholt, Imputation: methods, simulation experiments and practical examples, International Statistical Review 66 (1998) 157–180.
  18. R. R. Andridge, R. J. Little, A review of hot deck imputation for survey non-response, International statistical review 78 (2010) 40–64.
  19. J. K. Dixon, Pattern recognition with partly missing data, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 617–621.
  20. S. van Buuren, K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in R, Journal of Statistical Software 45 (2011) 1–67. doi:10.18637/jss.v045.i03.
  21. Intelligent initialization and adaptive thresholding for iterative matrix completion: Some statistical and algorithmic theory for adaptive-impute, Journal of Computational and Graphical Statistics 28 (2019) 323–333.
  22. J. Josse, F. Husson, missmda: a package for handling missing values in multivariate data analysis, Journal of statistical software 70 (2016) 1–31.
  23. J. Josse, F. Husson, Handling missing values in exploratory multivariate data analysis methods, Journal de la société française de statistique 153 (2012) 79–99.
  24. F. Husson, J. Josse, Handling missing values in multiple factor analysis, Food quality and preference 30 (2013) 77–85.
  25. D. J. Stekhoven, P. Bühlmann, MissForest — non-parametric missing value imputation for mixed-type data, Bioinformatics 28 (2012) 112–118.
  26. R. Lall, T. Robinson, The MIDAS touch: Accurate and scalable missing-data imputation with deep learning, Political Analysis 30 (2022) 179–196. doi:10.1017/pan.2020.49.
  27. P.-A. Mattei, J. Frellsen, Miwae: Deep generative modelling and imputation of incomplete data sets, in: International conference on machine learning, PMLR, 2019, pp. 4413–4423.
  28. GAIN: Missing data imputation using generative adversarial nets, in: Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 5689–5698.
  29. Missing data imputation with adversarially-trained graph convolutional networks, Neural Networks 129 (2020) 249–260.
  30. Evaluating the impact of multivariate imputation by MICE in feature selection, PLOS ONE 16 (2021) e0254720.
  31. J. Arbuckle, Full information estimation in the presence of incomplete data, in: G. A. Marcoulides, R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (2009 reprint), Psychology Press, New York, NY, 1996, pp. 243–277.
  32. Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods, IEEE Transactions on Software Engineering 27 (2001) 999–1013.
  33. Good methods for coping with missing data in decision trees, Pattern Recognition Letters 29 (2008) 950–956.
  34. A review of integrative imputation for multi-omics datasets, Frontiers in genetics 11 (2020) 570255.
  35. Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study, Wiley Interdisciplinary Reviews: Computational Statistics (2023) e1626.
  36. Multi-view learning in the presence of view disagreement, arXiv preprint arXiv:1206.3242 (2012).
  37. M. Wu, N. Goodman, Multimodal generative models for scalable weakly-supervised learning, Advances in neural information processing systems 31 (2018).
  38. Tobmi: trans-omics block missing data imputation using a k-nearest neighbor weighted approach, Bioinformatics 35 (2019) 1278–1283.
  39. Exploring and exploiting uncertainty for incomplete multi-view classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19873–19882.
  40. Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information, Bmc Bioinformatics 17 (2016) 1–13.
  41. Priority-lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data, BMC bioinformatics 19 (2018) 1–14.
  42. PrediXcan: Trait mapping using human transcriptome regulation, BioRxiv (2015) 020164.
  43. Integrative approaches for large-scale transcriptome-wide association studies, Nature genetics 48 (2016) 245–252.
  44. Tigar: an improved bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits, The American Journal of Human Genetics 105 (2019) 258–266.
  45. Multi-view learning with incomplete views, IEEE Transactions on Image Processing 24 (2015) 5812–5825.
  46. Structured matrix completion with applications to genomic data integration, Journal of the American Statistical Association 111 (2016) 621–633.
  47. Multi-view missing data completion, IEEE Transactions on Knowledge and Data Engineering 30 (2018) 1296–1309.
  48. Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion, NeuroImage 91 (2014) 386–400.
  49. Joint robust imputation and classification for early dementia detection using incomplete multi-modality data, in: PRedictive Intelligence in MEdicine: First International Workshop, PRIME 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 1, Springer, 2018, pp. 51–59.
  50. H. Linder, Y. Zhang, Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes, Communications for Statistical Applications and Methods 26 (2019) 411–430.
  51. Generalized integrative principal component analysis for multi-type data with block-wise missing structure, Biostatistics 21 (2020) 302–318.
  52. Imputed factor regression for high-dimensional block-wise missing data, Statistica Sinica 30 (2020) 631–651.
  53. Missing value imputation for multi-view urban statistical data via spatial correlation learning, IEEE Transactions on Knowledge and Data Engineering 35 (2023) 686–698.
  54. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets, Molecular systems biology 14 (2018) e8124.
  55. Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome biology 21 (2020) 1–17.
  56. Imputation algorithm for multi-view financial data based on weighted random forest, in: 2023 2nd International Conference on Urban Planning and Regional Economy (UPRE 2023), Atlantis Press, 2023, pp. 55–70.
  57. Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1405–1414.
  58. Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction, in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), IEEE, 2017, pp. 202–208.
  59. Prime: block-wise missingness handling for multi-modalities in intelligent tutoring systems, in: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, 2020, pp. 63–75.
  60. Jointly imputing multi-view data with optimal transport, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023, pp. 4747–4755.
  61. Deep adversarial learning for multi-modality missing data completion, in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1158–1166.
  62. Vigan: Missing view imputation with generative adversarial networks, in: 2017 IEEE International conference on big data (Big Data), IEEE, 2017, pp. 766–775.
  63. Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems, in: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2021, pp. 791–798.
  64. A case study of stacked multi-view learning in dementia research, in: 13th Conference on Artificial Intelligence in Medicine, 2011, pp. 60–69.
  65. Multi-view stacking for activity recognition with sound and accelerometer data, Information Fusion 40 (2018) 45–56.
  66. A mixture of views network with applications to multi-view medical imaging, Neurocomputing 374 (2020) 1–9.
  67. Adaptive mixtures of local experts, Neural computation 3 (1991) 79–87.
  68. Twenty years of mixture of experts, IEEE transactions on neural networks and learning systems 23 (2012) 1177–1193.
  69. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer, arXiv preprint arXiv:1701.06538 (2017).
  70. D. H. Wolpert, Stacked generalization, Neural Networks 5 (1992) 241–259.
  71. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls, Bmj 338 (2009).
  72. M. Matsumoto, T. Nishimura, Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation 8 (1998) 3–30.
  73. Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software 33 (2010) 1–22. URL: http://www.jstatsoft.org/v33/i01/.
  74. A. T. L. Lun, basilisk: a bioconductor package for managing python environments, Journal of Open Source Software 7 (2022) 4742. doi:10.21105/joss.04742.
  75. G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly Weather Review 78 (1950) 1–3.
  76. Driving cessation and dementia: results of the prospective registry on dementia in Austria (PRODEM), PLoS ONE 7 (2012) e52710.
  77. Assessment of cerebrovascular risk profiles in healthy persons: definition of research goals and the Austrian stroke prevention study (ASPS), Neuroepidemiology 13 (1994) 308–313.
  78. Fitness and cognition in the elderly: the Austrian stroke prevention study, Neurology 86 (2016) 418–424.
  79. T. Orchard, M. A. Woodbury, A missing information principle: theory and applications, in: Volume 1 Theory of Statistics, University of California Press, 1972, pp. 697–716.
  80. Y. Zhao, Q. Long, Multiple imputation in the presence of high-dimensional data, Statistical Methods in Medical Research 25 (2016) 2021–2035.
  81. Multiple imputation for general missing data patterns in the presence of high-dimensional data, Scientific Reports 6 (2016) 1–10.
  82. A. Kapelner, J. Bleich, Prediction with missing data via bayesian additive regression trees, Canadian Journal of Statistics 43 (2015) 224–239.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets