Imputation of missing values in multi-view data (2210.14484v4)
Abstract: Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.
- Multi-view learning overview: recent progress and new challenges, Information Fusion 38 (2017) 43–54.
- A review on machine learning principles for multi-view biological data integration, Briefings in Bioinformatics 19 (2018) 325–340.
- UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLoS Medicine 12 (2015) e1001779.
- The UK biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions, Nature Communications 11 (2020) 1–12.
- The Alzheimer’s disease neuroimaging initiative, Neuroimaging Clinics of North America 15 (2005) 869.
- Combining anatomical, diffusion, and resting state functional magnetic resonance imaging for individual classification of mild and moderate Alzheimer’s disease, NeuroImage: Clinical 11 (2016) 46–51.
- Combining multiple anatomical MRI measures improves Alzheimer’s disease classification, Human Brain Mapping 37 (2016) 1920–1929.
- A comprehensive analysis of resting state fMRI measures to classify individual patients with Alzheimer’s disease, NeuroImage 167 (2017) 62–72.
- Multimodal integration of brain images for MRI-based diagnosis in schizophrenia, Frontiers in Neuroscience 13 (2019) 1–9.
- A multimodal neuroimaging classifier for alcohol dependence, Scientific Reports 10 (2020) 1–12.
- MMDD-ensemble: A multimodal data driven ensemble approach for Parkinson’s disease detection, Frontiers in Neuroscience 15 (2021) 1–11.
- Stacked penalized logistic regression for selecting views in multi-view learning, Information Fusion 61 (2020a) 113–123. doi:https://doi.org/10.1016/j.inffus.2020.03.007.
- View selection in multi-view stacking: choosing the meta-learner, arXiv preprint arXiv:2010.16271 (2020b).
- Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer’s disease classification, Frontiers in Neuroscience 16 (2022).
- D. B. Rubin, Inference and missing data, Biometrika 63 (1976) 581–592.
- On the consistency of supervised learning with missing values, arXiv preprint arXiv:1902.06931 (2019).
- E. S. Nordholt, Imputation: methods, simulation experiments and practical examples, International Statistical Review 66 (1998) 157–180.
- R. R. Andridge, R. J. Little, A review of hot deck imputation for survey non-response, International statistical review 78 (2010) 40–64.
- J. K. Dixon, Pattern recognition with partly missing data, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 617–621.
- S. van Buuren, K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in R, Journal of Statistical Software 45 (2011) 1–67. doi:10.18637/jss.v045.i03.
- Intelligent initialization and adaptive thresholding for iterative matrix completion: Some statistical and algorithmic theory for adaptive-impute, Journal of Computational and Graphical Statistics 28 (2019) 323–333.
- J. Josse, F. Husson, missmda: a package for handling missing values in multivariate data analysis, Journal of statistical software 70 (2016) 1–31.
- J. Josse, F. Husson, Handling missing values in exploratory multivariate data analysis methods, Journal de la société française de statistique 153 (2012) 79–99.
- F. Husson, J. Josse, Handling missing values in multiple factor analysis, Food quality and preference 30 (2013) 77–85.
- D. J. Stekhoven, P. Bühlmann, MissForest — non-parametric missing value imputation for mixed-type data, Bioinformatics 28 (2012) 112–118.
- R. Lall, T. Robinson, The MIDAS touch: Accurate and scalable missing-data imputation with deep learning, Political Analysis 30 (2022) 179–196. doi:10.1017/pan.2020.49.
- P.-A. Mattei, J. Frellsen, Miwae: Deep generative modelling and imputation of incomplete data sets, in: International conference on machine learning, PMLR, 2019, pp. 4413–4423.
- GAIN: Missing data imputation using generative adversarial nets, in: Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 5689–5698.
- Missing data imputation with adversarially-trained graph convolutional networks, Neural Networks 129 (2020) 249–260.
- Evaluating the impact of multivariate imputation by MICE in feature selection, PLOS ONE 16 (2021) e0254720.
- J. Arbuckle, Full information estimation in the presence of incomplete data, in: G. A. Marcoulides, R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (2009 reprint), Psychology Press, New York, NY, 1996, pp. 243–277.
- Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods, IEEE Transactions on Software Engineering 27 (2001) 999–1013.
- Good methods for coping with missing data in decision trees, Pattern Recognition Letters 29 (2008) 950–956.
- A review of integrative imputation for multi-omics datasets, Frontiers in genetics 11 (2020) 570255.
- Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study, Wiley Interdisciplinary Reviews: Computational Statistics (2023) e1626.
- Multi-view learning in the presence of view disagreement, arXiv preprint arXiv:1206.3242 (2012).
- M. Wu, N. Goodman, Multimodal generative models for scalable weakly-supervised learning, Advances in neural information processing systems 31 (2018).
- Tobmi: trans-omics block missing data imputation using a k-nearest neighbor weighted approach, Bioinformatics 35 (2019) 1278–1283.
- Exploring and exploiting uncertainty for incomplete multi-view classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19873–19882.
- Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information, Bmc Bioinformatics 17 (2016) 1–13.
- Priority-lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data, BMC bioinformatics 19 (2018) 1–14.
- PrediXcan: Trait mapping using human transcriptome regulation, BioRxiv (2015) 020164.
- Integrative approaches for large-scale transcriptome-wide association studies, Nature genetics 48 (2016) 245–252.
- Tigar: an improved bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits, The American Journal of Human Genetics 105 (2019) 258–266.
- Multi-view learning with incomplete views, IEEE Transactions on Image Processing 24 (2015) 5812–5825.
- Structured matrix completion with applications to genomic data integration, Journal of the American Statistical Association 111 (2016) 621–633.
- Multi-view missing data completion, IEEE Transactions on Knowledge and Data Engineering 30 (2018) 1296–1309.
- Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion, NeuroImage 91 (2014) 386–400.
- Joint robust imputation and classification for early dementia detection using incomplete multi-modality data, in: PRedictive Intelligence in MEdicine: First International Workshop, PRIME 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 1, Springer, 2018, pp. 51–59.
- H. Linder, Y. Zhang, Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes, Communications for Statistical Applications and Methods 26 (2019) 411–430.
- Generalized integrative principal component analysis for multi-type data with block-wise missing structure, Biostatistics 21 (2020) 302–318.
- Imputed factor regression for high-dimensional block-wise missing data, Statistica Sinica 30 (2020) 631–651.
- Missing value imputation for multi-view urban statistical data via spatial correlation learning, IEEE Transactions on Knowledge and Data Engineering 35 (2023) 686–698.
- Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets, Molecular systems biology 14 (2018) e8124.
- Mofa+: a statistical framework for comprehensive integration of multi-modal single-cell data, Genome biology 21 (2020) 1–17.
- Imputation algorithm for multi-view financial data based on weighted random forest, in: 2023 2nd International Conference on Urban Planning and Regional Economy (UPRE 2023), Atlantis Press, 2023, pp. 55–70.
- Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1405–1414.
- Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction, in: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), IEEE, 2017, pp. 202–208.
- Prime: block-wise missingness handling for multi-modalities in intelligent tutoring systems, in: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, 2020, pp. 63–75.
- Jointly imputing multi-view data with optimal transport, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2023, pp. 4747–4755.
- Deep adversarial learning for multi-modality missing data completion, in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1158–1166.
- Vigan: Missing view imputation with generative adversarial networks, in: 2017 IEEE International conference on big data (Big Data), IEEE, 2017, pp. 766–775.
- Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems, in: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2021, pp. 791–798.
- A case study of stacked multi-view learning in dementia research, in: 13th Conference on Artificial Intelligence in Medicine, 2011, pp. 60–69.
- Multi-view stacking for activity recognition with sound and accelerometer data, Information Fusion 40 (2018) 45–56.
- A mixture of views network with applications to multi-view medical imaging, Neurocomputing 374 (2020) 1–9.
- Adaptive mixtures of local experts, Neural computation 3 (1991) 79–87.
- Twenty years of mixture of experts, IEEE transactions on neural networks and learning systems 23 (2012) 1177–1193.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer, arXiv preprint arXiv:1701.06538 (2017).
- D. H. Wolpert, Stacked generalization, Neural Networks 5 (1992) 241–259.
- Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls, Bmj 338 (2009).
- M. Matsumoto, T. Nishimura, Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation 8 (1998) 3–30.
- Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software 33 (2010) 1–22. URL: http://www.jstatsoft.org/v33/i01/.
- A. T. L. Lun, basilisk: a bioconductor package for managing python environments, Journal of Open Source Software 7 (2022) 4742. doi:10.21105/joss.04742.
- G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly Weather Review 78 (1950) 1–3.
- Driving cessation and dementia: results of the prospective registry on dementia in Austria (PRODEM), PLoS ONE 7 (2012) e52710.
- Assessment of cerebrovascular risk profiles in healthy persons: definition of research goals and the Austrian stroke prevention study (ASPS), Neuroepidemiology 13 (1994) 308–313.
- Fitness and cognition in the elderly: the Austrian stroke prevention study, Neurology 86 (2016) 418–424.
- T. Orchard, M. A. Woodbury, A missing information principle: theory and applications, in: Volume 1 Theory of Statistics, University of California Press, 1972, pp. 697–716.
- Y. Zhao, Q. Long, Multiple imputation in the presence of high-dimensional data, Statistical Methods in Medical Research 25 (2016) 2021–2035.
- Multiple imputation for general missing data patterns in the presence of high-dimensional data, Scientific Reports 6 (2016) 1–10.
- A. Kapelner, J. Bleich, Prediction with missing data via bayesian additive regression trees, Canadian Journal of Statistics 43 (2015) 224–239.