Beyond Random Missingness: Clinically Rethinking for Healthcare Time Series Imputation (2405.17508v3)
Abstract: This study investigates the impact of masking strategies on time series imputation models in healthcare settings. While current approaches predominantly rely on random masking for model evaluation, this practice fails to capture the structured nature of missing patterns in clinical data. Using the PhysioNet Challenge 2012 dataset, we analyse how different masking implementations affect both imputation accuracy and downstream clinical predictions across eleven imputation methods. Our results demonstrate that masking choices significantly influence model performance, while recurrent architectures show more consistent performance across strategies. Analysis of downstream mortality prediction reveals that imputation accuracy doesn't necessarily translate to optimal clinical prediction capabilities. Our findings emphasise the need for clinically-informed masking strategies that better reflect real-world missing patterns in healthcare data, suggesting current evaluation frameworks may need reconsideration for reliable clinical deployment.
- Diffusion-based time series imputation and forecasting with structured state space models. Transactions on Machine Learning Research, 2022.
- Medical data wrangling with sequential variational autoencoders. IEEE Journal of Biomedical and Health Informatics, 26(6):2737–2745, 2021.
- Brits: Bidirectional recurrent imputation for time series. Advances in neural information processing systems, 31, 2018.
- Recurrent neural networks for multivariate time series with missing values. Scientific reports, 8(1):1–12, 2018.
- Provably convergent schrödinger bridge with applications to probabilistic time series imputation. In International Conference on Machine Learning, pages 4485–4513. PMLR, 2023.
- Rdis: Random drop imputation with self-training for incomplete time series data. IEEE Access, 2023.
- Saits: Self-attention-based imputation for time series. Expert Systems with Applications, 219:119619, 2023.
- Wenjie Du. Pypots: A python toolbox for data mining on partially-observed time series. arXiv preprint arXiv:2305.18811, 2023.
- Gp-vae: Deep probabilistic time series imputation. In International conference on artificial intelligence and statistics, pages 1651–1661. PMLR, 2020.
- Pattern classification with missing data: a review. Neural Computing and Applications, 19:263–282, 2010.
- Tsi-gnn: Extending graph neural networks to handle missing data in temporal settings. Frontiers in big Data, 4:693869, 2021.
- Probabilistic imputation for time-series classification with missing data. In International Conference on Machine Learning, pages 16654–16667. PMLR, 2023.
- Memory-augmented dynamic graph convolution networks for traffic data imputation with diverse missing patterns. Transportation Research Part C: Emerging Technologies, 143:103826, 2022.
- Compound density networks for risk prediction using electronic health records. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1078–1085. IEEE, 2022.
- Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques. Artificial Intelligence in Medicine, 142:102587, 2023.
- Multivariate time series imputation with generative adversarial networks. Advances in neural information processing systems, 31, 2018.
- E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence, pages 3094–3100. AAAI Press Palo Alto, CA, USA, 2019.
- Miwae: Deep generative modelling and imputation of incomplete data sets. In International conference on machine learning, pages 4413–4423. PMLR, 2019.
- Generative semi-supervised learning for multivariate time series imputation. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 8983–8991, 2021.
- Uncertainty-aware variational-recurrent imputation network for clinical time series. IEEE Transactions on Cybernetics, 52(9):9684–9694, 2021.
- Handling incomplete heterogeneous data using vaes. Pattern Recognition, 107:107501, 2020.
- Neural markov controlled sde: Stochastic optimization for continuous-time data. In International Conference on Learning Representations, 2021.
- Missing value estimation of microarray data using sim-gan. Knowledge and Information Systems, 64(10):2661–2687, 2022.
- Modeling irregular time series with continuous recurrent units. In International Conference on Machine Learning, pages 19388–19405. PMLR, 2022.
- Vigan: Missing view imputation with generative adversarial networks. In 2017 IEEE International conference on big data (Big Data), pages 766–775. IEEE, 2017.
- Predicting mortality of icu patients: The physionet/computing in cardiology challenge 2012. Predicting Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge, p. v1, 2012.
- Glima: Global and local time series imputation with multi-directional attention learning. In 2020 IEEE International Conference on Big Data (Big Data), pages 798–807. IEEE, 2020.
- Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
- Imaging time-series to improve classification and imputation. In Proceedings of the 24th International Conference on Artificial Intelligence, pages 3939–3945, 2015.
- Deep learning for multivariate time series imputation: A survey. arXiv preprint arXiv:2402.04059, 2024.
- Timesnet: Temporal 2d-variation modeling for general time series analysis. In The eleventh international conference on learning representations, 2022.
- Density-aware temporal attentive step-wise diffusion model for medical time series imputation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 2836–2845, 2023.
- A survey on diffusion models for time series and spatio-temporal data. arXiv preprint arXiv:2404.18886, 2024.
- Spatial-temporal traffic data imputation via graph attention convolutional network. In Igor Farkaš, Paolo Masulli, Sebastian Otte, and Stefan Wermter, editors, Artificial Neural Networks and Machine Learning – ICANN 2021, pages 241–252, 2021.
- Multivariate time series imputation with transformers. IEEE Signal Processing Letters, 29:2517–2521, 2022.
- Multi-directional recurrent neural networks: A novel method for estimating missing data. In Time series workshop in international conference on machine learning, 2017.
- Cautionary tales on air-quality improvement in beijing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2205):20170457, 2017.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.