CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models (2405.15871v1)
Abstract: Despite the excelling performance of machine learning models, understanding the decisions of machine learning models remains a long-standing goal. While commonly used attribution methods in explainable AI attempt to address this issue, they typically rely on associational rather than causal relationships. In this study, within the context of time series classification, we introduce a novel framework to assess the causal effect of concepts, i.e., predefined segments within a time series, on specific classification outcomes. To achieve this, we leverage state-of-the-art diffusion-based generative models to estimate counterfactual outcomes. Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically. We demonstrate the insights gained by our approach for a diverse set of qualitatively different time series classification tasks. Although causal and associational attributions might often share some similarities, in all cases they differ in important details, underscoring the risks associated with drawing causal conclusions from associational data alone. We believe that the proposed approach is widely applicable also in other domains, particularly where predefined segmentations are available, to shed some light on the limits of associational attributions.
- J. L. Alcaraz and N. Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models. Transactions on Machine Learning Research, 2023a. ISSN 2835-8856.
- J. M. L. Alcaraz and N. Strodthoff. Diffusion-based conditional ECG generation with structured state space models. Computers in Biology and Medicine, page 107115, June 2023b. doi: 10.1016/j.compbiomed.2023.107115.
- J. M. L. Alcaraz and N. Strodthoff. GitHub - AI4HealthUOL/CausalConceptTS: Repository for the paper ’CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models’. — github.com. https://github.com/AI4HealthUOL/CausalConceptTS, 2024. [Accessed 24-05-2024].
- Drought forecasting through statistical models using standardised precipitation index: a systematic review and meta-regression analysis. Natural Hazards, 97:955–977, 2019.
- Counterfactual explanations for multivariate time series. In 2021 international conference on applied artificial intelligence (ICAPAI), pages 1–8. IEEE, 2021.
- M. A. Bashar and R. Nayak. Tanogan: Time series anomaly detection with generative adversarial networks. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1778–1785. IEEE, 2020.
- Utilizing humidity and temperature data to advance monitoring and prediction of meteorological drought. Climate, 3(4):999–1017, 2015.
- T. Bepler and B. Berger. Learning the protein language: Evolution, structure, and function. Cell systems, 12(6):654–669, 2021.
- PredDiff: Explanations and interactions from conditional expectations. Artificial Intelligence, 312:103774, 2022. doi: 10.1016/j.artint.2022.103774.
- Analysis of eeg structural synchrony in adolescents with schizophrenic disorders. Human Physiology, 31:255–261, 2005.
- Drought forecasting using the standardized precipitation index. Water resources management, 21:801–819, 2007.
- Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing, 307:72–77, 2018.
- Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research, 22(209):1–90, 2021.
- J. Crabbé and M. Van Der Schaar. Explaining time series predictions with dynamic masks. In International Conference on Machine Learning, pages 2166–2177. PMLR, 2021.
- Instance-based counterfactual explanations for time series classification. In International conference on case-based reasoning, pages 32–47. Springer, 2021.
- A time series forest for classification and feature extraction. Information Sciences, 239:142–153, 2013.
- W. Dressler and R. Hugo. High t waves in the earliest stage of myocardial infarction. American heart journal, 34(5):627–645, 1947.
- A guide to deep learning in healthcare. Nature medicine, 25(1):24–29, 2019.
- hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5):527–531, 2017.
- D. Gillies. Causality, Probability, and Medicine. Taylor & Francis, 2018. ISBN 9781317564287.
- PhysioBank, PhysioToolkit, and PhysioNet. Circulation, 101(23):e215–e220, 2000. doi: 10.1161/01.CIR.101.23.e215.
- Explaining classifiers with causal concept effect (cace). arXiv preprint 1907.07165, 2019.
- MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7(267):1–13, 2013. doi: 10.3389/fnins.2013.00267.
- Serial p wave changes in acute myocardial infarction. American Heart Journal, 77(3):336–341, 1969.
- Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022.
- Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature medicine, 25(1):65–69, 2019.
- Classification of time series by shapelet transformation. Data mining and knowledge discovery, 28:851–881, 2014.
- M. D. Hssayeni. Imbalanced time-series data regression using conditional generative adversarial networks. In International Conference on Machine Learning and Applications, 2022.
- Benchmarking deep learning interpretability in time series predictions. Advances in neural information processing systems, 33:6441–6452, 2020.
- Inceptiontime: Finding alexnet for time series classification. Data Mining and Knowledge Discovery, 34(6):1936–1962, 2020.
- Lstm fully convolutional networks for time series classification. IEEE access, 6:1662–1669, 2017.
- Generalized random shapelet forests. Data mining and knowledge discovery, 30:1053–1085, 2016.
- Bayesian optimization of machine learning classification of resting-state eeg microstates in schizophrenia: A proof-of-concept preliminary study based on secondary analysis. Brain Sciences, 12(11):1497, Nov 2022. ISSN 2076-3425. doi: 10.3390/brainsci12111497.
- Native eeg and treatment effects in neuroleptic-naïve schizophrenic patients: Time and frequency domain approaches. Schizophrenia Research, 97(1):163–172, 2007. ISSN 0920-9964. doi: https://doi.org/10.1016/j.schres.2007.07.012.
- A deviant eeg brain microstate in acute, neuroleptic-naive schizophrenics at rest. European archives of psychiatry and clinical neuroscience, 249:205–211, 1999.
- Diffwave: A versatile diffusion model for audio synthesis. In 9th International Conference on Learning Representations, ICLR 2021, 2021.
- Motif-guided time series counterfactual explanations. In International Conference on Pattern Recognition, pages 203–215. Springer, 2022.
- J. Lines and A. Bagnall. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, 29:565–592, 2015.
- A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
- T. Mehari and N. Strodthoff. Towards quantitative precision for ecg analysis: Leveraging state space models, self-supervision and patient metadata. IEEE Journal of Biomedical and Health Informatics, 2023.
- Hive-cote 2.0: a new meta ensemble for time series classification. Machine Learning, 110(11):3211–3243, 2021.
- C. Minixhofer. Predict droughts using weather & soil data, Mar 2021. URL https://www.kaggle.com/datasets/cdminix/us-drought-meteorological-data.
- Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 19(6):1236–1246, 2018.
- Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1–15, 2018.
- Eeg microstates associated with salience and frontoparietal networks in frontotemporal dementia, schizophrenia and alzheimer’s disease. Clinical Neurophysiology, 124(6):1106–1114, 2013. ISSN 1388-2457. doi: https://doi.org/10.1016/j.clinph.2013.01.005.
- Segmentation of brain electrical activity into microstates: model estimation and validation. IEEE Transactions on Biomedical Engineering, 42(7):658–665, 1995.
- J. Pearl. Causality. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009. ISBN 9780521895606.
- Seqsleepnet: end-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(3):400–410, 2019.
- Sleeptransformer: Automatic sleep staging with interpretability and uncertainty quantification. IEEE Transactions on Biomedical Engineering, 69(8):2456–2467, Aug. 2022. ISSN 1558-2531. doi: 10.1109/tbme.2022.3147187.
- Scalable and accurate deep learning with electronic health records. NPJ digital medicine, 1(1):18, 2018.
- T. Rakthanmanon and E. Keogh. Fast shapelets: A scalable algorithm for discovering time series shapelets. In proceedings of the 2013 SIAM International Conference on Data Mining, pages 668–676. SIAM, 2013.
- Tsshap: Robust model agnostic feature-based explainability for time series forecasting, 2023.
- Explainable artificial intelligence (xai) on timeseries data: A survey. arXiv preprint arXiv:2104.00950, 2021.
- Deep learning-based electroencephalography analysis: a systematic review. Journal of neural engineering, 16(5):051001, 2019.
- M. Rußwurm and M. Körner. Self-attention for raw optical satellite time series classification. ISPRS journal of photogrammetry and remote sensing, 169:421–435, 2020.
- Towards trustworthy seizure onset detection using workflow notes. npj Digital Medicine, 7(1):42, 2024.
- P. Schäfer. The boss is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery, 29:1505–1530, 2015.
- P. Schäfer and U. Leser. Fast and accurate time series classification with weasel. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 637–646, 2017.
- On causal and anticausal learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, page 459–466, Madison, WI, USA, 2012. Omnipress. ISBN 9781450312851.
- Estimating individual treatment effect: generalization bounds and algorithms. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3076–3085. PMLR, 06–11 Aug 2017.
- Deep learning in medical image analysis. Annual review of biomedical engineering, 19:221–248, 2017.
- Deep learning and the electrocardiogram: review of the current state-of-the-art. EP Europace, 23(8):1179–1191, Feb. 2021. doi: 10.1093/europace/euaa377.
- Drought prediction system for central europe and its validation. Geosciences, 8(4):104, 2018.
- Prospects for AI-Enhanced ECG as a Unified Screening Tool for Cardiac and Non-Cardiac Conditions – An Explorative Study in Emergency Care. European Heart Journal - Digital Health, page ztae039, 05 2024. ISSN 2634-3916. doi: 10.1093/ehjdh/ztae039.
- Eeg microstates and its relationship with clinical symptoms in patients with schizophrenia. Frontiers in Psychiatry, 12, 2021. ISSN 1664-0640. doi: 10.3389/fpsyt.2021.761203.
- Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
- Fourth universal definition of myocardial infarction (2018). Circulation, 138(20):e618–e651, 2018.
- What went wrong and when? instance-wise feature importance for time-series black-box models. Advances in Neural Information Processing Systems, 33:799–809, 2020.
- F. von Wegner. GitHub - Frederic-vW/eeg_microstates: EEG microstate analysis — github.com, 2017. URL https://github.com/Frederic-vW/eeg_microstates/tree/master. [Accessed 28-04-2024].
- PTB-XL, a large publicly available electrocardiography dataset. Scientific Data, 7(1):154, 2020. doi: 10.1038/s41597-020-0495-6.
- Explaining deep learning for ecg analysis: Building blocks for auditing and knowledge discovery. Computers in Biology and Medicine, 176:108525, June 2024. ISSN 0010-4825. doi: 10.1016/j.compbiomed.2024.108525.
- Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, Aug. 2023. doi: 10.1038/s41586-023-06221-2.
- Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119:3–11, mar 2019. doi: 10.1016/j.patrec.2018.02.010.
- T. Wang and N. Strodthoff. S4sleep: Elucidating the design space of deep-learning-based sleep stage classification models, 2023.
- Learning time series counterfactuals via latent space representations. In Discovery Science: 24th International Conference, DS 2021, Halifax, NS, Canada, October 11–13, 2021, Proceedings 24, pages 369–384. Springer, 2021.
- Interpretation of time-series deep models: A survey. arXiv preprint arXiv:2305.14582, 2023.