Multivariate Functional Linear Discriminant Analysis for the Classification of Short Time Series with Missing Data (2402.13103v1)
Abstract: Functional linear discriminant analysis (FLDA) is a powerful tool that extends LDA-mediated multiclass classification and dimension reduction to univariate time-series functions. However, in the age of large multivariate and incomplete data, statistical dependencies between features must be estimated in a computationally tractable way, while also dealing with missing data. There is a need for a computationally tractable approach that considers the statistical dependencies between features and can handle missing values. We here develop a multivariate version of FLDA (MUDRA) to tackle this issue and describe an efficient expectation/conditional-maximization (ECM) algorithm to infer its parameters. We assess its predictive power on the "Articulary Word Recognition" data set and show its improvement over the state-of-the-art, especially in the case of missing data. MUDRA allows interpretable classification of data sets with large proportions of missing data, which will be particularly useful for medical or psychological data sets.
- Algorithm 432 [C2]: Solution of the Matrix Equation AX + XB = C [F4]. Commun. ACM, 15(9):820–826, September 1972. ISSN 0001-0782. doi: 10.1145/361573.361582. URL https://doi.org/10.1145/361573.361582. Place: New York, NY, USA Publisher: Association for Computing Machinery.
- Variable-Length Multivariate Time Series Classification Using ROCKET: A Case Study of Incident Detection. IEEE Access, 10:95701–95715, 2022. ISSN 2169-3536. doi: 10.1109/ACCESS.2022.3203523. URL https://ieeexplore.ieee.org/document/9874797/.
- Rasmus Bro. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38(2):149–171, October 1997. ISSN 0169-7439. doi: 10.1016/S0169-7439(97)00032-4. URL https://www.sciencedirect.com/science/article/pii/S0169743997000324.
- The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, November 2019. ISSN 2329-9274. doi: 10.1109/JAS.2019.1911747. URL https://ieeexplore.ieee.org/document/8894743. Conference Name: IEEE/CAA Journal of Automatica Sinica.
- ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Disc, 34(5):1454–1495, September 2020. ISSN 1573-756X. doi: 10.1007/s10618-020-00701-z. URL https://doi.org/10.1007/s10618-020-00701-z.
- Ronald A Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.
- Multivariate functional subspace classification for high-dimensional longitudinal data. Jpn J Stat Data Sci, November 2023. ISSN 2520-8764. doi: 10.1007/s42081-023-00226-x. URL https://doi.org/10.1007/s42081-023-00226-x.
- Sugnet Gardner-Lubbe. Linear discriminant analysis for multiple functional data analysis. J Appl Stat, 48(11):1917–1933, 2021. ISSN 0266-4763 1360-0532. doi: 10.1080/02664763.2020.1780569. Place: England.
- An Expectation-Maximization Algorithm for the Matrix Normal Distribution, September 2013. URL http://arxiv.org/abs/1309.6609.
- A hessenberg-schur method for the problem ax+ xb= c. IEEE Transactions on Automatic Control, 24(6):909–913, 1979.
- Comparing linear discriminant analysis and supervised learning algorithms for binary classification—a method comparison study. Biometrical Journal, 66(1):2200098, 2024.
- Alan Julian Izenman. Multivariate Regression. In Alan J. Izenman, editor, Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning, pages 159–194. Springer New York, New York, NY, 2008. ISBN 978-0-387-78189-1. doi: 10.1007/978-0-387-78189-1_6. URL https://doi.org/10.1007/978-0-387-78189-1_6.
- Functional Linear Discriminant Analysis for Irregularly Sampled Curves. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(3):533–550, 2001. ISSN 13697412, 14679868. URL http://www.jstor.org/stable/2680587. Publisher: [Royal Statistical Society, Wiley].
- Time series analysis for psychological research: examining and forecasting change. Frontiers in Psychology, 6, 2015. ISSN 1664-1078. URL https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2015.00727.
- Tensorly: Tensor learning in python. arXiv preprint arXiv:1610.09555, 2016.
- Handling Missing Data in Growth Mixture Models. Journal of Educational and Behavioral Statistics, 48(3):320–348, June 2023. ISSN 1076-9986. doi: 10.3102/10769986221149140. URL https://doi.org/10.3102/10769986221149140. Publisher: American Educational Research Association.
- HIVE-COTE: The Hierarchical Vote Collective of Transformation-Based Ensembles for Time Series Classification. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1041–1046, December 2016. doi: 10.1109/ICDM.2016.0133. URL https://ieeexplore.ieee.org/document/7837946. ISSN: 2374-8486.
- Fedlda: Personalized federated learning through collaborative linear discriminant analysis. In International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023, 2023.
- Maximum likelihood estimation via the ecm algorithm: A general framework. Biometrika, 80(2):267–278, 1993.
- Hive-cote 2.0: a new meta ensemble for time series classification. Machine Learning, 110(11-12):3211–3243, 2021.
- A. K. Gupta Nagar, D. K. Matrix Variate Distributions. Chapman and Hall/CRC, New York, October 1999. ISBN 978-0-203-74928-9. doi: 10.1201/9780203749289.
- Fast adaptive parafac decomposition algorithm with linear complexity. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6235–6239. IEEE, 2016.
- Paul H. C. Eilers and Brian D. Marx. Flexible Smoothing with $B$-splines and Penalties. Statistical Science, 11(2):89–102, 1996. ISSN 08834237. URL http://www.jstor.org/stable/2246049. Publisher: Institute of Mathematical Statistics.
- Growth Mixture Modeling: A Method for Identifying Differences in Longitudinal Change Among Unobserved Groups. Int J Behav Dev, 33(6):565–576, 2009. ISSN 0165-0254. doi: 10.1177/0165025409343765. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3718544/.
- C Radhakrishna Rao. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society. Series B (Methodological), 10(2):159–203, 1948.
- The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc, 35(2):401–449, March 2021. ISSN 1573-756X. doi: 10.1007/s10618-020-00727-3. URL https://doi.org/10.1007/s10618-020-00727-3.
- GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. and Stat. Comput., 7(3):856–869, July 1986. ISSN 0196-5204. doi: 10.1137/0907058. URL https://epubs.siam.org/doi/10.1137/0907058. Publisher: Society for Industrial and Applied Mathematics.
- Fast differentiable matrix square root. arXiv preprint arXiv:2201.08663, 2022.
- Henrik Spliid. Multivariate Time Series Estimation using marima: 38. Symposium i Anvendt Statistik 2016. Symposium i anvendt statistik 2016, pages 108–123, 2016. ISSN 978-87-501-221l-I. Publisher: Danmarks Statistik.
- Multirocket: multiple pooling operators and transformations for fast and effective time series classification. Data Mining and Knowledge Discovery, 36(5):1623–1646, 2022.
- Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020.
- Inexact gmres iterations and relaxation strategies with fast-multipole boundary element method. Advances in Computational Mathematics, 48(3):32, 2022a.
- A Systematic Review of Time Series Classification Techniques Used in Biomedical Applications. Sensors, 22(20):8016, January 2022b. ISSN 1424-8220. doi: 10.3390/s22208016. URL https://www.mdpi.com/1424-8220/22/20/8016. Number: 20 Publisher: Multidisciplinary Digital Publishing Institute.
- Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association, 100(470):577–590, 2005. ISSN 01621459. URL http://www.jstor.org/stable/27590579. Publisher: [American Statistical Association, Taylor & Francis, Ltd.].
- Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks. IEEE Transactions on Biomedical Engineering, 66(5):1477–1490, May 2019. ISSN 1558-2531. doi: 10.1109/TBME.2018.2874712. URL https://ieeexplore.ieee.org/document/8485748. Conference Name: IEEE Transactions on Biomedical Engineering.
- Neighborhood linear discriminant analysis. Pattern Recognition, 123:108422, 2022.