2000 character limit reached
Spectrally-Corrected and Regularized Linear Discriminant Analysis for Spiked Covariance Model (2210.03859v3)
Published 8 Oct 2022 in stat.ML and cs.LG
Abstract: This paper proposes an improved linear discriminant analysis called spectrally-corrected and regularized LDA (SRLDA). This method integrates the design ideas of the sample spectrally-corrected covariance matrix and the regularized discriminant analysis. With the support of a large-dimensional random matrix analysis framework, it is proved that SRLDA has a linear classification global optimal solution under the spiked model assumption. According to simulation data analysis, the SRLDA classifier performs better than RLDA and ILDA and is closer to the theoretical classifier. Experiments on different data sets show that the SRLDA algorithm performs better in classification and dimensionality reduction than currently used tools.
- Isotropic local laws for sample covariance and generalized wigner matrices. Electronic Journal of Probability, 19, 2014.
- Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221, 2002.
- Estimation of spiked eigenvalues in spiked models. Random Matrices: Theory and Applications, 1(02):1150011, 2012.
- Spectral theory of large dimensional random matrices and its applications to wireless communications and finance statistics: random matrix theory and its applications. World Scientific, 2014.
- Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis, 97(6):1382–1408, 2006.
- An efficient method to estimate the optimum regularization parameter in rlda. Bioinformatics, 32(22):3461–3468, 2016.
- Statistical inference for principal components of spiked covariance matrices. The Annals of Statistics, 50(2):1144–1169, 2022.
- Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):199–227, 2008.
- Minimax estimation of large covariance matrices under l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm. Statistica Sinica, pages 1319–1349, 2012.
- Scale adjustments for classifiers in high-dimensional, low sample size settings. Biometrika, 96(2):469–478, 2009.
- Random matrix methods for wireless communications. Cambridge University Press, 2011.
- Doug J. Davidson. Functional mixed-effect models for electrophysiological responses. Neurophysiology, 41(1):71–79, 2009.
- D. L. Donoho. Aide-memoire. high-dimensional data analysis: The curses and blessings of dimensionality. 2000.
- Noureddine El Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36(6):2757–2790, 2008.
- Asymptotic performance of regularized quadratic discriminant analysis based classifiers. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2017.
- High dimensional classification using features annealed independence rules. Annals of statistics, 36(6):2605, 2008.
- Estimating number of factors by adjusted eigenvalues thresholding. Journal of the American Statistical Association, pages 1–33, 2020.
- l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-penalized linear mixed-effects models for high dimensional data with application to bci. NeuroImage, 56(4):2100–2108, 2011.
- Ronald A Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
- The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.
- Jerome H Friedman. Regularized discriminant analysis. Journal of the American statistical association, 84(405):165–175, 1989.
- Large-dimensional random matrix theory and its applications in deep learning and wireless communications. 2021.
- Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations. Bmc Bioinformatics, 5, 2004.
- Penalized discriminant analysis. The Annals of Statistics, 23(1):73–102, 1995.
- Limiting form of the sample covariance eigenspectrum in pca and kernel pca. Advances in Neural Information Processing Systems, 16:1181–1188, 2003.
- Bias-corrected diagonal discriminant rules for high-dimensional classification. Biometrics, 66(4):1096–1106, 2010.
- Generalized four moment theorem and an application to clt for spiked eigenvalues of high-dimensional covariance matrices. Bernoulli, 27(1):274–294, 2021.
- Applied multivariate statistical analysis. 2002.
- Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics, pages 295–327, 2001.
- Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis. Journal of the American Statistical Association, pages 1–19, 2021.
- Wishart distributions for decomposable covariance graph models. The Annals of Statistics, 39(1):514–555, 2011.
- Seungchan Kim et al. Identification of combination gene sets for glioma classification. Molecular Cancer Therapeutics, 1(13):1229–1236, 2002.
- Determining the number of components in a factor model from limited noisy data. Chemometrics and Intelligent Laboratory Systems, 94(1):19–32, 2008.
- Random matrix theory and financial correlations. International Journal of Theoretical and Applied Finance, 3(03):391–397, 2000.
- A well-conditioned estimator for large-dimensional covariance matrices. Journal of multivariate analysis, 88(2):365–411, 2004.
- Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics, 40(2):1024–1060, 2012.
- Spectrally-corrected estimation for high-dimensional markowitz mean-variance optimization. Econometrics and Statistics, (5), 2021.
- Y. Malevergne and D. Sornette. Collective origin of the coexistance of apparent rmt noise and factors in large sample correlation matrices. arXiv preprint cond-mat/0210115.
- Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(1):507–536, 1967.
- Sample eigenvalue based detection of high-dimensional signals in white noise using relatively few samples. IEEE Transactions on Signal Processing, 56(7):2625–2638, 2008.
- Boaz Nadler. Nonparametric detection of signals by information theoretic criteria: Performance analysis and an improved estimator. IEEE Transactions on Signal Processing, 58(5):2746–2756, 2010.
- Alexei Onatski. Testing hypotheses about the number of factors in large factor models. Econometrica, 77(5):1447–1479, 2009.
- Haesun Park and Moongu Jeonj. Ben Rosen. Lower dimensional representation of text data based on centroids and least squares. BIT Numerical Mathematics, 2003.
- On estimation of the noise variance in high dimensional probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):51–67, 2017.
- Debashis Paul. Asymptotics of sample eigenstruture for a large dimensional spiked covariance model. Statistica Sinica, 17(4):1617–1642, 2007.
- Vasiliki Plerou et al. Random matrix approach to cross correlations in financial data. Physical Review E, 65(6):066126, 2002.
- Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–436, 2002.
- Random Matrix Methods for Machine Learning. Cambridge University Press, 2022.
- Flexible covariance estimation in graphical gaussian models. The Annals of Statistics, 36(6):2818–2849, 2008.
- On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (3):242–252, 1980.
- High-dimensional covariance estimation by minimizing l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-penalized log-determinant divergence. Electronic Journal of Statistics, 5:935–980, 2011.
- Estimation of high-dimensional low-rank matrices. The Annals of Statistics, 39(2):887–930, 2011.
- Instabilities in complex mixtures with a large number of components. Physical review letters, 91(24):245701, 2003.
- High-dimensional linear discriminant analysis classifier for spiked covariance model. Journal of Machine Learning Research, pages 1–24, 2020.
- Using discriminant eigenfeatures for image retrieval. IEEE Transactions on pattern analysis and machine intelligence, 18(8):831–836, 1996.
- Emre Telatar. Capacity of multi-antenna gaussian channels. European transactions on telecommunications, 10(6):585–595, 1999.
- Data-driven design of rasta-like filters. In Eurospeech, volume 1, pages 1607–1610, 1997.
- Kush R Varshney. Generalization error of linear discriminant analysis in spatially-correlated sensor networks. IEEE transactions on signal processing, 60(6):3295–3301, 2012.
- On the dimension effect of regularized linear discriminant analysis. Electronic Journal of Statistics, 12(2):2709–2742, 2018.
- E. M. Wright and R. Bellman. Adaptive control processes: A guided tour. The Mathematical Gazette, 46(356):160, 1962.
- On detection of the number of signals in presence of white noise. Journal of multivariate analysis, 20(1):1–25, 1986.
- Generalized consistent error estimator of linear discriminant analysis. IEEE transactions on signal processing, 63(11):2804–2814, 2015.