Robust Principal Component Analysis using Density Power Divergence (2309.13531v1)
Abstract: Principal component analysis (PCA) is a widely employed statistical tool used primarily for dimensionality reduction. However, it is known to be adversely affected by the presence of outlying observations in the sample, which is quite common. Robust PCA methods using M-estimators have theoretical benefits, but their robustness drop substantially for high dimensional data. On the other end of the spectrum, robust PCA algorithms solving principal component pursuit or similar optimization problems have high breakdown, but lack theoretical richness and demand high computational power compared to the M-estimators. We introduce a novel robust PCA estimator based on the minimum density power divergence estimator. This combines the theoretical strength of the M-estimators and the minimum divergence estimators with a high breakdown guarantee regardless of data dimension. We present a computationally efficient algorithm for this estimate. Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods. We also showcase the proposed algorithm's applicability on two benchmark datasets and a credit card transactions dataset for fraud detection.
- Theodore Wilbur Anderson. Asymptotic theory for principal component analysis. The Annals of Mathematical Statistics, 34(1):122–148, 1963.
- Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika, 85(3):549–559, 1998. ISSN 00063444.
- Projection pursuit in high dimensions. Proceedings of the National Academy of Sciences, 115(37):9151–9156, 2018. doi: 10.1073/pnas.1801177115.
- On the Applications of Robust PCA in Image and Video Processing. Proceedings of the IEEE, 106(8):1427–1457, 2018. doi: 10.1109/JPROC.2018.2853589.
- Accelerated Alternating Projections for Robust Principal Component Analysis. Journal of Machine Learning Research, 20(20):1–33, 2019.
- N. A. Campbell. Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation. Journal of the Royal Statistical Society. Series C (Applied Statistics), 29(3):231–237, 1980. ISSN 00359254, 14679876.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. International journal of data science and analytics (Print), 5(4):285–300, 2018.
- Combining unsupervised and supervised learning in credit card fraud detection. Information sciences, 2019.
- Fast estimation of the median covariation matrix with application to online robust principal components analysis. Test, 26(3):461–480, 2017.
- C. Croux and A. Ruiz-Gazen. A Fast Algorithm for Robust Principal Components Based on Projection Pursuit. In Albert Prat, editor, COMPSTAT, pages 211–216, Heidelberg, 1996. Physica-Verlag HD. ISBN 978-3-642-46992-3.
- Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika, 87(3):603–618, 09 2000. ISSN 0006-3444. doi: 10.1093/biomet/87.3.603.
- High breakdown estimators for principal components: the projection-pursuit approach revisited. Journal of multivariate analysis, 95(1):206–226, 2005.
- Algorithms for projection–pursuit robust principal component analysis. Chemometrics and Intelligent Laboratory Systems, 87(2):218–225, 2007.
- Robust Estimation of Dispersion Matrices and Principal Components. Journal of the American Statistical Association, 76(374):354–362, 1981. ISSN 01621459.
- Multivariate Data Analysis: In Practice : an Introduction to Multivariate Data Analysis and Experimental Design. CAMO, 2002. ISBN 9788299333030.
- New highly efficient high-breakdown estimator of multivariate scatter and location for elliptical distributions. Canadian Journal of Statistics, n/a(n/a), 2023. doi: 10.1002/cjs.11770.
- Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. Electronic Journal of statistics, 7:2420–2456, 2013.
- MA Girshick. On the sampling theory of roots of determinantal equations. The Annals of Mathematical Statistics, 10(3):203–224, 1939.
- Robust Statistics: The Approach Based on Influence Functions. Wiley Series in Probability and Statistics. Wiley, 2011. ISBN 9781118150689.
- Frank R. Hampel. A General Qualitative Definition of Robustness. The Annals of Mathematical Statistics, 42(6):1887–1896, 1971. ISSN 00034851.
- Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1568–1575. IEEE, 2012. doi: 10.1109/CVPR.2012.6247848.
- Peter J. Huber. Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1):73 – 101, 1964. doi: 10.1214/aoms/1177703732.
- Peter J Huber. Robust statistics, volume 523. John Wiley & Sons, 2004.
- ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics, 47(1):64–79, 2005. ISSN 00401706.
- Bo Jiang and Yu-Hong Dai. A framework of constraint preserving update schemes for optimization on Stiefel manifold. Mathematical Programming, 153(2):535–575, 2015.
- Ian. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics. Springer New York, NY, 2002. ISBN 9780387224404. doi: 10.1007/b98835.
- New Spectral Statistics for Ensembles of 2x2 Real Symmetric Random Matrices. Acta Polytechnica, 57(6):418, Dec 2017. doi: 10.14311/ap.2017.57.0418.
- Machine Learning for Credit Card Fraud Detection-Practical Handbook. ACM SIGKDD explorations newsletter, 6(1):1–6, 2004.
- Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo. Journal of the American Statistical Association, 80(391):759–766, 1985. ISSN 01621459.
- Efficient Riemannian optimization on the Stiefel manifold via the Cayley transform. arXiv preprint arXiv:2002.01113, 2020.
- The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. arXiv preprint arXiv:1009.5055, 2010.
- Robust principal component analysis for functional data. Test, 8(1):1–73, 1999.
- Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. The Annals of Applied Statistics, 7(1):523 – 542, 2013. doi: 10.1214/12-AOAS597.
- Robust Statistics: Theory and Methods (with R). Wiley Series in Probability and Statistics. Wiley, 2019. ISBN 9781119214687.
- Ricardo Antonio Maronna. Robust M𝑀Mitalic_M-Estimators of Multivariate Location and Scatter. The Annals of Statistics, 4(1):51 – 67, 1976. doi: 10.1214/aos/1176343347.
- The Matrix Cookbook. Technical University of Denmark, 7(15):510, 2008.
- Peter J Rousseeuw. Multivariate Estimation with High Breakdown Point. Mathematical Statistics and Applications, 8(37):283–297, 1985.
- A New Robust Scalable Singular Value Decomposition Algorithm for Video Surveillance Background Modelling. arXiv preprint arXiv:2109.10680, 2021.
- Asymptotic Breakdown Point Analysis for a General Class of Minimum Divergence Estimators. arXiv preprint arXiv:2304.07466, 2023.
- P. Sanguansat. Principal Component Analysis: Engineering Applications. IntechOpen, 2012. ISBN 9789535101826.
- Multi-way Analysis: Applications in the Chemical Sciences. Wiley InterScience online books. Wiley, 2005. ISBN 9780470012109.
- Terence Tao. Topics in Random Matrix Theory, volume 132. American Mathematical Society, 2012.
- An object-oriented framework for robust multivariate analysis. Journal of Statistical Software, 32:1–47, 2010.
- David E Tyler. Asymptotic inference for eigenvectors. The Annals of Statistics, 9(4):725–736, 1981.
- The multivariate l1-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426, 2000.
- Graph-based clustering and data visualization algorithms. Springer, 2013.
- A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1):397–434, 2013.
- Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In NIPS, volume 58, pages 289–298, 2009.
- Robust PCA via Outlier Pursuit. IEEE Transactions on Information Theory, 58(5):3047–3064, 2012. doi: 10.1109/TIT.2011.2173156.
- Stable Principal Component Pursuit. In 2010 IEEE International Symposium on Information Theory, pages 1518–1522, 2010. doi: 10.1109/ISIT.2010.5513535.