Covariance Supervised Principal Component Analysis (2506.19247v1)
Abstract: Principal component analysis (PCA) is a widely used unsupervised dimensionality reduction technique in machine learning, applied across various fields such as bioinformatics, computer vision and finance. However, when the response variables are available, PCA does not guarantee that the derived principal components are informative to the response variables. Supervised PCA (SPCA) methods address this limitation by incorporating response variables into the learning process, typically through an objective function similar to PCA. Existing SPCA methods do not adequately address the challenge of deriving projections that are both interpretable and informative with respect to the response variable. The only existing approach attempting to overcome this, relies on a mathematically complicated manifold optimization scheme, sensitive to hyperparameter tuning. We propose covariance-supervised principal component analysis (CSPCA), a novel SPCA method that projects data into a lower-dimensional space by balancing (1) covariance between projections and responses and (2) explained variance, controlled via a regularization parameter. The projection matrix is derived through a closed-form solution in the form of a simple eigenvalue decomposition. To enhance computational efficiency for high-dimensional datasets, we extend CSPCA using the standard Nystr\"om method. Simulations and real-world applications demonstrate that CSPCA achieves strong performance across numerous performance metrics.