A random version of principal component analysis in data clustering

Published 27 Oct 2016 in q-bio.QM and cs.LG | (1610.08664v1)

Abstract: Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical constraints on the minimum number of different replicates or samples that must be included in the analysis. Here we show that a modified algorithm works not only on well dimensioned datasets, but also on degenerated ones.