Optimal whitening and decorrelation (1512.00809v4)

Published 2 Dec 2015 in stat.ME and stat.ML

Abstract: Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.

Citations (384)

View on Semantic Scholar

Summary

The paper examines various whitening and decorrelation procedures, providing an analytical framework and objective criteria for choosing the optimal method based on application goals.
It identifies five key methods PCA, ZCA, Cholesky, PCA-cor, and ZCA-cor discussing their mathematical bases and distinctions, including behavior under scale invariance.
Optimal choice depends on the goal PCA/PCA-cor are preferred for optimal data compression, while ZCA/ZCA-cor are best for preserving original data similarity and handling scale-invariant tasks.

Exploring Optimal Whitening and Decorrelation

The paper "Optimal Whitening and Decorrelation" by Kessy, Lewin, and Strimmer provides a comprehensive examination of the statistical preprocessing technique known as whitening and presents an analytical framework for understanding and selecting from the many possible whitening procedures. Whitening is a linear transformation crucial in various multivariate data analyses, aimed at transforming random variables to orthogonality. With the foundational theorem allowing for infinite approaches due to rotational freedom, the paper offers an in-depth discussion on determining optimal whitening methods suitable for different applications.

Theoretical Framework

The authors explore the mathematical underpinnings of whitening transformations, setting the context with the requisite linear algebraic formulations. Whitening simplifies the covariance structure by converting the data into a form where the covariance matrix is the identity matrix. The key mathematical challenge discussed is the inherent rotational freedom; multiple whitening matrices ( $W$ ) satisfy the orthogonality constraint $W^T \Sigma W = I$ , where $\Sigma$ denotes the covariance matrix.

Five Natural Whitening Procedures

The paper identifies five commonly used methods:

ZCA (Zero-Phase Component Analysis): This method offers the most symmetry by using the covariance eigenvectors, aiming to keep the transformed data closely aligned with the original data in appearance.
PCA (Principal Component Analysis): This is extensively used due to its helpfulness in dimension reduction by aligning new axes with directions of maximum variance.
Cholesky Whitening: It employs the lower triangular Cholesky decomposition, providing a unique triangular form.
ZCA-cor: Developed for maximum similarity using correlation eigenvectors and is advocated where scale invariance is requisite.
PCA-cor: Similar to PCA but designed to preserve rank correlations, offering improved compression under invariant scaling.

Optimal Whitening

Delving deeply into the structure of cross-covariance and cross-correlation matrices between whitened and original variables, the paper establishes criteria for optimal whitening. These criteria pivot on maximizing either similarity (cross-covariance) or achieving significant data compression (cross-corroboration).

ZCA is optimal for applications needing minimal deviation from the original data representation.
ZCA-cor maximizes cross-correlation, hence is suitable for maintaining original data similarity under scale-invariant operations.
PCA and PCA-cor aim for optimal data compression, with PCA-cor offering a more scale-invariant solution.

Practical Implications and Application

The authors highlight the practical application of these procedures using datasets like the iris flower data, illustrating the varied outcomes achievable with the different methods. They emphasize the usefulness of ZCA-cor and PCA-cor for tasks demanding scale invariance, offering guidelines for their deployment based on application needs. The choice of whitening technique should be influenced by whether the goal is to maintain interpretability (ZCA-cor) or to compress the data (PCA-cor).

Conclusion

This work significantly clarifies the landscape of whitening methods in data preprocessing, adding structured decision-making based on rigorous objective criteria for choosing optimal transformations. Moving forward, these insights pave the way for more tailored and application-specific usage of whitening in diverse data science tasks, from feature extraction to neural network preprocessing.

Overall, this paper not only elucidates the theoretical intricacies underlying different whitening transformations but also provides a solid practical framework for selecting appropriate whitening techniques based on specific data analysis goals.