Suppose X is a
random (column) vector with non-singular covariance matrix \Sigma and mean 0. Then the transformation Y = W X with a
whitening matrix W satisfying the condition W^\mathrm{T} W = \Sigma^{-1} yields the whitened random vector Y with unit diagonal covariance. If X has non-zero mean \mu, then whitening can be performed by Y = W (X - \mu). There are infinitely many possible whitening matrices W that all satisfy the above condition. Commonly used choices are W = \Sigma^{-1/2} (Mahalanobis or ZCA whitening), W = L^T where L is the
Cholesky decomposition of \Sigma^{-1} (Cholesky whitening), or the eigen-system of \Sigma (PCA whitening). Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of X and Y. For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original X and whitened Y is produced by the whitening matrix W = P^{-1/2} V^{-1/2} where P is the correlation matrix and V the diagonal variance matrix. == Whitening a data matrix ==