Given
n samples of
m-dimensional data, represented as the m-by-n matrix, X=[\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n], the
sample mean is :\overline{\mathbf{x}} = \frac{1}{n}\sum_{j=1}^n \mathbf{x}_j where \mathbf{x}_j is the
j-th column of X. The
scatter matrix is the
m-by-
m positive semi-definite matrix :S = \sum_{j=1}^n (\mathbf{x}_j-\overline{\mathbf{x}})(\mathbf{x}_j-\overline{\mathbf{x}})^T = \sum_{j=1}^n (\mathbf{x}_j-\overline{\mathbf{x}})\otimes(\mathbf{x}_j-\overline{\mathbf{x}}) = \left( \sum_{j=1}^n \mathbf{x}_j \mathbf{x}_j^T \right) - n \overline{\mathbf{x}} \overline{\mathbf{x}}^T where (\cdot)^T denotes
matrix transpose, and multiplication is with regards to the
outer product. The scatter matrix may be expressed more succinctly as :S = X\,C_n\,X^T where \,C_n is the
n-by-
n centering matrix. ==Application==