Relation to the autocorrelation matrix The auto-covariance matrix \operatorname{K}_{\mathbf{X}\mathbf{X}} is related to the
autocorrelation matrix \operatorname{R}_{\mathbf{X}\mathbf{X}} by \operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[(\mathbf{X} - \operatorname{E}[\mathbf{X}])(\mathbf{X} - \operatorname{E}[\mathbf{X}])^\mathsf{T}] = \operatorname{R}_{\mathbf{X}\mathbf{X}} - \operatorname{E}[\mathbf{X}] \operatorname{E}[\mathbf{X}]^\mathsf{T} where the autocorrelation matrix is defined as \operatorname{R}_{\mathbf{X}\mathbf{X}} = \operatorname{E}[\mathbf{X} \mathbf{X}^\mathsf{T}].
Relation to the correlation matrix An entity closely related to the covariance matrix is the matrix of
Pearson product-moment correlation coefficients between each of the random variables in the random vector \mathbf{X}, which can be written as \operatorname{corr}(\mathbf{X}) = \big(\operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}})\big)^{-\frac{1}{2}} \, \operatorname{K}_{\mathbf{X}\mathbf{X}} \, \big(\operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}})\big)^{-\frac{1}{2}}, where \operatorname{diag}(\operatorname{K}_{\mathbf{X}\mathbf{X}}) is the matrix of the diagonal elements of \operatorname{K}_{\mathbf{X}\mathbf{X}} (i.e., a
diagonal matrix of the variances of X_i for i = 1, \dots, n). Equivalently, the correlation matrix can be seen as the covariance matrix of the
standardized random variables X_i/\sigma(X_i) for i = 1, \dots, n. \operatorname{corr}(\mathbf{X}) = \begin{bmatrix} 1 & \frac{\operatorname{E}[(X_1 - \mu_1)(X_2 - \mu_2)]}{\sigma(X_1)\sigma(X_2)} & \cdots & \frac{\operatorname{E}[(X_1 - \mu_1)(X_n - \mu_n)]}{\sigma(X_1)\sigma(X_n)} \\ \\ \frac{\operatorname{E}[(X_2 - \mu_2)(X_1 - \mu_1)]}{\sigma(X_2)\sigma(X_1)} & 1 & \cdots & \frac{\operatorname{E}[(X_2 - \mu_2)(X_n - \mu_n)]}{\sigma(X_2)\sigma(X_n)} \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \frac{\operatorname{E}[(X_n - \mu_n)(X_1 - \mu_1)]}{\sigma(X_n)\sigma(X_1)} & \frac{\operatorname{E}[(X_n - \mu_n)(X_2 - \mu_2)]}{\sigma(X_n)\sigma(X_2)} & \cdots & 1 \end{bmatrix}. Each element on the principal diagonal of a correlation matrix is the correlation of a random variable with itself, which always equals 1. Each
off-diagonal element is between −1 and +1 inclusive.
Inverse of the covariance matrix The
inverse of this matrix, \operatorname{K}_{\mathbf{X}\mathbf{X}}^{-1}, if it exists, is the inverse covariance matrix, also known as the
precision matrix (or
concentration matrix). Just as the covariance matrix can be written as the rescaling of a correlation matrix by the marginal variances: \operatorname{cov}(\mathbf{X}) = \begin{bmatrix} \sigma_{x_1} & & & 0\\ & \sigma_{x_2} & & \\ & & \ddots & \\ 0 & & & \sigma_{x_n} \end{bmatrix} \times \begin{bmatrix} 1 & \rho_{x_1, x_2} & \cdots & \rho_{x_1, x_n}\\ \rho_{x_2, x_1} & 1 & \cdots & \rho_{x_2, x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \rho_{x_n, x_1} & \rho_{x_n, x_2} & \cdots & 1 \end{bmatrix} \times \begin{bmatrix} \sigma_{x_1} & & & 0\\ & \sigma_{x_2} & & \\ & & \ddots & \\ 0 & & & \sigma_{x_n} \end{bmatrix} So, using the idea of
partial correlation, and partial variance, the inverse covariance matrix can be expressed analogously: \operatorname{cov}(\mathbf{X})^{-1} = \begin{bmatrix} \frac{1}{\sigma_{x_1\mid x_2\dots}} & & & 0\\ & \frac{1}{\sigma_{x_2\mid x_1,x_3\dots}}\\ & & \ddots\\ 0 & & & \frac{1}{\sigma_{x_n\mid x_1\dots x_{n-1}}} \end{bmatrix} \times \begin{bmatrix} 1 & -\rho_{x_1, x_2\mid x_3\dots} & \cdots & -\rho_{x_1, x_n\mid x_2\dots x_{n-1}}\\ -\rho_{x_2, x_1\mid x_3\dots} & 1 & \cdots & -\rho_{x_2, x_n\mid x_1,x_3\dots x_{n-1}}\\ \vdots & \vdots & \ddots & \vdots\\ -\rho_{x_n, x_1\mid x_2\dots x_{n-1}} & -\rho_{x_n, x_2\mid x_1,x_3\dots x_{n-1}} & \cdots & 1 \end{bmatrix} \times \begin{bmatrix} \frac{1}{\sigma_{x_1\mid x_2\dots}} & & & 0\\ & \frac{1}{\sigma_{x_2\mid x_1,x_3\dots}}\\ & & \ddots\\ 0 & & & \frac{1}{\sigma_{x_n\mid x_1\dots x_{n-1}}} \end{bmatrix} This duality motivates a number of other dualities between marginalizing and conditioning for Gaussian random variables.
Basic properties For \operatorname{K}_{\mathbf{X}\mathbf{X}}=\operatorname{var}(\mathbf{X}) = \operatorname{E} \left[ \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right) \left( \mathbf{X} - \operatorname{E}[\mathbf{X}] \right)^\mathsf{T} \right] and \boldsymbol{\mu}_\mathbf{X} = \operatorname{E}[\textbf{X}], where \mathbf{X} = (X_1,\ldots,X_n)^\mathsf{T} is an n-dimensional random variable, the following basic properties apply: • \operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{E}(\mathbf{X X^\mathsf{T}}) - \boldsymbol{\mu}_\mathbf{X}\boldsymbol{\mu}_\mathbf{X}^\mathsf{T} • \operatorname{K}_{\mathbf{X}\mathbf{X}} \, is
positive-semidefinite, i.e. \mathbf{a}^T \operatorname{K}_{\mathbf{X}\mathbf{X}} \mathbf{a} \ge 0 \quad \text{for all } \mathbf{a} \in \mathbb{R}^n {{Hidden| title =
Proof | content = Indeed, from the property 4 it follows that under linear transformation of random variable \mathbf{X} with covariation matrix \mathbf{\Sigma_{X}} = \mathrm{cov}(\mathbf{X}) by linear operator \mathbf{A} s.a. \mathbf{Y} = \mathbf{A}\mathbf{X}, the covariation matrix is transformed as : \mathbf{\Sigma_{Y}} = \mathrm{cov}\left(\mathbf{Y}\right) = \mathbf{A\, \Sigma_{X}\,A}^{\top}. As according to the property 3 matrix \mathbf{\Sigma_{X}} is symmetric, it can be diagonalized by a linear orthogonal transformation, i.e. there exists such orthogonal matrix \mathbf{A} (meanwhile \mathbf{A}^{\top} = \mathbf{A}^{-1}), that : \mathbf{A\, \Sigma_{X}\,A}^{\top} = \mathbf{A\, \Sigma_{X}\,A}^{-1} = \mbox{diag}(\sigma_1,\ldots,\sigma_n), and \sigma_1,\ldots,\sigma_n are eigenvalues of \mathbf{\Sigma_{X}}. But this means that this matrix is a covariation matrix for a random variable \mathbf{Y} = \mathbf{A}\mathbf{X}, and the main diagonal of \mathbf{\Sigma_{Y}} = \mathrm{cov}\left(\mathbf{Y}\right) consists of variances of elements of \mathbf{Y} vector. As variance is always non-negative, we conclude that \sigma_i \geq 0 for any i. But this means that matrix \mathbf{\Sigma_{X}} is positive-semidefinite. }} • \operatorname{K}_{\mathbf{X}\mathbf{X}} \, is
symmetric, i.e. \operatorname{K}_{\mathbf{X}\mathbf{X}}^\mathsf{T} = \operatorname{K}_{\mathbf{X}\mathbf{X}} • For any constant (i.e. non-random) m \times n matrix \mathbf{A} and constant m \times 1 vector \mathbf{a}, one has \operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{var}(\mathbf{X})\, \mathbf{A}^\mathsf{T} • If \mathbf{Y} is another random vector with the same dimension as \mathbf{X}, then \operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y}) where \operatorname{cov}(\mathbf{X}, \mathbf{Y}) is the
cross-covariance matrix of \mathbf{X} and \mathbf{Y}.
Block matrices The joint mean \boldsymbol\mu and
joint covariance matrix \boldsymbol\Sigma of \mathbf{X} and \mathbf{Y} can be written in block form \boldsymbol\mu = \begin{bmatrix} \boldsymbol{\mu}_X \\ \boldsymbol{\mu}_Y \end{bmatrix}, \qquad \boldsymbol\Sigma = \begin{bmatrix} \operatorname{K}_\mathbf{XX} & \operatorname{K}_\mathbf{XY} \\ \operatorname{K}_\mathbf{YX} & \operatorname{K}_\mathbf{YY} \end{bmatrix} where \operatorname{K}_\mathbf{XX} = \operatorname{var}(\mathbf{X}) , \operatorname{K}_\mathbf{YY} = \operatorname{var}(\mathbf{Y}) and \operatorname{K}_\mathbf{XY} = \operatorname{K}^\mathsf{T}_\mathbf{YX} = \operatorname{cov}(\mathbf{X}, \mathbf{Y}) . \operatorname{K}_\mathbf{XX} and \operatorname{K}_\mathbf{YY} can be identified as the variance matrices of the
marginal distributions for \mathbf{X} and \mathbf{Y} respectively. If \mathbf{X} and \mathbf{Y} are
jointly normally distributed, \mathbf{X}, \mathbf{Y} \sim\ \mathcal{N}(\boldsymbol\mu, \operatorname{\boldsymbol\Sigma}), then the
conditional distribution for \mathbf{Y} given \mathbf{X} is given by \mathbf{Y} \mid \mathbf{X} \sim\ \mathcal{N}(\boldsymbol{\mu}_\mathbf{Y|X}, \operatorname{K}_\mathbf{Y|X}), defined by
conditional mean \boldsymbol{\mu}_{\mathbf{Y}|\mathbf{X}} = \boldsymbol{\mu}_\mathbf{Y} + \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1} \left( \mathbf{X} - \boldsymbol{\mu}_\mathbf{X} \right) and
conditional variance \operatorname{K}_\mathbf{Y|X} = \operatorname{K}_\mathbf{YY} - \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1} \operatorname{K}_\mathbf{XY}. The matrix \operatorname{K}_\mathbf{YX} \operatorname{K}_\mathbf{XX}^{-1} is known as the matrix of
regression coefficients, while in linear algebra \operatorname{K}_\mathbf{Y|X} is the
Schur complement of \operatorname{K}_\mathbf{XX} in \boldsymbol\Sigma . The matrix of regression coefficients may often be given in transpose form, \operatorname{K}_\mathbf{XX}^{-1} \operatorname{K}_\mathbf{XY} , suitable for post-multiplying a row vector of explanatory variables \mathbf{X}^\mathsf{T} rather than pre-multiplying a column vector \mathbf{X} . In this form they correspond to the coefficients obtained by inverting the matrix of the
normal equations of
ordinary least squares (OLS). == Partial covariance matrix ==