Convexity A general
linear combination of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, a
convex combination of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.
Moments Let , ..., denote random variables from the component distributions, and let denote a random variable from the mixture distribution. Then, for any function for which \operatorname{E}[H(X_i)] exists, and assuming that the component densities exist, \begin{align} \operatorname{E}[H(X)] & = \int_{-\infty}^\infty H(x) \sum_{i = 1}^n w_i p_i(x) \, dx \\ & = \sum_{i = 1}^n w_i \int_{-\infty}^\infty p_i(x) H(x) \, dx = \sum_{i = 1}^n w_i \operatorname{E}[H(X_i)]. \end{align} The th moment about zero (i.e. choosing ) is simply a weighted average of the -th moments of the components. Moments about the mean involve a binomial expansion: \begin{align} \operatorname{E}\left[{\left(X - \mu\right)}^j\right] & = \sum_{i=1}^n w_i \operatorname{E}\left[{\left(X_i - \mu_i + \mu_i - \mu\right)}^j\right] \\ & = \sum_{i=1}^n w_i \sum_{k=0}^j \binom{j}{k} {\left(\mu_i - \mu\right)}^{j-k} \operatorname{E}\left[{\left(X_i - \mu_i\right)}^k\right], \end{align} where denotes the mean of the -th component. In the case of a mixture of one-dimensional distributions with weights , means and variances , the total mean and variance will be: \operatorname{E}[X] = \mu = \sum_{i = 1}^n w_i \mu_i , \begin{align} \operatorname{E}\left[(X - \mu)^2\right] & = \sigma^2 \\ & = \operatorname{E}[X^2] - \mu^2 & (\text{standard variance reformulation})\\ & = \left(\sum_{i=1}^n w_i \operatorname{E}\left[X_i^2\right]\right) - \mu^{2} \\ & = \sum_{i=1}^n w_i(\sigma_i^2 + \mu_i^2)- \mu^2 & ( \sigma_i^2 = \operatorname{E}[X_i^2] - \mu_i^2 \implies \operatorname{E}[X_i^2] = \sigma_i^2 + \mu_i^2) \end{align} These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such as
skewness and
kurtosis (
fat tails) and multi-modality, even in the absence of such features within the components themselves. Marron and Wand (1992) give an illustrative account of the flexibility of this framework.
Modes The question of
multimodality is simple for some cases, such as mixtures of
exponential distributions: all such mixtures are
unimodal. However, for the case of mixtures of
normal distributions, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray & Lindsay extending earlier work on univariate and multivariate distributions. Here the problem of evaluation of the modes of an component mixture in a dimensional space is reduced to identification of critical points (local minima, maxima and
saddle points) on a
manifold referred to as the
ridgeline surface, which is the image of the ridgeline function x^{*}(\alpha) = \left[ \sum_{i=1}^{n} \alpha_i \Sigma_i^{-1} \right]^{-1} \times \left[ \sum_{i=1}^{n} \alpha_i \Sigma_i^{-1} \mu_i \right], where \alpha belongs to the (n-1)-dimensional standard
simplex: \mathcal{S}_n = \left\{ \alpha \in \mathbb{R}^n: \alpha_i \in [0,1], \sum_{i=1}^n \alpha_i = 1 \right\} and \Sigma_i \in \Reals^{D\times D},\, \mu_i \in \Reals^D correspond to the covariance and mean of the -th component. Ray & Lindsay consider the case in which n-1 showing a one-to-one correspondence of modes of the mixture and those on the
ridge elevation function h(\alpha) = q(x^*(\alpha)) thus one may identify the modes by solving \frac{d h(\alpha)}{d \alpha} = 0 with respect to \alpha and determining the value x^*(\alpha). Using graphical tools, the potential multi-modality of mixtures with number of components n \in \{2,3\} is demonstrated; in particular it is shown that the number of modes may exceed n and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to the first mixing weight w_1 (which also determines the second mixing weight through w_2 = 1-w_1) and expressing the solutions as a function \Pi(\alpha), \,\alpha \in [0,1] so that the number and location of modes for a given value of w_1 corresponds to the number of intersections of the graph on the line \Pi(\alpha) = w_1. This in turn can be related to the number of oscillations of the graph and therefore to solutions of \frac{d \Pi(\alpha)}{d \alpha} = 0 leading to an explicit solution for the case of a two component mixture with \Sigma_1 = \Sigma_2 = \Sigma (sometimes called a
homoscedastic mixture) given by 1 - \alpha(1-\alpha) d_M(\mu_1, \mu_2, \Sigma)^2 where d_M(\mu_1,\mu_2,\Sigma) = \sqrt{(\mu_2-\mu_1)^\mathsf{T} \Sigma^{-1} (\mu_2-\mu_1)} is the
Mahalanobis distance between \mu_1 and \mu_2. Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights. For normal mixtures with general n>2 and D>1, a lower bound for the maximum number of possible modes, andconditionally on the assumption that the maximum number is finitean upper bound are known. For those combinations of n and D for which the maximum number is known, it matches the lower bound.{{citation | doi = 10.1093/imaiai/iaz013 ==Examples==