The joint density of \boldsymbol{y} and \boldsymbol{u} can be written as: f(\boldsymbol{y},\boldsymbol{u}) = f(\boldsymbol{y} | \boldsymbol{u}) \, f(\boldsymbol{u}). Assuming normality, \boldsymbol{u} \sim \mathcal{N}(\boldsymbol{0},G), \boldsymbol{\epsilon} \sim \mathcal{N}(\boldsymbol{0},R) and \mathrm{Cov}(\boldsymbol{u},\boldsymbol{\epsilon})=\boldsymbol{0}, and maximizing the joint density over \boldsymbol{\beta} and \boldsymbol{u}, gives Henderson's "mixed model equations" (MME) for linear mixed models: : \begin{pmatrix} X'R^{-1}X & X'R^{-1}Z \\ Z'R^{-1}X & Z'R^{-1}Z + G^{-1} \end{pmatrix} \begin{pmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{u}} \end{pmatrix} = \begin{pmatrix} X'R^{-1}\boldsymbol{y} \\ Z'R^{-1}\boldsymbol{y} \end{pmatrix} where for example is the
matrix transpose of and is the
matrix inverse of . A more compact formulation of the above is : ([XZ]'R^{-1}[XZ]+G^{-1})[\hat{\boldsymbol{\beta}} \hat{\boldsymbol{u}}]'=[XZ]'R^{-1}\boldsymbol{y} or with [XZ]=W, : (W'R^{-1}W+G^{-1})[\hat{\boldsymbol{\beta}} \hat{\boldsymbol{u}}]'=W'R^{-1}\boldsymbol{y} where becomes : \begin{pmatrix} 0 & 0 \\ 0 & G^{-1} \end{pmatrix} The solutions to the MME, \textstyle\hat{\boldsymbol{\beta}} and \textstyle\hat{\boldsymbol{u}} are best linear unbiased estimates and predictors for \boldsymbol{\beta} and \boldsymbol{u}, respectively. This is a consequence of the
Gauss–Markov theorem when the
conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is best linear unbiased estimates. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.
Choice of random effects structure One choice that analysts face with mixed models is which random effects (i.e., grouping variables, random intercepts, and random slopes) to include. One prominent recommendation in the context of confirmatory hypothesis testing is to adopt a "maximal" random effects structure, including all possible random effects justified by the experimental design, as a means to control Type I error rates.
Software One method used to fit such mixed models is that of the
expectation–maximization algorithm (EM) where the variance components are treated as unobserved
nuisance parameters in the joint likelihood. Currently, this is the method implemented in statistical software such as
Python (statsmodels package) and as initial step only in
R's nlme package lme(). The solution to the mixed model equations is a
maximum likelihood estimate when the distribution of the errors is normal. There are several other methods to fit mixed models, including using a mixed effect model (MEM) initially, and then Newton-Raphson (used by
R package nlme's lme(), SAS MIXED, and SPSS MIXED), penalized least squares to get a profiled log likelihood only depending on the (low-dimensional) variance-covariance parameters of \boldsymbol{u}, i.e., its cov matrix \boldsymbol{G}, and then modern direct optimization for that reduced objective function (used by
R's lme4 package lmer() and the
Julia package MixedModels.jl) and direct optimization of the likelihood (used by e.g.
R's glmmTMB). Notably, while the canonical form proposed by Henderson is useful for theory, many popular software packages use a different formulation for numerical computation in order to take advantage of sparse matrix methods (e.g. lme4 and MixedModels.jl). In the context of Bayesian methods, the brms package provides a user-friendly interface for fitting mixed models in R using Stan, allowing for the incorporation of prior distributions and the estimation of posterior distributions. In python, Bambi provides a similarly streamlined approach for fitting mixed effects models using PyMC. ==See also==