The goal of the forward algorithm is to compute the
joint probability p(x_t,y_{1:t}), where for notational convenience we have abbreviated x(t) as x_t and (y(1), y(2), ..., y(t)) as y_{1:t}. Once the joint probability p(x_t,y_{1:t}) is computed, the other probabilities p(x_t|y_{1:t}) and p(y_{1:t}) are easily obtained. Both the state x_t and observation y_t are assumed to be discrete, finite random variables. The hidden Markov model's state transition probabilities p(x_t|x_{t-1}), observation/emission probabilities p(y_t|x_t), and initial prior probability p(x_0) are assumed to be known. Furthermore, the sequence of observations y_{1:t} are assumed to be given. Computing p(x_t,y_{1:t}) naively would require
marginalizing over all possible state sequences \{x_{1:t-1}\}, the number of which grows exponentially with t. Instead, the forward algorithm takes advantage of the
conditional independence rules of the
hidden Markov model (HMM) to perform the calculation recursively. To demonstrate the recursion, let ::\alpha(x_t) = p(x_t,y_{1:t}) = \sum_{x_{t-1}}p(x_t,x_{t-1},y_{1:t}). Using the
chain rule to expand p(x_t,x_{t-1},y_{1:t}), we can then write ::\alpha(x_t) = \sum_{x_{t-1}}p(y_t|x_t,x_{t-1},y_{1:t-1})p(x_t|x_{t-1},y_{1:t-1})p(x_{t-1},y_{1:t-1}). Because y_t is conditionally independent of everything but x_t, and x_t is conditionally independent of everything but x_{t-1}, this simplifies to ::\alpha(x_t) = p(y_t|x_t)\sum_{x_{t-1}}p(x_t|x_{t-1})\alpha(x_{t-1}). Thus, since p(y_t|x_t) and p(x_t|x_{t-1}) are given by the model's
emission distributions and
transition probabilities, which are assumed to be known, one can quickly calculate \alpha(x_t) from \alpha(x_{t-1}) and avoid incurring exponential computation time. The recursion formula given above can be written in a more compact form. Let a_{ij}=p(x_t=i|x_{t-1}=j) be the transition probabilities and b_{ij}=p(y_t=i|x_t=j) be the emission probabilities, then ::\mathbf{\alpha}_t = \mathbf{b}_t^T \odot \mathbf{A} \mathbf{\alpha}_{t-1} where \mathbf{A} = [a_{ij}] is the transition probability matrix, \mathbf{b}_t is the i-th row of the emission probability matrix \mathbf{B} = [b_{ij}] which corresponds to the actual observation y_t = i at time t, and \mathbf{\alpha}_t = [\alpha(x_t=1),\ldots, \alpha(x_t=n)]^T is the alpha vector. The \odot is the
hadamard product between the transpose of \mathbf{b}_t and \mathbf{A} \mathbf{\alpha}_{t-1}. The
initial condition is set in accordance to the
prior probability over x_0 as ::\alpha(x_0) = p(y_0|x_0)p(x_0). Once the joint probability \alpha(x_t) = p(x_t,y_{1:t}) has been computed using the forward algorithm, we can easily obtain the related joint probability p(y_{1:t}) as ::p(y_{1:t}) = \sum_{x_t} p(x_t, y_{1:t}) = \sum_{x_t} \alpha(x_t) and the required conditional probability p(x_t|y_{1:t}) as ::p(x_t|y_{1:t}) = \frac{p(x_t,y_{1:t})}{p(y_{1:t})} = \frac{\alpha(x_t)}{\sum_{x_t} \alpha(x_t)}. Once the
conditional probability has been calculated, we can also find the point estimate of x_t. For instance, the MAP estimate of x_t is given by ::\widehat{x}_t^{MAP} = \arg \max_{x_t} \; p(x_t|y_{1:t}) = \arg \max_{x_t} \; \alpha(x_t), while the MMSE estimate of x_t is given by ::\widehat{x}_t^{MMSE} = \mathbb{E}[x_t|y_{1:t}] = \sum_{x_t} x_t p(x_t|y_{1:t}) = \frac{\sum_{x_t} x_t \alpha(x_t)}{\sum_{x_t} \alpha(x_t)}. The forward algorithm is easily modified to account for observations from variants of the hidden Markov model as well, such as the
Markov jump linear system. ==Pseudocode==