There are several methods for actually computing the QR decomposition, such as the
Gram–Schmidt process,
Householder transformations, or
Givens rotations. Each has a number of advantages and disadvantages.
Using the Gram–Schmidt process Consider the
Gram–Schmidt process applied to the columns of the full column rank matrix {{nowrap|A = \begin{bmatrix}\mathbf{a}_1 & \cdots & \mathbf{a}_n\end{bmatrix},}} with
inner product \langle\mathbf{v}, \mathbf{w}\rangle = \mathbf{v}^\textsf{T} \mathbf{w} (or \langle\mathbf{v}, \mathbf{w}\rangle = \mathbf{v}^\dagger \mathbf{w} for the complex case). Define the
projection: :\operatorname{proj}_{\mathbf{u}}\mathbf{a} = \frac{\left\langle\mathbf{u}, \mathbf{a}\right\rangle}{\left\langle\mathbf{u}, \mathbf{u}\right\rangle}{\mathbf{u}} then: :\begin{align} \mathbf{u}_1 &= \mathbf{a}_1, & \mathbf{e}_1 &= \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|} \\ \mathbf{u}_2 &= \mathbf{a}_2 - \operatorname{proj}_{\mathbf{u}_1} \mathbf{a}_2, & \mathbf{e}_2 &= \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|} \\ \mathbf{u}_3 &= \mathbf{a}_3 - \operatorname{proj}_{\mathbf{u}_1} \mathbf{a}_3 - \operatorname{proj}_{\mathbf{u}_2} \mathbf{a}_3, & \mathbf{e}_3 &= \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|} \\ & \;\; \vdots & & \;\; \vdots \\ \mathbf{u}_k &= \mathbf{a}_k - \sum_{j=1}^{k-1}\operatorname{proj}_{\mathbf{u}_j} \mathbf{a}_k,& \mathbf{e}_k &= \frac{\mathbf{u}_k}{\|\mathbf{u}_k\|} \end{align} We can now express the \mathbf{a}_is over our newly computed orthonormal basis: :\begin{align} \mathbf{a}_1 &= \left\langle\mathbf{e}_1, \mathbf{a}_1\right\rangle \mathbf{e}_1 \\ \mathbf{a}_2 &= \left\langle\mathbf{e}_1, \mathbf{a}_2\right\rangle \mathbf{e}_1 + \left\langle\mathbf{e}_2, \mathbf{a}_2\right\rangle \mathbf{e}_2 \\ \mathbf{a}_3 &= \left\langle\mathbf{e}_1, \mathbf{a}_3\right\rangle \mathbf{e}_1 + \left\langle\mathbf{e}_2, \mathbf{a}_3\right\rangle \mathbf{e}_2 + \left\langle\mathbf{e}_3, \mathbf{a}_3\right\rangle \mathbf{e}_3 \\ &\;\;\vdots \\ \mathbf{a}_k &= \sum_{j=1}^k \left\langle \mathbf{e}_j, \mathbf{a}_k \right\rangle \mathbf{e}_j \end{align} where {{nowrap|\left\langle\mathbf{e}_i, \mathbf{a}_i\right\rangle = \left\|\mathbf{u}_i\right\|.}} This can be written in matrix form: :A = QR where: :Q = \begin{bmatrix}\mathbf{e}_1 & \cdots & \mathbf{e}_n\end{bmatrix} and :R = \begin{bmatrix} \langle\mathbf{e}_1, \mathbf{a}_1\rangle & \langle\mathbf{e}_1, \mathbf{a}_2\rangle & \langle\mathbf{e}_1, \mathbf{a}_3\rangle & \cdots & \langle\mathbf{e}_1, \mathbf{a}_n\rangle \\ 0 & \langle\mathbf{e}_2, \mathbf{a}_2\rangle & \langle\mathbf{e}_2, \mathbf{a}_3\rangle & \cdots & \langle\mathbf{e}_2, \mathbf{a}_n\rangle \\ 0 & 0 & \langle\mathbf{e}_3, \mathbf{a}_3\rangle & \cdots & \langle\mathbf{e}_3, \mathbf{a}_n\rangle \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \langle\mathbf{e}_n, \mathbf{a}_n\rangle \\ \end{bmatrix}.
Example Consider the decomposition of : A = \begin{bmatrix} 12 & -51 & 4 \\ 6 & 167 & -68 \\ -4 & 24 & -41 \end{bmatrix}. Recall that an orthonormal matrix Q has the property {{nowrap|Q^\textsf{T} Q = I.}} Then, we can calculate Q by means of Gram–Schmidt as follows: : \begin{align} U = \begin{bmatrix} \mathbf u_1 & \mathbf u_2 & \mathbf u_3 \end{bmatrix} &= \begin{bmatrix} 12 & -69 & -58/5 \\ 6 & 158 & 6/5 \\ -4 & 30 & -33 \end{bmatrix}; \\ Q = \begin{bmatrix} \frac{\mathbf u_1}{\|\mathbf u_1\|} & \frac{\mathbf u_2}{\|\mathbf u_2\|} & \frac{\mathbf u_3}{\|\mathbf u_3\|} \end{bmatrix} &= \begin{bmatrix} 6/7 & -69/175 & -58/175 \\ 3/7 & 158/175 & 6/175 \\ -2/7 & 6/35 & -33/35 \end{bmatrix}. \end{align} Thus, we have : \begin{align} Q^\textsf{T} A &= Q^\textsf{T}Q\,R = R; \\ R &= Q^\textsf{T}A = \begin{bmatrix} 14 & 21 & -14 \\ 0 & 175 & -70 \\ 0 & 0 & 35 \end{bmatrix}. \end{align}
Relation to RQ decomposition The RQ decomposition transforms a matrix
A into the product of an upper triangular matrix
R (also known as right-triangular) and an orthogonal matrix
Q. The only difference from QR decomposition is the order of these matrices. QR decomposition is Gram–Schmidt orthogonalization of columns of
A, started from the first column. RQ decomposition is Gram–Schmidt orthogonalization of rows of
A, started from the last row.
Advantages and disadvantages The Gram-Schmidt process is inherently numerically unstable. While the application of the projections has an appealing geometric analogy to orthogonalization, the orthogonalization itself is prone to
numerical error. A significant advantage is the ease of implementation.
Using Householder reflections A Householder reflection (or
Householder transformation) is a transformation that takes a vector and reflects it about some
plane or
hyperplane. We can use this operation to calculate the
QR factorization of an
m-by-
n matrix A with .
Q can be used to reflect a vector in such a way that all coordinates but one disappear. Let \mathbf{x} be an arbitrary real
m-dimensional column vector of A such that \|\mathbf{x}\| = |\alpha| for a scalar
α.
α should get the same sign as the k-th coordinate of {{nowrap|\mathbf{x},}} where x_k is to be the pivot coordinate after which all entries are 0 in matrix
As final upper triangular form. If the algorithm is implemented using
floating-point arithmetic, then
α should get the opposite sign to avoid
loss of significance (for example, when \mathbf{x} is almost collinear with \mathbf{e}_1, \|\mathbf{u}\| becomes "small" and \mathbf{u} / \|\mathbf{u}\| is numerically unstable; the extreme case is \|\mathbf{u}\| = 0, which causes the previous division to result in
NaN). In the complex case, set :\alpha = -e^{i \arg x_k} \|\mathbf{x}\| and substitute transposition by conjugate transposition in the construction of
Q below. Then, where \mathbf{e}_1 is the vector , is the
Euclidean norm and I is an
identity matrix, set : \begin{align} \mathbf{u} &= \mathbf{x} - \alpha\mathbf{e}_1, \\ \mathbf{v} &= \frac{\mathbf{u}}{\|\mathbf{u}\|}, \\ Q &= I - 2 \mathbf{v}\mathbf{v}^\textsf{T}. \end{align} Or, if A is complex : Q = I - 2\mathbf{v}\mathbf{v}^\dagger. Q is an
m-by-
m Householder matrix, which is both symmetric and orthogonal (Hermitian and unitary in the complex case), and : Q\mathbf{x} = \begin{bmatrix} \alpha \\ 0 \\ \vdots \\ 0 \end{bmatrix}. This can be used to gradually transform an
m-by-
n matrix
A to upper
triangular form. First, we multiply
A with the Householder matrix
Q1 we obtain when we choose the first matrix column for
x. This results in a matrix
Q1
A with zeros in the left column (except for the first row). : Q_1A = \begin{bmatrix} \alpha_1 & \star & \cdots & \star \\ 0 & & & \\ \vdots & & A' & \\ 0 & & & \end{bmatrix} This can be repeated for
A′ (obtained from
Q1
A by deleting the first row and first column), resulting in a Householder matrix
Q′2. Note that
Q′2 is smaller than
Q1. Since we want it really to operate on
Q1
A instead of
A′ we need to expand it to the upper left, filling in a 1, or in general: :Q_k = \begin{bmatrix} I_{k-1} & 0 \\ 0 & Q_k' \end{bmatrix}. After t iterations of this process, :R = Q_t \cdots Q_2 Q_1 A is an upper triangular matrix. So, with :\begin{align} Q^\textsf{T} &= Q_t \cdots Q_2 Q_1, \\ Q &= Q_1^\textsf{T} Q_2^\textsf{T} \cdots Q_t^\textsf{T} \end{align} A = QR is a QR decomposition of A. This method has greater
numerical stability than the Gram–Schmidt method above. In numerical tests the computed factors Q_c and R_c satisfy \frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} = O(\varepsilon) at machine precision. Also, orthogonality is preserved: \|Q_c^\mathsf{T} Q_c - I\|_\infty = O(\varepsilon). However, the accuracy of Q_c and R_c decrease with condition number: \|Q - Q_c\|_\infty = O(\varepsilon\,\kappa_\infty(A)),\quad \frac{\|R - R_c\|_\infty}{\|R\|_\infty} = O(\varepsilon\,\kappa_\infty(A)). For a well-conditioned example (n=4000, \kappa_\infty(A)\approx3\times10^{3}): \frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} \approx 1.6\times10^{-15}, \|Q - Q_c\|_\infty \approx 1.6\times10^{-15}, \frac{\|R - R_c\|_\infty}{\|R\|_\infty} \approx 4.3\times10^{-14}, \|Q_c^\mathsf{T}Q_c - I\|_\infty \approx 1.1\times10^{-13}. In an ill-conditioned test (n=4000, \kappa_\infty(A)\approx4\times10^{18}): \frac{\|Q R - Q_c R_c\|_\infty}{\|A\|_\infty} \approx 1.3\times10^{-15}, \|Q - Q_c\|_\infty \approx 5.2\times10^{-4}, \frac{\|R - R_c\|_\infty}{\|R\|_\infty} \approx 1.2\times10^{-4}, \|Q_c^\mathsf{T}Q_c - I\|_\infty \approx 1.1\times10^{-13}. The following table gives the number of operations in the
k-th step of the QR-decomposition by the Householder transformation, assuming a square matrix with size
n. Summing these numbers over the steps (for a square matrix of size
n), the complexity of the algorithm (in terms of floating point multiplications) is given by :\frac{2}{3}n^3 + n^2 + \frac{1}{3}n - 2 = O\left(n^3\right).
Example Let us calculate the decomposition of : A = \begin{bmatrix} 12 & -51 & 4 \\ 6 & 167 & -68 \\ -4 & 24 & -41 \end{bmatrix}. First, we need to find a reflection that transforms the first column of matrix
A, vector {{nowrap|\mathbf{a}_1 = \begin{bmatrix} 12 & 6 & -4 \end{bmatrix}^\textsf{T},}} into {{nowrap|\left\|\mathbf{a}_1\right\| \mathbf{e}_1 = \begin{bmatrix} \alpha & 0 & 0\end{bmatrix}^\textsf{T}.}} Now, : \mathbf{u} = \mathbf{x} - \alpha\mathbf{e}_1, and : \mathbf{v} = \frac{\mathbf{u}}{\|\mathbf{u}\|}. Here, : \alpha = 14 and \mathbf{x} = \mathbf{a}_1 = \begin{bmatrix} 12 & 6 & -4 \end{bmatrix}^\textsf{T} Therefore : \mathbf{u} = \begin{bmatrix} -2 & 6 & -4 \end{bmatrix}^\textsf{T} = 2 \begin{bmatrix} -1 & 3 & -2 \end{bmatrix}^\textsf{T} and {{nowrap|\mathbf{v} = \frac{1}{\sqrt{14}}\begin{bmatrix} -1 & 3 & -2 \end{bmatrix}^\textsf{T},}} and then : \begin{align} Q_1 ={} &I - \frac{2}{\sqrt{14}\sqrt{14}} \begin{bmatrix} -1 \\ 3 \\ -2 \end{bmatrix} \begin{bmatrix} -1 & 3 & -2 \end{bmatrix} \\ ={} &I - \frac{1}{7}\begin{bmatrix} 1 & -3 & 2 \\ -3 & 9 & -6 \\ 2 & -6 & 4 \end{bmatrix} \\ ={} &\begin{bmatrix} 6/7 & 3/7 & -2/7 \\ 3/7 & -2/7 & 6/7 \\ -2/7 & 6/7 & 3/7 \\ \end{bmatrix}. \end{align} Now observe: :Q_1A = \begin{bmatrix} 14 & 21 & -14 \\ 0 & -49 & -14 \\ 0 & 168 & -77 \end{bmatrix}, so we already have almost a triangular matrix. We only need to zero the (3, 2) entry. Take the (1, 1)
minor, and then apply the process again to :A' = M_{11} = \begin{bmatrix} -49 & -14 \\ 168 & -77 \end{bmatrix}. By the same method as above, we obtain the matrix of the Householder transformation :Q_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -7/25 & 24/25 \\ 0 & 24/25 & 7/25 \end{bmatrix} after performing a direct sum with 1 to make sure the next step in the process works properly. Now, we find :Q = Q_1^\textsf{T} Q_2^\textsf{T} = \begin{bmatrix} 6/7 & -69/175 & 58/175 \\ 3/7 & 158/175 & -6/175 \\ -2/7 & 6/35 & 33/35 \end{bmatrix}. Or, to four decimal digits, :\begin{align} Q &= Q_1^\textsf{T} Q_2^\textsf{T} = \begin{bmatrix} 0.8571 & -0.3943 & 0.3314 \\ 0.4286 & 0.9029 & -0.0343 \\ -0.2857 & 0.1714 & 0.9429 \end{bmatrix} \\ R &= Q_2 Q_1 A = Q^\textsf{T} A = \begin{bmatrix} 14 & 21 & -14 \\ 0 & 175 & -70 \\ 0 & 0 & -35 \end{bmatrix}. \end{align} The matrix
Q is orthogonal and
R is upper triangular, so is the required QR decomposition.
Advantages and disadvantages The use of Householder transformations is inherently the most simple of the numerically stable QR decomposition algorithms due to the use of reflections as the mechanism for producing zeroes in the
R matrix. However, the Householder reflection algorithm is bandwidth heavy and difficult to parallelize, as every reflection that produces a new zero element changes the entirety of both
Q and
R matrices.
Parallel implementation of Householder QR The Householder QR method can be implemented in parallel with algorithms such as the TSQR algorithm (which stands for
Tall Skinny QR). This algorithm can be applied in the case when the matrix
A has
m >> n. This algorithm uses a binary reduction tree to compute local householder QR decomposition at each node in the forward pass, and re-constitute the Q matrix in the backward pass. The
binary tree structure aims at decreasing the amount of communication between processor to increase performance.
Using Givens rotations QR decompositions can also be computed with a series of
Givens rotations. Each rotation zeroes an element in the subdiagonal of the matrix, forming the
R matrix. The concatenation of all the Givens rotations forms the orthogonal
Q matrix. In practice, Givens rotations are not actually performed by building a whole matrix and doing a matrix multiplication. A Givens rotation procedure is used instead which does the equivalent of the sparse Givens matrix multiplication, without the extra work of handling the sparse elements. The Givens rotation procedure is useful in situations where only relatively few off-diagonal elements need to be zeroed, and is more easily parallelized than
Householder transformations.
Example Let us calculate the decomposition of : A = \begin{bmatrix} 12 & -51 & 4 \\ 6 & 167 & -68 \\ -4 & 24 & -41 \end{bmatrix}. First, we need to form a
rotation matrix that will zero the lowermost left element, {{nowrap|1=a_{31} = -4.}} We form this matrix using the Givens rotation method, and call the matrix G_1. We will first rotate the vector {{nowrap|\begin{bmatrix} 12 & -4 \end{bmatrix},}} to point along the
X axis. This vector has an angle {{nowrap|\theta = \arctan\left(\frac{-(-4)}{12}\right).}} We create the orthogonal Givens rotation matrix, G_1: :\begin{align} G_1 &= \begin{bmatrix} \cos(\theta) & 0 & -\sin(\theta) \\ 0 & 1 & 0 \\ \sin(\theta) & 0 & \cos(\theta) \end{bmatrix} \\ &\approx \begin{bmatrix} 0.94868 & 0 & -0.31622 \\ 0 & 1 & 0 \\ 0.31622 & 0 & 0.94868 \end{bmatrix} \end{align} And the result of G_1A now has a zero in the a_{31} element. :G_1A \approx \begin{bmatrix} 12.64911 & -55.97231 & 16.76007 \\ 6 & 167 & -68 \\ 0 & 6.64078 & -37.6311 \end{bmatrix} We can similarly form Givens matrices G_2 and which will zero the sub-diagonal elements a_{21} and {{nowrap|a_{32},}} forming a triangular matrix The orthogonal matrix Q^\textsf{T} is formed from the product of all the Givens matrices {{nowrap|Q^\textsf{T} = G_3 G_2 G_1.}} Thus, we have {{nowrap|G_3 G_2 G_1 A = Q^\textsf{T} A = R,}} and the
QR decomposition is
Advantages and disadvantages The QR decomposition via Givens rotations is the most involved to implement, as the ordering of the rows required to fully exploit the algorithm is not trivial to determine. However, it has a significant advantage in that each new zero element a_{ij} affects only the row with the element to be zeroed (
i) and a row above (
j). This makes the Givens rotation algorithm more bandwidth efficient and parallelizable than the Householder reflection technique.
Using fast matrix multiplication It is possible to compute the QR decomposition in a fast way with the use of
fast matrix multiplication algorithms in the time O({n^\omega }) for ~2.37 \le \omega . ==Connection to a determinant or a product of eigenvalues==