The Cayley–Hamilton theorem is an immediate consequence of the existence of the
Jordan normal form for matrices over
algebraically closed fields, see . In this section, direct proofs are presented. As the examples above show, obtaining the statement of the Cayley–Hamilton theorem for an n \times n matrix A = \left(a_{ij}\right)_{i,j=1}^n requires two steps: first the coefficients of the characteristic polynomial are determined by development as a polynomial in of the determinant \begin{align} p(t) & = \det(t I_n - A) = \begin{vmatrix}t-a_{1,1}&-a_{1,2}&\cdots&-a_{1,n} \\ -a_{2,1}&t-a_{2,2}&\cdots&-a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ -a_{n,1}&-a_{n,2}& \cdots& t-a_{n,n} \end{vmatrix} \\[5pt] & = t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0, \end{align} and then these coefficients are used in a linear combination of powers of that is equated to the n \times n zero matrix: A^n+c_{n-1}A^{n-1} + \cdots + c_1 A + c_0 I_n = \begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}. The left-hand side can be worked out to an n \times n matrix whose entries are (enormous) polynomial expressions in the set of entries of , so the Cayley–Hamilton theorem states that each of these expressions equals . For any fixed value of , these identities can be obtained by tedious but straightforward algebraic manipulations. None of these computations, however, can show why the Cayley–Hamilton theorem should be valid for matrices of all possible sizes , so a uniform proof for all is needed.
Preliminaries If a vector of size is an
eigenvector of with eigenvalue , in other words if , then \begin{align} p(A)\cdot v & = A^n\cdot v+c_{n-1}A^{n-1}\cdot v+\cdots+c_1A\cdot v+c_0I_n\cdot v \\[6pt] & = \lambda^nv+c_{n-1}\lambda^{n-1}v+\cdots+c_1\lambda v+c_0 v=p(\lambda)v, \end{align} which is the zero vector since (the eigenvalues of are precisely the
roots of ). This holds for all possible eigenvalues , so the two matrices equated by the theorem certainly give the same (null) result when applied to any eigenvector. Now if admits a
basis of eigenvectors, in other words if is
diagonalizable, then the Cayley–Hamilton theorem must hold for , since two matrices that give the same values when applied to each element of a basis must be equal. A=XDX^{-1}, \quad D=\operatorname{diag}(\lambda_i), \quad i=1,2,...,n p_A(\lambda)=|\lambda I-A|=\prod_{i=1}^n (\lambda-\lambda_i)\equiv \sum_{k=0}^n c_k\lambda^k p_A(A)=\sum c_k A^k=X p_A(D)X^{-1}=X C X^{-1} C_{ii}=\sum_{k=0}^n c_k\lambda_i^k=\prod_{j=1}^n(\lambda_i-\lambda_j)=0, \qquad C_{i,j\neq i}=0 \therefore p_A(A)=XCX^{-1}=O . Consider now the function e\colon M_n \to M_n which maps n \times n matrices to n \times n matrices given by the formula e(A)=p_A(A), i.e. which takes a matrix A and plugs it into its own characteristic polynomial. Not all matrices are diagonalizable, but for matrices with
complex coefficients many of them are: the set D of diagonalizable complex square matrices of a given size is
dense in the set of all such square matrices (for a matrix to be diagonalizable it suffices for instance that its characteristic polynomial not have any
multiple roots). Now viewed as a function e\colon \C^{n^2}\to \C ^{n^2}(since matrices have n^2 entries) we see that this function is
continuous. This is true because the entries of the image of a matrix are given by polynomials in the entries of the matrix. Since e(D) = \left\{\begin{pmatrix} 0 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}\right\} and since the set D is dense, by continuity this function must map the entire set of n \times n matrices to the zero matrix. Therefore, the Cayley–Hamilton theorem is true for complex numbers, and must therefore also hold for \Q- or \R-valued matrices. While this provides a valid proof, the argument is not very satisfactory, since the identities represented by the theorem do not in any way depend on the nature of the matrix (diagonalizable or not), nor on the kind of entries allowed (for matrices with real entries the diagonalizable ones do not form a dense set, and it seems strange one would have to consider complex matrices to see that the Cayley–Hamilton theorem holds for them). We shall therefore now consider only arguments that prove the theorem directly for any matrix using algebraic manipulations only; these also have the benefit of working for matrices with entries in any
commutative ring. There is a great variety of such proofs of the Cayley–Hamilton theorem, of which several will be given here. They vary in the amount of
abstract algebraic notions required to understand the proof. The simplest proofs use just those notions needed to formulate the theorem (matrices, polynomials with numeric entries, determinants), but involve technical computations that render somewhat mysterious the fact that they lead precisely to the correct conclusion. It is possible to avoid such details, but at the price of involving more subtle algebraic notions: polynomials with coefficients in a non-commutative ring, or matrices with unusual kinds of entries.
Adjugate matrices All proofs below use the notion of the
adjugate matrix of an n \times n matrix , the
transpose of its
cofactor matrix. This is a matrix whose coefficients are given by polynomial expressions in the coefficients of (in fact, by certain (n-1) \times (n-1) determinants), in such a way that the following fundamental relations hold, \operatorname{adj}(M)\cdot M=\det(M)I_n=M\cdot\operatorname{adj}(M)~. These relations are a direct consequence of the basic properties of determinants: evaluation of the entry of the matrix product on the left gives the expansion by column of the determinant of the matrix obtained from by replacing column by a copy of column , which is if and zero otherwise; the matrix product on the right is similar, but for expansions by rows. Being a consequence of just algebraic expression manipulation, these relations are valid for matrices with entries in any commutative ring (commutativity must be assumed for determinants to be defined in the first place). This is important to note here, because these relations will be applied below for matrices with non-numeric entries such as polynomials.
A direct algebraic proof This proof uses just the kind of objects needed to formulate the Cayley–Hamilton theorem: matrices with polynomials as entries. The matrix tI_n-A whose determinant is the characteristic polynomial of is such a matrix, and since polynomials form a commutative ring, it has an
adjugate B=\operatorname{adj}(tI_n-A). Then, according to the right-hand fundamental relation of the adjugate, one has (t I_n - A)B = \det(t I_n - A) I_n = p(t) I_n. Since is also a matrix with polynomials in as entries, one can, for each , collect the coefficients of t^i in each entry to form a matrix of numbers, such that one has B = \sum_{i = 0}^{n - 1} t^i B_i. (The way the entries of are defined makes clear that no powers higher than occur). While this
looks like a polynomial with matrices as coefficients, we shall not consider such a notion; it is just a way to write a matrix with polynomial entries as a linear combination of constant matrices, and the coefficient t^i has been written to the left of the matrix to stress this point of view. Now, one can expand the matrix product in our equation: \begin{align} p(t) I_n &= (t I_n - A)B \\ &=(t I_n - A)\sum_{i = 0}^{n - 1} t^i B_i \\ &=\sum_{i = 0}^{n - 1} tI_n\cdot t^i B_i - \sum_{i = 0}^{n - 1} A\cdot t^i B_i \\ &=\sum_{i = 0}^{n - 1} t^{i + 1} B_i- \sum_{i = 0}^{n - 1} t^i AB_i \\ &=t^n B_{n - 1} + \sum_{i = 1}^{n - 1} t^i(B_{i - 1} - AB_i) - AB_0. \end{align} Writing p(t)I_n=t^nI_n+t^{n-1}c_{n-1}I_n+\cdots+tc_1I_n+c_0I_n, one obtains an equality of two matrices with polynomial entries, written as linear combinations of constant matrices with powers of as coefficients. Such an equality can hold only if in any matrix position the entry that is multiplied by a given power t^i is the same on both sides; it follows that the constant matrices with coefficient t^i in both expressions must be equal. Writing these equations then for from down to 0, one finds B_{n - 1} = I_n, \qquad B_{i - 1} - AB_i = c_i I_n\quad \text{for }1 \leq i \leq n-1, \qquad -A B_0 = c_0 I_n. Finally, multiply the equation of the coefficients of t^i from the left by A^i, and sum up: A^n B_{n-1} + \sum\limits_{i=1}^{n-1}\left( A^i B_{i-1} - A^{i+1}B_i\right) -A B_0 = A^n+c_{n-1} A^{n-1} + \cdots + c_1A + c_0I_n. The left-hand sides form a
telescoping sum and cancel completely; the right-hand sides add up to p(A): 0 = p(A). This completes the proof.
A proof using polynomials with matrix coefficients This proof is similar to the first one, but tries to give meaning to the notion of polynomial with matrix coefficients that was suggested by the expressions occurring in that proof. This requires considerable care, since it is somewhat unusual to consider polynomials with coefficients in a non-commutative ring, and not all reasoning that is valid for commutative polynomials can be applied in this setting. Notably, while arithmetic of polynomials over a commutative ring models the arithmetic of
polynomial functions, this is not the case over a non-commutative ring (in fact there is no obvious notion of polynomial function in this case that is closed under multiplication). So when considering polynomials in with matrix coefficients, the variable must not be thought of as an "unknown", but as a formal symbol that is to be manipulated according to given rules; in particular one cannot just set to a specific value. (f+g)(x) = \sum_i \left (f_i+g_i \right )x^i = \sum_i{f_i x^i} + \sum_i{g_i x^i} = f(x) + g(x). Let M(n,R) be the ring of matrices with entries in some ring
R (such as the real or complex numbers) that has as an element. Matrices with as coefficients polynomials in , such as t I_n - A or its adjugate
B in the first proof, are elements of M(n,R[t]). By collecting like powers of , such matrices can be written as "polynomials" in with constant matrices as coefficients; write M(n,R)[t] for the set of such polynomials. Since this set is in
bijection with M(n,R[t]), one defines arithmetic operations on it correspondingly, in particular multiplication is given by \left( \sum_i M_i t^i \right) \!\!\left( \sum_j N_j t^j \right) = \sum_{i,j} (M_i N_j) t^{i+j}, respecting the order of the coefficient matrices from the two operands; obviously this gives a non-commutative multiplication. Thus, the identity (t I_n - A)B = p(t) I_n. from the first proof can be viewed as one involving a multiplication of elements in M(n,R)[t]. At this point, it is tempting to simply set equal to the matrix , which makes the first factor on the left equal to the zero matrix, and the right hand side equal to ; however, this is not an allowed operation when coefficients do not commute. It is possible to define a "right-evaluation map" , which replaces each by the matrix power of , where one stipulates that the power is always to be multiplied on the right to the corresponding coefficient. But this map is not a
ring homomorphism: the right-evaluation of a product differs in general from the product of the right-evaluations. This is so because multiplication of polynomials with matrix coefficients does not model multiplication of expressions containing unknowns: a product Mt^i Nt^j = (M\cdot N) t^{i+j} is defined assuming that commutes with , but this may fail if is replaced by the matrix . One can work around this difficulty in the particular situation at hand, since the above right-evaluation map does become a ring homomorphism if the matrix is in the
center of the ring of coefficients, so that it commutes with all the coefficients of the polynomials (the argument proving this is straightforward, exactly because commuting with coefficients is now justified after evaluation). Now, is not always in the center of , but we may replace with a smaller ring provided it contains all the coefficients of the polynomials in question: I_n, , and the coefficients B_i of the polynomial . The obvious choice for such a
subring is the
centralizer of , the subring of all matrices that commute with ; by definition is in the center of . This centralizer obviously contains I_n, and , but one has to show that it contains the matrices B_i. To do this, one combines the two fundamental relations for adjugates, writing out the adjugate as a polynomial: \begin{align} \left(\sum_{i = 0}^m B_i t^i\right)\!(t I_n - A) &= (tI_n - A) \sum_{i = 0}^m B_i t^i \\ \sum_{i = 0}^m B_i t^{i + 1} - \sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m B_i t^{i + 1} - \sum_{i = 0}^m A B_i t^i \\ \sum_{i = 0}^m B_i A t^i &= \sum_{i = 0}^m A B_i t^i . \end{align}
Equating the coefficients shows that for each , we have as desired. Having found the proper setting in which is indeed a
homomorphism of rings, one can complete the proof as suggested above: \begin{align} \operatorname{ev}_A\left(p(t)I_n\right) &= \operatorname{ev}_A((tI_n-A)B) \\[5pt] p(A) &= \operatorname{ev}_A(tI_n - A)\cdot \operatorname{ev}_A(B) \\[5pt] p(A) &= (AI_n-A) \cdot \operatorname{ev}_A(B) = O\cdot\operatorname{ev}_A(B)=O. \end{align} This completes the proof.
A synthesis of the first two proofs In the first proof, one was able to determine the coefficients of based on the right-hand fundamental relation for the adjugate only. In fact the first equations derived can be interpreted as determining the quotient of the
Euclidean division of the polynomial on the left by the
monic polynomial , while the final equation expresses the fact that the remainder is zero. This division is performed in the ring of polynomials with matrix coefficients. Indeed, even over a non-commutative ring, Euclidean division by a monic polynomial is defined, and always produces a unique quotient and remainder with the same
degree condition as in the commutative case, provided it is specified at which side one wishes to be a factor (here that is to the left). To see that quotient and remainder are unique (which is the important part of the statement here), it suffices to write PQ+r = PQ'+r' as P(Q-Q') = r'-r and observe that since is monic, cannot have a degree less than that of , unless . But the dividend and divisor used here both lie in the subring , where is the subring of the
matrix ring generated by : the -linear
span of all powers of . Therefore, the Euclidean division can in fact be performed within that
commutative polynomial ring, and of course it then gives the same quotient and remainder 0 as in the larger ring; in particular this shows that in fact lies in . But, in this commutative setting, it is valid to set to in the equation p(t)I_n=(tI_n-A)B; in other words, to apply the evaluation map \operatorname{ev}_A:(R[A])[t]\to R[A] which is a ring homomorphism, giving p(A)=0\cdot\operatorname{ev}_A(B)=0 just like in the second proof, as desired. In addition to proving the theorem, the above argument tells us that the coefficients of are polynomials in , while from the second proof we only knew that they lie in the centralizer of ; in general is a larger subring than , and not necessarily commutative. In particular the constant term lies in . Since is an arbitrary square matrix, this proves that can always be expressed as a polynomial in (with coefficients that depend on . In fact, the equations found in the first proof allow successively expressing B_{n-1}, \ldots, B_1, B_0 as polynomials in , which leads to the identity {{Equation box 1 valid for all matrices, where p(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0 is the characteristic polynomial of . Note that this identity also implies the statement of the Cayley–Hamilton theorem: one may move to the right hand side, multiply the resulting equation (on the left or on the right) by , and use the fact that -A\cdot \operatorname{adj}(-A) = \operatorname{adj}(-A)\cdot (-A) = \det(-A) I_n = c_0I_n.
A proof using matrices of endomorphisms As was mentioned above, the matrix
p(
A) in statement of the theorem is obtained by first evaluating the determinant and then substituting the matrix
A for
t; doing that substitution into the matrix t I_n - A before evaluating the determinant is not meaningful. Nevertheless, it is possible to give an interpretation where is obtained directly as the value of a certain determinant, but this requires a more complicated setting, one of matrices over a ring in which one can interpret both the entries A_{i,j} of , and all of itself. One could take for this the ring of matrices over , where the entry A_{i,j} is realised as A_{i,j} I_n, and as itself. But considering matrices with matrices as entries might cause confusion with
block matrices, which is not intended, as that gives the wrong notion of determinant (recall that the determinant of a matrix is defined as a sum of products of its entries, and in the case of a block matrix this is generally not the same as the corresponding sum of products of its blocks!). It is clearer to distinguish from the
endomorphism of an -
dimensional vector space V (or
free -module if is not a field) defined by it in a basis e_1, \ldots, e_n, and to take matrices over the ring End(
V) of all such endomorphisms. Then is a possible matrix entry, while designates the element of whose entry is endomorphism of scalar multiplication by A_{i,j}; similarly I_n will be interpreted as element of . However, since is not a commutative ring, no determinant is defined on ; this can only be done for matrices over a commutative subring of . Now the entries of the matrix \varphi I_n-A all lie in the subring generated by the identity and , which is commutative. Then a determinant map is defined, and \det(\varphi I_n - A) evaluates to the value of the characteristic polynomial of at (this holds independently of the relation between and ); the Cayley–Hamilton theorem states that is the null endomorphism. In this form, the following proof can be obtained from that of (which in fact is the more general statement related to the
Nakayama lemma; one takes for the
ideal in that proposition the whole ring ). The fact that is the matrix of in the basis means that \varphi(e_i) = \sum_{j = 1}^n A_{j,i} e_j \quad\text{for }i=1,\ldots,n. One can interpret these as components of one equation in , whose members can be written using the matrix-vector product that is defined as usual, but with individual entries and in being "multiplied" by forming \psi(v); this gives: \varphi I_n \cdot E = A^\operatorname{tr}\cdot E, where E\in V^n is the element whose component is (in other words it is the basis of written as a column of vectors). Writing this equation as (\varphi I_n-A^\operatorname{tr})\cdot E = 0\in V^n one recognizes the
transpose of the matrix \varphi I_n-A considered above, and its determinant (as element of is also
p(
φ). To derive from this equation that , one left-multiplies by the
adjugate matrix of \varphi I_n-A^\operatorname{tr}, which is defined in the matrix ring , giving \begin{align} 0 &= \operatorname{adj}(\varphi I_n-A^\operatorname{tr}) \cdot \left((\varphi I_n-A^\operatorname{tr})\cdot E\right) \\[1ex] &= \left(\operatorname{adj}(\varphi I_n-A^\operatorname{tr}) \cdot (\varphi I_n-A^\operatorname{tr})\right) \cdot E \\[1ex] &= \left(\det(\varphi I_n - A^\operatorname{tr})I_n\right) \cdot E \\[1ex] &= (p(\varphi)I_n)\cdot E; \end{align} the
associativity of matrix-matrix and matrix-vector multiplication used in the first step is a purely formal property of those operations, independent of the nature of the entries. Now component of this equation says that ; thus vanishes on all , and since these elements generate it follows that , completing the proof. One additional fact that follows from this proof is that the matrix whose characteristic polynomial is taken need not be identical to the value substituted into that polynomial; it suffices that be an endomorphism of satisfying the initial equations \varphi(e_i) = \sum_j A_{j,i} e_j for
some sequence of elements that generate (which space might have smaller dimension than , or in case the ring is not a field it might not be a
free module at all).
A bogus "proof": One persistent elementary but
incorrect argument for the theorem is to "simply" take the definition p(\lambda) = \det(\lambda I_n - A) and substitute for , obtaining p(A)=\det(A I_n - A) = \det(A - A) = \det(\mathbf{0}) = 0. There are many ways to see why this argument is wrong. First, in the Cayley–Hamilton theorem, is an
matrix. However, the right hand side of the above equation is the value of a determinant, which is a
scalar. So they cannot be equated unless (i.e. is just a scalar). Second, in the expression \det(\lambda I_n - A), the variable λ actually occurs at the diagonal entries of the matrix \lambda I_n - A. To illustrate, consider the characteristic polynomial in the previous example again: \det\!\begin{pmatrix}\lambda-1&-2\\-3&\lambda-4\end{pmatrix}. If one substitutes the entire matrix for in those positions, one obtains \det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 1 & -2 \\ -3 &\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 4\end{pmatrix}, in which the "matrix" expression is simply not a valid one. Note, however, that if scalar multiples of identity matrices instead of scalars are subtracted in the above, i.e. if the substitution is performed as \det\!\begin{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - I_2 & -2I_2 \\ -3I_2 & \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} - 4I_2 \end{pmatrix}, then the determinant is indeed zero, but the expanded matrix in question does not evaluate to A I_n-A; nor can its determinant (a scalar) be compared to
p(
A) (a matrix). So the argument that p(A) = \det(A I_n - A) = 0 still does not apply. Actually, if such an argument holds, it should also hold when other
multilinear forms instead of determinant is used. For instance, if we consider the
permanent function and define q(\lambda) = \operatorname{perm}(\lambda I_n - A), then by the same argument, we should be able to "prove" that . But this statement is demonstrably wrong: in the 2-dimensional case, for instance, the permanent of a matrix is given by \operatorname{perm}\!\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad + bc. So, for the matrix in the previous example, \begin{align} q(\lambda) & = \operatorname{perm}(\lambda I_2 - A) = \operatorname{perm}\!\begin{pmatrix} \lambda - 1 & -2 \\ -3 & \lambda-4 \end{pmatrix} \\[6pt] & = (\lambda - 1)(\lambda - 4) + (-2)(-3) = \lambda^2 - 5\lambda + 10. \end{align} Yet one can verify that q(A) = A^2 - 5A + 10I_2 = 12I_2 \neq 0. One of the proofs for Cayley–Hamilton theorem above bears some similarity to the argument that p(A) = \det(AI_n-A) = 0. By introducing a matrix with non-numeric coefficients, one can actually let live inside a matrix entry, but then A I_n is not equal to , and the conclusion is reached differently.
Proofs using methods of abstract algebra Basic properties of
Hasse–Schmidt derivations on the
exterior algebra A = \bigwedge M of some -
module (supposed to be free and of finite rank) have been used by to prove the Cayley–Hamilton theorem. See also .
A combinatorial proof A proof based on developing the
Leibniz formula for the characteristic polynomial was given by Straubing and a generalization was given using
trace monoid theory of Foata and Cartier. ==Abstraction and generalizations==