Relationship with total derivative The gradient is closely related to the
total derivative (
total differential) df: they are
transpose (
dual) to each other. Using the convention that vectors in \R^n are represented by
column vectors, and that covectors (linear maps \R^n \to \R) are represented by
row vectors, the gradient \nabla f and the derivative df are expressed as a column and row vector, respectively, with the same components, but transpose of each other: \nabla f(p) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} ; df_p = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix} . While these both have the same components, they differ in what kind of mathematical object they represent: at each point, the derivative is a
cotangent vector, a
linear form (or covector) which expresses how much the (scalar) output changes for a given infinitesimal change in (vector) input, while at each point, the gradient is a
tangent vector, which represents an infinitesimal change in (vector) input. In symbols, the gradient is an element of the tangent space at a point, \nabla f(p) \in T_p \R^n, while the derivative is a map from the tangent space to the real numbers, df_p \colon T_p \R^n \to \R. The tangent spaces at each point of \R^n can be "naturally" identified with the vector space \R^n itself, and similarly the cotangent space at each point can be naturally identified with the
dual vector space (\R^n)^* of covectors; thus the value of the gradient at a point can be thought of a vector in the original \R^n, not just as a tangent vector. Computationally, given a tangent vector, the vector can be
multiplied by the derivative (as matrices), which is equal to taking the
dot product with the gradient: (df_p)(v) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) & \cdots & \frac{\partial f}{\partial x_n}(p) \end{bmatrix} \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix} = \sum_{i=1}^n \frac{\partial f}{\partial x_i}(p) v_i = \begin{bmatrix}\frac{\partial f}{\partial x_1}(p) \\ \vdots \\ \frac{\partial f}{\partial x_n}(p) \end{bmatrix} \cdot \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix} = \nabla f(p) \cdot v
Differential or (exterior) derivative The best linear approximation to a differentiable function f : \R^n \to \R at a point x in \R^n is a linear map from \R^n to \R which is often denoted by df_x or Df(x) and called the
differential or
total derivative of f at x. The function df, which maps x to df_x, is called the
total differential or
exterior derivative of f and is an example of a
differential 1-form. Much as the derivative of a function of a single variable represents the
slope of the
tangent to the
graph of the function, the directional derivative of a function in several variables represents the slope of the tangent
hyperplane in the direction of the vector. The gradient is related to the differential by the formula (\nabla f)_x\cdot v = df_x(v) for any v\in\R^n, where \cdot is the
dot product: taking the dot product of a vector with the gradient is the same as taking the directional derivative along the vector. If \R^n is viewed as the space of (dimension n) column vectors (of real numbers), then one can regard df as the row vector with components \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n}\right), so that df_x(v) is given by
matrix multiplication. Assuming the standard Euclidean metric on \R^n, the gradient is then the corresponding column vector, that is, (\nabla f)_i = df^\mathsf{T}_i.
Linear approximation to a function The best
linear approximation to a function can be expressed in terms of the gradient, rather than the derivative. The gradient of a
function f from the Euclidean space \R^n to \R at any particular point x_0 in \R^n characterizes the best
linear approximation to f at x_0. The approximation is as follows: f(x) \approx f(x_0) + (\nabla f)_{x_0}\cdot(x-x_0) for x close to x_0, where (\nabla f)_{x_0} is the gradient of f computed at x_0, and the dot denotes the dot product on \R^n. This equation is equivalent to the first two terms in the
multivariable Taylor series expansion of f at x_0.
Relationship with Let be an
open set in . If the function is differentiable, then the differential of is the
Fréchet derivative of . Thus is a function from to the space such that \lim_{h\to 0} \frac{\|h\|} = 0, where · is the dot product. As a consequence, the usual properties of the derivative hold for the gradient, though the gradient is not a derivative itself, but rather dual to the derivative: ;
Linearity :The gradient is linear in the sense that if and are two real-valued functions differentiable at the point , and and are two constants, then is differentiable at , and moreover \nabla\left(\alpha f+\beta g\right)(a) = \alpha \nabla f(a) + \beta\nabla g (a). ;
Product rule :If and are real-valued functions differentiable at a point , then the product rule asserts that the product is differentiable at , and \nabla (fg)(a) = f(a)\nabla g(a) + g(a)\nabla f(a). ;
Chain rule :Suppose that is a real-valued function defined on a subset of , and that is differentiable at a point . There are two forms of the chain rule applying to the gradient. First, suppose that the function is a
parametric curve; that is, a function maps a subset into . If is differentiable at a point such that , then (f\circ g)'(c) = \nabla f(a)\cdot g'(c), where ∘ is the
composition operator: . More generally, if instead , then the following holds: \nabla (f\circ g)(c) = \big(Dg(c)\big)^\mathsf{T} \big(\nabla f(a)\big), where T denotes the transpose
Jacobian matrix. For the second form of the chain rule, suppose that is a real valued function on a subset of , and that is differentiable at the point . Then \nabla (h\circ f)(a) = h'\big(f(a)\big)\nabla f(a). ==Further properties and applications==