In the case of
W1, we have c(x, y) = d(x, y), and there is a particularly simple way to state that a function is c-convex in this case: a function f is
c-convex iff it is Lipschitz, with
Lipschitz constant \leq 1. In this case, f^c = -f, so the Kantorovich duality states thatW_1 (\mu, \nu) = \sup \left\{ \left. \int_{M} f(x) \, \mathrm{d} (\mu - \nu) (x) \, \right| \text{ continuous } f : M \to \mathbb{R}, \operatorname{Lip} (f) \leq 1 \right\},This form shows that
W1 is an
integral probability metric. If there exists a f that reaches the supremum exactly, then such a function is called a
Kantorovich potential for this optimal transport problem. For example, for any k > 2, the optimal transport plan of moving an upside-down unit hemisphere centered at (0, 0, k) to a rightside-up unit hemisphere centered at (0, 0, 0) is simply moving it along the
z-direction. This is proven by using the Kantorovich potential f(x, y, z) = z.
Connection to Radon measure Compare this with the definition of the
Radon metric:\rho (\mu, \nu) := \sup \left\{ \left. \int_M f(x) \, \mathrm{d} (\mu - \nu) (x) \, \right| \text{ continuous } f : M \to [-1, 1] \right\}.If the metric
d of the metric space (
M,
d) is bounded by some constant
C, then2 W_1 (\mu, \nu) \leq C \rho (\mu, \nu),and so convergence in the Radon metric (identical to
total variation convergence when
M is a
Polish space) implies convergence in the Wasserstein metric, but not vice versa.
Geometric interpretation Monge's original insight was that, in the case where X is a Riemannian manifold with the geodesic distance metric, the duality of optimal transport is geometrically meaningful. Let \gamma be an optimal transport plan from \mu to \nu. To avoid complications and make the imagery clear, suppose that \gamma is a
deterministic plan. That is, it is equivalent to a transport
map T: \operatorname{supp}(\mu) \to \operatorname{supp}(\nu), so that any infinitesimal lump of coal at x is transported to T(x). The entire transport map can then be drawn as a family of geodesic curve segments in X connecting each x to its corresponding T(x). These are
transport rays. Monge noted that the transport rays do not intersect at an angle, because otherwise the plan is not optimal. Concretely, suppose that there exist x, x' such that x \to T(x) and x' \to T(x') intersect at an X-shape at some point y, then we can redirect x to T(x') and vice versa, and "pull apart" the X-shape into a )(-shape, and thus cost less in transport. Therefore, the transport rays make up a non-intersecting family of geodesic arcs.
Planar case Monge studied first the case where X = \R^2, in which case transport rays are line segments. He studied in particular the case where the transport rays cover a solid region of the plane. That is, we have a 1-parameter family of lines, a
line congruence. He showed that these are orthogonal to a 1-parameter family of curves. That is, there exists some partially defined f: \R^2 \to \R, such that each contour curve f^{-1}(k) is orthogonal to the transport rays, and furthermore, the curves are separated by their distance. That is, starting on some point z on the curve f^{-1}(k), and move along the transport ray that the point z is on for a distance \delta k, we would arrive at a point on the curve f^{-1}(k + \delta k). This is the Kantorovich potential. Any point in the thin slice can be moved to any other point in the other thin slice without changing the cost. All such plans are optimal. Furthermore, any other plan is suboptimal. In this way, the Kantorovich potential field entirely solves the problem. Each line in the line congruence divides the plane into two halves, such that the mass of \mu, \nu in the two halves are the same. The line congruence produces a caustic curve, and the involutes of the caustic curve are the contours of the Kantorovich potential field.
Spatial case Next, Monge noted that when X = \R^3, transport rays are line segments. He studied in particular the case where the transport rays cover a solid region of space. That is, we have a 2-parameter line congruence. He showed that these are orthogonal to a 1-parameter family of surfaces. That is, there exists some partially defined f: \R^3 \to \R, such that each contour surface f^{-1}(k) is orthogonal to the transport rays, and furthermore, the surfaces are separated by their distance. That is, starting on some point z on the surface f^{-1}(k), and move along the transport ray that the point z is on for a distance \delta k, we would arrive at a point on the surface f^{-1}(k + \delta k). This is the Kantorovich potential. Any infinitesimal circle in the transport rays sweeps out a thin tube in space, and it encloses one thin filament in \mu and another in \nu. Any point in one thin filament can slide along the tube to any other point in the other thin filament without changing the cost. All such plans are optimal. Furthermore, any other plan is suboptimal. In this way, the Kantorovich potential field entirely solves the problem. Unlike the planar case, a non-intersecting family of line congruences cannot in general be normal to any surface. For example, consider the standard
contact structure on \R^3, which can be understood as a field of infinitesimal planes. Now, perpendicular to every infinitesimal plane, draw a directed ray. This gives us a "twisted" line congruence. There is no surface perpendicular to the congruence, because there is no surface that can be tangent to every infinitesimal plane. However, Monge showed that in this particular case, this 2-parameter line congruence can be split into a 1-parameter family of 1-parameter line congruences, in such a way that each such 1-parameter line congruence is a
developable surface. In fact, there are two ways to split. Thus, the 2-parameter line congruence is generated as the grid of intersections between two 1-parameter families of developable surfaces. He showed that a line congruence satisfying such a condition
is normal to a 1-parameter family of surfaces, and thus he constructed the surfaces. In modern language, he showed that this is an
integrable foliation of space by lines. Each contour surface f^{-1}(k) intersects these two families of developable surfaces at two families of curves, and Monge named them "
lines of curvature". He would later study those of the
ellipsoid in 1795. Let the surfaces be contours of the equation f(x, y, z) = const. The lines of curvature can be computed from the equation, and 4 lines of curvature produce an infinitesimal square tube. The volumes of \mu and \nu cut by the square tube are equal. This is the
partial differential equation satisfied by the surface.
General case In general, at each point z on a transport ray x \to T(x), define the unit velocity vector v_z. This produces a vector field. The vector field cannot have curl in it, since if there is a curl, then effectively, some lump of coal is being transported in a cycle, which is suboptimal. Therefore, the vector field is
irrotational, thus locally integrable as the gradient field \nabla f of some function X \to\R. This corresponds to the general fact that an optimal transport plan is
c-cyclically monotonic. In this way, an optimal transport map produces the Kantorovich potential. Note that in general, the Kantorovich potential is not everywhere differentiable, and there may be singular points on its surface. For example, consider the transport from the unit circle in \R^2 to the origin. Its optimal transport rays are radii of the circle, and its Kantorovich potential has a sharp point at origin. However, it is still differentiable
generically, i.e. almost everywhere (like in
Sard's theorem). Conversely, given a continuous f : X \to \R such that almost everywhere it is differentiable with gradient \nabla f of unit length, it is a Kantorovich potential, and its gradient flow generates an optimal transport map. This duality between the transport
rays and the potential
field can be regarded as a
Huygens–Fresnel principle.
Economic interpretation Economically, this can be interpreted as follows. Consider a market of coals. The price of coal vary over space, so define the price function f: X \to \R, where f(x) is the price of coal at location x. The market may have opportunity for spatial
arbitrage, means that \exists x, x' \in X such that f(x) - f(x') > d(x, x'). An enterprising merchant can then buy coal at x' and sell coal at x, earning a net profit of f(x) - f(x') - d(x, x'). The price function is
c-convex iff it is free of spatial arbitrage. For example, a market of constant pricing f = c_0 has no spatial arbitrage, since the coal is the same price everywhere, so any movement of coal would waste transportation without earning any profit. Given such a market, a configuration of coal \mu has a total market-value of \int_X f d\mu, and two configurations \mu, \nu may have differing market-values \int_X f d(\mu - \nu). The two parts of duality then state: • W_1(\mu, \nu) \geq \int_X f d(\mu - \nu) for any arbitrage-free market pricing, because otherwise, one can purchase \nu , transport to \mu using an optimal transport plan at a cost of W_1(\mu, \nu) , then sell it, creating profit, so f is not arbitrage-free after all. • In the case of exact equality W_1(\mu, \nu) = \int_X f d(\mu - \nu) , the price f can be interpreted as the
shadow price of any optimal transport plan, and any transport plan that exactly breaks even is an optimal plan. The shadow price interpretation is Kantorovich's original understanding of the duality between optimal planning and market pricing. ==
W2 ==