Let \beta be the radix of the floating-point system and p the precision. Consider several easy cases first: • If x is zero then x - y = -y, and if y is zero then x - y = x, so the result is trivial because floating-point negation is always exact. • If x = y the result is zero and thus exact. • If x then we must also have y/2 \leq x so y . In this case, x - y = -(-x - -y), so the result follows from the theorem restricted to x, y \geq 0. • If x \leq y, we can write x - y = -(y - x) with x/2 \leq y \leq 2 x, so the result follows from the theorem restricted to x \geq y. For the rest of the proof, assume 0
without loss of generality. Write x, y > 0 in terms of their positive integral significands s_x, s_y \leq \beta^p - 1 and minimal exponents e_x, e_y: \begin{align} x &= s_x \cdot \beta^{e_x - p + 1} \\ y &= s_y \cdot \beta^{e_y - p + 1} \end{align} Note that x and y may be subnormal—we do not assume s_x, s_y \geq \beta^{p - 1}. The subtraction gives: \begin{align} x - y &= s_x \cdot \beta^{e_x - p + 1} - s_y \cdot \beta^{e_y - p + 1} \\ &= s_x \beta^{e_x - e_y} \cdot \beta^{e_y - p + 1} - s_y \cdot \beta^{e_y - p + 1} \\ &= (s_x \beta^{e_x - e_y} - s_y) \cdot \beta^{e_y - p + 1}. \end{align} Let s' = s_x \beta^{e_x - e_y} - s_y. Since 0 we have: • e_y \leq e_x, so e_x - e_y \geq 0, from which we can conclude \beta^{e_x - e_y} is an integer and therefore so is s' = s_x \beta^{e_x - e_y} - s_y; and • x - y > 0, so s' > 0. Further, since x \leq 2 y, we have x - y \leq y, so that s' \cdot \beta^{e_y - p + 1} = x - y \leq y = s_y \cdot \beta^{e_y - p + 1} which implies that 0 Hence x - y = s' \cdot \beta^{e_y - p + 1}, \quad \text{for} \quad 0 so x - y is a floating-point number. Note: Even if x and y are normal,
i.e., s_x, s_y \geq \beta^{p - 1}, we cannot prove that s' \geq \beta^{p - 1} and therefore cannot prove that x - y is also normal. For example, the difference of the two smallest positive normal floating-point numbers x = (\beta^{p - 1} + 1) \cdot \beta^{e_{\mathrm{min}} - p + 1} and y = \beta^{p - 1} \cdot \beta^{e_{\mathrm{min}} - p + 1} is x - y = 1 \cdot \beta^{e_{\mathrm{min}} - p + 1} which is necessarily subnormal. In floating-point number systems without
subnormal numbers, such as CPUs in nonstandard flush-to-zero mode instead of the standard gradual underflow, the Sterbenz lemma does not apply. == Relation to catastrophic cancellation ==