The essence of directed information is
causal conditioning. The probability of x^n causally conditioned on y^n is defined as :P(x^n||y^n) \triangleq \prod_{i=1}^n P(x_i|x^{i-1},y^{i}). This is similar to the
chain rule for conventional conditioning P(x^n|y^n) = \prod_{i=1}^n P(x_i|x^{i-1},y^{n}) except one conditions on "past" and "present" symbols y^{i} rather than all symbols y^{n}. To include "past" symbols only, one can introduce a delay by prepending a constant symbol: :P(x^n||(0,y^{n-1})) \triangleq \prod_{i=1}^n P(x_i|x^{i-1},y^{i-1}). It is common to abuse notation by writing P(x^n||y^{n-1}) for this expression, although formally all strings should have the same number of symbols. One may also condition on multiple strings: P(x^n||y^n,z^n) \triangleq \prod_{i=1}^n P(x_i|x^{i-1},y^{i},z^{i}).
Causally conditioned entropy The
causally conditioned entropy is defined as: :H(X^n || Y^n)=\mathbf E\left[ -\log {P(X^n||Y^n)} \right]=\sum_{i=1}^n H(X_{i}|X^{i-1},Y^{i}) Similarly, one may causally condition on multiple strings and write H(X^n || Y^n,Z^n)=\mathbf E\left[ -\log {P(X^n||Y^n,Z^n)} \right]. ==Properties==