. Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source. •
Data compression (source coding): There are two formulations for the compression problem: •
Lossless data compression: the data must be reconstructed exactly; •
Lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called
rate–distortion theory. •
Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error-correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel. This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the
broadcast channel) or intermediary "helpers" (the
relay channel), or more general
networks, compression followed by transmission may no longer be optimal. For general sources and channels that are not necessarily stationary or ergodic, information-spectrum methods characterize coding limits using asymptotic distributions of information density rather than only single-letter entropies or mutual information. A related problem,
channel resolvability, asks what rate is required for channel inputs to approximate a target output distribution; Han and Sergio Verdú connected this approximation problem to coding theorems for general channels. Hayashi later derived general nonasymptotic and asymptotic formulas connecting channel resolvability and identification capacity, and applied these formulas to secrecy analysis for the wiretap channel.
Source theory Any process that generates successive messages can be considered a of information. A memoryless source is one in which each message is an
independent identically distributed random variable, whereas the properties of
ergodicity and
stationarity impose less restrictive constraints. All such sources are
stochastic. These terms are well studied in their own right outside information theory.
Rate Information
rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is: :r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots); that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the
average rate is: :r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n); that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result. The
information rate is defined as: :r = \lim_{n \to \infty} \frac{1}{n} I(X_1, X_2, \dots X_n;Y_1,Y_2, \dots Y_n); It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .
Channel capacity Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. Consider the communications process over a discrete channel. A simple model of the process is shown below: : \xrightarrow[\text{Message}]{W} \begin{array}{ |c| }\hline \text{Encoder} \\ f_n \\ \hline\end{array} \xrightarrow[\mathrm{Encoded \atop sequence}]{X^n} \begin{array}{ |c| }\hline \text{Channel} \\ p(y|x) \\ \hline\end{array} \xrightarrow[\mathrm{Received \atop sequence}]{Y^n} \begin{array}{ |c| }\hline \text{Decoder} \\ g_n \\ \hline\end{array} \xrightarrow[\mathrm{Estimated \atop message}]{\hat W} Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let be the
conditional probability distribution function of
Y given X. We will consider to be an inherent fixed property of our communications channel (representing the nature of the
noise of our channel). Then the joint distribution of
X and
Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the
signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by: : C = \max_{f} I(X;Y).\! This capacity has the following property related to communicating at information rate
R (where
R is usually bits per symbol). For any information rate
R 0, for large enough
N, there exists a code of length
N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤
ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate
R >
C, it is impossible to transmit with arbitrarily small block error.
Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
Capacity of particular channel models • A continuous-time analog communications channel subject to
Gaussian noise—see
Shannon–Hartley theorem. • A
binary symmetric channel (BSC) with crossover probability
p is a binary input, binary output channel that flips the input bit with probability
p. The BSC has a capacity of bits per channel use, where is the binary entropy function to the base-2 logarithm: :: • A
binary erasure channel (BEC) with erasure probability
p is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is bits per channel use. ::
Channels with memory and directed information In practice many channels have memory. Namely, at time i the channel is given by the conditional probability P(y_i|x_i,x_{i-1},x_{i-2},...,x_1,y_{i-1},y_{i-2},...,y_1) . It is often more comfortable to use the notation x^i=(x_i,x_{i-1},x_{i-2},...,x_1) and the channel become P(y_i|x^i,y^{i-1}) . In such a case the capacity is given by the
mutual information rate when there is no feedback available and the
Directed information rate in the case that either there is feedback or not (if there is no feedback the directed information equals the mutual information).
Fungible information Fungible information is the
information for which the means of
encoding is not important. Classical information theorists and computer scientists are mainly concerned with information of this sort. It is sometimes referred as speakable information. ==Applications to other fields==