Binary entropy function

In information theory, the binary entropy function, denoted or , is defined as the entropy of a Bernoulli process with probability of one of two values, and is given by the formula:

Notation

Binary entropy \operatorname H_\mathrm{b}(p) is a special case of \Eta(X), the entropy function. \operatorname H_\mathrm{b}(p) is distinguished from the general entropy function \Eta(X) in that the former takes a single real number as a parameter whereas the latter takes a distribution or random variable as a parameter. Thus the binary entropy (of ) is the entropy of the particular distribution X \sim \operatorname{Ber}(p), so \operatorname H_\mathrm{b}(p) = \Eta\bigl(\operatorname{Ber}(p)\bigr). Writing the probability of each of the two values being and , so p + q = 1 and q = 1 - p, this corresponds to :\operatorname H(X) = -p \log p - (1 - p) \log (1 - p) = -p \log p - q \log q = - \sum_{x \in X} \operatorname{Pr}(X=x) \cdot \log \operatorname{Pr}(X=x) = \Eta\bigl(\operatorname{Ber}(p)\bigr). Sometimes the binary entropy function is also written as \operatorname H_2(p). However, it is different from and should not be confused with the Rényi entropy, which is also denoted as \Eta_2(X). ==Explanation==

Explanation

In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose p=0. At this probability, the event is certain never to occur, and so there is no uncertainty at all, leading to an entropy of 0. If p=1, the result is again certain, so the entropy is 0 here as well. When p=1/2, the uncertainty is at a maximum; if one were to place a fair bet on the outcome in this case, there is no advantage to be gained with prior knowledge of the probabilities. In this case, the entropy is maximum at a value of 1 bit. Intermediate values fall between these cases; for instance, if p=1/4, there is still a measure of uncertainty on the outcome, but one can still predict the outcome correctly more often than not, so the uncertainty measure, or entropy, is less than 1 full bit. ==Properties==

Properties

Derivative The derivative of the binary entropy function may be expressed as the negative of the logit function: : {d \over dp} \operatorname H_\text{b}(p) = - \operatorname{logit}_a(p) = -\log_a\left( \frac{p}{1-p} \right). : {d^2 \over dp^2} \operatorname H_\text{b}(p) = - \frac{1}{p(1-p) \ln a}\, , where denotes the given base of the logarithm. Convex conjugate The convex conjugate (specifically, the Legendre transform) of the binary entropy (with base ) is the negative softplus function. This is because (following the definition of the Legendre transform: the derivatives are inverse functions) the derivative of negative binary entropy is the logit, whose inverse function is the logistic function, which is the derivative of softplus. Softplus can be interpreted as logistic loss, so by duality, minimizing logistic loss corresponds to maximizing entropy. This justifies the principle of maximum entropy as loss minimization. Taylor series The Taylor series of the binary entropy function at 1/2 is :\operatorname H_\text{b}(p) = 1 - \frac{1}{2\ln 2} \sum^{\infin}_{n=1} \frac{(1-2p)^{2n}}{n(2n-1)} which converges to the binary entropy function for all values 0\le p\le 1. Bounds The following bounds hold for 0 : :\ln(2) \cdot \log_2(p) \cdot \log_2(1-p) \leq H_\text{b}(p) \leq \log_2(p) \cdot \log_2(1-p) and :4p(1-p) \leq H_\text{b}(p) \leq (4p(1-p))^{(1/\ln 4)} where \ln denotes natural logarithm. ==See also==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com