BrownBoost uses a non-convex potential
loss function, thus it does not fit into the
AdaBoost framework. The non-convex optimization provides a method to avoid
overfitting noisy data sets. However, in contrast to boosting algorithms that analytically minimize a convex loss function (e.g.
AdaBoost and
LogitBoost), BrownBoost solves a system of two equations and two unknowns using standard numerical methods. The only parameter of BrownBoost (c in the algorithm) is the "time" the algorithm runs. The theory of BrownBoost states that each hypothesis takes a variable amount of time (t in the algorithm) which is directly related to the weight given to the hypothesis \alpha. The time parameter in BrownBoost is analogous to the number of iterations T in AdaBoost. A larger value of c means that BrownBoost will treat the data as if it were less noisy and therefore will give up on fewer examples. Conversely, a smaller value of c means that BrownBoost will treat the data as more noisy and give up on more examples. During each iteration of the algorithm, a hypothesis is selected with some advantage over random guessing. The weight of this hypothesis \alpha and the "amount of time passed" t during the iteration are simultaneously solved in a system of two non-linear equations ( 1. uncorrelated hypothesis w.r.t example weights and 2. hold the potential constant) with two unknowns (weight of hypothesis \alpha and time passed t). This can be solved by bisection (as implemented in the
JBoost software package) or
Newton's method (as described in the original paper by Freund). Once these equations are solved, the margins of each example (r_i(x_j) in the algorithm) and the amount of time remaining s are updated appropriately. This process is repeated until there is no time remaining. The initial potential is defined to be \frac{1}{m}\sum_{j=1}^m 1-\mbox{erf}(\sqrt{c}) = 1-\mbox{erf}(\sqrt{c}). Since a constraint of each iteration is that the potential be held constant, the final potential is \frac{1}{m}\sum_{j=1}^m 1-\mbox{erf}(r_i(x_j)/\sqrt{c}) = 1-\mbox{erf}(\sqrt{c}). Thus the final error is
likely to be near 1-\mbox{erf}(\sqrt{c}). However, the final potential function is not the 0–1 loss
error function. For the final error to be exactly 1-\mbox{erf}(\sqrt{c}), the variance of the loss function must decrease linearly w.r.t. time to form the 0–1 loss function at the end of boosting iterations. This is not yet discussed in the literature and is not in the definition of the algorithm below. The final classifier is a linear combination of weak hypotheses and is evaluated in the same manner as most other boosting algorithms. ==BrownBoost learning algorithm definition==