Bayesian optimization is used on problems of the form \max_{x \in X } f(x), with X being the set of all possible parameters x, typically with less than or equal to 20
dimensions for optimal usage (X \rightarrow \mathbb{R}^d \mid d \le 20), and whose membership can easily be evaluated. Bayesian optimization is particularly advantageous for problems where f(x) is difficult to evaluate due to its computational cost. The objective function, f, is continuous and takes the form of some unknown structure, referred to as a "black box". Upon its evaluation, only f(x) is observed and its
derivatives are not evaluated. Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a
prior over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the
posterior distribution over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point. There are several methods used to define the prior/posterior distribution over the objective function. The most common two methods use
Gaussian processes in a method called
kriging. Another less expensive method uses the
Parzen-Tree Estimator to construct two distributions for 'high' and 'low' points, and then finds the location that maximizes the expected improvement. Standard Bayesian optimization relies upon each x \in X being easy to evaluate, and problems that deviate from this assumption are known as
exotic Bayesian optimization problems. Optimization problems can become exotic if it is known that there is noise, the evaluations are being done in parallel, the quality of evaluations relies upon a tradeoff between difficulty and accuracy, the presence of random environmental conditions, or if the evaluation involves derivatives. ==Acquisition functions==