Bayesian optimization: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered title. Add: authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Dominic3203 | Linked from User:LinguisticMystic/cs/outline | #UCB_webform_linked 165/2277
Line 28:
==Strategy==
[[File:GpParBayesAnimationSmall.gif|thumb|440x330px|Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom.<ref>{{Citation|last=Wilson|first=Samuel|title=ParBayesianOptimization R package|date=2019-11-22|url=https://github.com/AnotherSamWilson/ParBayesianOptimization|access-date=2019-12-12}}</ref>]]
Bayesian optimization is typically used on problems of the form <math display="inline">\max_{x \in AX } f(x)</math>, wherewith <math display="inline">AX</math> isbeing athe set of points,all possible parameters <math display="inline">x</math>, whichtypically rely uponwith less (than or equal to) than 20 [[dimension]]s for optimal usage (<math display="inline">X \rightarrow \mathbb{R}^d, \mid d \le 20</math>), and whose membership can easily be evaluated. Bayesian optimization is particularly advantageous for problems where <math display="inline">f(x)</math> is difficult to evaluate due to its computational cost. The objective function, <math display="inline">f</math>, is continuous and takes the form of some unknown structure, referred to as a "black box". Upon its evaluation, only <math display="inline">f(x)</math> is observed and its [[derivative]]s are not evaluated.<ref name=":0">{{cite arXiv|last=Frazier|first=Peter I.|date=2018-07-08|title=A Tutorial on Bayesian Optimization|class=stat.ML|eprint=1807.02811}}</ref>
 
Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a [[Prior distribution|prior]] over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the [[posterior distribution]] over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.