Content deleted Content added
bayesian optimization from theory to practice Tags: Reverted Visual edit |
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
||
(15 intermediate revisions by 12 users not shown) | |||
Line 14:
The term is generally attributed to {{ill|Jonas Mockus|lt}} and is coined in his work from a series of publications on global optimization in the 1970s and 1980s.<ref>{{cite book |first=Jonas |last=Močkus |title=Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974 |chapter=On bayesian methods for seeking the extremum |doi=10.1007/3-540-07165-2_55 |series=Lecture Notes in Computer Science |date=1975 |volume=27 |pages=400–404 |isbn=978-3-540-07165-5 |doi-access=free }}</ref><ref>{{cite journal |first=Jonas |last=Močkus |title=On Bayesian Methods for Seeking the Extremum and their Application |journal=IFIP Congress |year=1977 |pages=195–200 }}</ref><ref name="Mockus1989">{{cite book |first=J. |last=Močkus |title=Bayesian Approach to Global Optimization |publisher=Kluwer Academic |___location=Dordrecht |year=1989 |isbn=0-7923-0115-3 }}</ref>
=== Early
==== From 1960s to 1980s ====
The earliest idea of Bayesian optimization<ref>{{Cite book |last=GARNETT |first=ROMAN |title=BAYESIAN OPTIMIZATION |date=2023 |publisher=Cambridge University Press |isbn=978-1-108-42578-0 |edition=First published 2023}}</ref> sprang in 1964, from a paper by American applied mathematician Harold J. Kushner,<ref>{{
By the 1980s, the framework we now use for Bayesian optimization was explicitly established. In 1978, the
==== From
In the 1990s, Bayesian optimization began to gradually transition from pure theory to real-world applications. In 1998, Donald R. Jones<ref>{{Cite web |title=Donald R. Jones |url=https://scholar.google.com/citations?user=CZhZ4MYAAAAJ&hl=en |access-date=2025-02-25 |website=scholar.google.com}}</ref> and his coworkers published a paper titled
In the 21st century, with the gradual rise of artificial intelligence and bionic robots, Bayesian optimization has been widely used in machine learning and deep learning, and has become an important tool for [[Hyperparameter optimization|Hyperparameter Tuning]].<ref>T. T. Joy, S. Rana, S. Gupta and S. Venkatesh, "Hyperparameter tuning for big data using Bayesian optimisation," 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016, pp. 2574-2579, doi: 10.1109/ICPR.2016.7900023. keywords: {Big Data;Bayes methods;Optimization;Tuning;Data models;Gaussian processes;Noise measurement},</ref> Companies such as Google, Facebook and OpenAI have added Bayesian optimization to their deep learning frameworks to improve search efficiency. However, Bayesian optimization still faces many challenges, for example, because of the use of Gaussian Process<ref>{{Cite book|title=Neural Networks and Machine Learning|contribution=Introduction to Gaussian processes|first=D. J. C.|last=Mackay|editor-first=C. M.|editor-last=Bishop|contribution-url=https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e045b76dc5daf9f4656ac10b456c5d1d9de5bc84|archive-url=https://web.archive.org/web/20240423144014/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e045b76dc5daf9f4656ac10b456c5d1d9de5bc84|archive-date=2024-04-23|access-date=2025-03-06|series=NATO ASI Series|volume=168|pages=133–165|year=1998|url-status=live}}</ref> as a proxy model for optimization, when there is a lot of data, the training of Gaussian Process will be very slow and the computational cost is very high. This makes it difficult for this optimization method to work well in more complex drug development and medical experiments.
==Strategy==
[[File:GpParBayesAnimationSmall.gif|thumb|440x330px|Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom.<ref>{{Citation|last=Wilson|first=Samuel|title=ParBayesianOptimization R package|date=2019-11-22|url=https://github.com/AnotherSamWilson/ParBayesianOptimization|access-date=2019-12-12}}</ref>]]
Bayesian optimization is
Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a [[Prior distribution|prior]] over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the [[posterior distribution]] over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.
Line 32 ⟶ 34:
There are several methods used to define the prior/posterior distribution over the objective function. The most common two methods use [[Gaussian process]]es in a method called [[kriging]]. Another less expensive method uses the [[Parzen-Tree Estimator]] to construct two distributions for 'high' and 'low' points, and then finds the ___location that maximizes the expected improvement.<ref>J. S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl: [http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf Algorithms for Hyper-Parameter Optimization]. Advances in Neural Information Processing Systems: 2546–2554 (2011)</ref>
Standard Bayesian optimization relies upon each <math>x \in
==Acquisition functions==
|