Bayesian optimization: Difference between revisions

Content deleted Content added
From Theory to Practice: fix bare-url ref
History: capitalization, format
Line 14:
The term is generally attributed to {{ill|Jonas Mockus|lt}} and is coined in his work from a series of publications on global optimization in the 1970s and 1980s.<ref>{{cite book |first=Jonas |last=Močkus |title=Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974 |chapter=On bayesian methods for seeking the extremum |doi=10.1007/3-540-07165-2_55 |series=Lecture Notes in Computer Science |date=1975 |volume=27 |pages=400–404 |isbn=978-3-540-07165-5 |doi-access=free }}</ref><ref>{{cite journal |first=Jonas |last=Močkus |title=On Bayesian Methods for Seeking the Extremum and their Application |journal=IFIP Congress |year=1977 |pages=195–200 }}</ref><ref name="Mockus1989">{{cite book |first=J. |last=Močkus |title=Bayesian Approach to Global Optimization |publisher=Kluwer Academic |___location=Dordrecht |year=1989 |isbn=0-7923-0115-3 }}</ref>
 
=== Early Mathematicsmathematics Foundationsfoundations ===
 
==== From 1960s to 1980s ====
The earliest idea of Bayesian optimization <ref>{{Cite book |last=GARNETT |first=ROMAN |title=BAYESIAN OPTIMIZATION |date=2023 |publisher=Cambridge University Press |isbn=978-1-108-42578-0 |edition=First published 2023}}</ref> sprang in 1964, from a paper by American applied mathematician Harold J. Kushner,<ref>{{Cite web|url=https://vivo.brown.edu/display/hkushner|title=Kushner, Harold|website=vivo.brown.edu}}</ref> [https://asmedigitalcollection.asme.org/fluidsengineering/article/86/1/97/392213/A-New-Method-of-Locating-the-Maximum-Point-of-an “A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise”]. Although not directly proposing Bayesian optimization, in this paper, he first proposed a new method of locating the maximum point of an arbitrary multipeak curve in a noisy environment. This method provided an important theoretical foundation for subsequent Bayesian optimization.
 
By the 1980s, the framework we now use for Bayesian optimization was explicitly established. In 1978, the Lithuanian scientist Jonas Mockus,<ref>{{Cite web |title=Jonas Mockus |url=https://en.ktu.edu/people/jonas-mockus/ |access-date=2025-03-06 |website=Kaunas University of Technology |language=en}}</ref> in his paper “The Application of Bayesian Methods for Seeking the Extremum”, discussed how to use Bayesian methods to find the extreme value of a function under various uncertain conditions. In his paper, Mockus first proposed the [https://schneppat.com/expected-improvement_ei.html Expected Improvement principle (EI)], which is one of the core sampling strategies of Bayesian optimization. This criterion balances exploration while optimizing the function efficiently by maximizing the expected improvement. Because of the usefulness and profound impact of this principle, Jonas Mockus is widely regarded as the founder of Bayesian optimization. Although Expected Improvement principle (IE) is one of the earliest proposed core sampling strategies for Bayesian optimization, it is not the only one, with the development of modern society, we also have Probability of Improvement (PI), or Upper Confidence Bound (UCB)<ref>{{Cite journal |last1=Kaufmann |first1=Emilie |last2=Cappe |first2=Olivier |last3=Garivier |first3=Aurelien |date=2012-03-21 |title=On Bayesian Upper Confidence Bounds for Bandit Problems |url=https://proceedings.mlr.press/v22/kaufmann12.html |journal=Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics |language=en |publisher=PMLR |pages=592–600}}</ref> and so on.
 
==== From Theorytheory to Practicepractice ====
In the 1990s, Bayesian optimization began to gradually transition from pure theory to real-world applications. In 1998, Donald R. Jones<ref>{{Cite web |title=Donald R. Jones |url=https://scholar.google.com/citations?user=CZhZ4MYAAAAJ&hl=en |access-date=2025-02-25 |website=scholar.google.com}}</ref> and his coworkers published a paper titled “Gaussian Optimization<ref>{{Cite book |last=Grcar |first=Joseph F. |title=Mathematicians of Gaussian Elimination}}</ref>”. In this paper, they proposed the Gaussian Process (GP) and elaborated on the Expected Improvement principle (EI) proposed by Jonas Mockus in 1978. Through the efforts of Donald R. Jones and his colleagues, Bayesian Optimization began to shine in the fields like computers science and engineering. However, the computational complexity of Bayesian optimization for the computing power at that time still affected its development to a large extent.
 
In the 21st century, with the gradual rise of artificial intelligence and bionic robots, Bayesian optimization has been widely used in machine learning and deep learning, and has become an important tool for [[Hyperparameter optimization|Hyperparameter Tuning]].<ref>T. T. Joy, S. Rana, S. Gupta and S. Venkatesh, "Hyperparameter tuning for big data using Bayesian optimisation," 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016, pp. 2574-2579, doi: 10.1109/ICPR.2016.7900023. keywords: {Big Data;Bayes methods;Optimization;Tuning;Data models;Gaussian processes;Noise measurement},</ref> Companies such as Google, Facebook and OpenAI have added Bayesian optimization to their deep learning frameworks to improve search efficiency. However, Bayesian optimization still faces many challenges, for example, because of the use of Gaussian Process<ref>{{Cite book |title=Neural Networks and Machine Learning|contribution=Introduction to Gaussian processes|first=D. J. C.|last=Mackay|editor-first=C. M.|editor-last=Bishop |contribution-url=https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e045b76dc5daf9f4656ac10b456c5d1d9de5bc84 |archive-url=http://web.archive.org/web/20240423144014/https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e045b76dc5daf9f4656ac10b456c5d1d9de5bc84 |archive-date=2024-04-23 |access-date=2025-03-06 |series=NATO ASI Series|volume=168|pages=133–165|year=1998}}</ref> as a proxy model for optimization, when there is a lot of data, the training of Gaussian Process will be very slow and the computational cost is very high. This makes it difficult for this optimization method to work well in more complex drug development and medical experiments.