Content deleted Content added
→Random search: add note about continuous and grid search |
Maxeto0910 (talk | contribs) per WP:HOWTOSD Tags: Mobile edit Mobile web edit Advanced mobile edit |
||
(24 intermediate revisions by 12 users not shown) | |||
Line 1:
{{Short description|
for a machine learning algorithm}}
In [[machine learning]], '''hyperparameter optimization'''<ref>Matthias Feurer and Frank Hutter. [https://link.springer.com/content/pdf/10.1007%2F978-3-030-05318-5_1.pdf Hyperparameter optimization]. In: ''AutoML: Methods, Systems, Challenges'', pages 3–38.</ref> or tuning is the problem of choosing a set of optimal [[Hyperparameter (machine learning)|hyperparameters]] for a learning algorithm. A hyperparameter is a [[parameter]] whose value is used to control the learning process, which must be configured before the process starts.<ref>{{cite journal |last1=Yang|first1=Li|title=On hyperparameter optimization of machine learning algorithms: Theory and practice|journal=Neurocomputing|year=2020|volume=415|pages=295–316|doi=10.1016/j.neucom.2020.07.061|arxiv=2007.15745 }}</ref><ref>{{cite arXiv |vauthors=Franceschi L, Donini M, Perrone V, Klein A, Archambeau C, Seeger M, Pontil M, Frasconi P |title=Hyperparameter Optimization in Machine Learning |year=2024 |class=stat.ML |eprint=2410.22854 }}</ref>
Hyperparameter optimization
== Approaches ==
Line 9 ⟶ 10:
=== Grid search ===
The traditional
or evaluation on a hold-out validation set.<ref>{{cite journal
| vauthors = Chicco D
Line 106 ⟶ 107:
| bibcode = 2012arXiv1208.3719T
| arxiv = 1208.3719
}}</ref><ref name="krnc">{{Citation
|last=Kernc
|title=SAMBO: Sequential And Model-Based Optimization: Efficient global optimization in Python
|date=2024
|url=https://zenodo.org/records/14461363
|access-date=2025-01-30
|doi=10.5281/zenodo.14461363
}}</ref> to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.
Line 111 ⟶ 119:
For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using [[gradient descent]]. The first usage of these techniques was focused on neural networks.<ref>{{cite book |last1=Larsen|first1=Jan|last2= Hansen |first2=Lars Kai|last3=Svarer|first3=Claus|last4=Ohlsson|first4=M|title=Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop |chapter=Design and regularization of neural networks: The optimal use of a validation set |date=1996|pages=62–71|doi=10.1109/NNSP.1996.548336|isbn=0-7803-3550-3|citeseerx=10.1.1.415.3266|s2cid=238874|chapter-url=http://orbit.dtu.dk/files/4545571/Svarer.pdf}}</ref> Since then, these methods have been extended to other models such as [[support vector machine]]s<ref>{{cite journal |author1=Olivier Chapelle |author2=Vladimir Vapnik |author3=Olivier Bousquet |author4=Sayan Mukherjee |title=Choosing multiple parameters for support vector machines |journal=Machine Learning |year=2002 |volume=46 |pages=131–159 |url=http://www.chapelle.cc/olivier/pub/mlj02.pdf | doi = 10.1023/a:1012450327387 |doi-access=free }}</ref> or logistic regression.<ref>{{cite journal |author1 =Chuong B|author2= Chuan-Sheng Foo|author3=Andrew Y Ng|journal = Advances in Neural Information Processing Systems |volume=20|title = Efficient multiple hyperparameter learning for log-linear models|year =2008|url=http://papers.nips.cc/paper/3286-efficient-multiple-hyperparameter-learning-for-log-linear-models.pdf}}</ref>
A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using [[automatic differentiation]].<ref>{{cite journal|last1=Domke|first1=Justin|title=Generic Methods for Optimization-Based Modeling|journal=Aistats|date=2012|volume=22|url=http://www.jmlr.org/proceedings/papers/v22/domke12/domke12.pdf|access-date=2017-12-09|archive-date=2014-01-24|archive-url=https://web.archive.org/web/20140124182520/http://jmlr.org/proceedings/papers/v22/domke12/domke12.pdf|url-status=dead}}</ref><ref name=abs1502.03492>{{cite arXiv |last1=Maclaurin|first1=Dougal|last2=Duvenaud|first2=David|last3=Adams|first3=Ryan P.|eprint=1502.03492|title=Gradient-based Hyperparameter Optimization through Reversible Learning|class=stat.ML|date=2015}}</ref><ref>{{cite journal |last1=Franceschi |first1=Luca |last2=Donini |first2=Michele |last3=Frasconi |first3=Paolo |last4=Pontil |first4=Massimiliano |title=Forward and Reverse Gradient-Based Hyperparameter Optimization |journal=Proceedings of the 34th International Conference on Machine Learning |date=2017 |arxiv=1703.01785 |bibcode=2017arXiv170301785F |url=http://proceedings.mlr.press/v70/franceschi17a/franceschi17a-supp.pdf}}</ref><ref>
In a different approach,<ref>
Apart from hypernetwork approaches, gradient-based methods can be used to optimize discrete hyperparameters also by adopting a continuous relaxation of the parameters.<ref>
=== Evolutionary optimization ===
Line 123 ⟶ 131:
# Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+)
# Evaluate the
# Rank the hyperparameter tuples by their relative fitness
# Replace the worst-performing hyperparameter tuples with new
# Repeat steps 2-4 until satisfactory algorithm performance is reached or
Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms,<ref name="bergstra11" /> [[automated machine learning]], typical neural network <ref name="kousiouris1">{{cite journal |vauthors=Kousiouris G, Cuccinotta T, Varvarigou T | year = 2011 | title= The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks | url= https://www.sciencedirect.com/science/article/abs/pii/S0164121211000951 | journal = Journal of Systems and Software | volume = 84 | issue = 8 | pages = 1270–1291| doi = 10.1016/j.jss.2011.04.013 | hdl = 11382/361472 | hdl-access = free }}</ref> and [[Deep learning#Deep neural networks|deep neural network]] architecture search,<ref name="miikkulainen1">{{cite arXiv | vauthors = Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B | year = 2017 | title = Evolving Deep Neural Networks |eprint=1703.00548| class = cs.NE }}</ref><ref name="jaderberg1">{{cite arXiv | vauthors = Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K | year = 2017 | title = Population Based Training of Neural Networks |eprint=1711.09846| class = cs.LG }}</ref> as well as training of the weights in deep neural networks.<ref name="such1">{{cite arXiv | vauthors = Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J | year = 2017 | title = Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning |eprint=1712.06567| class = cs.NE }}</ref>
Line 137 ⟶ 145:
=== Early stopping-based ===
[[File:Successive-halving-for-eight-arbitrary-hyperparameter-configurations.png|thumb|Successive halving for eight arbitrary hyperparameter configurations. The approach starts with eight models with different configurations and consecutively applies successive halving until only one model remains.]]
A class of early stopping-based hyperparameter optimization algorithms is purpose built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly.<ref name="irace">{{cite journal |last1=López-Ibáñez |first1=Manuel |last2=Dubois-Lacoste |first2=Jérémie |last3=Pérez Cáceres |first3=Leslie |last4=Stützle |first4=Thomas |last5=Birattari |first5=Mauro |date=2016 |title=The irace package: Iterated Racing for Automatic Algorithm Configuration |journal=Operations Research Perspective |volume=3 |issue=3 |pages=43–58 |doi=10.1016/j.orp.2016.09.002|doi-access=free |hdl=10419/178265 |hdl-access=free }}</ref><ref name="race">{{cite journal |last1=Birattari |first1=Mauro |last2=Stützle |first2=Thomas |last3=Paquete |first3=Luis |last4=Varrentrapp |first4=Klaus |date=2002 |title=A Racing Algorithm for Configuring Metaheuristics |journal=Gecco 2002 |pages=11–18}}</ref>
Another early stopping hyperparameter optimization algorithm is successive halving (SHA),<ref>{{cite arXiv|last1=Jamieson|first1=Kevin|last2=Talwalkar|first2=Ameet|date=2015-02-27|title=Non-stochastic Best Arm Identification and Hyperparameter Optimization|eprint=1502.07943|class=cs.LG}}</ref> which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA)<ref>{{cite arXiv|last1=Li|first1=Liam|last2=Jamieson|first2=Kevin|last3=Rostamizadeh|first3=Afshin|last4=Gonina|first4=Ekaterina|last5=Hardt|first5=Moritz|last6=Recht|first6=Benjamin|last7=Talwalkar|first7=Ameet|date=2020-03-16|title=A System for Massively Parallel Hyperparameter Tuning|class=cs.LG|eprint=1810.05934v5}}</ref> further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate and prune low-performing models. Hyperband<ref>{{cite journal|last1=Li|first1=Lisha|last2=Jamieson|first2=Kevin|last3=DeSalvo|first3=Giulia|last4=Rostamizadeh|first4=Afshin|last5=Talwalkar|first5=Ameet|date=2020-03-16|title=Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization|journal=Journal of Machine Learning Research|volume=18|pages=1–52|arxiv=1603.06560}}</ref> is a higher level early stopping-based algorithm that invokes SHA or ASHA multiple times with varying levels of pruning aggressiveness, in order to be more widely applicable and with fewer required inputs.
Line 154 ⟶ 162:
* [[Self-tuning]]
* [[XGBoost]]
* [[Optuna]]
== References ==
|