Content deleted Content added
ce |
Citation bot (talk | contribs) m Alter: journal. Add: isbn, series, arxiv, bibcode, pages, volume, year, class, title, author pars. 1-4. Removed parameters. You can use this bot yourself. Report bugs here. |
||
Line 1:
In [[machine learning]], '''hyperparameter optimization''' or tuning is the problem of choosing a set of optimal [[Hyperparameter (machine learning)|hyperparameters]] for a learning algorithm.
The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined [[loss function]] on given independent data.<ref name=abs1502.02127>{{cite arxiv |eprint=1502.02127|last1=Claesen|first1=Marc|title=Hyperparameter Search in Machine Learning|author2=Bart De Moor|class=cs.LG|year=2015}}</ref> The objective function takes a tuple of hyperparameters and returns the associated loss.<ref name=abs1502.02127/> [[Cross-validation (statistics)|Cross-validation]] is often used to estimate this generalization performance.<ref name="bergstra" />
== Approaches ==
Line 13:
| volume = 10
| issue = 35
| pages =
| date = December 2017
| pmid = 29234465
Line 80:
| title = Practical Bayesian Optimization of Machine Learning Algorithms
| journal = Advances in Neural Information Processing Systems
| volume = 1206
| pages = arXiv:1206.2944
| year = 2012
| url = http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf
| arxiv = 1206.2944
| class = stat.ML
}}</ref><ref name="thornton">{{Citation
| last = Thornton
| first = Chris
Line 92 ⟶ 97:
| title = Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms
| journal = Knowledge discovery and data mining
| volume = 1208
| pages = arXiv:1208.3719
| year = 2013
| url = http://www.cs.ubc.ca/labs/beta/Projects/autoweka/papers/autoweka.pdf| bibcode = 2012arXiv1208.3719T
| arxiv = 1208.3719 | class = cs.LG }}</ref> to obtain better results in fewer experiments than grid search and random search, due to the ability to reason about the quality of experiments before they are run. === Gradient-based optimization ===
For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks.<ref>{{cite journal |last1=Larsen|first1=Jan|last2= Hansen |first2=Lars Kai|last3=Svarer|first3=Claus|last4=Ohlsson|first4=M|title=Design and regularization of neural networks: the optimal use of a validation set|journal=Proceedings of the 1996 IEEE Signal Processing Society Workshop|date=1996}}</ref> Since then, these methods have been extended to other models such as [[support vector machine]]s<ref>{{cite journal |author1=Olivier Chapelle |author2=Vladimir Vapnik |author3=Olivier Bousquet |author4=Sayan Mukherjee |title=Choosing multiple parameters for support vector machines |journal=Machine Learning |year=2002 |volume=46 |pages=131–159 |url=http://www.chapelle.cc/olivier/pub/mlj02.pdf | doi = 10.1023/a:1012450327387 }}</ref> or logistic regression.<ref>{{cite journal |author1 =Chuong B|author2= Chuan-Sheng Foo|author3=Andrew Y Ng|journal = Advances in Neural Information Processing Systems 20|title = Efficient multiple hyperparameter learning for log-linear models|year =2008}}</ref>
A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using [[automatic differentiation]].<ref>{{cite journal|last1=Domke|first1=Justin|title=Generic Methods for Optimization-Based Modeling|journal=
=== Evolutionary optimization ===
Line 111 ⟶ 121:
# Repeat steps 2-4 until satisfactory algorithm performance is reached or algorithm performance is no longer improving
Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms<ref name="bergstra11" />, [[automated machine learning]]<ref name="tpot1" /><ref name="tpot2" />, [[Deep_learning#Deep_neural_networks|deep neural network]] architecture search<ref name="miikkulainen1">{{cite arxiv | vauthors = Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B | year = 2017 | title = Evolving Deep Neural Networks
=== Others ===
[[Radial basis function|RBF]]<ref name=abs1705.08520>{{cite arxiv |eprint=1705.08520
networks|last2=Fokoue|first2=Achille|last3=Nannicini|first3=Giacomo|last4=Samulowitz|first4=Horst|class=cs.AI|year=2017}}</ref> and [[spectral method|spectral]]<ref name=abs1706.00764>{{cite arxiv |eprint=1706.00764|last1=Hazan|first1=Elad|title=Hyperparameter Optimization: A Spectral Approach|last2=Klivans|first2=Adam|last3=Yuan|first3=Yang|class=cs.LG|year=2017}}</ref> approaches have also been developed.
== Software ==
Line 138 ⟶ 149:
| pages = 3915−3919
| url = http://jmlr.org/papers/volume15/martinezcantin14a/martinezcantin14a.pdf
| bibcode = 2014arXiv1405.7430M
| arxiv = 1405.7430
| class = cs.LG
}}</ref> an efficient implementation of Bayesian optimization in C/C++ with support for Python, Matlab and Octave.
* [https://github.com/yelp/MOE MOE] MOE is a Python/C++/CUDA library implementing Bayesian Global Optimization using Gaussian Processes.
* [http://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-WEKA]<ref name="autoweka">{{cite journal | vauthors = Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K | year = 2017 | title = Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA | url = http://jmlr.org/papers/v18/16-261.html | journal = Journal of Machine Learning Research | pages =
* [https://github.com/automl/auto-sklearn Auto-sklearn]<ref name="autosklearn">{{cite journal | vauthors = Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F | year = 2015 | title = Efficient and Robust Automated Machine Learning | url = https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning | journal = Advances in Neural Information Processing Systems 28 (NIPS 2015) | pages =
===Gradient based===
Line 147 ⟶ 161:
===Evolutionary===
* [https://github.com/rhiever/tpot TPOT]<ref name="tpot1">{{cite journal | vauthors = Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd L, Moore JH | year = 2016 | title = Automating biomedical data science through tree-based pipeline optimization | url = https://link.springer.com/chapter/10.1007/978-3-319-31204-0_9 | journal = Proceedings of EvoStar 2016 | volume = 9597 | pages =
* [https://github.com/joeddav/devol devol] is a Python package that performs Deep Neural Network architecture search using [[genetic programming]].
|