Hyperparameter optimization: Difference between revisions

Content deleted Content added
ce
Citation bot (talk | contribs)
m Alter: journal. Add: isbn, series, arxiv, bibcode, pages, volume, year, class, title, author pars. 1-4. Removed parameters. You can use this bot yourself. Report bugs here.
Line 1:
In [[machine learning]], '''hyperparameter optimization''' or tuning is the problem of choosing a set of optimal [[Hyperparameter (machine learning)|hyperparameters]] for a learning algorithm.
 
The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined [[loss function]] on given independent data.<ref name=abs1502.02127>{{cite arxiv |eprint=1502.02127|last1=Claesen|first1=Marc|title=Hyperparameter Search in Machine Learning|author2=Bart De Moor|class=cs.LG|year=2015}}</ref> The objective function takes a tuple of hyperparameters and returns the associated loss.<ref name=abs1502.02127/> [[Cross-validation (statistics)|Cross-validation]] is often used to estimate this generalization performance.<ref name="bergstra" />
 
== Approaches ==
Line 13:
| volume = 10
| issue = 35
| pages = 1-171–17
| date = December 2017
| pmid = 29234465
Line 80:
| title = Practical Bayesian Optimization of Machine Learning Algorithms
| journal = Advances in Neural Information Processing Systems
| volume = 1206
| pages = arXiv:1206.2944
| year = 2012
| url = http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf }}</ref><ref| bibcode name="thornton">{{Citation 2012arXiv1206.2944S
| arxiv = 1206.2944
| class = stat.ML
}}</ref><ref name="thornton">{{Citation
| last = Thornton
| first = Chris
Line 92 ⟶ 97:
| title = Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms
| journal = Knowledge discovery and data mining
| volume = 1208
| pages = arXiv:1208.3719
| year = 2013
| url = http://www.cs.ubc.ca/labs/beta/Projects/autoweka/papers/autoweka.pdf| bibcode = 2012arXiv1208.3719T
| arxiv = 1208.3719
| class = cs.LG
}}</ref> to obtain better results in fewer experiments than grid search and random search, due to the ability to reason about the quality of experiments before they are run.
 
=== Gradient-based optimization ===
For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks.<ref>{{cite journal |last1=Larsen|first1=Jan|last2= Hansen |first2=Lars Kai|last3=Svarer|first3=Claus|last4=Ohlsson|first4=M|title=Design and regularization of neural networks: the optimal use of a validation set|journal=Proceedings of the 1996 IEEE Signal Processing Society Workshop|date=1996}}</ref> Since then, these methods have been extended to other models such as [[support vector machine]]s<ref>{{cite journal |author1=Olivier Chapelle |author2=Vladimir Vapnik |author3=Olivier Bousquet |author4=Sayan Mukherjee |title=Choosing multiple parameters for support vector machines |journal=Machine Learning |year=2002 |volume=46 |pages=131–159 |url=http://www.chapelle.cc/olivier/pub/mlj02.pdf | doi = 10.1023/a:1012450327387 }}</ref> or logistic regression.<ref>{{cite journal |author1 =Chuong B|author2= Chuan-Sheng Foo|author3=Andrew Y Ng|journal = Advances in Neural Information Processing Systems 20|title = Efficient multiple hyperparameter learning for log-linear models|year =2008}}</ref>
 
A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using [[automatic differentiation]].<ref>{{cite journal|last1=Domke|first1=Justin|title=Generic Methods for Optimization-Based Modeling|journal=AISTATSAistats|date=2012|volume=22|url=http://www.jmlr.org/proceedings/papers/v22/domke12/domke12.pdf}}</ref><ref name=abs1502.03492>{{cite arXiv |last1=Maclaurin|first1=Douglas|last2=Duvenaud|first2=David|last3=Adams|first3=Ryan P.|eprint=1502.03492|title=Gradient-based Hyperparameter Optimization through Reversible Learning|class=stat.ML|date=2015}}</ref>
 
=== Evolutionary optimization ===
Line 111 ⟶ 121:
# Repeat steps 2-4 until satisfactory algorithm performance is reached or algorithm performance is no longer improving
 
Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms<ref name="bergstra11" />, [[automated machine learning]]<ref name="tpot1" /><ref name="tpot2" />, [[Deep_learning#Deep_neural_networks|deep neural network]] architecture search<ref name="miikkulainen1">{{cite arxiv | vauthors = Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B | year = 2017 | title = Evolving Deep Neural Networks | |eprint=1703.00548| class = cs.NE }}</ref><ref name="jaderberg1">{{cite arxiv | vauthors = Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K | year = 2017 | title = Population Based Training of Neural Networks |eprint=1711.09846| class = cs.LG }}</ref>, as well as training of the weights in deep neural networks<ref name="such1">{{cite arxiv | vauthors = Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J | year = 2017 | title = Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning |eprint=1712.06567| class = cs.NE }}</ref>.
 
=== Others ===
[[Radial basis function|RBF]]<ref name=abs1705.08520>{{cite arxiv |eprint=1705.08520}}</ref> and [[spectral method|spectral]]<ref namelast1=abs1706.00764>{{citeDiaz|first1=Gonzalo|title=An arxiveffective |eprint=1706.00764}}</ref>algorithm approachesfor havehyperparameter alsooptimization beenof developed.neural
networks|last2=Fokoue|first2=Achille|last3=Nannicini|first3=Giacomo|last4=Samulowitz|first4=Horst|class=cs.AI|year=2017}}</ref> and [[spectral method|spectral]]<ref name=abs1706.00764>{{cite arxiv |eprint=1706.00764|last1=Hazan|first1=Elad|title=Hyperparameter Optimization: A Spectral Approach|last2=Klivans|first2=Adam|last3=Yuan|first3=Yang|class=cs.LG|year=2017}}</ref> approaches have also been developed.
 
== Software ==
Line 138 ⟶ 149:
| pages = 3915−3919
| url = http://jmlr.org/papers/volume15/martinezcantin14a/martinezcantin14a.pdf
| bibcode = 2014arXiv1405.7430M
| arxiv = 1405.7430
| class = cs.LG
}}</ref> an efficient implementation of Bayesian optimization in C/C++ with support for Python, Matlab and Octave.
* [https://github.com/yelp/MOE MOE] MOE is a Python/C++/CUDA library implementing Bayesian Global Optimization using Gaussian Processes.
* [http://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-WEKA]<ref name="autoweka">{{cite journal | vauthors = Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K | year = 2017 | title = Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA | url = http://jmlr.org/papers/v18/16-261.html | journal = Journal of Machine Learning Research | pages = 1-51–5 }}</ref> is a Bayesian hyperparameter optimization layer on top of [[Weka (machine learning)|WEKA]].
* [https://github.com/automl/auto-sklearn Auto-sklearn]<ref name="autosklearn">{{cite journal | vauthors = Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F | year = 2015 | title = Efficient and Robust Automated Machine Learning | url = https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning | journal = Advances in Neural Information Processing Systems 28 (NIPS 2015) | pages = 2962--29702962–2970 }}</ref> is a Bayesian hyperparameter optimization layer on top of [[scikit-learn]].
 
===Gradient based===
Line 147 ⟶ 161:
 
===Evolutionary===
* [https://github.com/rhiever/tpot TPOT]<ref name="tpot1">{{cite journal | vauthors = Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd L, Moore JH | year = 2016 | title = Automating biomedical data science through tree-based pipeline optimization | url = https://link.springer.com/chapter/10.1007/978-3-319-31204-0_9 | journal = Proceedings of EvoStar 2016 | volume = 9597 | pages = 123-137123–137 | doi = 10.1007/978-3-319-31204-0_9 | series = Lecture Notes in Computer Science | isbn = 978-3-319-31203-3 }}</ref><ref name="tpot2">{{cite journal | vauthors = Olson RS, Bartley N, Urbanowicz RJ, Moore JH | year = 2016 | title = Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science | url = https://dl.acm.org/citation.cfm?id=2908918 | journal = Proceedings of EvoBIO 2016 | pages = 485-492485–492 | doi = 10.1145/2908812.2908918 | isbn = 9781450342063 }}</ref> is a Python package that automatically creates and optimizes full machine learning pipelines using [[genetic programming]].
* [https://github.com/joeddav/devol devol] is a Python package that performs Deep Neural Network architecture search using [[genetic programming]].