Hyperparameter optimization: Difference between revisions

Content deleted Content added
Rhiever (talk | contribs)
Line 29:
 
Grid search suffers from the [[curse of dimensionality]], but is often [[embarrassingly parallel]] because typically the hyperparameter settings it evaluates are independent of each other.<ref name="bergstra"/>
 
=== Random search ===
{{main article|Random search}}
 
Since grid searching is an exhaustive and therefore potentially expensive method, several alternatives have been proposed. In particular, a randomized search that simply samples parameter settings a fixed number of times has been found to be more effective in high-dimensional spaces than exhaustive search. This is because oftentimes, it turns out some hyperparameters do not significantly affect the loss. Therefore, having randomly dispersed data gives more "textured" data than an exhaustive search over parameters that ultimately do not affect the loss.<ref name="bergstra">{{cite journal
| title = Random Search for Hyper-Parameter Optimization
| first1 = James
| last1 = Bergstra
| first2 = Yoshua
| last2 = Bengio
| journal = J. Machine Learning Research
| volume = 13
| year = 2012
| pages = 281–305
| url = http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
}}</ref>
 
=== Bayesian optimization ===
Line 78 ⟶ 94:
| year = 2013
| url = http://www.cs.ubc.ca/labs/beta/Projects/autoweka/papers/autoweka.pdf}}</ref> to obtain better results in fewer experiments than grid search and random search, due to the ability to reason about the quality of experiments before they are run.
 
=== Random search ===
{{main article|Random search}}
 
Since grid searching is an exhaustive and therefore potentially expensive method, several alternatives have been proposed. In particular, a randomized search that simply samples parameter settings a fixed number of times has been found to be more effective in high-dimensional spaces than exhaustive search. This is because oftentimes, it turns out some hyperparameters do not significantly affect the loss. Therefore, having randomly dispersed data gives more "textured" data than an exhaustive search over parameters that ultimately do not affect the loss.<ref name="bergstra">{{cite journal
| title = Random Search for Hyper-Parameter Optimization
| first1 = James
| last1 = Bergstra
| first2 = Yoshua
| last2 = Bengio
| journal = J. Machine Learning Research
| volume = 13
| year = 2012
| pages = 281–305
| url = http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
}}</ref>
 
=== Gradient-based optimization ===