Hyperparameter optimization: Difference between revisions

Content deleted Content added
Random search: add note about continuous and grid search
OAbot (talk | contribs)
m Open access bot: hdl updated in citation with #oabot.
Line 137:
=== Early stopping-based ===
[[File:Successive-halving-for-eight-arbitrary-hyperparameter-configurations.png|thumb|Successive halving for eight arbitrary hyperparameter configurations. The approach starts with eight models with different configurations and consecutively applies successive halving until only one model remains.]]
A class of early stopping-based hyperparameter optimization algorithms is purpose built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly.<ref name="irace">{{cite journal |last1=López-Ibáñez |first1=Manuel |last2=Dubois-Lacoste |first2=Jérémie |last3=Pérez Cáceres |first3=Leslie |last4=Stützle |first4=Thomas |last5=Birattari |first5=Mauro |date=2016 |title=The irace package: Iterated Racing for Automatic Algorithm Configuration |journal=Operations Research Perspective |volume=3 |issue=3 |pages=43–58 |doi=10.1016/j.orp.2016.09.002|doi-access=free |hdl=10419/178265 |hdl-access=free }}</ref><ref name="race">{{cite journal |last1=Birattari |first1=Mauro |last2=Stützle |first2=Thomas |last3=Paquete |first3=Luis |last4=Varrentrapp |first4=Klaus |date=2002 |title=A Racing Algorithm for Configuring Metaheuristics |journal=Gecco 2002 |pages=11–18}}</ref>
Another early stopping hyperparameter optimization algorithm is successive halving (SHA),<ref>{{cite arXiv|last1=Jamieson|first1=Kevin|last2=Talwalkar|first2=Ameet|date=2015-02-27|title=Non-stochastic Best Arm Identification and Hyperparameter Optimization|eprint=1502.07943|class=cs.LG}}</ref> which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA)<ref>{{cite arXiv|last1=Li|first1=Liam|last2=Jamieson|first2=Kevin|last3=Rostamizadeh|first3=Afshin|last4=Gonina|first4=Ekaterina|last5=Hardt|first5=Moritz|last6=Recht|first6=Benjamin|last7=Talwalkar|first7=Ameet|date=2020-03-16|title=A System for Massively Parallel Hyperparameter Tuning|class=cs.LG|eprint=1810.05934v5}}</ref> further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate and prune low-performing models. Hyperband<ref>{{cite journal|last1=Li|first1=Lisha|last2=Jamieson|first2=Kevin|last3=DeSalvo|first3=Giulia|last4=Rostamizadeh|first4=Afshin|last5=Talwalkar|first5=Ameet|date=2020-03-16|title=Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization|journal=Journal of Machine Learning Research|volume=18|pages=1–52|arxiv=1603.06560}}</ref> is a higher level early stopping-based algorithm that invokes SHA or ASHA multiple times with varying levels of pruning aggressiveness, in order to be more widely applicable and with fewer required inputs.