Content deleted Content added
m →top: bold |
Countercheck (talk | contribs) m clean up, typo(s) fixed: On the other hand → On the other hand,, upto → up to, ’s → 's, eg: → e.g.: (2) |
||
Line 12:
[[Reinforcement learning]] (RL) can underpin a NAS search strategy. Barret Zoph and [[Quoc Viet Le]]<ref name="Zoph 2016" /> applied NAS with RL targeting the [[CIFAR-10]] dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the [[Treebank|Penn Treebank]] dataset, that model composed a recurrent cell that outperforms [[Long short-term memory|LSTM]], reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214.<ref name="Zoph 2016">{{cite arXiv|last1=Zoph|first1=Barret|last2=Le|first2=Quoc V.|date=2016-11-04|title=Neural Architecture Search with Reinforcement Learning|eprint=1611.01578 |class=cs.LG}}</ref>
Learning a model architecture directly on a large dataset can be a lengthy process. NASNet<ref name="Zoph 2017" /><ref>{{Cite news|url=https://research.googleblog.com/2017/11/automl-for-large-scale-image.html|title=AutoML for large scale image classification and object detection|last1=Zoph|first1=Barret|date=November 2, 2017|work=Research Blog|access-date=2018-02-20|last2=Vasudevan|first2=Vijay|language=en-US|last3=Shlens|first3=Jonathon|last4=Le|first4=Quoc V.}}</ref> addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of [[Convolutional neural network|convolutional]] cells to return feature maps that serve two main functions when convoluting an input feature map: ''normal cells'' that return maps of the same extent (height and width) and ''reduction cells'' in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the
In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with [[Reinforcement learning|policy gradient]] to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical [[cross entropy]] loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.<ref>{{cite arXiv|last1=Hieu|first1=Pham|last2=Y.|first2=Guan, Melody|last3=Barret|first3=Zoph|last4=V.|first4=Le, Quoc|last5=Jeff|first5=Dean|date=2018-02-09|title=Efficient Neural Architecture Search via Parameter Sharing|eprint=1802.03268|class=cs.LG}}</ref>
== Evolution ==
An alternative approach to NAS is based on [[evolutionary algorithm]]s, which has been employed by several groups.<ref>{{cite arXiv|last1=Real|first1=Esteban|last2=Moore|first2=Sherry|last3=Selle|first3=Andrew|last4=Saxena|first4=Saurabh|last5=Suematsu|first5=Yutaka Leon|last6=Tan|first6=Jie|last7=Le|first7=Quoc|last8=Kurakin|first8=Alex|date=2017-03-03|title=Large-Scale Evolution of Image Classifiers|eprint=1703.01041|class=cs.NE}}</ref><ref>{{Cite arXiv|last1=Suganuma|first1=Masanori|last2=Shirakawa|first2=Shinichi|last3=Nagao|first3=Tomoharu|date=2017-04-03|title=A Genetic Programming Approach to Designing Convolutional Neural Network Architectures|class=cs.NE|eprint=1704.00764v2|language=en}}</ref><ref name=":0">{{Cite arXiv|last1=Liu|first1=Hanxiao|last2=Simonyan|first2=Karen|last3=Vinyals|first3=Oriol|last4=Fernando|first4=Chrisantha|last5=Kavukcuoglu|first5=Koray|date=2017-11-01|title=Hierarchical Representations for Efficient Architecture Search|class=cs.LG|eprint=1711.00436v2|language=en}}</ref><ref name="Real 2018">{{cite arXiv|last1=Real|first1=Esteban|last2=Aggarwal|first2=Alok|last3=Huang|first3=Yanping|last4=Le|first4=Quoc V.|date=2018-02-05|title=Regularized Evolution for Image Classifier Architecture Search|eprint=1802.01548|class=cs.NE}}</ref><ref>{{cite arXiv|last1=Miikkulainen|first1=Risto|last2=Liang|first2=Jason|last3=Meyerson|first3=Elliot|last4=Rawal|first4=Aditya|last5=Fink|first5=Dan|last6=Francon|first6=Olivier|last7=Raju|first7=Bala|last8=Shahrzad|first8=Hormoz|last9=Navruzyan|first9=Arshak|last10=Duffy|first10=Nigel|last11=Hodjat|first11=Babak|date=2017-03-04|title=Evolving Deep Neural Networks|class=cs.NE|eprint=1703.00548}}</ref><ref>{{Cite book|last1=Xie|first1=Lingxi|last2=Yuille|first2=Alan|title=2017 IEEE International Conference on Computer Vision (ICCV) |chapter=Genetic CNN |chapter-url=https://ieeexplore.ieee.org/document/8237416|year=2017|pages=1388–1397|doi=10.1109/ICCV.2017.154|arxiv=1703.01513|isbn=978-1-5386-1032-9|s2cid=206770867}}</ref><ref name="Elsken 2018" /> An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure.<ref name="liu2021survey">{{cite journal|last1=Liu|first1=Yuqiao|last2=Sun|first2=Yanan|last3=Xue|first3=Bing|last4=Zhang|first4=Mengjie|last5=Yen|first5=Gary G|last6=Tan|first6=Kay Chen|title=A Survey on Evolutionary Neural Architecture Search|journal=IEEE Transactions on Neural Networks and Learning Systems|year=2021|volume=PP|issue=2 |pages=1–21|doi=10.1109/TNNLS.2021.3100554|pmid=34357870|arxiv=2008.10937|s2cid=221293236}}</ref> First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (
== Bayesian optimization ==
Line 42:
== Neural architecture search benchmarks ==
Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks<ref>Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K. and Hutter, F., 2019, May. Nas-bench-101: [[arxiv:1902.09635|Towards reproducible neural architecture search]]. In ''International Conference on Machine Learning'' (pp. 7105-7114). PMLR.</ref><ref>Zela, A., Siems, J. and Hutter, F., 2020. Nas-bench-1shot1: Benchmarking and dissecting one-shot neural architecture search. ''arXiv preprint [[arXiv:2001.10422]]''.</ref><ref>Dong, X. and Yang, Y., 2020. Nas-bench-201: Extending the scope of reproducible neural architecture search. ''arXiv preprint [[arXiv:2001.00326]]''.</ref><ref>Siems, J., Zimmer, L., Zela, A., Lukasik, J., Keuper, M. and Hutter, F., 2020. Nas-bench-301 and the case for surrogate benchmarks for neural architecture search. ''arXiv preprint [[arXiv:2008.09777]]''.</ref> have been introduced, from which one can either query or predict the final performance of neural architectures in seconds. A NAS benchmark is defined as a dataset with a fixed train-test split, a search space, and a fixed training pipeline (hyperparameters). There are primarily two types of NAS benchmarks: a surrogate NAS benchmark and a tabular NAS benchmark. A surrogate benchmark uses a surrogate model (
==See also==
|