Neural architecture search: Difference between revisions

Content deleted Content added
m top: bold
m clean up, typo(s) fixed: On the other hand → On the other hand,, upto → up to, ’s → 's, eg: → e.g.: (2)
Line 12:
[[Reinforcement learning]] (RL) can underpin a NAS search strategy. Barret Zoph and [[Quoc Viet Le]]<ref name="Zoph 2016" /> applied NAS with RL targeting the [[CIFAR-10]] dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the [[Treebank|Penn Treebank]] dataset, that model composed a recurrent cell that outperforms [[Long short-term memory|LSTM]], reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214.<ref name="Zoph 2016">{{cite arXiv|last1=Zoph|first1=Barret|last2=Le|first2=Quoc V.|date=2016-11-04|title=Neural Architecture Search with Reinforcement Learning|eprint=1611.01578 |class=cs.LG}}</ref>
 
Learning a model architecture directly on a large dataset can be a lengthy process. NASNet<ref name="Zoph 2017" /><ref>{{Cite news|url=https://research.googleblog.com/2017/11/automl-for-large-scale-image.html|title=AutoML for large scale image classification and object detection|last1=Zoph|first1=Barret|date=November 2, 2017|work=Research Blog|access-date=2018-02-20|last2=Vasudevan|first2=Vijay|language=en-US|last3=Shlens|first3=Jonathon|last4=Le|first4=Quoc V.}}</ref> addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of [[Convolutional neural network|convolutional]] cells to return feature maps that serve two main functions when convoluting an input feature map: ''normal cells'' that return maps of the same extent (height and width) and ''reduction cells'' in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell’scell's inputs uses a stride of two (to reduce the height and width).<ref name="Zoph 2017" /> The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the [[ImageNet]] dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer [[FLOPS]]—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the [[COCO (dataset)|COCO]] dataset.<ref name="Zoph 2017" />
 
In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with [[Reinforcement learning|policy gradient]] to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical [[cross entropy]] loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.<ref>{{cite arXiv|last1=Hieu|first1=Pham|last2=Y.|first2=Guan, Melody|last3=Barret|first3=Zoph|last4=V.|first4=Le, Quoc|last5=Jeff|first5=Dean|date=2018-02-09|title=Efficient Neural Architecture Search via Parameter Sharing|eprint=1802.03268|class=cs.LG}}</ref>
 
== Evolution ==
An alternative approach to NAS is based on [[evolutionary algorithm]]s, which has been employed by several groups.<ref>{{cite arXiv|last1=Real|first1=Esteban|last2=Moore|first2=Sherry|last3=Selle|first3=Andrew|last4=Saxena|first4=Saurabh|last5=Suematsu|first5=Yutaka Leon|last6=Tan|first6=Jie|last7=Le|first7=Quoc|last8=Kurakin|first8=Alex|date=2017-03-03|title=Large-Scale Evolution of Image Classifiers|eprint=1703.01041|class=cs.NE}}</ref><ref>{{Cite arXiv|last1=Suganuma|first1=Masanori|last2=Shirakawa|first2=Shinichi|last3=Nagao|first3=Tomoharu|date=2017-04-03|title=A Genetic Programming Approach to Designing Convolutional Neural Network Architectures|class=cs.NE|eprint=1704.00764v2|language=en}}</ref><ref name=":0">{{Cite arXiv|last1=Liu|first1=Hanxiao|last2=Simonyan|first2=Karen|last3=Vinyals|first3=Oriol|last4=Fernando|first4=Chrisantha|last5=Kavukcuoglu|first5=Koray|date=2017-11-01|title=Hierarchical Representations for Efficient Architecture Search|class=cs.LG|eprint=1711.00436v2|language=en}}</ref><ref name="Real 2018">{{cite arXiv|last1=Real|first1=Esteban|last2=Aggarwal|first2=Alok|last3=Huang|first3=Yanping|last4=Le|first4=Quoc V.|date=2018-02-05|title=Regularized Evolution for Image Classifier Architecture Search|eprint=1802.01548|class=cs.NE}}</ref><ref>{{cite arXiv|last1=Miikkulainen|first1=Risto|last2=Liang|first2=Jason|last3=Meyerson|first3=Elliot|last4=Rawal|first4=Aditya|last5=Fink|first5=Dan|last6=Francon|first6=Olivier|last7=Raju|first7=Bala|last8=Shahrzad|first8=Hormoz|last9=Navruzyan|first9=Arshak|last10=Duffy|first10=Nigel|last11=Hodjat|first11=Babak|date=2017-03-04|title=Evolving Deep Neural Networks|class=cs.NE|eprint=1703.00548}}</ref><ref>{{Cite book|last1=Xie|first1=Lingxi|last2=Yuille|first2=Alan|title=2017 IEEE International Conference on Computer Vision (ICCV) |chapter=Genetic CNN |chapter-url=https://ieeexplore.ieee.org/document/8237416|year=2017|pages=1388–1397|doi=10.1109/ICCV.2017.154|arxiv=1703.01513|isbn=978-1-5386-1032-9|s2cid=206770867}}</ref><ref name="Elsken 2018" /> An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure.<ref name="liu2021survey">{{cite journal|last1=Liu|first1=Yuqiao|last2=Sun|first2=Yanan|last3=Xue|first3=Bing|last4=Zhang|first4=Mengjie|last5=Yen|first5=Gary G|last6=Tan|first6=Kay Chen|title=A Survey on Evolutionary Neural Architecture Search|journal=IEEE Transactions on Neural Networks and Learning Systems|year=2021|volume=PP|issue=2 |pages=1–21|doi=10.1109/TNNLS.2021.3100554|pmid=34357870|arxiv=2008.10937|s2cid=221293236}}</ref> First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (ege.g.: 3x3 convolution instead of a 5x5 convolution). Next the new architectures are trained from scratch for a few epochs and their validation scores are obtained. This is followed by replacing the lowest scoring architectures in the candidate pool with the better, newer architectures. This procedure is repeated multiple times and thus the candidate pool is refined over time. Mutations in the context of evolving ANNs are operations such as adding or removing a layer, which include changing the type of a layer (e.g., from convolution to pooling), changing the hyperparameters of a layer, or changing the training hyperparameters. On [[CIFAR-10]] and [[ImageNet]], evolution and RL performed comparably, while both slightly outperformed [[random search]].<ref name="Real 2018" /><ref name=":0" />
 
== Bayesian optimization ==
Line 42:
 
== Neural architecture search benchmarks ==
Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks<ref>Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K. and Hutter, F., 2019, May. Nas-bench-101: [[arxiv:1902.09635|Towards reproducible neural architecture search]]. In ''International Conference on Machine Learning'' (pp. 7105-7114). PMLR.</ref><ref>Zela, A., Siems, J. and Hutter, F., 2020. Nas-bench-1shot1: Benchmarking and dissecting one-shot neural architecture search. ''arXiv preprint [[arXiv:2001.10422]]''.</ref><ref>Dong, X. and Yang, Y., 2020. Nas-bench-201: Extending the scope of reproducible neural architecture search. ''arXiv preprint [[arXiv:2001.00326]]''.</ref><ref>Siems, J., Zimmer, L., Zela, A., Lukasik, J., Keuper, M. and Hutter, F., 2020. Nas-bench-301 and the case for surrogate benchmarks for neural architecture search. ''arXiv preprint [[arXiv:2008.09777]]''.</ref> have been introduced, from which one can either query or predict the final performance of neural architectures in seconds. A NAS benchmark is defined as a dataset with a fixed train-test split, a search space, and a fixed training pipeline (hyperparameters). There are primarily two types of NAS benchmarks: a surrogate NAS benchmark and a tabular NAS benchmark. A surrogate benchmark uses a surrogate model (ege.g.: a neural network) to predict the performance of an architecture from the search space. On the other hand, a tabular benchmark queries the actual performance of an architecture trained uptoup to convergence. Both of these benchmarks are queryable and can be used to efficiently simulate many NAS algorithms using only a CPU to query the benchmark instead of training an architecture from scratch.
 
==See also==