Neural architecture search: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation)
Line 18:
 
== Evolution ==
An alternative approach to NAS is based on [[Evolutionary algorithm|evolutionary algorithms]], which has been employed by several groups.<ref>{{cite arXiv|last1=Real|first1=Esteban|last2=Moore|first2=Sherry|last3=Selle|first3=Andrew|last4=Saxena|first4=Saurabh|last5=Suematsu|first5=Yutaka Leon|last6=Tan|first6=Jie|last7=Le|first7=Quoc|last8=Kurakin|first8=Alex|date=2017-03-03|title=Large-Scale Evolution of Image Classifiers|eprint=1703.01041|class=cs.NE}}</ref><ref>{{Cite arXiv|last=Suganuma|first=Masanori|last2=Shirakawa|first2=Shinichi|last3=Nagao|first3=Tomoharu|date=2017-04-03|title=A Genetic Programming Approach to Designing Convolutional Neural Network Architectures|arxiv=1704.00764v2|language=en}}</ref><ref name=":0">{{Cite arXiv|last=Liu|first=Hanxiao|last2=Simonyan|first2=Karen|last3=Vinyals|first3=Oriol|last4=Fernando|first4=Chrisantha|last5=Kavukcuoglu|first5=Koray|date=2017-11-01|title=Hierarchical Representations for Efficient Architecture Search|arxiv=1711.00436v2|language=en}}</ref><ref name="Real 2018">{{cite arXiv|last1=Real|first1=Esteban|last2=Aggarwal|first2=Alok|last3=Huang|first3=Yanping|last4=Le|first4=Quoc V.|date=2018-02-05|title=Regularized Evolution for Image Classifier Architecture Search|eprint=1802.01548|class=cs.NE}}</ref><ref>{{Cite journal|last=Miikkulainen|first=Risto|last2=Liang|first2=Jason|last3=Meyerson|first3=Elliot|last4=Rawal|first4=Aditya|last5=Fink|first5=Dan|last6=Francon|first6=Olivier|last7=Raju|first7=Bala|last8=Shahrzad|first8=Hormoz|last9=Navruzyan|first9=Arshak|last10=Duffy|first10=Nigel|last11=Hodjat|first11=Babak|date=2017-03-04|title=Evolving Deep Neural Networks|url=http://arxiv.org/abs/1703.00548|journal=arXiv:1703.00548 [cs]}}</ref><ref>{{Cite journal|last=Xie|first=Lingxi|last2=Yuille|first2=Alan|date=|title=Genetic CNN|url=https://ieeexplore.ieee.org/document/8237416|journal=2017 IEEE International Conference on Computer Vision (ICCV)|pages=1388–1397|doi=10.1109/ICCV.2017.154}}</ref><ref name="Elsken 2018" />. An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure.<ref name="liu2021survey">{{cite arXiv|last1=Liu|first1=Yuqiao|last2=Sun|first2=Yanan|last3=Xue|first3=Bing|last4=Zhang|first4=Mengjie|last5=Yen|first5=Gary G|last6=Tan|first6=Kay Chen|date=2020-08-25|title=A Survey on Evolutionary Neural Architecture Search|eprint=2008.10937|class=cs.NE}}</ref>. First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (eg: 3x3 convolution instead of a 5x5 convolution). Next the new architectures are trained from scratch for a few epochs and their validation scores are obtained. This is followed by replacing the lowest scoring architectures in the candidate pool with the better, newer architectures. This procedure is repeated multiple times and thus the candidate pool is refined over time. Mutations in the context of evolving ANNs are operations such as adding or removing a layer, which include changing the type of a layer (e.g., from convolution to pooling), changing the hyperparameters of a layer, or changing the training hyperparameters. On [[CIFAR-10]] and [[ImageNet]], evolution and RL performed comparably, while both slightly outperformed [[random search]].<ref name="Real 2018" /><ref name=":0" />
 
== Bayesian Optimization ==
Line 38:
To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: [[arxiv:1802.03268|Efficient neural architecture search via parameter sharing]]. In: Proceedings of the 35th International Conference on Machine Learning (2018).</ref><ref>Li, L., Talwalkar, A.: [[arxiv:1902.07638|Random search and reproducibility for neural architecture search]]. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2019).</ref> In this approach a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large Directed Acyclic Graph (DAG) whose subgraphs are different candidate neural networks.Thus in a supernetwork the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days.
 
More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>H. Cai, L. Zhu, and S. Han. [[arxiv:1812.00332|Proxylessnas: Direct neural architecture search on target task and hardware]]. ICLR, 2019.</ref><ref>X. Dong and Y. Yang. [[arxiv:1910.04465|Searching for a robust neural architecture in four gpu hours]]. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2019.</ref><ref>H. Liu, K. Simonyan, and Y. Yang. [[arxiv:1806.09055|Darts: Differentiable architecture search]]. In ICLR, 2019</ref><ref>S. Xie, H. Zheng, C. Liu, and L. Lin. [[arxiv:1812.09926|Snas: stochastic neural architecture search]]. ICLR, 2019.</ref> which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS .<ref>H. Liu, K. Simonyan, and Y. Yang. [[arxiv:1806.09055|Darts: Differentiable architecture search]]. In ICLR, 2019</ref>. However DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms .<ref>Chu, Xiangxiang and Zhou, Tianbao and Zhang, Bo and Li, Jixiang. [[arxiv:1911.12126|Fair darts: Eliminating unfair advantages in differentiable architecture search]]. In ECCV, 2020</ref><ref>Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter. [[arxiv:1909.09656|Understanding and Robustifying Differentiable Architecture Search]]. In ICLR, 2020</ref><ref>Xiangning Chen, Cho-Jui Hsieh. [[arxiv:2002.05283|Stabilizing Differentiable Architecture Search via Perturbation-based Regularization]]. In ICML, 2020</ref><ref>Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong. [[arxiv:1907.05737 |PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search]]. In ICLR, 2020</ref> . Methods like <ref>Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter. [[arxiv:1909.09656|Understanding and Robustifying Differentiable Architecture Search]]. In ICLR, 2020</ref><ref>Xiangning Chen, Cho-Jui Hsieh. [[arxiv:2002.05283|Stabilizing Differentiable Architecture Search via Perturbation-based Regularization]]. In ICML, 2020</ref> aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively. The cause of performance degradation is later analyzed from the architecture selection aspect .<ref>Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, Cho-Jui Hsieh. [[arxiv:2108.04392|Rethinking Architecture Selection in Differentiable NAS]]. In ICLR, 2022</ref>.
 
Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x ''less'' search time than was used for mNASNet.<ref name="FBNet">{{cite arXiv|eprint=1812.03443|last1=Wu|first1=Bichen|title=FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search|last2=Dai|first2=Xiaoliang|last3=Zhang|first3=Peizhao|last4=Wang|first4=Yanghan|last5=Sun|first5=Fei|last6=Wu|first6=Yiming|last7=Tian|first7=Yuandong|last8=Vajda|first8=Peter|last9=Jia|first9=Yangqing|last10=Keutzer|first10=Kurt|class=cs.CV|date=24 May 2019}}</ref><ref name="MobileNetV2">{{cite arXiv|eprint=1801.04381|last1=Sandler|first1=Mark|title=MobileNetV2: Inverted Residuals and Linear Bottlenecks|last2=Howard|first2=Andrew|last3=Zhu|first3=Menglong|last4=Zhmoginov|first4=Andrey|last5=Chen|first5=Liang-Chieh|class=cs.CV|year=2018}}</ref><ref>{{Cite web|url=http://sites.ieee.org/scv-cas/files/2019/05/2019-05-22-ieee-co-design-trim.pdf|title=Co-Design of DNNs and NN Accelerators|last=Keutzer|first=Kurt|date=2019-05-22|website=IEEE|url-status=|archive-url=|archive-date=|access-date=2019-09-26}}</ref> Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search.<ref name="SqueezeNAS">{{cite arXiv|eprint=1908.01748|last1=Shaw|first1=Albert|title=SqueezeNAS: Fast neural architecture search for faster semantic segmentation|last2=Hunter|first2=Daniel|last3=Iandola|first3=Forrest|last4=Sidhu|first4=Sammy|class=cs.CV|year=2019}}</ref><ref>{{Cite news|url=https://www.eetimes.com/document.asp?doc_id=1335063|title=Does Your AI Chip Have Its Own DNN?|last=Yoshida|first=Junko|date=2019-08-25|work=EE Times|access-date=2019-09-26}}</ref>.
 
==NAS Benchmarks==
NAS research is often very computationally expensive which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation.<ref name=":1">Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., & Hutter, F. (2019, May). Nas-bench-101: Towards reproducible neural architecture search. In International Conference on Machine Learning (pp. 7105-7114). PMLR.</ref>. Tabular or surrogate NAS benchmarks facilitate more efficient, effective and reproducible research on NAS.
 
Following is the list of the most popular NAS benchmarks: