Neural architecture search: Difference between revisions

Content deleted Content added
ce
Citation bot (talk | contribs)
Altered template type. Add: eprint, class, date, title, authors 1-7. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Line 35:
RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.<ref name="Zoph 2017" /><ref name="mNASNet2">{{cite arXiv|eprint=1807.11626|last1=Tan|first1=Mingxing|title=MnasNet: Platform-Aware Neural Architecture Search for Mobile|last2=Chen|first2=Bo|last3=Pang|first3=Ruoming|last4=Vasudevan|first4=Vijay|last5=Sandler|first5=Mark|last6=Howard|first6=Andrew|last7=Le|first7=Quoc V.|class=cs.CV|year=2018}}</ref><ref name="MobileNetV3">{{cite arXiv|date=2019-05-06|title=Searching for MobileNetV3|eprint=1905.02244|class=cs.CV|last1=Howard|first1=Andrew|last2=Sandler|first2=Mark|last3=Chu|first3=Grace|last4=Chen|first4=Liang-Chieh|last5=Chen|first5=Bo|last6=Tan|first6=Mingxing|last7=Wang|first7=Weijun|last8=Zhu|first8=Yukun|last9=Pang|first9=Ruoming|last10=Vasudevan|first10=Vijay|last11=Le|first11=Quoc V.|last12=Adam|first12=Hartwig}}</ref>
 
To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>{{cite arxivarXiv |eprint=1802.03268 |last1=Pham |first1=Hieu |last2=Guan |first2=Melody Y. |last3=Zoph |first3=Barret |last4=Le |first4=Quoc V. |last5=Dean |first5=Jeff |title=Efficient Neural Architecture Search via Parameter Sharing |date=2018 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=1902.07638 |last1=Li |first1=Liam |last2=Talwalkar |first2=Ameet |title=Random Search and Reproducibility for Neural Architecture Search |date=2019 |class=cs.LG }}</ref> In this approach, a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large [[Directed acyclic graph|Directed Acyclic Graph]] (DAG) whose subgraphs are different candidate neural networks. Thus, in a supernetwork, the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days.
 
More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>{{cite arxivarXiv |eprint=1812.00332 |last1=Cai |first1=Han |last2=Zhu |first2=Ligeng |last3=Han |first3=Song |title=ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware |date=2018 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=1910.04465 |last1=Dong |first1=Xuanyi |last2=Yang |first2=Yi |title=Searching for a Robust Neural Architecture in Four GPU Hours |date=2019 |class=cs.CV }}</ref><ref name="H. Liu, K. Simonyan 1806">{{cite arxivarXiv |eprint=1806.09055 |last1=Liu |first1=Hanxiao |last2=Simonyan |first2=Karen |last3=Yang |first3=Yiming |title=DARTS: Differentiable Architecture Search |date=2018 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=1812.09926 |last1=Xie |first1=Sirui |last2=Zheng |first2=Hehui |last3=Liu |first3=Chunxiao |last4=Lin |first4=Liang |title=SNAS: Stochastic Neural Architecture Search |date=2018 |class=cs.LG }}</ref> which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS.<ref name="H. Liu, K. Simonyan 1806"/> However, DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms.<ref>{{cite arxivarXiv |eprint=1911.12126 |last1=Chu |first1=Xiangxiang |last2=Zhou |first2=Tianbao |last3=Zhang |first3=Bo |last4=Li |first4=Jixiang |title=Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search |date=2019 |class=cs.LG }}</ref><ref name="Arber Zela 1909">{{cite arxivarXiv |eprint=1909.09656 |last1=Zela |first1=Arber |last2=Elsken |first2=Thomas |last3=Saikia |first3=Tonmoy |last4=Marrakchi |first4=Yassine |last5=Brox |first5=Thomas |last6=Hutter |first6=Frank |title=Understanding and Robustifying Differentiable Architecture Search |date=2019 |class=cs.LG }}</ref><ref name="Xiangning Chen 2002">{{cite arxivarXiv |eprint=2002.05283 |last1=Chen |first1=Xiangning |last2=Hsieh |first2=Cho-Jui |title=Stabilizing Differentiable Architecture Search via Perturbation-based Regularization |date=2020 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=1907.05737 |last1=Xu |first1=Yuhui |last2=Xie |first2=Lingxi |last3=Zhang |first3=Xiaopeng |last4=Chen |first4=Xin |last5=Qi |first5=Guo-Jun |last6=Tian |first6=Qi |last7=Xiong |first7=Hongkai |title=PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search |date=2019 |class=cs.CV }}</ref> Methods like <ref name="Arber Zela 1909"/><ref name="Xiangning Chen 2002"/> aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively. The cause of performance degradation is later analyzed from the architecture selection aspect.<ref>{{cite arxivarXiv |eprint=2108.04392 |last1=Wang |first1=Ruochen |last2=Cheng |first2=Minhao |last3=Chen |first3=Xiangning |last4=Tang |first4=Xiaocheng |last5=Hsieh |first5=Cho-Jui |title=Rethinking Architecture Selection in Differentiable NAS |date=2021 |class=cs.LG }}</ref>
 
Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x ''less'' search time than was used for mNASNet.<ref name="FBNet">{{cite arXiv|eprint=1812.03443|last1=Wu|first1=Bichen|title=FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search|last2=Dai|first2=Xiaoliang|last3=Zhang|first3=Peizhao|last4=Wang|first4=Yanghan|last5=Sun|first5=Fei|last6=Wu|first6=Yiming|last7=Tian|first7=Yuandong|last8=Vajda|first8=Peter|last9=Jia|first9=Yangqing|last10=Keutzer|first10=Kurt|class=cs.CV|date=24 May 2019}}</ref><ref name="MobileNetV2">{{cite arXiv|eprint=1801.04381|last1=Sandler|first1=Mark|title=MobileNetV2: Inverted Residuals and Linear Bottlenecks|last2=Howard|first2=Andrew|last3=Zhu|first3=Menglong|last4=Zhmoginov|first4=Andrey|last5=Chen|first5=Liang-Chieh|class=cs.CV|year=2018}}</ref><ref>{{Cite web|url=http://sites.ieee.org/scv-cas/files/2019/05/2019-05-22-ieee-co-design-trim.pdf|title=Co-Design of DNNs and NN Accelerators|last=Keutzer|first=Kurt|date=2019-05-22|website=IEEE|url-status=|archive-url=|archive-date=|access-date=2019-09-26}}</ref> Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search.<ref name="SqueezeNAS">{{cite arXiv|eprint=1908.01748|last1=Shaw|first1=Albert|title=SqueezeNAS: Fast neural architecture search for faster semantic segmentation|last2=Hunter|first2=Daniel|last3=Iandola|first3=Forrest|last4=Sidhu|first4=Sammy|class=cs.CV|year=2019}}</ref><ref>{{Cite news|url=https://www.eetimes.com/document.asp?doc_id=1335063|title=Does Your AI Chip Have Its Own DNN?|last=Yoshida|first=Junko|date=2019-08-25|work=EE Times|access-date=2019-09-26}}</ref>
 
== Neural architecture search benchmarks ==
Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks<ref>{{cite arxivarXiv |eprint=1902.09635 |last1=Ying |first1=Chris |last2=Klein |first2=Aaron |last3=Real |first3=Esteban |last4=Christiansen |first4=Eric |last5=Murphy |first5=Kevin |last6=Hutter |first6=Frank |title=NAS-Bench-101: Towards Reproducible Neural Architecture Search |date=2019 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=2001.10422 |last1=Zela |first1=Arber |last2=Siems |first2=Julien |last3=Hutter |first3=Frank |title=NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search |date=2020 |class=cs.LG }}</ref><ref>{{cite arxivarXiv |eprint=2001.00326 |last1=Dong |first1=Xuanyi |last2=Yang |first2=Yi |title=NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search |date=2020 |class=cs.CV }}</ref><ref>{{cite arxivarXiv |eprint=2008.09777 |last1=Zela |first1=Arber |last2=Siems |first2=Julien |last3=Zimmer |first3=Lucas |last4=Lukasik |first4=Jovita |last5=Keuper |first5=Margret |last6=Hutter |first6=Frank |title=Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks |date=2020 |class=cs.LG }}</ref> have been introduced, from which one can either query or predict the final performance of neural architectures in seconds. A NAS benchmark is defined as a dataset with a fixed train-test split, a search space, and a fixed training pipeline (hyperparameters). There are primarily two types of NAS benchmarks: a surrogate NAS benchmark and a tabular NAS benchmark. A surrogate benchmark uses a surrogate model (e.g.: a neural network) to predict the performance of an architecture from the search space. On the other hand, a tabular benchmark queries the actual performance of an architecture trained up to convergence. Both of these benchmarks are queryable and can be used to efficiently simulate many NAS algorithms using only a CPU to query the benchmark instead of training an architecture from scratch.
 
==See also==
Line 53:
* {{Cite journal |last1=Elsken |first1=Thomas |last2=Metzen |first2=Jan Hendrik |last3=Hutter |first3=Frank |date=August 8, 2019 |title=Neural Architecture Search: A Survey |url=http://jmlr.org/papers/v20/18-598.html |journal=Journal of Machine Learning Research |volume=20 |issue=55 |pages=1–21 |arxiv=1808.05377}}
* {{cite journal |last1=Liu |first1=Yuqiao |last2=Sun |first2=Yanan |last3=Xue |first3=Bing |last4=Zhang |first4=Mengjie |last5=Yen |first5=Gary G |last6=Tan |first6=Kay Chen |year=2021 |title=A Survey on Evolutionary Neural Architecture Search |journal=IEEE Transactions on Neural Networks and Learning Systems |volume=PP |issue=2 |pages=1–21 |arxiv=2008.10937 |doi=10.1109/TNNLS.2021.3100554 |pmid=34357870 |s2cid=221293236}}
* {{cite arxivarXiv |last1=White |first1=Colin |title=Neural Architecture Search: Insights from 1000 Papers |date=2023-01-25 |arxiveprint=2301.08727 |last2=Safari |first2=Mahmoud |last3=Sukthanker |first3=Rhea |last4=Ru |first4=Binxin |last5=Elsken |first5=Thomas |last6=Zela |first6=Arber |last7=Dey |first7=Debadeepta |last8=Hutter |first8=Frank|class=cs.LG }}
 
==References==