Content deleted Content added
m Added missing commas, missing acronym definition. Tags: Visual edit Mobile edit Mobile web edit |
m Added missing link Tags: Visual edit Mobile edit Mobile web edit |
||
Line 35:
RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.<ref name="Zoph 2017" /><ref name="mNASNet2">{{cite arXiv|eprint=1807.11626|last1=Tan|first1=Mingxing|title=MnasNet: Platform-Aware Neural Architecture Search for Mobile|last2=Chen|first2=Bo|last3=Pang|first3=Ruoming|last4=Vasudevan|first4=Vijay|last5=Sandler|first5=Mark|last6=Howard|first6=Andrew|last7=Le|first7=Quoc V.|class=cs.CV|year=2018}}</ref><ref name="MobileNetV3">{{cite arXiv|date=2019-05-06|title=Searching for MobileNetV3|eprint=1905.02244|class=cs.CV|last1=Howard|first1=Andrew|last2=Sandler|first2=Mark|last3=Chu|first3=Grace|last4=Chen|first4=Liang-Chieh|last5=Chen|first5=Bo|last6=Tan|first6=Mingxing|last7=Wang|first7=Weijun|last8=Zhu|first8=Yukun|last9=Pang|first9=Ruoming|last10=Vasudevan|first10=Vijay|last11=Le|first11=Quoc V.|last12=Adam|first12=Hartwig}}</ref>
To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: [[arxiv:1802.03268|Efficient neural architecture search via parameter sharing]]. In: Proceedings of the 35th International Conference on Machine Learning (2018).</ref><ref>Li, L., Talwalkar, A.: [[arxiv:1902.07638|Random search and reproducibility for neural architecture search]]. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2019).</ref> In this approach a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large [[Directed acyclic graph|Directed Acyclic Graph]] (DAG) whose subgraphs are different candidate neural networks.Thus in a supernetwork the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days.
More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>H. Cai, L. Zhu, and S. Han. [[arxiv:1812.00332|Proxylessnas: Direct neural architecture search on target task and hardware]]. ICLR, 2019.</ref><ref>X. Dong and Y. Yang. [[arxiv:1910.04465|Searching for a robust neural architecture in four gpu hours]]. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2019.</ref><ref name="H. Liu, K. Simonyan 1806">H. Liu, K. Simonyan, and Y. Yang. [[arxiv:1806.09055|Darts: Differentiable architecture search]]. In ICLR, 2019</ref><ref>S. Xie, H. Zheng, C. Liu, and L. Lin. [[arxiv:1812.09926|Snas: stochastic neural architecture search]]. ICLR, 2019.</ref> which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS.<ref name="H. Liu, K. Simonyan 1806"/> However DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms.<ref>Chu, Xiangxiang and Zhou, Tianbao and Zhang, Bo and Li, Jixiang. [[arxiv:1911.12126|Fair darts: Eliminating unfair advantages in differentiable architecture search]]. In ECCV, 2020</ref><ref name="Arber Zela 1909">Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter. [[arxiv:1909.09656|Understanding and Robustifying Differentiable Architecture Search]]. In ICLR, 2020</ref><ref name="Xiangning Chen 2002">Xiangning Chen, Cho-Jui Hsieh. [[arxiv:2002.05283|Stabilizing Differentiable Architecture Search via Perturbation-based Regularization]]. In ICML, 2020</ref><ref>Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong. [[arxiv:1907.05737|PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search]]. In ICLR, 2020</ref> Methods like <ref name="Arber Zela 1909"/><ref name="Xiangning Chen 2002"/> aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively. The cause of performance degradation is later analyzed from the architecture selection aspect.<ref>Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, Cho-Jui Hsieh. [[arxiv:2108.04392|Rethinking Architecture Selection in Differentiable NAS]]. In ICLR, 2022</ref>
|