Content deleted Content added
COI / refspam |
m Various citation & identifier cleanup, plus AWB genfixes |
||
Line 1:
{{Short description|Machine learning-powered structure design}}
{{Machine learning bar}}
▲{{otheruses|Nas (disambiguation)}}
'''Neural architecture search''' (NAS)<ref name="survey">{{Cite journal|url=http://jmlr.org/papers/v20/18-598.html|title=Neural Architecture Search: A Survey|first1=Thomas|last1=Elsken|first2=Jan Hendrik|last2=Metzen|first3=Frank|last3=Hutter|date=August 8, 2019|journal=Journal of Machine Learning Research|volume=20|issue=55|pages=1–21|via=jmlr.org|bibcode=2018arXiv180805377E|arxiv=1808.05377}}</ref><ref name="survey2">{{cite arXiv|last1=Wistuba|first1=Martin|last2=Rawat|first2=Ambrish|last3=Pedapati|first3=Tejaswini|date=2019-05-04|title=A Survey on Neural Architecture Search|eprint=1905.01392|class=cs.LG}}</ref> is a technique for automating the design of [[artificial neural network]]s (ANN), a widely used model in the field of [[machine learning]]. NAS has been used to design networks that are on par or outperform hand-designed architectures.<ref name="Zoph 2016" /><ref name="Zoph 2017">{{cite arXiv|last1=Zoph|first1=Barret|last2=Vasudevan|first2=Vijay|last3=Shlens|first3=Jonathon|last4=Le|first4=Quoc V.|date=2017-07-21|title=Learning Transferable Architectures for Scalable Image Recognition|eprint=1707.07012|class=cs.CV}}</ref> Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:<ref name="survey" />
Line 18:
== Evolution ==
An alternative approach to NAS is based on [[
== Bayesian Optimization ==
[[Bayesian Optimization]] which has proven to be an efficient method for hyperparameter optimization can also be applied to NAS. In this context the objective function maps an architecture to its validation error after being trained for a number of epochs. At each iteration BO uses a surrogate to model this objective function based on previously obtained architectures and their validation errors. One then chooses the next architecture to evaluate by maximizing an acquisition function, such as expected improvement, which provides a balance between exploration and exploitation. Acquisition function maximization and objective function evaluation are often computationally expensive for NAS, and make the application of BO challenging in this context. Recently BANANAS<ref>{{
==Hill-climbing==
Line 38:
To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: [[arxiv:1802.03268|Efficient neural architecture search via parameter sharing]]. In: Proceedings of the 35th International Conference on Machine Learning (2018).</ref><ref>Li, L., Talwalkar, A.: [[arxiv:1902.07638|Random search and reproducibility for neural architecture search]]. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2019).</ref> In this approach a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large Directed Acyclic Graph (DAG) whose subgraphs are different candidate neural networks.Thus in a supernetwork the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days.
More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>H. Cai, L. Zhu, and S. Han. [[arxiv:1812.00332|Proxylessnas: Direct neural architecture search on target task and hardware]]. ICLR, 2019.</ref><ref>X. Dong and Y. Yang. [[arxiv:1910.04465|Searching for a robust neural architecture in four gpu hours]]. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2019.</ref><ref name="H. Liu, K. Simonyan 1806">H. Liu, K. Simonyan, and Y. Yang. [[arxiv:1806.09055|Darts: Differentiable architecture search]]. In ICLR, 2019</ref><ref>S. Xie, H. Zheng, C. Liu, and L. Lin. [[arxiv:1812.09926|Snas: stochastic neural architecture search]]. ICLR, 2019.</ref> which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS.<ref
Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x ''less'' search time than was used for mNASNet.<ref name="FBNet">{{cite arXiv|eprint=1812.03443|last1=Wu|first1=Bichen|title=FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search|last2=Dai|first2=Xiaoliang|last3=Zhang|first3=Peizhao|last4=Wang|first4=Yanghan|last5=Sun|first5=Fei|last6=Wu|first6=Yiming|last7=Tian|first7=Yuandong|last8=Vajda|first8=Peter|last9=Jia|first9=Yangqing|last10=Keutzer|first10=Kurt|class=cs.CV|date=24 May 2019}}</ref><ref name="MobileNetV2">{{cite arXiv|eprint=1801.04381|last1=Sandler|first1=Mark|title=MobileNetV2: Inverted Residuals and Linear Bottlenecks|last2=Howard|first2=Andrew|last3=Zhu|first3=Menglong|last4=Zhmoginov|first4=Andrey|last5=Chen|first5=Liang-Chieh|class=cs.CV|year=2018}}</ref><ref>{{Cite web|url=http://sites.ieee.org/scv-cas/files/2019/05/2019-05-22-ieee-co-design-trim.pdf|title=Co-Design of DNNs and NN Accelerators|last=Keutzer|first=Kurt|date=2019-05-22|website=IEEE|url-status=|archive-url=|archive-date=|access-date=2019-09-26}}</ref> Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search.<ref name="SqueezeNAS">{{cite arXiv|eprint=1908.01748|last1=Shaw|first1=Albert|title=SqueezeNAS: Fast neural architecture search for faster semantic segmentation|last2=Hunter|first2=Daniel|last3=Iandola|first3=Forrest|last4=Sidhu|first4=Sammy|class=cs.CV|year=2019}}</ref><ref>{{Cite news|url=https://www.eetimes.com/document.asp?doc_id=1335063|title=Does Your AI Chip Have Its Own DNN?|last=Yoshida|first=Junko|date=2019-08-25|work=EE Times|access-date=2019-09-26}}</ref>
|