Content deleted Content added
m Added missing link Tags: Visual edit Mobile edit Mobile web edit |
Citation bot (talk | contribs) Removed URL that duplicated identifier. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 893/990 |
||
(19 intermediate revisions by 12 users not shown) | |||
Line 1:
{{Short description|Machine learning-powered structure design}}
{{Machine learning bar}}
'''Neural architecture search''' ('''NAS''')<ref name="survey">{{Cite journal|url=http://jmlr.org/papers/v20/18-598.html|title=Neural Architecture Search: A Survey|first1=Thomas|last1=Elsken|first2=Jan Hendrik|last2=Metzen|first3=Frank|last3=Hutter|date=August 8, 2019|journal=Journal of Machine Learning Research|volume=20|issue=55|pages=1–21
* The ''search space'' defines the type(s) of ANN that can be designed and optimized.
Line 7:
* The ''performance estimation strategy'' evaluates the performance of a possible ANN from its design (without constructing and training it).
NAS is closely related to [[hyperparameter optimization]]<ref>Matthias Feurer and Frank Hutter. [https://link.springer.com/content/pdf/10.1007%2F978-3-030-05318-5_1.pdf Hyperparameter optimization]. In: ''AutoML: Methods, Systems, Challenges'', pages 3–38.</ref> and [[meta-learning (computer science)|meta-learning]]<ref>{{Cite book|chapter-url=https://link.springer.com/chapter/10.1007/978-3-030-05318-5_2|doi = 10.1007/978-3-030-05318-5_2|chapter = Meta-Learning|title = Automated Machine Learning|series = The Springer Series on Challenges in Machine Learning|year = 2019|last1 = Vanschoren|first1 = Joaquin|pages = 35–61|isbn = 978-3-030-05317-8|s2cid = 239362577}}</ref> and is a subfield of [[automated machine learning]] (AutoML).<ref>{{Cite journal |last1=Salehin |first1=Imrus |last2=Islam |first2=Md. Shamiul |last3=Saha |first3=Pritom |last4=Noman |first4=S. M. |last5=Tuni |first5=Azra |last6=Hasan |first6=Md. Mehedi |last7=Baten |first7=Md. Abu |date=2024-01-01 |title=AutoML: A systematic review on automated machine learning with neural architecture search |journal=Journal of Information and Intelligence |volume=2 |issue=1 |pages=52–81 |doi=10.1016/j.jiixd.2023.10.002 |issn=2949-7159|doi-access=free }}</ref>
==Reinforcement learning==
[[Reinforcement learning]] (RL) can underpin a NAS search strategy. Barret Zoph and [[Quoc Viet Le]]<ref name="Zoph 2016" /> applied NAS with RL targeting the [[CIFAR-10]] dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the [[Treebank|Penn Treebank]] dataset, that model composed a recurrent cell that outperforms [[Long short-term memory|LSTM]], reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214.<ref name="Zoph 2016">{{cite arXiv|last1=Zoph|first1=Barret|last2=Le|first2=Quoc V.|date=2016-11-04|title=Neural Architecture Search with Reinforcement Learning|eprint=1611.01578 |class=cs.LG}}</ref>
Learning a model architecture directly on a large dataset can be a lengthy process. NASNet<ref name="Zoph 2017" /><ref>{{Cite news|url=https://research.googleblog.com/2017/11/automl-for-large-scale-image.html|title=AutoML for large scale image classification and object detection|last1=Zoph|first1=Barret|date=November 2, 2017|work=Research Blog|access-date=2018-02-20|last2=Vasudevan|first2=Vijay|language=en-US|last3=Shlens|first3=Jonathon|last4=Le|first4=Quoc V.}}</ref> addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of [[Convolutional neural network|convolutional]] cells to return feature maps that serve two main functions when convoluting an input feature map: ''normal cells'' that return maps of the same extent (height and width) and ''reduction cells'' in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the
In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with [[Reinforcement learning|policy gradient]] to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical [[cross entropy]] loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.<ref>{{cite arXiv
== Evolution ==
An alternative approach to NAS is based on [[evolutionary algorithm]]s, which has been employed by several groups.<ref>{{cite arXiv|last1=Real|first1=Esteban|last2=Moore|first2=Sherry|last3=Selle|first3=Andrew|last4=Saxena|first4=Saurabh|last5=Suematsu|first5=Yutaka Leon|last6=Tan|first6=Jie|last7=Le|first7=Quoc|last8=Kurakin|first8=Alex|date=2017-03-03|title=Large-Scale Evolution of Image Classifiers|eprint=1703.01041|class=cs.NE}}</ref><ref>{{Cite arXiv|last1=Suganuma|first1=Masanori|last2=Shirakawa|first2=Shinichi|last3=Nagao|first3=Tomoharu|date=2017-04-03|title=A Genetic Programming Approach to Designing Convolutional Neural Network Architectures|class=cs.NE|eprint=1704.00764v2|language=en}}</ref><ref name=":0">{{Cite arXiv|last1=Liu|first1=Hanxiao|last2=Simonyan|first2=Karen|last3=Vinyals|first3=Oriol|last4=Fernando|first4=Chrisantha|last5=Kavukcuoglu|first5=Koray|date=2017-11-01|title=Hierarchical Representations for Efficient Architecture Search|class=cs.LG|eprint=1711.00436v2|language=en}}</ref><ref name="Real 2018">{{cite arXiv|last1=Real|first1=Esteban|last2=Aggarwal|first2=Alok|last3=Huang|first3=Yanping|last4=Le|first4=Quoc V.|date=2018-02-05|title=Regularized Evolution for Image Classifier Architecture Search|eprint=1802.01548|class=cs.NE}}</ref><ref>{{cite arXiv|last1=Miikkulainen|first1=Risto|last2=Liang|first2=Jason|last3=Meyerson|first3=Elliot|last4=Rawal|first4=Aditya|last5=Fink|first5=Dan|last6=Francon|first6=Olivier|last7=Raju|first7=Bala|last8=Shahrzad|first8=Hormoz|last9=Navruzyan|first9=Arshak|last10=Duffy|first10=Nigel|last11=Hodjat|first11=Babak|date=2017-03-04|title=Evolving Deep Neural Networks|class=cs.NE|eprint=1703.00548}}</ref><ref>{{Cite book|last1=Xie|first1=Lingxi|last2=Yuille|first2=Alan|title=2017 IEEE International Conference on Computer Vision (ICCV) |chapter=Genetic CNN
== Bayesian optimization ==
Line 26:
== Multi-objective search ==
While most approaches solely focus on finding architecture with maximal predictive performance, for most practical applications other objectives are relevant, such as memory consumption, model size or inference time (i.e., the time required to obtain a prediction). Because of that, researchers created a [[Multi-objective optimization|multi-objective]] search.<ref name="Elsken 2018">{{cite arXiv|last1=Elsken|first1=Thomas|last2=Metzen|first2=Jan Hendrik|last3=Hutter|first3=Frank|date=2018-04-24|title=Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution|eprint=1804.09081|class=stat.ML}}</ref><ref name="Zhou 2018">{{cite web|url=https://www.sysml.cc/doc/2018/94.pdf|title=Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction|last1=Zhou|first1=Yanqi|last2=Diamos|first2=Gregory|date=|website=|publisher=Baidu|access-date=2019-09-27|archive-date=2019-09-27|archive-url=https://web.archive.org/web/20190927090457/https://www.sysml.cc/doc/2018/94.pdf|url-status=dead}}</ref>
LEMONADE<ref name="Elsken 2018" /> is an evolutionary algorithm that adopted [[Lamarckism]] to efficiently optimize multiple objectives. In every generation, child networks are generated to improve the [[Pareto efficiency#Pareto frontier|Pareto frontier]] with respect to the current population of ANNs.
Line 35:
RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.<ref name="Zoph 2017" /><ref name="mNASNet2">{{cite arXiv|eprint=1807.11626|last1=Tan|first1=Mingxing|title=MnasNet: Platform-Aware Neural Architecture Search for Mobile|last2=Chen|first2=Bo|last3=Pang|first3=Ruoming|last4=Vasudevan|first4=Vijay|last5=Sandler|first5=Mark|last6=Howard|first6=Andrew|last7=Le|first7=Quoc V.|class=cs.CV|year=2018}}</ref><ref name="MobileNetV3">{{cite arXiv|date=2019-05-06|title=Searching for MobileNetV3|eprint=1905.02244|class=cs.CV|last1=Howard|first1=Andrew|last2=Sandler|first2=Mark|last3=Chu|first3=Grace|last4=Chen|first4=Liang-Chieh|last5=Chen|first5=Bo|last6=Tan|first6=Mingxing|last7=Wang|first7=Weijun|last8=Zhu|first8=Yukun|last9=Pang|first9=Ruoming|last10=Vasudevan|first10=Vijay|last11=Le|first11=Quoc V.|last12=Adam|first12=Hartwig}}</ref>
To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>
More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>
Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x ''less'' search time than was used for mNASNet.<ref name="FBNet">{{cite arXiv|eprint=1812.03443|last1=Wu|first1=Bichen|title=FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search|last2=Dai|first2=Xiaoliang|last3=Zhang|first3=Peizhao|last4=Wang|first4=Yanghan|last5=Sun|first5=Fei|last6=Wu|first6=Yiming|last7=Tian|first7=Yuandong|last8=Vajda|first8=Peter|last9=Jia|first9=Yangqing|last10=Keutzer|first10=Kurt|class=cs.CV|date=24 May 2019}}</ref><ref name="MobileNetV2">{{cite arXiv|eprint=1801.04381|last1=Sandler|first1=Mark|title=MobileNetV2: Inverted Residuals and Linear Bottlenecks|last2=Howard|first2=Andrew|last3=Zhu|first3=Menglong|last4=Zhmoginov|first4=Andrey|last5=Chen|first5=Liang-Chieh|class=cs.CV|year=2018}}</ref><ref>{{Cite web|url=
== Neural architecture search benchmarks ==
Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks<ref>
==See also==
*[[Neural Network Intelligence]]
*[[automated machine learning|Automated Machine Learning]]
*[[hyperparameter optimization|Hyperparameter Optimization]]
== Further reading ==
Survey articles.
* {{cite arXiv |eprint=1905.01392 |class=cs.LG |first1=Martin |last1=Wistuba |first2=Ambrish |last2=Rawat |title=A Survey on Neural Architecture Search |date=2019-05-04 |last3=Pedapati |first3=Tejaswini}}
* {{Cite journal |last1=Elsken |first1=Thomas |last2=Metzen |first2=Jan Hendrik |last3=Hutter |first3=Frank |date=August 8, 2019 |title=Neural Architecture Search: A Survey |url=http://jmlr.org/papers/v20/18-598.html |journal=Journal of Machine Learning Research |volume=20 |issue=55 |pages=1–21 |arxiv=1808.05377}}
* {{cite journal |last1=Liu |first1=Yuqiao |last2=Sun |first2=Yanan |last3=Xue |first3=Bing |last4=Zhang |first4=Mengjie |last5=Yen |first5=Gary G |last6=Tan |first6=Kay Chen |year=2021 |title=A Survey on Evolutionary Neural Architecture Search |journal=IEEE Transactions on Neural Networks and Learning Systems |volume= 34|issue=2 |pages=1–21 |arxiv=2008.10937 |doi=10.1109/TNNLS.2021.3100554 |pmid=34357870 |s2cid=221293236}}
* {{cite arXiv |last1=White |first1=Colin |title=Neural Architecture Search: Insights from 1000 Papers |date=2023-01-25 |eprint=2301.08727 |last2=Safari |first2=Mahmoud |last3=Sukthanker |first3=Rhea |last4=Ru |first4=Binxin |last5=Elsken |first5=Thomas |last6=Zela |first6=Arber |last7=Dey |first7=Debadeepta |last8=Hutter |first8=Frank|class=cs.LG }}
==References==
Line 51 ⟶ 61:
{{Differentiable computing}}
[[Category:Artificial intelligence engineering]]▼
▲[[Category:Artificial intelligence]]
|