Neural architecture search: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:01, 4 February 2024 edit Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,183 edits ce ← Previous edit		Latest revision as of 00:29, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,866,392 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 893/990
(12 intermediate revisions by 8 users not shown)
Line 1: {{Short description\|Machine learning-powered structure design}} {{Machine learning bar}} '''Neural architecture search''' ('''NAS''')<ref name="survey">{{Cite journal\|url=http://jmlr.org/papers/v20/18-598.html\|title=Neural Architecture Search: A Survey\|first1=Thomas\|last1=Elsken\|first2=Jan Hendrik\|last2=Metzen\|first3=Frank\|last3=Hutter\|date=August 8, 2019\|journal=Journal of Machine Learning Research\|volume=20\|issue=55\|pages=1–21\|arxiv=1808.05377}}</ref><ref name="survey2">{{cite arXiv\|last1=Wistuba\|first1=Martin\|last2=Rawat\|first2=Ambrish\|last3=Pedapati\|first3=Tejaswini\|date=2019-05-04\|title=A Survey on Neural Architecture Search\|eprint=1905.01392\|class=cs.LG}}</ref> is a technique for automating the design of [[artificial neural network]]s (ANN), a widely used model in the field of [[machine learning]]. NAS has been used to design networks that are on par with or outperform hand-designed architectures.<ref name="Zoph 2016" /><ref name="Zoph 2017">{{cite arXiv\|last1=Zoph\|first1=Barret\|last2=Vasudevan\|first2=Vijay\|last3=Shlens\|first3=Jonathon\|last4=Le\|first4=Quoc V.\|date=2017-07-21\|title=Learning Transferable Architectures for Scalable Image Recognition\|eprint=1707.07012\|class=cs.CV}}</ref> Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:<ref name="survey" /> * The ''search space'' defines the type(s) of ANN that can be designed and optimized. Line 7: * The ''performance estimation strategy'' evaluates the performance of a possible ANN from its design (without constructing and training it). NAS is closely related to [[hyperparameter optimization]]<ref>Matthias Feurer and Frank Hutter. [https://link.springer.com/content/pdf/10.1007%2F978-3-030-05318-5_1.pdf Hyperparameter optimization]. In: ''AutoML: Methods, Systems, Challenges'', pages 3–38.</ref> and [[meta-learning (computer science)\|meta-learning]]<ref>{{Cite book\|chapter-url=https://link.springer.com/chapter/10.1007/978-3-030-05318-5_2\|doi = 10.1007/978-3-030-05318-5_2\|chapter = Meta-Learning\|title = Automated Machine Learning\|series = The Springer Series on Challenges in Machine Learning\|year = 2019\|last1 = Vanschoren\|first1 = Joaquin\|pages = 35–61\|isbn = 978-3-030-05317-8\|s2cid = 239362577}}</ref> and is a subfield of [[automated machine learning]] (AutoML).<ref>{{Cite journal \|last1=Salehin \|first1=Imrus \|last2=Islam \|first2=Md. Shamiul \|last3=Saha \|first3=Pritom \|last4=Noman \|first4=S. M. \|last5=Tuni \|first5=Azra \|last6=Hasan \|first6=Md. Mehedi \|last7=Baten \|first7=Md. Abu \|date=2024-01-01 \|title=AutoML: A systematic review on automated machine learning with neural architecture search \|journal=Journal of Information and Intelligence \|volume=2 \|issue=1 \|pages=52–81 \|doi=10.1016/j.jiixd.2023.10.002 \|issn=2949-7159\|doi-access=free }}</ref> ==Reinforcement learning== Line 14: Learning a model architecture directly on a large dataset can be a lengthy process. NASNet<ref name="Zoph 2017" /><ref>{{Cite news\|url=https://research.googleblog.com/2017/11/automl-for-large-scale-image.html\|title=AutoML for large scale image classification and object detection\|last1=Zoph\|first1=Barret\|date=November 2, 2017\|work=Research Blog\|access-date=2018-02-20\|last2=Vasudevan\|first2=Vijay\|language=en-US\|last3=Shlens\|first3=Jonathon\|last4=Le\|first4=Quoc V.}}</ref> addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of [[Convolutional neural network\|convolutional]] cells to return feature maps that serve two main functions when convoluting an input feature map: ''normal cells'' that return maps of the same extent (height and width) and ''reduction cells'' in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell's inputs uses a stride of two (to reduce the height and width).<ref name="Zoph 2017" /> The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the [[ImageNet]] dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer [[FLOPS]]—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the [[COCO (dataset)\|COCO]] dataset.<ref name="Zoph 2017" /> In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with [[Reinforcement learning\|policy gradient]] to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical [[cross entropy]] loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.<ref>{{cite arXiv~~\|last1=Hieu\|first1=Pham\|last2=Y.\|first2=Guan,~~ ~~Melody\|last3=Barret\|first3=Zoph\|last4=V.\|first4=Le, Quoc\|last5=Jeff\|first5=Dean~~\|date=2018-02-09\|title=Efficient Neural Architecture Search via Parameter Sharing\|eprint=1802.03268\|class=cs.LG \|last1=Pham \|first1=Hieu \|last2=Guan \|first2=Melody Y. \|last3=Zoph \|first3=Barret \|last4=Le \|first4=Quoc V. \|last5=Dean \|first5=Jeff }}</ref> == Evolution == An alternative approach to NAS is based on [[evolutionary algorithm]]s, which has been employed by several groups.<ref>{{cite arXiv\|last1=Real\|first1=Esteban\|last2=Moore\|first2=Sherry\|last3=Selle\|first3=Andrew\|last4=Saxena\|first4=Saurabh\|last5=Suematsu\|first5=Yutaka Leon\|last6=Tan\|first6=Jie\|last7=Le\|first7=Quoc\|last8=Kurakin\|first8=Alex\|date=2017-03-03\|title=Large-Scale Evolution of Image Classifiers\|eprint=1703.01041\|class=cs.NE}}</ref><ref>{{Cite arXiv\|last1=Suganuma\|first1=Masanori\|last2=Shirakawa\|first2=Shinichi\|last3=Nagao\|first3=Tomoharu\|date=2017-04-03\|title=A Genetic Programming Approach to Designing Convolutional Neural Network Architectures\|class=cs.NE\|eprint=1704.00764v2\|language=en}}</ref><ref name=":0">{{Cite arXiv\|last1=Liu\|first1=Hanxiao\|last2=Simonyan\|first2=Karen\|last3=Vinyals\|first3=Oriol\|last4=Fernando\|first4=Chrisantha\|last5=Kavukcuoglu\|first5=Koray\|date=2017-11-01\|title=Hierarchical Representations for Efficient Architecture Search\|class=cs.LG\|eprint=1711.00436v2\|language=en}}</ref><ref name="Real 2018">{{cite arXiv\|last1=Real\|first1=Esteban\|last2=Aggarwal\|first2=Alok\|last3=Huang\|first3=Yanping\|last4=Le\|first4=Quoc V.\|date=2018-02-05\|title=Regularized Evolution for Image Classifier Architecture Search\|eprint=1802.01548\|class=cs.NE}}</ref><ref>{{cite arXiv\|last1=Miikkulainen\|first1=Risto\|last2=Liang\|first2=Jason\|last3=Meyerson\|first3=Elliot\|last4=Rawal\|first4=Aditya\|last5=Fink\|first5=Dan\|last6=Francon\|first6=Olivier\|last7=Raju\|first7=Bala\|last8=Shahrzad\|first8=Hormoz\|last9=Navruzyan\|first9=Arshak\|last10=Duffy\|first10=Nigel\|last11=Hodjat\|first11=Babak\|date=2017-03-04\|title=Evolving Deep Neural Networks\|class=cs.NE\|eprint=1703.00548}}</ref><ref>{{Cite book\|last1=Xie\|first1=Lingxi\|last2=Yuille\|first2=Alan\|title=2017 IEEE International Conference on Computer Vision (ICCV) \|chapter=Genetic CNN ~~\|chapter-url=https://ieeexplore.ieee.org/document/8237416~~\|year=2017\|pages=1388–1397\|doi=10.1109/ICCV.2017.154\|arxiv=1703.01513\|isbn=978-1-5386-1032-9\|s2cid=206770867}}</ref><ref name="Elsken 2018" /> An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure.<ref name="liu2021survey">{{cite journal\|last1=Liu\|first1=Yuqiao\|last2=Sun\|first2=Yanan\|last3=Xue\|first3=Bing\|last4=Zhang\|first4=Mengjie\|last5=Yen\|first5=Gary G\|last6=Tan\|first6=Kay Chen\|title=A Survey on Evolutionary Neural Architecture Search\|journal=IEEE Transactions on Neural Networks and Learning Systems\|year=2021\|volume=PP 34\|issue=2 \|pages=1–21\|doi=10.1109/TNNLS.2021.3100554\|pmid=34357870\|arxiv=2008.10937\|s2cid=221293236}}</ref> First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (e.g.: 3x3 convolution instead of a 5x5 convolution). Next the new architectures are trained from scratch for a few epochs and their validation scores are obtained. This is followed by replacing the lowest scoring architectures in the candidate pool with the better, newer architectures. This procedure is repeated multiple times and thus the candidate pool is refined over time. Mutations in the context of evolving ANNs are operations such as adding or removing a layer, which include changing the type of a layer (e.g., from convolution to pooling), changing the hyperparameters of a layer, or changing the training hyperparameters. On [[CIFAR-10]] and [[ImageNet]], evolution and RL performed comparably, while both slightly outperformed [[random search]].<ref name="Real 2018" /><ref name=":0" /> == Bayesian optimization == Line 26: == Multi-objective search == While most approaches solely focus on finding architecture with maximal predictive performance, for most practical applications other objectives are relevant, such as memory consumption, model size or inference time (i.e., the time required to obtain a prediction). Because of that, researchers created a [[Multi-objective optimization\|multi-objective]] search.<ref name="Elsken 2018">{{cite arXiv\|last1=Elsken\|first1=Thomas\|last2=Metzen\|first2=Jan Hendrik\|last3=Hutter\|first3=Frank\|date=2018-04-24\|title=Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution\|eprint=1804.09081\|class=stat.ML}}</ref><ref name="Zhou 2018">{{cite web\|url=https://www.sysml.cc/doc/2018/94.pdf\|title=Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction\|last1=Zhou\|first1=Yanqi\|last2=Diamos\|first2=Gregory\|date=\|website=\|publisher=Baidu\|access-date=2019-09-27\|archive-date=2019-09-27\|archive-url=https://web.archive.org/web/20190927090457/https://www.sysml.cc/doc/2018/94.pdf\|url-status=dead}}</ref> LEMONADE<ref name="Elsken 2018" /> is an evolutionary algorithm that adopted [[Lamarckism]] to efficiently optimize multiple objectives. In every generation, child networks are generated to improve the [[Pareto efficiency#Pareto frontier\|Pareto frontier]] with respect to the current population of ANNs. Line 35: RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers.<ref name="Zoph 2017" /><ref name="mNASNet2">{{cite arXiv\|eprint=1807.11626\|last1=Tan\|first1=Mingxing\|title=MnasNet: Platform-Aware Neural Architecture Search for Mobile\|last2=Chen\|first2=Bo\|last3=Pang\|first3=Ruoming\|last4=Vasudevan\|first4=Vijay\|last5=Sandler\|first5=Mark\|last6=Howard\|first6=Andrew\|last7=Le\|first7=Quoc V.\|class=cs.CV\|year=2018}}</ref><ref name="MobileNetV3">{{cite arXiv\|date=2019-05-06\|title=Searching for MobileNetV3\|eprint=1905.02244\|class=cs.CV\|last1=Howard\|first1=Andrew\|last2=Sandler\|first2=Mark\|last3=Chu\|first3=Grace\|last4=Chen\|first4=Liang-Chieh\|last5=Chen\|first5=Bo\|last6=Tan\|first6=Mingxing\|last7=Wang\|first7=Weijun\|last8=Zhu\|first8=Yukun\|last9=Pang\|first9=Ruoming\|last10=Vasudevan\|first10=Vijay\|last11=Le\|first11=Quoc V.\|last12=Adam\|first12=Hartwig}}</ref> To reduce computational cost, many recent NAS methods rely on the weight-sharing idea.<ref>{{cite ~~arxiv~~arXiv \|eprint=1802.03268 \|last1=Pham \|first1=Hieu \|last2=Guan \|first2=Melody Y. \|last3=Zoph \|first3=Barret \|last4=Le \|first4=Quoc V. \|last5=Dean \|first5=Jeff \|title=Efficient Neural Architecture Search via Parameter Sharing \|date=2018 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=1902.07638 \|last1=Li \|first1=Liam \|last2=Talwalkar \|first2=Ameet \|title=Random Search and Reproducibility for Neural Architecture Search \|date=2019 \|class=cs.LG }}</ref> In this approach, a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large [[Directed acyclic graph\|Directed Acyclic Graph]] (DAG) whose subgraphs are different candidate neural networks. Thus, in a supernetwork, the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days. More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space,<ref>{{cite ~~arxiv~~arXiv \|eprint=1812.00332 \|last1=Cai \|first1=Han \|last2=Zhu \|first2=Ligeng \|last3=Han \|first3=Song \|title=ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware \|date=2018 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=1910.04465 \|last1=Dong \|first1=Xuanyi \|last2=Yang \|first2=Yi \|title=Searching for a Robust Neural Architecture in Four GPU Hours \|date=2019 \|class=cs.CV }}</ref><ref name="H. Liu, K. Simonyan 1806">{{cite ~~arxiv~~arXiv \|eprint=1806.09055 \|last1=Liu \|first1=Hanxiao \|last2=Simonyan \|first2=Karen \|last3=Yang \|first3=Yiming \|title=DARTS: Differentiable Architecture Search \|date=2018 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=1812.09926 \|last1=Xie \|first1=Sirui \|last2=Zheng \|first2=Hehui \|last3=Liu \|first3=Chunxiao \|last4=Lin \|first4=Liang \|title=SNAS: Stochastic Neural Architecture Search \|date=2018 \|class=cs.LG }}</ref> which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS.<ref name="H. Liu, K. Simonyan 1806"/> However, DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms.<ref>{{cite ~~arxiv~~arXiv \|eprint=1911.12126 \|last1=Chu \|first1=Xiangxiang \|last2=Zhou \|first2=Tianbao \|last3=Zhang \|first3=Bo \|last4=Li \|first4=Jixiang \|title=Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search \|date=2019 \|class=cs.LG }}</ref><ref name="Arber Zela 1909">{{cite ~~arxiv~~arXiv \|eprint=1909.09656 \|last1=Zela \|first1=Arber \|last2=Elsken \|first2=Thomas \|last3=Saikia \|first3=Tonmoy \|last4=Marrakchi \|first4=Yassine \|last5=Brox \|first5=Thomas \|last6=Hutter \|first6=Frank \|title=Understanding and Robustifying Differentiable Architecture Search \|date=2019 \|class=cs.LG }}</ref><ref name="Xiangning Chen 2002">{{cite ~~arxiv~~arXiv \|eprint=2002.05283 \|last1=Chen \|first1=Xiangning \|last2=Hsieh \|first2=Cho-Jui \|title=Stabilizing Differentiable Architecture Search via Perturbation-based Regularization \|date=2020 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=1907.05737 \|last1=Xu \|first1=Yuhui \|last2=Xie \|first2=Lingxi \|last3=Zhang \|first3=Xiaopeng \|last4=Chen \|first4=Xin \|last5=Qi \|first5=Guo-Jun \|last6=Tian \|first6=Qi \|last7=Xiong \|first7=Hongkai \|title=PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search \|date=2019 \|class=cs.CV }}</ref> Methods like <ref name="Arber Zela 1909"/><ref name="Xiangning Chen 2002"/> aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively. The cause of performance degradation is later analyzed from the architecture selection aspect.<ref>{{cite ~~arxiv~~arXiv \|eprint=2108.04392 \|last1=Wang \|first1=Ruochen \|last2=Cheng \|first2=Minhao \|last3=Chen \|first3=Xiangning \|last4=Tang \|first4=Xiaocheng \|last5=Hsieh \|first5=Cho-Jui \|title=Rethinking Architecture Selection in Differentiable NAS \|date=2021 \|class=cs.LG }}</ref> Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x ''less'' search time than was used for mNASNet.<ref name="FBNet">{{cite arXiv\|eprint=1812.03443\|last1=Wu\|first1=Bichen\|title=FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search\|last2=Dai\|first2=Xiaoliang\|last3=Zhang\|first3=Peizhao\|last4=Wang\|first4=Yanghan\|last5=Sun\|first5=Fei\|last6=Wu\|first6=Yiming\|last7=Tian\|first7=Yuandong\|last8=Vajda\|first8=Peter\|last9=Jia\|first9=Yangqing\|last10=Keutzer\|first10=Kurt\|class=cs.CV\|date=24 May 2019}}</ref><ref name="MobileNetV2">{{cite arXiv\|eprint=1801.04381\|last1=Sandler\|first1=Mark\|title=MobileNetV2: Inverted Residuals and Linear Bottlenecks\|last2=Howard\|first2=Andrew\|last3=Zhu\|first3=Menglong\|last4=Zhmoginov\|first4=Andrey\|last5=Chen\|first5=Liang-Chieh\|class=cs.CV\|year=2018}}</ref><ref>{{Cite web\|url=~~http~~https://~~sites~~site.ieee.org/scv-cas/files/2019/05/2019-05-22-ieee-co-design-trim.pdf\|title=Co-Design of DNNs and NN Accelerators\|last=Keutzer\|first=Kurt\|date=2019-05-22\|website=IEEE~~\|url-status=\|archive-url=\|archive-date=~~\|access-date=2019-09-26}}</ref> Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search.<ref name="SqueezeNAS">{{cite arXiv\|eprint=1908.01748\|last1=Shaw\|first1=Albert\|title=SqueezeNAS: Fast neural architecture search for faster semantic segmentation\|last2=Hunter\|first2=Daniel\|last3=Iandola\|first3=Forrest\|last4=Sidhu\|first4=Sammy\|class=cs.CV\|year=2019}}</ref><ref>{{Cite news\|url=https://www.eetimes.com/document.asp?doc_id=1335063\|title=Does Your AI Chip Have Its Own DNN?\|last=Yoshida\|first=Junko\|date=2019-08-25\|work=EE Times\|access-date=2019-09-26}}</ref> == Neural architecture search benchmarks == Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks<ref>{{cite ~~arxiv~~arXiv \|eprint=1902.09635 \|last1=Ying \|first1=Chris \|last2=Klein \|first2=Aaron \|last3=Real \|first3=Esteban \|last4=Christiansen \|first4=Eric \|last5=Murphy \|first5=Kevin \|last6=Hutter \|first6=Frank \|title=NAS-Bench-101: Towards Reproducible Neural Architecture Search \|date=2019 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=2001.10422 \|last1=Zela \|first1=Arber \|last2=Siems \|first2=Julien \|last3=Hutter \|first3=Frank \|title=NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search \|date=2020 \|class=cs.LG }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=2001.00326 \|last1=Dong \|first1=Xuanyi \|last2=Yang \|first2=Yi \|title=NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search \|date=2020 \|class=cs.CV }}</ref><ref>{{cite ~~arxiv~~arXiv \|eprint=2008.09777 \|last1=Zela \|first1=Arber \|last2=Siems \|first2=Julien \|last3=Zimmer \|first3=Lucas \|last4=Lukasik \|first4=Jovita \|last5=Keuper \|first5=Margret \|last6=Hutter \|first6=Frank \|title=Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks \|date=2020 \|class=cs.LG }}</ref> have been introduced, from which one can either query or predict the final performance of neural architectures in seconds. A NAS benchmark is defined as a dataset with a fixed train-test split, a search space, and a fixed training pipeline (hyperparameters). There are primarily two types of NAS benchmarks: a surrogate NAS benchmark and a tabular NAS benchmark. A surrogate benchmark uses a surrogate model (e.g.: a neural network) to predict the performance of an architecture from the search space. On the other hand, a tabular benchmark queries the actual performance of an architecture trained up to convergence. Both of these benchmarks are queryable and can be used to efficiently simulate many NAS algorithms using only a CPU to query the benchmark instead of training an architecture from scratch. ==See also== [[Neural Network Intelligence]] [[automated machine learning\|Automated Machine Learning]] [[hyperparameter optimization\|Hyperparameter Optimization]] == Further reading == Line 52 ⟶ 54: {{cite arXiv \|eprint=1905.01392 \|class=cs.LG \|first1=Martin \|last1=Wistuba \|first2=Ambrish \|last2=Rawat \|title=A Survey on Neural Architecture Search \|date=2019-05-04 \|last3=Pedapati \|first3=Tejaswini}} * {{Cite journal \|last1=Elsken \|first1=Thomas \|last2=Metzen \|first2=Jan Hendrik \|last3=Hutter \|first3=Frank \|date=August 8, 2019 \|title=Neural Architecture Search: A Survey \|url=http://jmlr.org/papers/v20/18-598.html \|journal=Journal of Machine Learning Research \|volume=20 \|issue=55 \|pages=1–21 \|arxiv=1808.05377}} * {{cite journal \|last1=Liu \|first1=Yuqiao \|last2=Sun \|first2=Yanan \|last3=Xue \|first3=Bing \|last4=Zhang \|first4=Mengjie \|last5=Yen \|first5=Gary G \|last6=Tan \|first6=Kay Chen \|year=2021 \|title=A Survey on Evolutionary Neural Architecture Search \|journal=IEEE Transactions on Neural Networks and Learning Systems \|volume=PP 34\|issue=2 \|pages=1–21 \|arxiv=2008.10937 \|doi=10.1109/TNNLS.2021.3100554 \|pmid=34357870 \|s2cid=221293236}} * {{cite ~~arxiv~~arXiv \|last1=White \|first1=Colin \|title=Neural Architecture Search: Insights from 1000 Papers \|date=2023-01-25 \|~~arxiv~~eprint=2301.08727 \|last2=Safari \|first2=Mahmoud \|last3=Sukthanker \|first3=Rhea \|last4=Ru \|first4=Binxin \|last5=Elsken \|first5=Thomas \|last6=Zela \|first6=Arber \|last7=Dey \|first7=Debadeepta \|last8=Hutter \|first8=Frank\|class=cs.LG }} ==References== Line 59 ⟶ 61: {{Differentiable computing}} [[Category:Artificial intelligence engineering]]▼ ▲[[Category:Artificial intelligence]]