Neural architecture search: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered template type. Add: eprint, class, date, title, authors 1-7. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
Reinforcement learning: | Add: authors 1-5. | Use this tool. Report bugs. | #UCB_Gadget
Line 14:
Learning a model architecture directly on a large dataset can be a lengthy process. NASNet<ref name="Zoph 2017" /><ref>{{Cite news|url=https://research.googleblog.com/2017/11/automl-for-large-scale-image.html|title=AutoML for large scale image classification and object detection|last1=Zoph|first1=Barret|date=November 2, 2017|work=Research Blog|access-date=2018-02-20|last2=Vasudevan|first2=Vijay|language=en-US|last3=Shlens|first3=Jonathon|last4=Le|first4=Quoc V.}}</ref> addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of [[Convolutional neural network|convolutional]] cells to return feature maps that serve two main functions when convoluting an input feature map: ''normal cells'' that return maps of the same extent (height and width) and ''reduction cells'' in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell's inputs uses a stride of two (to reduce the height and width).<ref name="Zoph 2017" /> The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the [[ImageNet]] dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer [[FLOPS]]—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the [[COCO (dataset)|COCO]] dataset.<ref name="Zoph 2017" />
 
In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with [[Reinforcement learning|policy gradient]] to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical [[cross entropy]] loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8.<ref>{{cite arXiv|last1=Hieu|first1=Pham|last2=Y.|first2=Guan, Melody|last3=Barret|first3=Zoph|last4=V.|first4=Le, Quoc|last5=Jeff|first5=Dean|date=2018-02-09|title=Efficient Neural Architecture Search via Parameter Sharing|eprint=1802.03268|class=cs.LG |last1=Pham |first1=Hieu |last2=Guan |first2=Melody Y. |last3=Zoph |first3=Barret |last4=Le |first4=Quoc V. |last5=Dean |first5=Jeff }}</ref>
 
== Evolution ==