Region Based Convolutional Neural Networks: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:58, 6 September 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits mNo edit summary Tag: Visual edit ← Previous edit		Latest revision as of 07:39, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,867,078 edits Added bibcode. Removed URL that duplicated identifier. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 251/967
(21 intermediate revisions by 12 users not shown)
Line 1: {{Short description\|Machine learning model family}} [[File:R-cnn.svg\|thumb\|272x272px\|R-CNN architecture]] '''Region-based Convolutional Neural Networks (R-CNN)''' are a family of machine learning models for [[computer vision]], and specifically [[object detection]] and localization.<ref name=":0">{{Cite book \|~~last~~last1=Zhang \|~~first~~first1=Aston \|title=Dive into deep learning \|last2=Lipton \|first2=Zachary \|last3=Li \|first3=Mu \|last4=Smola \|first4=Alexander J. \|date=2024 \|publisher=Cambridge University Press \|isbn=978-1-009-38943-3 \|___location=Cambridge New York Port Melbourne New Delhi Singapore \|chapter=14.8. Region-based CNNs (R-CNNs) \|chapter-url=https://d2l.ai/chapter_computer-vision/rcnn.html}}</ref> The original goal of R-CNN was to take an input image and produce a set of [[Minimum bounding box\|bounding boxes]] as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search<ref name=":1">{{Cite journal \|~~last~~last1=Uijlings \|~~first~~first1=J. R. R. \|last2=van de Sande \|first2=K. E. A. \|last3=Gevers \|first3=T. \|last4=Smeulders \|first4=A. W. M. \|date=2013-09-01 \|title=Selective Search for Object Recognition \|url=https://link.springer.com/article/10.1007/s11263-013-0620-5 \|journal=International Journal of Computer Vision ~~\|language=en~~ \|volume=104 \|issue=2 \|pages=154–171 \|doi=10.1007/s11263-013-0620-5 \|issn=1573-1405\|url-access=subscription }}</ref> over feature maps outputted by a CNN. R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera,<ref>{{Cite news \|last=Nene \|first=Vidi \|date=Aug 2, 2019 \|title=Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone \|url=https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/ \|access-date=Mar 28, 2020 \|work=Drone Below}}</ref> locating text in an image,<ref>{{Cite news \|last=Ray \|first=Tiernan \|date=Sep 11, 2018 \|title=Facebook pumps up character recognition to mine memes \|url=https://www.zdnet.com/article/facebook-pumps-up-character-recognition-to-mine-memes/ \|access-date=Mar 28, 2020 \|publisher=[[ZDNET]]}}</ref> and enabling object detection in [[Google Lens]].<ref>{{Cite news \|last=Sagar \|first=Ram \|date=Sep 9, 2019 \|title=These machine learning methods make google lens a success \|url=https://analyticsindiamag.com/these-machine-learning-techniques-make-google-lens-a-success/ \|access-date=Mar 28, 2020 \|work=Analytics India}}</ref> Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.<ref>{{cite arXiv \|eprint=1910.01500v3 \|class=math.LG \|first=Peter \|last=Mattson \|title=MLPerf Training Benchmark \|date=2019 \|display-authors=etal}}</ref> Line 12: * November 2013: '''R-CNN'''.<ref name=":2" /> * April 2015: '''Fast R-CNN'''.<ref name=":3">{{Cite ~~journal~~book \|last=Girshick \|first=Ross \|~~date=07-13 December 2015 \|title~~chapter=Fast R-CNN \|~~url~~date=~~http://ieeexplore.ieee.org/document/7410526/~~7–13 December 2015 \|~~journal~~title=2015 IEEE International Conference on Computer Vision (ICCV) \|publisher=IEEE \|pages=1440–1448 \|doi=10.1109/ICCV.2015.169 \|isbn=978-1-4673-8391-2}}</ref> * June 2015: '''Faster R-CNN'''.<ref name=":4">{{Cite journal \|~~last~~last1=Ren \|~~first~~first1=Shaoqing \|last2=He \|first2=Kaiming \|last3=Girshick \|first3=Ross \|last4=Sun \|first4=Jian \|date=2017-06-01 \|title=Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ~~\|url=http://ieeexplore.ieee.org/document/7485869/~~ \|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence \|volume=39 \|issue=6 \|pages=1137–1149 \|doi=10.1109/TPAMI.2016.2577031 \|pmid=27295650 \|issn=0162-8828\|arxiv=1506.01497 \|bibcode=2017ITPAM..39.1137R }}</ref> * March 2017: '''Mask R-CNN'''.<ref name=":5">{{Cite ~~journal~~book \|~~last~~last1=He \|~~first~~first1=Kaiming \|last2=Gkioxari \|first2=Georgia \|last3=Dollar \|first3=Piotr \|last4=Girshick \|first4=Ross \|~~date=2017-10 \|title~~chapter=Mask R-CNN \|~~url~~date=~~http://ieeexplore.ieee.org/document/8237584/~~October 2017 \|title=2017 IEEE International Conference on Computer Vision (ICCV) \|publisher=IEEE \|pages=2980–2988 \|doi=10.1109/ICCV.2017.322 \|isbn=978-1-5386-1032-9}}</ref> * ~~June~~December ~~2019~~2017: '''~~Mesh~~Cascade R-CNN''' ~~adds~~is trained with increasing Intersection over Union (IoU, also known as the ~~ability~~[[Jaccard toindex]]) ~~generate~~thresholds, amaking 3Deach ~~mesh~~stage ~~from~~more aselective 2Dagainst ~~image~~nearby false positives.<ref>{{Cite journal \|~~last~~last1=~~Gkioxari~~Cai \|~~first~~first1=~~Georgia~~Zhaowei \|last2=~~Malik~~Vasconcelos \|first2=~~Jitendra \|last3=Johnson \|first3=Justin~~Nuno \|date=~~2019~~2017 \|title=~~Mesh~~Cascade R-CNN: Delving into High Quality Object Detection \|~~url~~arxiv=~~https://openaccess~~1712.~~thecvf.com/content_ICCV_2019/html/Gkioxari_Mesh_R-CNN_ICCV_2019_paper.html~~00726 ~~\|pages=9785–9795~~}}</ref> * June 2019: '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite journal \|last1=Gkioxari \|first1=Georgia \|last2=Malik \|first2=Jitendra \|last3=Johnson \|first3=Justin \|date=2019 \|title=Mesh R-CNN \|url=https://openaccess.thecvf.com/content_ICCV_2019/html/Gkioxari_Mesh_R-CNN_ICCV_2019_paper.html \|pages=9785–9795\|arxiv=1906.02739 }}</ref> == Architecture == For review articles see .<ref name=":0" /><ref>{{Cite news \|last=Farooq \|first=Umer \|date=February 15, 2018 \|title=From R-CNN to Mask R-CNN \|url=https://medium.com/@umerfarooq_26378/from-r-cnn-to-mask-r-cnn-d6367b196cfd \|access-date=March 12, 2020 \|work=Medium}}</ref><ref>{{Cite news \|last=Weng \|first=Lilian \|date=December 31, 2017 \|title=Object Detection for Dummies Part 3: R-CNN Family \|url=https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html \|access-date=March 12, 2020 \|work=Lil'Log}}</ref>. === Selective search === Given an image (or an image-like feature map), '''selective search''' (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004),<ref>{{Cite journal \|last1=Felzenszwalb \|first1=Pedro F. \|last2=Huttenlocher \|first2=Daniel P. \|date=2004-09-01 \|title=Efficient Graph-Based Image Segmentation \|url=https://link.springer.com/article/10.1023/B:VISI.0000022288.19776.77 \|journal=International Journal of Computer Vision \|language=en \|volume=59 \|issue=2 \|pages=167–181 \|doi=10.1023/B:VISI.0000022288.19776.77 \|issn=1573-1405\|url-access=subscription }}</ref> then performs the following:<ref name=":1" /> '''Input:''' (colour) image '''Output:''' Set of object ___location hypotheses L Segment image into initial regions R = {r<sub>1</sub>, ..., r<sub>n</sub>} using Felzenszwalb and Huttenlocher (2004) Initialise similarity set S = ∅ '''foreach''' Neighbouring region pair (r<sub>i</sub>, r<sub>j</sub>) do Calculate similarity s(r<sub>i</sub>, r<sub>j</sub>) S = S ∪ s(r<sub>i</sub>, r<sub>j</sub>) '''while''' S ≠ ∅ do Get highest similarity s(r<sub>i</sub>, r<sub>j</sub>) = max(S) Merge corresponding regions r<sub>t</sub> = r<sub>i</sub> ∪ r<sub>j</sub> Remove similarities regarding r<sub>i</sub>: S = S \ s(r<sub>i</sub>, r∗) Remove similarities regarding r<sub>j</sub>: S = S \ s(r∗, r<sub>j</sub>) Calculate similarity set S<sub>t</sub> between r<sub>t</sub> and its neighbours S = S ∪ S<sub>t</sub> R = R ∪ r<sub>t</sub> Extract object ___location boxes L from all regions in R === R-CNN === [[File:R-cnn.svg\|thumb\|272x272px\|R-CNN architecture]] Given an input image, R-CNN begins by applying ~~a mechanism called~~ selective search~~<ref name=":1" />~~ to extract [[Region of interest\|regions of interest]] (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as {{nobr\|two thousand}} ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, aan ~~collection~~ensemble of [[support-vector machine]] classifiers is used to determine what type of object (if any) is contained within the ROI.<ref name=":2">{{Cite journal \|~~last~~last1=Girshick \|~~first~~first1=Ross \|last2=Donahue \|first2=Jeff \|last3=Darrell \|first3=Trevor \|last4=Malik \|first4=Jitendra \|date=2016-01-01 \|title=Region-Based Convolutional Networks for Accurate Object Detection and Segmentation ~~\|url=http://ieeexplore.ieee.org/document/7112511/~~ \|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence \|volume=38 \|issue=1 \|pages=142–158 \|doi=10.1109/TPAMI.2015.2437384 \|pmid=26656583 \|bibcode=2016ITPAM..38..142G \|issn=0162-8828}}</ref> {{-}} === Fast R-CNN === [[File:Fast-rcnn.svg\|thumb\|Fast R-CNN]]While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image.<ref name=":3" /> [[File:RoI_pooling_animated.gif\|thumb\|268x268px\|RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.]] At the end of the network is a '''ROIPooling''' module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals. {{-}} === Faster R-CNN === [[File:Faster-rcnn.svg\|thumb\|Faster R-CNN]]While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":4" /> {{-}} === Mask R-CNN === [[File:Mask-rcnn.svg\|thumb\|Mask R-CNN]]While previous versions of R-CNN focused on object ~~detection~~detections, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.<ref name=":5" /> == References == <references /> == Further reading == * {{Cite web \|last=Parthasarathy \|first=Dhruv \|date=2017-04-27 \|title=A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN \|url=https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 \|access-date=2024-09-11 \|website=Medium \|language=en}} [[Category:Object recognition and categorization]] [[Category:Deep learning]]