Region Based Convolutional Neural Networks: Difference between revisions

Content deleted Content added
mNo edit summary
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 251/967
 
(21 intermediate revisions by 12 users not shown)
Line 1:
{{Short description|Machine learning model family}}
[[File:R-cnn.svg|thumb|272x272px|R-CNN architecture]]
'''Region-based Convolutional Neural Networks (R-CNN)''' are a family of machine learning models for [[computer vision]], and specifically [[object detection]] and localization.<ref name=":0">{{Cite book |lastlast1=Zhang |firstfirst1=Aston |title=Dive into deep learning |last2=Lipton |first2=Zachary |last3=Li |first3=Mu |last4=Smola |first4=Alexander J. |date=2024 |publisher=Cambridge University Press |isbn=978-1-009-38943-3 |___location=Cambridge New York Port Melbourne New Delhi Singapore |chapter=14.8. Region-based CNNs (R-CNNs) |chapter-url=https://d2l.ai/chapter_computer-vision/rcnn.html}}</ref> The original goal of R-CNN was to take an input image and produce a set of [[Minimum bounding box|bounding boxes]] as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search<ref name=":1">{{Cite journal |lastlast1=Uijlings |firstfirst1=J. R. R. |last2=van de Sande |first2=K. E. A. |last3=Gevers |first3=T. |last4=Smeulders |first4=A. W. M. |date=2013-09-01 |title=Selective Search for Object Recognition |url=https://link.springer.com/article/10.1007/s11263-013-0620-5 |journal=International Journal of Computer Vision |language=en |volume=104 |issue=2 |pages=154–171 |doi=10.1007/s11263-013-0620-5 |issn=1573-1405|url-access=subscription }}</ref> over feature maps outputted by a CNN.
 
R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera,<ref>{{Cite news |last=Nene |first=Vidi |date=Aug 2, 2019 |title=Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone |url=https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/ |access-date=Mar 28, 2020 |work=Drone Below}}</ref> locating text in an image,<ref>{{Cite news |last=Ray |first=Tiernan |date=Sep 11, 2018 |title=Facebook pumps up character recognition to mine memes |url=https://www.zdnet.com/article/facebook-pumps-up-character-recognition-to-mine-memes/ |access-date=Mar 28, 2020 |publisher=[[ZDNET]]}}</ref> and enabling object detection in [[Google Lens]].<ref>{{Cite news |last=Sagar |first=Ram |date=Sep 9, 2019 |title=These machine learning methods make google lens a success |url=https://analyticsindiamag.com/these-machine-learning-techniques-make-google-lens-a-success/ |access-date=Mar 28, 2020 |work=Analytics India}}</ref>
 
Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.<ref>{{cite arXiv |eprint=1910.01500v3 |class=math.LG |first=Peter |last=Mattson |title=MLPerf Training Benchmark |date=2019 |display-authors=etal}}</ref>
Line 12:
 
* November 2013: '''R-CNN'''.<ref name=":2" />
* April 2015: '''Fast R-CNN'''.<ref name=":3">{{Cite journalbook |last=Girshick |first=Ross |date=07-13 December 2015 |titlechapter=Fast R-CNN |urldate=http://ieeexplore.ieee.org/document/7410526/7–13 December 2015 |journaltitle=2015 IEEE International Conference on Computer Vision (ICCV) |publisher=IEEE |pages=1440–1448 |doi=10.1109/ICCV.2015.169 |isbn=978-1-4673-8391-2}}</ref>
* June 2015: '''Faster R-CNN'''.<ref name=":4">{{Cite journal |lastlast1=Ren |firstfirst1=Shaoqing |last2=He |first2=Kaiming |last3=Girshick |first3=Ross |last4=Sun |first4=Jian |date=2017-06-01 |title=Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks |url=http://ieeexplore.ieee.org/document/7485869/ |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=39 |issue=6 |pages=1137–1149 |doi=10.1109/TPAMI.2016.2577031 |pmid=27295650 |issn=0162-8828|arxiv=1506.01497 |bibcode=2017ITPAM..39.1137R }}</ref>
* March 2017: '''Mask R-CNN'''.<ref name=":5">{{Cite journalbook |lastlast1=He |firstfirst1=Kaiming |last2=Gkioxari |first2=Georgia |last3=Dollar |first3=Piotr |last4=Girshick |first4=Ross |date=2017-10 |titlechapter=Mask R-CNN |urldate=http://ieeexplore.ieee.org/document/8237584/October 2017 |title=2017 IEEE International Conference on Computer Vision (ICCV) |publisher=IEEE |pages=2980–2988 |doi=10.1109/ICCV.2017.322 |isbn=978-1-5386-1032-9}}</ref>
* JuneDecember 20192017: '''MeshCascade R-CNN''' addsis trained with increasing Intersection over Union (IoU, also known as the ability[[Jaccard toindex]]) generatethresholds, amaking 3Deach meshstage frommore aselective 2Dagainst imagenearby false positives.<ref>{{Cite journal |lastlast1=GkioxariCai |firstfirst1=GeorgiaZhaowei |last2=MalikVasconcelos |first2=Jitendra |last3=Johnson |first3=JustinNuno |date=20192017 |title=MeshCascade R-CNN: Delving into High Quality Object Detection |urlarxiv=https://openaccess1712.thecvf.com/content_ICCV_2019/html/Gkioxari_Mesh_R-CNN_ICCV_2019_paper.html00726 |pages=9785–9795}}</ref>
* June 2019: '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite journal |last1=Gkioxari |first1=Georgia |last2=Malik |first2=Jitendra |last3=Johnson |first3=Justin |date=2019 |title=Mesh R-CNN |url=https://openaccess.thecvf.com/content_ICCV_2019/html/Gkioxari_Mesh_R-CNN_ICCV_2019_paper.html |pages=9785–9795|arxiv=1906.02739 }}</ref>
 
== Architecture ==
For review articles see .<ref name=":0" /><ref>{{Cite news |last=Farooq |first=Umer |date=February 15, 2018 |title=From R-CNN to Mask R-CNN |url=https://medium.com/@umerfarooq_26378/from-r-cnn-to-mask-r-cnn-d6367b196cfd |access-date=March 12, 2020 |work=Medium}}</ref><ref>{{Cite news |last=Weng |first=Lilian |date=December 31, 2017 |title=Object Detection for Dummies Part 3: R-CNN Family |url=https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html |access-date=March 12, 2020 |work=Lil'Log}}</ref>.
 
=== Selective search ===
Given an image (or an image-like feature map), '''selective search''' (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004),<ref>{{Cite journal |last1=Felzenszwalb |first1=Pedro F. |last2=Huttenlocher |first2=Daniel P. |date=2004-09-01 |title=Efficient Graph-Based Image Segmentation |url=https://link.springer.com/article/10.1023/B:VISI.0000022288.19776.77 |journal=International Journal of Computer Vision |language=en |volume=59 |issue=2 |pages=167–181 |doi=10.1023/B:VISI.0000022288.19776.77 |issn=1573-1405|url-access=subscription }}</ref> then performs the following:<ref name=":1" />
 
'''Input:''' (colour) image
'''Output:''' Set of object ___location hypotheses L
Segment image into initial regions R = {r<sub>1</sub>, ..., r<sub>n</sub>} using Felzenszwalb and Huttenlocher (2004)
Initialise similarity set S = ∅
'''foreach''' Neighbouring region pair (r<sub>i</sub>, r<sub>j</sub>) do
Calculate similarity s(r<sub>i</sub>, r<sub>j</sub>)
S = S ∪ s(r<sub>i</sub>, r<sub>j</sub>)
'''while''' S ≠ ∅ do
Get highest similarity s(r<sub>i</sub>, r<sub>j</sub>) = max(S)
Merge corresponding regions r<sub>t</sub> = r<sub>i</sub> ∪ r<sub>j</sub>
Remove similarities regarding r<sub>i</sub>: S = S \ s(r<sub>i</sub>, r∗)
Remove similarities regarding r<sub>j</sub>: S = S \ s(r∗, r<sub>j</sub>)
Calculate similarity set S<sub>t</sub> between r<sub>t</sub> and its neighbours
S = S ∪ S<sub>t</sub>
R = R ∪ r<sub>t</sub>
Extract object ___location boxes L from all regions in R
 
=== R-CNN ===
[[File:R-cnn.svg|thumb|272x272px|R-CNN architecture]]
Given an input image, R-CNN begins by applying a mechanism called selective search<ref name=":1" /> to extract [[Region of interest|regions of interest]] (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as {{nobr|two thousand}} ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, aan collectionensemble of [[support-vector machine]] classifiers is used to determine what type of object (if any) is contained within the ROI.<ref name=":2">{{Cite journal |lastlast1=Girshick |firstfirst1=Ross |last2=Donahue |first2=Jeff |last3=Darrell |first3=Trevor |last4=Malik |first4=Jitendra |date=2016-01-01 |title=Region-Based Convolutional Networks for Accurate Object Detection and Segmentation |url=http://ieeexplore.ieee.org/document/7112511/ |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=38 |issue=1 |pages=142–158 |doi=10.1109/TPAMI.2015.2437384 |pmid=26656583 |bibcode=2016ITPAM..38..142G |issn=0162-8828}}</ref>
{{-}}
 
=== Fast R-CNN ===
[[File:Fast-rcnn.svg|thumb|Fast R-CNN]]While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image.<ref name=":3" />
[[File:RoI_pooling_animated.gif|thumb|268x268px|RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.]]
 
At the end of the network is a '''ROIPooling''' module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals.
{{-}}
 
=== Faster R-CNN ===
[[File:Faster-rcnn.svg|thumb|Faster R-CNN]]While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":4" />
{{-}}
 
=== Mask R-CNN ===
[[File:Mask-rcnn.svg|thumb|Mask R-CNN]]While previous versions of R-CNN focused on object detectiondetections, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.<ref name=":5" />
 
== References ==
<references />
 
== Further reading ==
 
* {{Cite web |last=Parthasarathy |first=Dhruv |date=2017-04-27 |title=A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN |url=https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 |access-date=2024-09-11 |website=Medium |language=en}}
 
[[Category:Object recognition and categorization]]
[[Category:Deep learning]]