Revision as of 16:26, 2 May 2025 edit Istaminerd (talk \| contribs) 4 edits Add reference to Intersection over Union Tag: Visual edit ← Previous edit		Revision as of 03:33, 25 May 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. Next edit →
Line 1: {{Short description\|Machine learning model family}} [[File:R-cnn.svg\|thumb\|272x272px\|R-CNN architecture]] '''Region-based Convolutional Neural Networks (R-CNN)''' are a family of machine learning models for [[computer vision]], and specifically [[object detection]] and localization.<ref name=":0">{{Cite book \|last1=Zhang \|first1=Aston \|title=Dive into deep learning \|last2=Lipton \|first2=Zachary \|last3=Li \|first3=Mu \|last4=Smola \|first4=Alexander J. \|date=2024 \|publisher=Cambridge University Press \|isbn=978-1-009-38943-3 \|___location=Cambridge New York Port Melbourne New Delhi Singapore \|chapter=14.8. Region-based CNNs (R-CNNs) \|chapter-url=https://d2l.ai/chapter_computer-vision/rcnn.html}}</ref> The original goal of R-CNN was to take an input image and produce a set of [[Minimum bounding box\|bounding boxes]] as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search<ref name=":1">{{Cite journal \|last1=Uijlings \|first1=J. R. R. \|last2=van de Sande \|first2=K. E. A. \|last3=Gevers \|first3=T. \|last4=Smeulders \|first4=A. W. M. \|date=2013-09-01 \|title=Selective Search for Object Recognition \|url=https://link.springer.com/article/10.1007/s11263-013-0620-5 \|journal=International Journal of Computer Vision \|volume=104 \|issue=2 \|pages=154–171 \|doi=10.1007/s11263-013-0620-5 \|issn=1573-1405\|url-access=subscription }}</ref> over feature maps outputted by a CNN. R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera,<ref>{{Cite news \|last=Nene \|first=Vidi \|date=Aug 2, 2019 \|title=Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone \|url=https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/ \|access-date=Mar 28, 2020 \|work=Drone Below}}</ref> locating text in an image,<ref>{{Cite news \|last=Ray \|first=Tiernan \|date=Sep 11, 2018 \|title=Facebook pumps up character recognition to mine memes \|url=https://www.zdnet.com/article/facebook-pumps-up-character-recognition-to-mine-memes/ \|access-date=Mar 28, 2020 \|publisher=[[ZDNET]]}}</ref> and enabling object detection in [[Google Lens]].<ref>{{Cite news \|last=Sagar \|first=Ram \|date=Sep 9, 2019 \|title=These machine learning methods make google lens a success \|url=https://analyticsindiamag.com/these-machine-learning-techniques-make-google-lens-a-success/ \|access-date=Mar 28, 2020 \|work=Analytics India}}</ref> Line 22: === Selective search === Given an image (or an image-like feature map), '''selective search''' (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004),<ref>{{Cite journal \|last1=Felzenszwalb \|first1=Pedro F. \|last2=Huttenlocher \|first2=Daniel P. \|date=2004-09-01 \|title=Efficient Graph-Based Image Segmentation \|url=https://link.springer.com/article/10.1023/B:VISI.0000022288.19776.77 \|journal=International Journal of Computer Vision \|language=en \|volume=59 \|issue=2 \|pages=167–181 \|doi=10.1023/B:VISI.0000022288.19776.77 \|issn=1573-1405\|url-access=subscription }}</ref> then performs the following:<ref name=":1" /> <pre> Input: (colour) image

Region Based Convolutional Neural Networks: Difference between revisions