Region Based Convolutional Neural Networks: Difference between revisions

Content deleted Content added
AnomieBOT (talk | contribs)
m Dating maintenance tags: {{Cn}}
added links to page describing the aforementioned "Selective Search" algorithm.
Line 7:
The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. More recently, R-CNN has been extended to perform other computer vision tasks. The following covers some of the versions of R-CNN that have been developed.
 
* November 2013: '''R-CNN'''. Given an input image, R-CNN begins by applying a mechanism called [[Selective Search (Object Recognition)|Selective Search]] to extract [[Region of interest|regions of interest]] (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as {{nobr|two thousand}} ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, a collection of [[support-vector machine]] classifiers is used to determine what type of object (if any) is contained within the ROI.{{cn|date=June 2024}}
* April 2015: '''Fast R-CNN'''. While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image. At the end of the network is a novel method called ROIPooling, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses [[Selective Search (Object Recognition)|Selective Search]] to generate its region proposals.<ref name=":0">{{Cite news|last=Bhatia|first=Richa|url=https://analyticsindiamag.com/what-is-region-of-interest-pooling/|title=What is region of interest pooling?|date=September 10, 2018|work=Analytics India|access-date=March 12, 2020}}</ref>
* June 2015: '''Faster R-CNN'''. While Fast R-CNN used [[Selective Search (Object Recognition)|Selective Search]] to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":0" />
* March 2017: '''Mask R-CNN'''. While previous versions of R-CNN focused on object detection, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.<ref>{{Cite news|last=Farooq|first=Umer|url=https://medium.com/@umerfarooq_26378/from-r-cnn-to-mask-r-cnn-d6367b196cfd|title=From R-CNN to Mask R-CNN|date=February 15, 2018|work=Medium|access-date=March 12, 2020}}</ref><ref>{{Cite news|last=Weng|first=Lilian|url=https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html|title=Object Detection for Dummies Part 3: R-CNN Family|date=December 31, 2017|work=Lil'Log|access-date=March 12, 2020}}</ref>
* June 2019: '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite news|last=Wiggers|first=Kyle|url=https://venturebeat.com/2019/10/29/facebook-highlights-ai-that-converts-2d-objects-into-3d-shapes/|title=Facebook highlights AI that converts 2D objects into 3D shapes|date=October 29, 2019|work=VentureBeat|access-date=March 12, 2020}}</ref>