== History ==
The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where the each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. TodayMore recently, R-CNN has been extended to perform other taskscomputer suchvision as semantic segmentation (where the goal is to identify the object category that each pixel in an image belongs to) and mesh generation (converting a 2D image into a 3D mesh)tasks. The following covers some of the versions of R-CNN that have been developed.
* November 2013: '''R-CNN'''. Given an input image, R-CNN begins by applying a mechanism called Selective Search to extract [[Region of interest|regions of interest]] (ROI), whichwhere each ROI is a rectangle that may containrepresent the boundary of an object in image. Depending on the scenario, there may be as many as 2000two thousand ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, a collection of [[support-vector machine]] classifiers is used to determine what type of object (if any) is in the ROI.<ref>{{Cite news|last=Ghandi|first=Rohith|url=https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e|title=R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms|date=July 9, 2018|work=Towards Data Science|access-date=March 12, 2020|url-status=live}}</ref>
* April 2015: '''Fast R-CNN'''. While the original R-CNN ran the neural network on each of as many as 2000 regions of interest (ROI), Fast R-CNN runs the neural network once on the whole image. At the end of the network is a novel method called ROI poolingROIPooling, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses Selective Search to generate its region proposals.<ref name=":0">{{Cite news|last=Bhatia|first=Richa|url=https://analyticsindiamag.com/what-is-region-of-interest-pooling/|title=What is region of interest pooling?|date=September 10, 2018|work=Analytics India|access-date=March 12, 2020|url-status=live}}</ref> As in the original R-CNN, the Fast R-CNN uses Selective Search to generate its region proposals.
* June 2015: '''Faster R-CNN'''. While Fast R-CNN used Selective Search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":0" />
* March 2017: '''Mask R-CNN'''. While previous versions of R-CNN focused on object detection, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROI poolingROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.<ref>{{Cite news|last=Farooq|first=Umer|url=https://medium.com/@umerfarooq_26378/from-r-cnn-to-mask-r-cnn-d6367b196cfd|title=From R-CNN to Mask R-CNN|date=February 15, 2018|work=Medium|access-date=March 12, 2020|url-status=live}}</ref><ref>{{Cite news|last=Weng|first=Lilian|url=https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html|title=Object Detection for Dummies Part 3: R-CNN Family|date=December 31, 2017|work=Lil'Log|access-date=March 12, 2020|url-status=live}}</ref>
* (June 2019): '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite news|last=Wiggers|first=Kyle|url=https://venturebeat.com/2019/10/29/facebook-highlights-ai-that-converts-2d-objects-into-3d-shapes/|title=Facebook highlights AI that converts 2D objects into 3D shapes|date=October 29, 2019|work=VentureBeat|access-date=March 12, 2020|url-status=live}}</ref>
<br />
== Applications ==
Region Based Convolutional Neural Networks have been used for tracking objects from a drone-mounted camera,<ref>{{Cite news|last=Nene|first=Vidi|url=https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/|title=Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone|date=Aug 2, 2019|work=Drone Below|access-date=Mar 28, 2020|url-status=live}}</ref> locating text in an image,<ref>{{Cite news|last=Ray|first=Tiernan|url=https://www.zdnet.com/article/facebook-pumps-up-character-recognition-to-mine-memes/|title=Facebook pumps up character recognition to mine memes|date=Sep 11, 2018|work=ZDnet|access-date=Mar 28, 2020|url-status=live}}</ref> and enabling object detection in [[Google Lens]].<ref>{{Cite news|last=Sagar|first=Ram|url=https://analyticsindiamag.com/these-machine-learning-techniques-make-google-lens-a-success/|title=These machine learning methods make google lens a success|date=Sep 9, 2019|work=Analytics India|access-date=Mar 28, 2020|url-status=live}}</ref> Mask R-CNN serves as one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.<ref>{{cite arXiv|eprint=1910.01500v3|class=math.LG|first=Peter|last=Mattson|title=MLPerf Training Benchmark|date=2019|display-authors=etal}}</ref>
(starting with a bibliography; will clean up)
*https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/
*https://venturebeat.com/2019/04/24/google-open-sources-ai-image-segmentation-models-optimized-for-cloud-tpus/
*https://syncedreview.com/2019/12/26/facebook-pointrend-rendering-image-segmentation/
*https://syncedreview.com/2019/03/12/new-sota-on-instance-segmentation-mask-scoring-r-cnn-tops-mask-r-cnn-on-coco/
* fast training - https://siliconangle.com/2019/07/10/nvidia-sets-new-records-mlperf-ai-benchmark-tests/
*The MLPerf benchmark tests how fast a computing platform can train Mask R-CNN.
*
== References ==
|