Revision as of 22:46, 29 March 2020 edit LxNorm (talk \| contribs) 72 edits applications Tag: Visual edit ← Previous edit		Revision as of 22:59, 29 March 2020 edit undo LxNorm (talk \| contribs) 72 edits m minor edit Tag: Visual edit Next edit →
Line 3: == History == The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where the each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. ~~Today~~More recently, R-CNN has been extended to perform other ~~tasks~~computer ~~such~~vision ~~as semantic segmentation (where the goal is to identify the object category that each pixel in an image belongs to) and mesh generation (converting a 2D image into a 3D mesh)~~tasks. The following covers some of the versions of R-CNN that have been developed. * November 2013: '''R-CNN'''. Given an input image, R-CNN begins by applying a mechanism called Selective Search to extract [[Region of interest\|regions of interest]] (ROI), ~~which~~where each ROI is a rectangle that may ~~contain~~represent the boundary of an object in image. Depending on the scenario, there may be as many as ~~2000~~two thousand ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, a collection of [[support-vector machine]] classifiers is used to determine what type of object (if any) is in the ROI.<ref>{{Cite news\|last=Ghandi\|first=Rohith\|url=https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e\|title=R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms\|date=July 9, 2018\|work=Towards Data Science\|access-date=March 12, 2020\|url-status=live}}</ref> * April 2015: '''Fast R-CNN'''. While the original R-CNN ran the neural network on each of as many as 2000 regions of interest (ROI), Fast R-CNN runs the neural network once on the whole image. At the end of the network is a novel method called ~~ROI pooling~~ROIPooling, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses Selective Search to generate its region proposals.<ref name=":0">{{Cite news\|last=Bhatia\|first=Richa\|url=https://analyticsindiamag.com/what-is-region-of-interest-pooling/\|title=What is region of interest pooling?\|date=September 10, 2018\|work=Analytics India\|access-date=March 12, 2020\|url-status=live}}</ref> ~~As in the original R-CNN, the Fast R-CNN uses Selective Search to generate its region proposals.~~ * June 2015: '''Faster R-CNN'''. While Fast R-CNN used Selective Search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":0" /> * March 2017: '''Mask R-CNN'''. While previous versions of R-CNN focused on object detection, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ~~ROI pooling~~ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.<ref>{{Cite news\|last=Farooq\|first=Umer\|url=https://medium.com/@umerfarooq_26378/from-r-cnn-to-mask-r-cnn-d6367b196cfd\|title=From R-CNN to Mask R-CNN\|date=February 15, 2018\|work=Medium\|access-date=March 12, 2020\|url-status=live}}</ref><ref>{{Cite news\|last=Weng\|first=Lilian\|url=https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html\|title=Object Detection for Dummies Part 3: R-CNN Family\|date=December 31, 2017\|work=Lil'Log\|access-date=March 12, 2020\|url-status=live}}</ref> * (June 2019): '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite news\|last=Wiggers\|first=Kyle\|url=https://venturebeat.com/2019/10/29/facebook-highlights-ai-that-converts-2d-objects-into-3d-shapes/\|title=Facebook highlights AI that converts 2D objects into 3D shapes\|date=October 29, 2019\|work=VentureBeat\|access-date=March 12, 2020\|url-status=live}}</ref> ~~<br />~~ == Applications == Region Based Convolutional Neural Networks have been used for tracking objects from a drone-mounted camera,<ref>{{Cite news\|last=Nene\|first=Vidi\|url=https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/\|title=Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone\|date=Aug 2, 2019\|work=Drone Below\|access-date=Mar 28, 2020\|url-status=live}}</ref> locating text in an image,<ref>{{Cite news\|last=Ray\|first=Tiernan\|url=https://www.zdnet.com/article/facebook-pumps-up-character-recognition-to-mine-memes/\|title=Facebook pumps up character recognition to mine memes\|date=Sep 11, 2018\|work=ZDnet\|access-date=Mar 28, 2020\|url-status=live}}</ref> and enabling object detection in [[Google Lens]].<ref>{{Cite news\|last=Sagar\|first=Ram\|url=https://analyticsindiamag.com/these-machine-learning-techniques-make-google-lens-a-success/\|title=These machine learning methods make google lens a success\|date=Sep 9, 2019\|work=Analytics India\|access-date=Mar 28, 2020\|url-status=live}}</ref> Mask R-CNN serves as one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.<ref>{{cite arXiv\|eprint=1910.01500v3\|class=math.LG\|first=Peter\|last=Mattson\|title=MLPerf Training Benchmark\|date=2019\|display-authors=etal}}</ref> ~~(starting with a bibliography; will clean up)~~ https://dronebelow.com/2019/08/02/deep-learning-based-real-time-multiple-object-detection-and-tracking-via-drone/ https://venturebeat.com/2019/04/24/google-open-sources-ai-image-segmentation-models-optimized-for-cloud-tpus/ https://syncedreview.com/2019/12/26/facebook-pointrend-rendering-image-segmentation/ https://syncedreview.com/2019/03/12/new-sota-on-instance-segmentation-mask-scoring-r-cnn-tops-mask-r-cnn-on-coco/ * fast training - https://siliconangle.com/2019/07/10/nvidia-sets-new-records-mlperf-ai-benchmark-tests/ The MLPerf benchmark tests how fast a computing platform can train Mask R-CNN. == References ==

Region Based Convolutional Neural Networks: Difference between revisions