Region Based Convolutional Neural Networks: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:33, 25 May 2025 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. ← Previous edit		Latest revision as of 07:39, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,866,626 edits Added bibcode. Removed URL that duplicated identifier. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 251/967
(4 intermediate revisions by 4 users not shown)
Line 12: * November 2013: '''R-CNN'''.<ref name=":2" /> * April 2015: '''Fast R-CNN'''.<ref name=":3">{{Cite book \|last=Girshick \|first=Ross \|chapter=Fast R-CNN \|date=7–13 December 2015 \|title=2015 IEEE International Conference on Computer Vision (ICCV) ~~\|chapter-url=https://ieeexplore.ieee.org/document/7410526~~ \|publisher=IEEE \|pages=1440–1448 \|doi=10.1109/ICCV.2015.169 \|isbn=978-1-4673-8391-2}}</ref> * June 2015: '''Faster R-CNN'''.<ref name=":4">{{Cite journal \|last1=Ren \|first1=Shaoqing \|last2=He \|first2=Kaiming \|last3=Girshick \|first3=Ross \|last4=Sun \|first4=Jian \|date=2017-06-01 \|title=Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ~~\|url=https://ieeexplore.ieee.org/document/7485869~~ \|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence \|volume=39 \|issue=6 \|pages=1137–1149 \|doi=10.1109/TPAMI.2016.2577031 \|pmid=27295650 \|issn=0162-8828\|arxiv=1506.01497 \|bibcode=2017ITPAM..39.1137R }}</ref> * March 2017: '''Mask R-CNN'''.<ref name=":5">{{Cite book \|last1=He \|first1=Kaiming \|last2=Gkioxari \|first2=Georgia \|last3=Dollar \|first3=Piotr \|last4=Girshick \|first4=Ross \|chapter=Mask R-CNN \|date=October 2017 \|title=2017 IEEE International Conference on Computer Vision (ICCV) ~~\|chapter-url=https://ieeexplore.ieee.org/document/8237584~~ \|publisher=IEEE \|pages=2980–2988 \|doi=10.1109/ICCV.2017.322 \|isbn=978-1-5386-1032-9}}</ref> * December 2017: '''Cascade R-CNN''' is trained with increasing Intersection over Union (IoU, also known as the [[Jaccard index]]) thresholds, making each stage more selective against nearby false positives.<ref>{{Cite journal \|last1=Cai \|first1=Zhaowei \|last2=Vasconcelos \|first2=Nuno \|date=2017 \|title=Cascade R-CNN: Delving into High Quality Object Detection ~~\|url=https://arxiv.org/abs/1712.00726~~ \|arxiv=1712.00726 }}</ref> * June 2019: '''Mesh R-CNN''' adds the ability to generate a 3D mesh from a 2D image.<ref>{{Cite journal \|last1=Gkioxari \|first1=Georgia \|last2=Malik \|first2=Jitendra \|last3=Johnson \|first3=Justin \|date=2019 \|title=Mesh R-CNN \|url=https://openaccess.thecvf.com/content_ICCV_2019/html/Gkioxari_Mesh_R-CNN_ICCV_2019_paper.html \|pages=9785–9795\|arxiv=1906.02739 }}</ref> Line 23: === Selective search === Given an image (or an image-like feature map), '''selective search''' (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004),<ref>{{Cite journal \|last1=Felzenszwalb \|first1=Pedro F. \|last2=Huttenlocher \|first2=Daniel P. \|date=2004-09-01 \|title=Efficient Graph-Based Image Segmentation \|url=https://link.springer.com/article/10.1023/B:VISI.0000022288.19776.77 \|journal=International Journal of Computer Vision \|language=en \|volume=59 \|issue=2 \|pages=167–181 \|doi=10.1023/B:VISI.0000022288.19776.77 \|issn=1573-1405\|url-access=subscription }}</ref> then performs the following:<ref name=":1" /> ~~<pre>~~ Input: (colour) image ▼ Output: Set of object ___location hypotheses L ▼ ▲ '''Input:''' (colour) image Segment image into initial regions R = {r₁, ..., rₙ} using Felzenszwalb and Huttenlocher (2004)▼ ▲ '''Output:''' Set of object ___location hypotheses L Initialise similarity set S = ∅▼ foreach Neighbouring region pair (rᵢ, rⱼ) do▼ ▲ Segment image into initial regions R = {r₁r<sub>1</sub>, ..., rₙr<sub>n</sub>} using Felzenszwalb and Huttenlocher (2004) ~~Calculate similarity s(rᵢ, rⱼ)~~ ▲ Initialise similarity set S = ∅ S = S ∪ s(rᵢ, rⱼ)▼ ▲ '''foreach''' Neighbouring region pair (rᵢr<sub>i</sub>, rⱼr<sub>j</sub>) do while S ≠ ∅ do▼ ~~Get~~ ~~highest~~ Calculate similarity s(rᵢr<sub>i</sub>, ~~rⱼ) = max(S~~r<sub>j</sub>) S = S ∪ s(r<sub>i</sub>, r<sub>j</sub>) Merge corresponding regions rₜ = rᵢ ∪ rⱼ▼ ▲ '''while''' S ≠ ∅ do Remove similarities regarding rᵢ: S = S \ s(rᵢ, r∗)▼ Get highest similarity s(r<sub>i</sub>, r<sub>j</sub>) = max(S) Remove similarities regarding rⱼ: S = S \ s(r∗, rⱼ)▼ ▲ Merge corresponding regions rₜr<sub>t</sub> = rᵢr<sub>i</sub> ∪ rⱼr<sub>j</sub> Calculate similarity set Sₜ between rₜ and its neighbours▼ ▲ Remove similarities regarding rᵢr<sub>i</sub>: S = S \ s(rᵢr<sub>i</sub>, r∗) ~~S = S ∪ Sₜ~~ ▲ Remove similarities regarding rⱼr<sub>j</sub>: S = S \ s(r∗, rⱼr<sub>j</sub>) R = R ∪ rₜ▼ ▲ Calculate similarity set SₜS<sub>t</sub> between rₜr<sub>t</sub> and its neighbours Extract object ___location boxes L from all regions in R▼ ▲ S = S ∪ ~~s(rᵢ, rⱼ)~~S<sub>t</sub> ~~</pre>~~ ▲ R = R ∪ rₜr<sub>t</sub> ▲ Extract object ___location boxes L from all regions in R === R-CNN === [[File:R-cnn.svg\|thumb\|272x272px\|R-CNN architecture]] Given an input image, R-CNN begins by applying selective search to extract [[Region of interest\|regions of interest]] (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as {{nobr\|two thousand}} ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, an ensemble of [[support-vector machine]] classifiers is used to determine what type of object (if any) is contained within the ROI.<ref name=":2">{{Cite journal \|last1=Girshick \|first1=Ross \|last2=Donahue \|first2=Jeff \|last3=Darrell \|first3=Trevor \|last4=Malik \|first4=Jitendra \|date=2016-01-01 \|title=Region-Based Convolutional Networks for Accurate Object Detection and Segmentation ~~\|url=https://ieeexplore.ieee.org/document/7112511~~ \|journal=IEEE Transactions on Pattern Analysis and Machine Intelligence \|volume=38 \|issue=1 \|pages=142–158 \|doi=10.1109/TPAMI.2015.2437384 \|pmid=26656583 \|bibcode=2016ITPAM..38..142G \|issn=0162-8828}}</ref> {{-}} === Fast R-CNN === Line 51: [[File:RoI_pooling_animated.gif\|thumb\|268x268px\|RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.]] At the end of the network is a '''ROIPooling''' module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals. {{-}} === Faster R-CNN === [[File:Faster-rcnn.svg\|thumb\|Faster R-CNN]]While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself.<ref name=":4" /> {{-}} === Mask R-CNN ===