Given an image (or an image-like feature map), '''selective search''' (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, (2004),<ref>{{Cite journal |last=Felzenszwalb |first=Pedro F. |last2=Huttenlocher |first2=Daniel P. |date=2004-09-01 |title=Efficient Graph-Based Image Segmentation |url=https://link.springer.com/article/10.1023/B:VISI.0000022288.19776.77 |journal=International Journal of Computer Vision |language=en |volume=59 |issue=2 |pages=167–181 |doi=10.1023/B:VISI.0000022288.19776.77 |issn=1573-1405}}</ref> then performs the following:<ref name=":1" /><syntaxhighlight>Input: (colour) image
Given an input image, R-CNN begins by applying a mechanism called selective search<ref name=":1" /> to extract [[Region of interest|regions of interest]] (ROI), where each ROI is a rectangle that may represent the boundary of an object in image. Depending on the scenario, there may be as many as {{nobr|two thousand}} ROIs. After that, each ROI is fed through a neural network to produce output features. For each ROI's output features, aan collectionensemble of [[support-vector machine]] classifiers is used to determine what type of object (if any) is contained within the ROI.<ref name=":2">{{Cite journal |last=Girshick |first=Ross |last2=Donahue |first2=Jeff |last3=Darrell |first3=Trevor |last4=Malik |first4=Jitendra |date=2016-01-01 |title=Region-Based Convolutional Networks for Accurate Object Detection and Segmentation |url=http://ieeexplore.ieee.org/document/7112511/ |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=38 |issue=1 |pages=142–158 |doi=10.1109/TPAMI.2015.2437384 |issn=0162-8828}}</ref>