Random sample consensus: Difference between revisions

Content deleted Content added
Ivan1248 (talk | contribs)
m Rollback edit(s) by Cyouabra (talk): Non-constructive edit (UV 0.1.6)
 
(16 intermediate revisions by 9 users not shown)
Line 1:
{{Short description|Statistical method}}
{{Machine learning bar}}
'''Random sample consensus''' ('''RANSAC''') is an [[iterative method]] to estimate parameters of a mathematical model from a set of observed data that contains [[outliers]], when outliers are to be {{clarify span|accorded no influence|date=November 2024}} on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method.<ref>Data Fitting and Uncertainty, T. Strutz, Springer Vieweg (2nd edition, 2016).</ref> It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at [[SRI International]] in 1981. They used RANSAC to solve the Location___location Determinationdetermination Problemproblem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.
 
RANSAC uses [[Cross-validation (statistics)#Repeated random sub-sampling validation|repeated random sub-sampling]].<ref>{{cite web |last1=Cantzler, |first1=H. "|title=Random sampleSample consensusConsensus (ransacRANSAC)." ''|language=en |publisher=Institute for Perception, Action and Behaviour, Division of Informatics, University of Edinburgh'' |citeseerx=10.1.1.106.3035 |url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.3035&rep=rep1&type=pdf |archive-url=https://web.archive.org/web/20230204054340/http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.3035&rep=rep1&type=pdf |archive-date=2023-02-04 |url-status=dead}}</ref> A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers", which are data that do not fit the model. The outliers can come, for example, from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (1981usually small) set of inliers, there exists a procedure that can estimate the parameters of a model optimally explaining or fitting this data.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.3035&rep=rep1&type=pdf</ref> A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers" which are data that do not fit the model. The outliers can come, for example, from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.
 
==Example==
Line 16 ⟶ 15:
==Overview==
The RANSAC algorithm is a learning technique to estimate parameters of a model by random sampling of observed data. Given a dataset whose data elements contain both inliers and outliers, RANSAC uses the voting scheme to find the optimal fitting result. Data elements in the dataset are used to vote for one or multiple models. The implementation of this voting scheme is based on two assumptions: that the noisy features will not vote consistently for any single model (few outliers) and there are enough features to agree on a good model (few missing data). The RANSAC algorithm is essentially composed of two steps that are iteratively repeated:
# In the first step, aA sample subset containing minimal number of data items is randomly selected from the input dataset. A fitting model with model parameters is computed using only the elements of this sample subset. The cardinality of the sample subset (e.g., the amount of data in this subset) is sufficient to determine the model parameters.
# In the second step, theThe algorithm checks which elements of the entire dataset are consistent with the model instantiated by the estimated model parameters obtained from the first step. A data element will be considered as an outlier if it does not fit the model within some error threshold defining the maximum data deviation of inliers. (Datadata elements beyond this deviation are outliers.).
The set of inliers obtained for the fitting model is called the ''consensus set''. The RANSAC algorithm will iteratively repeat the above two steps until the obtained consensus set in certain iteration has enough inliers.
 
The input to the RANSAC algorithm is a set of observed data values, a model to fit to the observations, and some [[confidence interval|confidence]] parameters defining outliers. In more details than the aforementioned RANSAC algorithm overview, RANSAC achieves its goal by repeating the following steps:
 
# Select a random subset of the original data. Call this subset the ''hypothetical inliers''.
# A model is fitted to the set of hypothetical inliers.
Line 30 ⟶ 28:
To converge to a sufficiently good model parameter set, this procedure is repeated a fixed number of times, each time producing either the rejection of a model because too few points are a part of the consensus set, or a refined model with a consensus set size larger than the previous consensus set.
 
[[File:RANSAC Inliers and Outliers.png|thumb|center|500px|RANSAC: Inliersinliers and Outliersoutliers. The linear fitting to data points in this example is with 7 inliers (data points fitted well with the model under some criteria). It is not a good fitting, since there is a linear line where most data points are distributed near it (i.e., more inliers).]]
<gallery widths="286" heights="255" perrow="2">
File:RANSAC Inliers and Outliers.png|RANSAC: Inliers and Outliers. The linear fitting to data points in this example is with 7 inliers (data points fitted well with the model under some criteria). It is not a good fitting since there is a linear line where most data points are distributed near it (i.e., more inliers).
</gallery>
 
== Pseudocode ==
Line 176 ⟶ 172:
plt.show()
</syntaxhighlight>
[[File:RANSAC applied to 2D regression problem.png|alt=A scatterplot showing a diagonal line from the bottom left to top right of the figure. A trend line fits closely along the diagonal, without being thrown off by outliers scattered elsewhere in the figure.|center|thumb|Result of running the <code>RANSAC</code> implementation. The orange line shows the least -squares parameters found by the iterative approach, which successfully ignores the outlier points.]]
 
== Parameters ==
Line 205 ⟶ 201:
==Advantages and disadvantages==
{{refimprove section|date=September 2014}}
An advantage of RANSAC is its ability to do [[robust statistics|robust estimation]]<ref>Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004), page 1.</ref> of the model parameters, i.e., it can estimate the parameters with a high degree of accuracy even when a significant number of [[outlier]]s are present in the data set. A disadvantage of RANSAC is that there is no upper bound on the time it takes to compute these parameters (except exhaustion). When the number of iterations computed is limited, the solution obtained may not be optimal, and it may not even be one that fits the data in a good way. In this way RANSAC offers a trade-off; by computing a greater number of iterations, the probability of a reasonable model being produced is increased. Moreover, RANSAC is not always able to find the optimal set even for moderately contaminated sets, and it usually performs badly when the number of inliers is less than 50%. Optimal RANSAC <ref>Anders Hast, Johan Nysjö, Andrea Marchetti (2013). "[http://wscg.zcu.cz/WSCG2013/!_2013_J_WSCG-1.pdf Optimal RANSAC – Towards a Repeatable Algorithm for Finding the Optimal Set]". Journal of WSCG 21 (1): 21–30.</ref> was proposed to handle both these problems and is capable of finding the optimal set for heavily contaminated sets, even for an inlier ratio under 5%. Another disadvantage of RANSAC is that it requires the setting of problem-specific thresholds.
 
RANSAC can only estimate one model for a particular data set. As for any one-model approach when two (or more) model instances exist, RANSAC may fail to find either one. The [[Hough transform]] is one alternative robust estimation technique that may be useful when more than one model instance is present. Another approach for multi -model fitting is known as PEARL,<ref>Hossam Isack, Yuri Boykov (2012). "Energy-based Geometric Multi-Model Fitting". International Journal of Computer Vision 97 (2: 1): 23–147. {{doi|10.1007/s11263-011-0474-7}}.</ref> which combines model sampling from data points as in RANSAC with iterative re-estimation of inliers and the multi-model fitting being formulated as an optimization problem with a global energy function describing the quality of the overall solution.
 
==Applications==
Line 229 ⟶ 225:
* [[Resampling (statistics)]]
* Hop-Diffusion Monte Carlo uses randomized sampling involve global jumps and local diffusion to choose the sample at each step of RANSAC for epipolar geometry estimation between very wide-baseline images.<ref>{{cite journal |last1=Brahmachari |first1=Aveek S. |last2=Sarkar |first2=Sudeep |title=Hop-Diffusion Monte Carlo for Epipolar Geometry Estimation between Very Wide-Baseline Images |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |date=March 2013 |volume=35 |issue=3 |pages=755–762 |doi=10.1109/TPAMI.2012.227|pmid=26353140 |s2cid=2524656 }}</ref>
* [[FSASAC]] (RANSAC based on data filtering and [[simulated annealing]])<ref>W. Ruoyan and W. Junfeng, "[https://ieeexplore.ieee.org/document/9648331 FSASAC: Random Sample Consensus Based on Data Filter and Simulated Annealing]," in IEEE Access, vol. 9, pp. 164935-164948, 2021, doi: 10.1109/ACCESS.2021.3135416.</ref>
 
== See also ==
Line 287 ⟶ 284:
}}
*{{Cite journal
| author1=Sunglok Choi
| |author2=Taemin Kim
| |author3=Wonpil Yu
| |name-list-style=amp
| title=Performance Evaluation of RANSAC Family
| journal=In Proceedings of the British Machine Vision Conference (BMVC)
| year=2009
| url=http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf
| access-date=2010-10-01
| archive-date=2020-08-31
| archive-url=https://web.archive.org/web/20200831001552/http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf
| url-status=dead
}}