Content deleted Content added
→Overview: The overview of the RANSAC algorithm is cleaned up to make them easier to understand. |
|||
(40 intermediate revisions by 16 users not shown) | |||
Line 1:
{{Short description|Statistical method}}
'''Random sample consensus''' ('''RANSAC''') is an [[iterative method]] to estimate parameters of a mathematical model from a set of observed data that contains [[outliers]], when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method.<ref>Data Fitting and Uncertainty, T. Strutz, Springer Vieweg (2nd edition, 2016)</ref> It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at [[SRI International]] in 1981. They used RANSAC to solve the Location Determination Problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.▼
{{Machine learning bar}}
▲'''Random sample consensus''' ('''RANSAC''') is an [[iterative method]] to estimate parameters of a mathematical model from a set of observed data that contains [[outliers]], when outliers are to be {{clarify span|accorded no influence|date=November 2024}} on the values of the estimates.
RANSAC uses [[Cross-validation (statistics)#Repeated random sub-sampling validation|repeated random sub-sampling]].<ref>{{cite web |last1=Cantzler
==Example==
A simple example is [[Regression analysis|fitting a line]] in two dimensions to a set of observations.
<gallery widths="286" heights="255" perrow="2">
Line 14:
==Overview==
The RANSAC algorithm is a learning technique to estimate parameters of a model by random sampling of observed data. Given a dataset whose data elements contain both inliers and outliers, RANSAC uses the voting scheme to find the optimal fitting result. Data elements in the dataset are used to vote for one or multiple models. The implementation of this voting scheme is based on two assumptions: that the noisy features will not vote consistently for any single model (few outliers) and there are enough features to agree on a good model (few missing data). The RANSAC algorithm is essentially composed of two steps that are iteratively repeated:
#
#
▲# In the first step, a sample subset containing minimal data items is randomly selected from the input dataset. A fitting model and the corresponding model parameters are computed using only the elements of this sample subset. The cardinality of the sample subset (e.g., the number of data in this subset) is the smallest sufficient to determine the model parameters.
The set of inliers obtained for the fitting model is called the ''consensus set''. The RANSAC algorithm will iteratively repeat the above two steps until the obtained consensus set in certain iteration has enough inliers.▼
▲# In the second step, the algorithm checks which elements of the entire dataset are consistent with the model instantiated by the estimated model parameters obtained from the first step. A data element will be considered as an outlier if it does not fit the fitting model determined by the estimated model parameters within some error threshold defining the maximum data deviation of inliers.
▲The set of inliers obtained for the fitting model is called the consensus set. The RANSAC algorithm will iteratively repeat the above two steps until the obtained consensus set in certain iteration has enough inliers.
The input to the RANSAC algorithm is a set of observed data values, a way of fitting some kind of model to the observations, and some [[confidence interval|confidence]] parameters. RANSAC achieves its goal by repeating the following steps:▼
▲The input to the RANSAC algorithm is a set of observed data values, a
# Select a random subset of the original data. Call this subset the ''hypothetical inliers''.
# A model is fitted to the set of hypothetical inliers.
# All
# The estimated model is reasonably good if sufficiently many data points have been classified as a part of the consensus set.
#
[[File:RANSAC Inliers and Outliers.png|thumb|center|500px|RANSAC: inliers and outliers. The linear fitting to data points in this example is with 7 inliers (data points fitted well with the model under some criteria). It is not a good fitting, since there is a linear line where most data points are distributed near it (i.e., more inliers).]]
==
The generic RANSAC algorithm works as
Given:
data – A set of observations.
model – A model to explain the observed data points.
n –
k –
t –
d –
Return:
bestFit – The model parameters which may best fit the data (or null if no good model is found).
iterations = 0
bestFit = null
bestErr = something really large // This parameter is used to sharpen the model parameters to the best data fitting as iterations go on.
'''while''' ''iterations'' < ''k'' '''do'''
maybeInliers := n randomly selected values from data
maybeModel := model parameters fitted to maybeInliers
'''for''' every point in data
'''if''' point fits maybeModel with an error smaller than t then
add point to
'''end if'''
'''end for'''
'''if''' the number of elements in
// This implies that we may have found a good model.
//
betterModel := model parameters fitted to all the points in
thisErr := a measure of how well betterModel fits these points
'''if''' thisErr < bestErr '''then'''
Line 77 ⟶ 74:
'''return''' bestFit
== Example
A Python implementation mirroring the pseudocode. This also defines a <code>LinearRegressor</code> based on least squares, applies <code>RANSAC</code> to a 2D regression problem, and visualizes the outcome:
Line 101 ⟶ 98:
def fit(self, X, y):
for _ in range(self.k):
ids = rng.permutation(X.shape[0])
Line 125 ⟶ 121:
if this_error < self.best_error:
self.best_error = this_error
self.best_fit =
return self
Line 131 ⟶ 127:
def predict(self, X):
return self.best_fit.predict(X)
def square_error_loss(y_true, y_pred):
return (y_true - y_pred) ** 2
def mean_square_error(y_true, y_pred):
return np.sum(square_error_loss(y_true, y_pred)) / y_true.shape[0]
class LinearRegressor:
Line 153 ⟶ 150:
X = np.hstack([np.ones((r, 1)), X])
return X @ self.params
if __name__ == "__main__":
Line 173 ⟶ 171:
plt.plot(line, regressor.predict(line), c="peru")
plt.show()
</syntaxhighlight>
[[File:RANSAC applied to 2D regression problem.png|alt=A scatterplot showing a diagonal line from the bottom left to top right of the figure. A trend line fits closely along the diagonal, without being thrown off by outliers scattered elsewhere in the figure.|center|thumb|Result of running the <code>RANSAC</code> implementation. The orange line shows the least
== Parameters ==
The threshold value to determine when a data point fits a model ({{mvar|t}}), and the number of
Let {{mvar|p}} be the desired probability that the RANSAC algorithm provides at least one useful result after running. In extreme (for simplifying the derivation), RANSAC returns a successful result if in some iteration it selects only inliers from the input data set when it chooses :<math>w</math> = number of inliers in data / number of points in data
A common case is that <math>w</math> is not well known beforehand because of an unknown number of inliers in data before running the RANSAC algorithm, but some rough value can be given. With a given rough value of <math>w</math> and roughly
:<math>
Line 188 ⟶ 187:
</math>
which, after taking the [[logarithm]] of both sides, leads to
:<math>
Line 194 ⟶ 193:
</math>
This result assumes that the {{mvar|n}} data points are selected independently, that is, a point which has been selected once is replaced and can be selected again in the same iteration.
To gain additional confidence, the [[standard deviation]] or multiples thereof can be added to {{mvar|k}}. The standard deviation of {{mvar|k}} is defined as
Line 202 ⟶ 201:
==Advantages and disadvantages==
{{refimprove section|date=September 2014}}
An advantage of RANSAC is its ability to do [[robust statistics|robust estimation]]<ref>Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004), page 1.</ref> of the model parameters, i.e., it can estimate the parameters with a high degree of accuracy even when a significant number of [[outlier]]s are present in the data set. A disadvantage of RANSAC is that there is no upper bound on the time it takes to compute these parameters (except exhaustion). When the number of iterations computed is limited, the solution obtained may not be optimal, and it may not even be one that fits the data in a good way. In this way RANSAC offers a trade-off; by computing a greater number of iterations, the probability of a reasonable model being produced is increased. Moreover, RANSAC is not always able to find the optimal set even for moderately contaminated sets, and it usually performs badly when the number of inliers is less than 50%. Optimal RANSAC
RANSAC can only estimate one model for a particular data set. As for any one-model approach when two (or more) model instances exist, RANSAC may fail to find either one. The [[Hough transform]] is one alternative robust estimation technique that may be useful when more than one model instance is present. Another approach for multi
==Applications==
Line 214 ⟶ 213:
RANSAC can be sensitive to the choice of the correct noise threshold that defines which data points fit a model instantiated with a certain set of parameters. If such threshold is too large, then all the hypotheses tend to be ranked equally (good). On the other hand, when the noise threshold is too small, the estimated parameters tend to be unstable ( i.e. by simply adding or removing a datum to the set of inliers, the estimate of the parameters may fluctuate). To partially compensate for this undesirable effect, Torr et al. proposed two modification of RANSAC called MSAC (M-estimator SAmple and Consensus) and MLESAC (Maximum Likelihood Estimation SAmple and Consensus).<ref>P.H.S. Torr and A. Zisserman, [http://www.academia.edu/download/3436793/torr_mlesac.pdf MLESAC: A new robust estimator with application to estimating image geometry]{{dead link|date=July 2022|bot=medic}}{{cbignore|bot=medic}}, Journal of Computer Vision and Image Understanding 78 (2000), no. 1, 138–156.</ref> The main idea is to evaluate the quality of the consensus set ( i.e. the data that fit a model and a certain set of parameters) calculating its likelihood (whereas in the original formulation by Fischler and Bolles the rank was the cardinality of such set). An extension to MLESAC which takes into account the prior probabilities associated to the input dataset is proposed by Tordoff.<ref>B. J. Tordoff and D. W. Murray, [https://ieeexplore.ieee.org/abstract/document/1498749/ Guided-MLESAC: Faster image transform estimation by using matching priors], IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005), no. 10, 1523–1535.</ref> The resulting algorithm is dubbed Guided-MLESAC. Along similar lines, Chum proposed to guide the sampling procedure if some a priori information regarding the input data is known, i.e. whether a datum is likely to be an inlier or an outlier. The proposed approach is called PROSAC, PROgressive SAmple Consensus.<ref>[https://dspace.cvut.cz/bitstream/handle/10467/9496/2005-Matching-with-PROSAC-progressive-sample-consensus.pdf?sequence=1 Matching with PROSAC – progressive sample consensus], Proceedings of Conference on Computer Vision and Pattern Recognition (San Diego), vol. 1, June 2005, pp. 220–226</ref>
Chum et al. also proposed a randomized version of RANSAC called R-RANSAC <ref>O. Chum and J. Matas, Randomized RANSAC with Td,d test, 13th British Machine Vision Conference, September 2002. http://www.bmva.org/bmvc/2002/papers/50/</ref> to reduce the computational burden to identify a good consensus set. The basic idea is to initially evaluate the goodness of the currently instantiated model using only a reduced set of points instead of the entire dataset. A sound strategy will tell with high confidence when it is the case to evaluate the fitting of the entire dataset or when the model can be readily discarded. It is reasonable to think that the impact of this approach is more relevant in cases where the percentage of inliers is large. The type of strategy proposed by Chum et al. is called preemption scheme. Nistér proposed a paradigm called Preemptive RANSAC<ref>D. Nistér, [https://pdfs.semanticscholar.org/e712/35d9e17f13186a4da6ee11eede0b64b01c95.pdf Preemptive RANSAC for live structure and motion estimation], IEEE International Conference on Computer Vision (Nice, France), October 2003, pp. 199–206.</ref> that allows real time robust estimation of the structure of a scene and of the motion of the camera. The core idea of the approach consists in generating a fixed number of
Other researchers tried to cope with difficult situations where the noise scale is not known and/or multiple model instances are present. The first problem has been tackled in the work by Wang and Suter.<ref>H. Wang and D. Suter, [https://ieeexplore.ieee.org/abstract/document/1335451/ Robust adaptive-scale parametric model estimation for computer vision]., IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004), no. 11, 1459–1474</ref> Toldo et al. represent each datum with the characteristic function of the set of random models that fit the point. Then multiple models are revealed as clusters which group the points supporting the same model. The clustering algorithm, called J-linkage, does not require prior specification of the number of models, nor does it necessitate manual parameters tuning.<ref>R. Toldo and A. Fusiello, [https://pdfs.semanticscholar.org/0455/e5596d734e3dcf60c0179efb6404e62ceabb.pdf Robust multiple structures estimation with J-linkage], European Conference on Computer Vision (Marseille, France), October 2008, pp. 537–547.</ref>
Line 227 ⟶ 225:
* [[Resampling (statistics)]]
* Hop-Diffusion Monte Carlo uses randomized sampling involve global jumps and local diffusion to choose the sample at each step of RANSAC for epipolar geometry estimation between very wide-baseline images.<ref>{{cite journal |last1=Brahmachari |first1=Aveek S. |last2=Sarkar |first2=Sudeep |title=Hop-Diffusion Monte Carlo for Epipolar Geometry Estimation between Very Wide-Baseline Images |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |date=March 2013 |volume=35 |issue=3 |pages=755–762 |doi=10.1109/TPAMI.2012.227|pmid=26353140 |s2cid=2524656 }}</ref>
* [[FSASAC]] (RANSAC based on data filtering and [[simulated annealing]])<ref>W. Ruoyan and W. Junfeng, "[https://ieeexplore.ieee.org/document/9648331 FSASAC: Random Sample Consensus Based on Data Filter and Simulated Annealing]," in IEEE Access, vol. 9, pp. 164935-164948, 2021, doi: 10.1109/ACCESS.2021.3135416.</ref>
== See also ==
Line 285 ⟶ 284:
}}
*{{Cite journal
| author1=Sunglok Choi
| | | | title=Performance Evaluation of RANSAC Family | journal=In Proceedings of the British Machine Vision Conference (BMVC)
| year=2009
| url=http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf
| access-date=2010-10-01
| archive-date=2020-08-31
| archive-url=https://web.archive.org/web/20200831001552/http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf
| url-status=dead
}}
|