Statistical model validation: Difference between revisions

Content deleted Content added
m Cleaned up some of the language
m redundant
Line 2:
In [[statistics]], '''model validation''' is the task of evaluating whether a chosen [[statistical model]] is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not to be confused with the closely related task of [[model selection]], the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs.
 
There are many ways to validate a model, and this article cannot cover all of them. Some popular methods are the following, but there are many more. [[Residual sum of squares|Residual plots]] plot the difference between the actual data and the model's predictions: correlations in the residual plots may indicate a flaw in the model. [[Cross-validation (statistics)|Cross validation]] is a method of model validation that iteratively refits the model, each time leaving out just a small sample and comparing whether the samples left out are predicted by the model: there are [[Cross-validation (statistics)#Types|many kinds of cross validation]]. [[Predictive modelling|Predictive simulation]] is used to compare simulated data to actual data. [[External validity|External validation]] involves fitting the model to new data. Akaike information criterion estimates the quality of a model.
 
==Overview==