Statistical model validation: Difference between revisions

Content deleted Content added
Link suggestions feature: 3 links added.
 
(One intermediate revision by one other user not shown)
Line 1:
{{Short description|Evaluating whether a chosen statistical model is appropriate or not}}
{{Redirect|Model validation|the investment banking role|Quantitative analysis (finance) #Model validation}}
In [[statistics]], '''model validation''' is the task of evaluating whether a chosen [[statistical model]] is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. Model validation is also called '''model criticism''' or '''model evaluation.'''

This topic is not to be confused with the closely related task of [[model selection]], the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs.
 
There are many ways to validate a model. [[Residual sum of squares|Residual plots]] plot the difference between the actual data and the model's predictions: correlations in the residual plots may indicate a flaw in the model. [[Cross-validation (statistics)|Cross validation]] is a method of model validation that iteratively refits the model, each time leaving out just a small sample and comparing whether the samples left out are predicted by the model: there are [[Cross-validation (statistics)#Types|many kinds of cross validation]]. [[Predictive modelling|Predictive simulation]] is used to compare simulated data to actual data. [[External validity|External validation]] involves fitting the model to new data. [[Akaike information criterion]] estimates the quality of a model.
 
==Overview==
Model validation comes in many forms and the specific method of model validation a researcher uses is often a constraint of their [[research design]]. To emphasize, what this means is that there is no one-size-fits-all method to validating a model. For example, if a researcher is operating with a very limited set of data, but data they have strong prior assumptions about, they may consider validating the fit of their model by using a Bayesian framework and testing the fit of their model using various prior distributions. However, if a researcher has a lot of data and is testing multiple nested models, these conditions may lend themselves toward cross validation and possibly a leave one out test. These are two abstract examples and any actual model validation will have to consider far more intricacies than describes here but these example illustrate that model validation methods are always going to be circumstantial.
 
In general, models can be validated using existing data or with new data, and both methods are discussed more in the following subsections, and a note of caution is provided, too.
Line 13 ⟶ 15:
Validation based on existing data involves analyzing the [[goodness of fit]] of the model or analyzing whether the [[Errors and residuals|residuals]] seem to be random (i.e. [[#Residual diagnostics|residual diagnostics]]). This method involves using analyses of the models closeness to the data and trying to understand how well the model predicts its own data. One example of this method is in Figure 1, which shows a polynomial function fit to some data. We see that the polynomial function does not conform well to the data, which appears linear, and might invalidate this polynomial model.
 
Commonly, statistical models on existing data are validated using a validation set, which may also be referred to as a holdout set. A validation set is a set of data points that the user leaves out when fitting a statistical model. After the statistical model is fitted, the validation set is used as a measure of the model's error. If the model fits well on the initial data but has a large error on the validation set, this is a sign of [[overfitting]].
 
[[Image:Overfitted Data.png|thumb|300px|Data (black dots), which was generated via the straight line and some added noise, is perfectly fitted by a curvy [[polynomial]].]]
Line 26 ⟶ 28:
 
==Methods for validating==
When doing a validation, there are three notable causes of potential difficulty, according to the ''[[Encyclopedia of Statistical Sciences]]''.<ref name="ESS06">{{citation| first= M. L. | last= Deaton | title= Simulation models, validation of | encyclopedia= [[Encyclopedia of Statistical Sciences]] | editor1-first= S. | editor1-last= Kotz | editor1-link= Samuel Kotz |display-editors=etal | year= 2006 | publisher= [[Wiley (publisher)|Wiley]]}}.</ref> The three causes are these: lack of data; lack of control of the input variables; uncertainty about the underlying [[Probability distribution|probability distributions]] and correlations. The usual methods for dealing with difficulties in validation include the following: checking the assumptions made in constructing the model; examining the available data and related model outputs; applying expert judgment.<ref name="NRC12" /> Note that expert judgment commonly requires expertise in the application area.<ref name="NRC12">{{citation | chapter= Chapter 5: Model validation and prediction | chapter-url= https://www.nap.edu/read/13395/chapter/7 | author= [[National Academies of Sciences, Engineering, and Medicine|National Research Council]] | year= 2012 | title= Assessing the Reliability of Complex Models: Mathematical and statistical foundations of verification, validation, and uncertainty quantification | ___location= Washington, DC | publisher= [[National Academies Press]] | pages= 52–85 | doi= 10.17226/13395 | isbn= 978-0-309-25634-6 }}.</ref>
 
Expert judgment can sometimes be used to assess the validity of a prediction ''without'' obtaining real data: e.g. for the curve in Figure&nbsp;1, an expert might well be able to assess that a substantial extrapolation will be invalid. Additionally, expert judgment can be used in [[Turing test|Turing]]-type tests, where experts are presented with both real data and related model outputs and then asked to distinguish between the two.<ref name= "MB93">{{citation | author1-first= D. G. | author1-last=Mayer | author2-first= D.G. | author2-last= Butler | title= Statistical validation | journal= [[Ecological Modelling]] | year= 1993 | volume= 68 | issue=1–2 | pages= 21–32 | doi= 10.1016/0304-3800(93)90105-2}}.</ref>