Statistical model validation: Difference between revisions

Content deleted Content added
Validation with New Data: Introduced ___domain-invariance
BattyBot (talk | contribs)
Fixed reference date error(s) (see CS1 errors: dates for details) and AWB general fixes
Line 10:
 
=== Validation with Existing Data ===
Validation based on existing data involves analyzing the [[goodness of fit]] of the model or analyzing whether the [[Errors and residuals|residuals]] seem to be random (i.e. [[#Residual diagnostics|residual diagnostics]]). This method involves using analyses of the models closeness to the data and trying to understand how well the model predicts its own data. One example of this method is in Figure 1, which shows a polynomial function fit to some data. We see that the polynomial function does not conform well to the data, which appears linear, and might invalidate this polynomial model.
 
Commonly, statistical models on existing data are validated using a validation set, which may also be referred to as a holdout set. A validation set is a set of data points that the user leaves out when fitting a statistical model. After the statistical model is fitted, the validation set is used as a measure of the model's error. If the model fits well on the initial data but has a large error on the validation set, this is a sign of overfitting, as seen in Figure 1.
 
[[Image:Overfitted Data.png|thumb|300px|Figure 1.  Data (black dots), which was generated via the straight line and some added noise, is perfectly fitted by a curvy [[polynomial]].]]
Line 19:
If new data becomes available, an existing model can be validated by assessing whether the new data is predicted by the old model. If the new data is not predicted by the old model, then the model might not be valid for the researcher's goals.
 
With this in mind, a modern approach is to validate a neural network is to test its performance on ___domain-shifted data. This ascertains if the model learned ___domain-invariant features. <ref>{{Cite journal |last=Feng |first=Cheng |last2=Zhong |first2=Chaoliang |last3=Wang |first3=Jie |last4=Zhang |first4=Ying |last5=Sun |first5=Jun |last6=Yokota |first6=Yasuto |date=July 2022-07 |title=Learning Unforgotten Domain-Invariant Representations for Online Unsupervised Domain Adaptation |url=http://dx.doi.org/10.24963/ijcai.2022/410 |journal=Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence |___location=California |publisher=International Joint Conferences on Artificial Intelligence Organization |doi=10.24963/ijcai.2022/410}}</ref>
 
=== A Note of Caution ===
Line 25:
 
==Methods for validating==
When doing a validation, there are three notable causes of potential difficulty, according to the ''[[Encyclopedia of Statistical Sciences]]''.<ref name="ESS06">{{citation| first= M. L. | last= Deaton | title= Simulation models, validation of | encyclopedia= [[Encyclopedia of Statistical Sciences]] | editor1-first= S. | editor1-last= Kotz | editor1-link= Samuel Kotz |display-editors=etal | year= 2006 | publisher= [[Wiley (publisher)|Wiley]]}}.</ref> The three causes are these: lack of data; lack of control of the input variables; uncertainty about the underlying probability distributions and correlations. The usual methods for dealing with difficulties in validation include the following: checking the assumptions made in constructing the model; examining the available data and related model outputs; applying expert judgment.<ref name="NRC12" /> Note that expert judgment commonly requires expertise in the application area.<ref name="NRC12">{{citation | chapter= Chapter 5: Model validation and prediction | chapter-url= https://www.nap.edu/read/13395/chapter/7 | author= [[National Academies of Sciences, Engineering, and Medicine|National Research Council]] | year= 2012 | title= Assessing the Reliability of Complex Models: Mathematical and statistical foundations of verification, validation, and uncertainty quantification | ___location= Washington, DC | publisher= [[National Academies Press]] | pages= 52–85 | doi= 10.17226/13395 | isbn= 978-0-309-25634-6 }}. </ref>
 
Expert judgment can sometimes be used to assess the validity of a prediction ''without'' obtaining real data: e.g. for the curve in Figure&nbsp;1, an expert might well be able to assess that a substantial extrapolation will be invalid. Additionally, expert judgment can be used in [[Turing test|Turing]]-type tests, where experts are presented with both real data and related model outputs and then asked to distinguish between the two.<ref name= "MB93">{{citation | author1-first= D. G. | author1-last=Mayer | author2-first= D.G. | author2-last= Butler | title= Statistical validation | journal= [[Ecological Modelling]] | year= 1993 | volume= 68 | issue=1–2 | pages= 21–32 | doi= 10.1016/0304-3800(93)90105-2}}.</ref>
Line 38:
 
=== Cross validation ===
{{SeeFurther|Cross-validation (statistics)}}
Cross validation is a method of sampling that involves leaving some parts of the data out of the fitting process and then seeing whether those data that are left out are close or far away from where the model predicts they would be. What that means practically is that cross validation techniques fit the model many, many times with a portion of the data and compares each model fit to the portion it did not use. If the models very rarely describe the data that they were not trained on, then the model is probably wrong.