Statistical model validation: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 7:
[[Image:Overfitted Data.png|thumb|300px|Figure 1.  Data (black dots), which was generated via the straight line and some added noise, is perfectly fitted by a curvy [[polynomial]].]]
 
Validation based on only the first type (data that was used in the construction of the model) is often inadequate. An extreme example is illustrated in Figure 1. The figure displays data (black dots) that was generated via a straight line + noise. The figure also displays a curvy linecurve, which is a [[polynomial]] chosen to fit the data perfectly. The residuals for the curvy linecurve are all zero. Hence validation based on only the first type of data would conclude that the curvy linecurve was a good model. Yet the curvy linecurve is obviously a poor model: interpolation, especially between −5 and −4, would tend to be highly misleading; moreover, any substantial extrapolation would be bad.
 
Thus, validation is usually not based on only considering data that was used in the construction of the model; rather, validation usually also employs data that was not used in the construction. In other words, validation usually includes testing some of the model's predictions.
 
A model can be validated only relative to some application area.<ref name="NRC12" /><ref name="BBKK">{{citation | author1-first= J. J. | author1-last= Batzel | author2-first= M. | author2-last= Bachar | author3-first= J. M. | author3-last= Karemaker | author4-first= F. | author4-last= Kappel | pages= 3–19 | chapter= Chapter 1: Merging mathematical and physiological knowledge | editor1-first= J. J. | editor1-last= Batzel | editor2-first= M. | editor2-last= Bachar | editor3-first= F. | editor3-last= Kappel | title= Mathematical Modeling and Validation in Physiology | publisher= [[Springer Science+Business Media|Springer]] | year= 2013 | doi= 10.1007/978-3-642-32882-4_1}}.</ref> A model that is valid for one application might be invalid for some other applications. As an example, consider the curvy linecurve in Figure&nbsp;1: if the application only used inputs from the interval [0,&nbsp;2], then the curvy linecurve might well be an acceptable model.
 
==Methods for validating==