Revision as of 07:19, 15 October 2015 edit Cydebot (talk \| contribs) 6,812,251 edits m Robot - Speedily moving category Non-parametric statistics to Category:Nonparametric statistics per CFDS. ← Previous edit		Revision as of 21:19, 25 October 2015 edit undo Bender235 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers, Template editors 472,816 edits m →Nonparametric multiplicative regression{{anchor\|Multiplicative}} Next edit →
Line 42: Understanding and using these controls on overfitting is essential to effective modeling with nonparametric regression. Nonparametric regression models can become overfit either by including too many predictors or by using small smoothing parameters (also known as bandwidth or tolerance). This can make a big difference with special problems, such as small data sets or clumped distributions along predictor variables. The methods for controlling overfitting differ between NPMR and the [[generalized linear ~~modeling~~model]]ing (GLMs). The most popular overfitting controls for GLMs are the ~~AIC (~~[[Akaike ~~Information~~information ~~Criterion~~criterion]] (AIC) and the ~~BIC (~~[[Bayesian ~~Information~~information ~~Criterion~~criterion]] (BIC) for model selection. The AIC and BIC depend on the number of parameters in a model. Because NPMR models do not have explicit parameters as such, these are not directly applicable to NPMR models. Instead, one can control overfitting by setting a minimum average neighborhood size, minimum data:predictor ratio, and a minimum improvement required to add a predictor to a model. Nonparametric regression models sometimes use an AIC based on the "effective number of parameters".<ref>{{cite book \|last=Hastie \|first=T. \|first2=R. \|last2=Tibsharani \|first3=J. \|last3=Friedman \|year=2001 \|title=The Elements of Statistical Learning \|publisher=Springer \|___location=New York \|page=205 \|isbn=0387952845 }}</ref> This penalizes a measure of fit by the trace of the smoothing matrix—essentially how much each data point contributes to estimating itself, summed across all data points. If, however, you use leave-one-out cross validation in the model fitting phase, the trace of the smoothing matrix is always zero, corresponding to zero parameters for the AIC. Thus, NPMR with cross-validation in the model fitting phase already penalizes the measure of fit, such that the error rate of the training data set is expected to approximate the error rate in a validation data set. In other words, the training error rate approximates the prediction (extra-sample) error rate.

Nonparametric regression: Difference between revisions