Nonparametric regression: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:51, 19 May 2015 edit Fgnievinski (talk \| contribs) Autopatrolled, Extended confirmed users 71,092 edits →Nonparametric multiplicative regression: split longest sec in this article ← Previous edit		Latest revision as of 18:59, 1 August 2025 edit undo Skbrownlee (talk \| contribs) 1 edit m grammer correction
(43 intermediate revisions by 35 users not shown)
Line 1: {{Short description\|Category of regression analysis}} {{Regression bar}} '''Nonparametric regression''' is a form of [[regression analysis]] ~~in which~~where the [[predictor]] does not take a predetermined form but is completely constructed ~~according to~~using information derived from the data. ~~Nonparametric~~That ~~regression~~is, ~~requires~~no [[parametric equation]] is assumed for the relationship between [[Dependent_and_independent_variables\|predictors]] and dependent variable. A larger [[Sampling_(statistics)\|sample]] ~~sizes~~size ~~than~~is ~~regression~~needed ~~based~~to onbuild a nonparametric model having the same level of [[Prediction_interval\|uncertainty]] as a [[parametric ~~models~~model]] because the data must supply both the model structure ~~as well as~~and the ~~model~~parameter estimates. == Definition == ==Gaussian process regression or Kriging==▼ Nonparametric regression assumes the following relationship, given the random variables <math>X</math> and <math>Y</math>: {{Main\|Gaussian process regression}}▼ :<math> In Gaussian process regression, also known as Kriging, a Gaussian prior is assumed for the regression curve. The errors are assumed to have a [[multivariate normal distribution]] and the regression curve is estimated by its [[posterior mode]]. The Gaussian prior may depend on unknown hyperparameters, which are usually estimated via [[empirical Bayes]].▼ \mathbb{E}[Y\mid X=x] = m(x), </math> where <math>m(x)</math> is some deterministic function. [[Linear regression]] is a restricted case of nonparametric regression where <math>m(x)</math> is assumed to be a linear function of the data. Sometimes a slightly stronger assumption of additive noise is used: :<math> Y = m(X) + U, </math> where the random variable <math>U</math> is the `noise term', with mean 0. Without the assumption that <math>m</math> belongs to a specific parametric family of functions it is impossible to get an unbiased estimate for <math>m</math>, however most estimators are [[Consistency_(statistics)\|consistent]] under suitable conditions. == Common nonparametric regression algorithms == ~~[[Smoothing splines]] have an interpretation as the posterior mode of a Gaussian process regression.~~ This is a non-exhaustive list of non-parametric models for regression. * [[nearest neighbor smoothing]] (see also [[k-nearest neighbors algorithm]]) ~~==Penalized least squares==~~ * [[regression tree\|regression trees]] ~~==Kernel~~* [[kernel regression==]]▼ * [[local regression]] * [[multivariate adaptive regression splines]] * [[smoothing spline\|smoothing splines]] * [[Artificial neural network\|neural networks]]<ref>{{Cite journal \|last=Cherkassky \|first=Vladimir \|last2=Mulier \|first2=Filip \|date=1994 \|editor-last=Cheeseman \|editor-first=P. \|editor2-last=Oldford \|editor2-first=R. W. \|title=Statistical and neural network techniques for nonparametric regression \|url=https://link.springer.com/chapter/10.1007/978-1-4612-2660-4_39 \|journal=Selecting Models from Data \|series=Lecture Notes in Statistics \|language=en \|___location=New York, NY \|publisher=Springer \|pages=383–392 \|doi=10.1007/978-1-4612-2660-4_39 \|isbn=978-1-4612-2660-4\|url-access=subscription }}</ref> == Examples == ~~===Quadratic penalties===~~ ▲=== Gaussian process regression or Kriging === ▲{{Main\|Gaussian process regression}} ▲In Gaussian process regression, also known as Kriging, a Gaussian prior is assumed for the regression curve. The errors are assumed to have a [[multivariate normal distribution]] and the regression curve is estimated by its [[posterior mode]]. The Gaussian prior may depend on unknown hyperparameters, which are usually estimated via [[empirical Bayes]]. The hyperparameters typically specify a prior covariance kernel. In case the kernel should also be inferred nonparametrically from the data, the [[Information_field_theory#Critical_filter\|critical filter]] can be used. ~~Here~~[[Smoothing ~~the~~splines]] ~~penalty~~have ~~term~~an isinterpretation ~~proportional to~~as the ~~[[RKHS]]~~posterior ~~norm~~mode of ~~the~~a ~~vector~~Gaussian ofprocess regression ~~coefficients~~. ===~~Lasso~~ Kernel regression === {{Main\|Lasso (statistics)}}▼ ~~The lasso is a form of penalized least squares, where the penalty term is proportional to the L1 norm of the vector of regression coefficients.~~ ▲==Kernel regression== {{Main\|Kernel regression}} [[File:NonparRegrGaussianKernel.png\|thumb\| Example of a curve (red line) fit to a small data set (black points) with nonparametric regression using a Gaussian kernel smoother. The pink shaded area illustrates the kernel function applied to obtain an estimate of y for a given value of x. The kernel function defines the weight given to each data point in producing the estimate for a target point.]]{{Unreferenced section\|date=August 2020}} Kernel regression estimates the continuous dependent variable from a limited set of data points by [[Convolution\|convolving]] the data points' locations with a [[kernel function]]—approximately speaking, the kernel function specifies how to "blur" the influence of the data points so that their values can be used to predict the value for nearby locations. === Regression trees ===▼ ~~==Nonparametric multiplicative regression==~~ ~~{{split section\|Nonparametric multiplicative regression\|date=May 2015}}~~ ~~[[File:KernelTypes.png\|thumb\| Two kinds of kernels used with kernel smoothers for nonparametric regression.]]~~ [[File:Kernel2Predictors.png\|thumb\| Use of Gaussian kernels for nonparametric multiplicative regression with two predictors. The weights from the kernel function for each predictor are multiplied to obtain a weight for a given data point in estimating a response variable (dependent variable) at a target point in the predictor space.]] ~~[[File:ModelFormsNPMR.png\|thumb\|Two commonly used forms of a local model used in nonparametric regression, contrasted with a simple linear model.]]~~ Nonparametric multiplicative regression (NPMR) is a form of nonparametric regression based on multiplicative [[kernel estimation]]. Like other regression methods, the goal is to estimate a response (dependent variable) based on one or more predictors (independent variables). NPMR can be a good choice for a regression method if the following are true: ~~# The shape of the response surface is unknown.~~ ~~# The predictors are likely to interact in producing the response; in other words, the shape of the response to one predictor is likely to depend on other predictors.~~ ~~# The response is either a quantitative or binary (0/1) variable.~~ ~~This is a smoothing technique that can be cross-validated and applied in a predictive way.~~ ~~;NPMR behaves like an organism~~ NPMR has been useful for modeling the response of an organism to its environment. Organismal response to environment tends to be nonlinear and have complex interactions among predictors. NPMR allows you to model automatically the complex interactions among predictors in much the same way that organisms integrate the numerous factors affecting their performance.<ref>{{Cite journal\|last=McCune\|first=B.\|year=2006\|title=Non-parametric habitat models with automatic interactions\|journal=Journal of Vegetation Science\|volume=17\|pages=819–830\|doi=10.1658/1100-9233(2006)17[819:NHMWAI]2.0.CO;2\|issue=6}}</ref> A key biological feature of an NPMR model is that failure of an organism to tolerate any single dimension of the predictor space results in overall failure of the organism. For example, assume that a plant needs a certain range of moisture in a particular temperature range. If either temperature or moisture fall outside the tolerance of the organism, then the organism dies. If it is too hot, then no amount of moisture can compensate to result in survival of the plant. Mathematically this works with NPMR because the product of the weights for the target point is zero or near zero if any of the weights for individual predictors (moisture or temperature) are zero or near zero. Note further that in this simple example, the second condition listed above is probably true: the response of the plant to moisture probably depends on temperature and vice versa. Optimizing the selection of predictors and their smoothing parameters in a multiplicative model is computationally intensive. With a large pool of predictors, the computer must search through a huge number of potential models in search for the best model. The best model has the best fit, subject to [[overfitting]] constraints or penalties (see below).<ref>{{cite journal \|last=Grundel \|first=R. \|first2=N. B. \|last2=Pavlovic \|year=2007 \|title=Response of bird species densities to habitat structure and fire history along a Midwestern open–forest Gradient \|journal=[[The Condor (journal)\|The Condor]] \|volume=109 \|issue=4 \|pages=734–749 \|doi=10.1650/0010-5422(2007)109[734:ROBSDT]2.0.CO;2 }}</ref><ref>{{cite journal \|last=DeBano \|first=S. J. \|first2=P. B. \|last2=Hamm \|first3=A. \|last3=Jensen \|first4=S. I. \|last4=Rondon \|first5=P. J. \|last5=Landolt \|year=2010 \|title=Spatial and temporal dynamics of potato tuberworm (''Lepidoptera: Gelechiidae'') in the Columbia Basin of the Pacific Northwest \|journal=Environmental Entomology \|volume=39 \|issue=1 \|pages=1–14 \|doi=10.1603/EN08270 }}</ref> ~~;The local model~~ NPMR can be applied with several different kinds of local models. By "local model" we mean the way that data points near a target point in the predictor space are combined to produce an estimate for the target point. The most common choices for the local models are the local mean estimator, a local linear estimator, or a local logistic estimator. In each case the weights can be extended multiplicatively to multiple dimensions. In words, the estimate of the response is a local estimate (for example a local mean) of the observed values, each value weighted by its proximity to the target point in the predictor space, the weights being the product of weights for individual predictors. The model allows interactions, because weights for individual predictors are combined by multiplication rather than addition. ~~;Overfitting Controls~~ Understanding and using these controls on overfitting is essential to effective modeling with nonparametric regression. Nonparametric regression models can become overfit either by including too many predictors or by using small smoothing parameters (also known as bandwidth or tolerance). This can make a big difference with special problems, such as small data sets or clumped distributions along predictor variables. The methods for controlling overfitting differ between NPMR and the generalized linear modeling (GLMs). The most popular overfitting controls for GLMs are the AIC ([[Akaike Information Criterion]]) and the BIC (Bayesian Information Criterion) for model selection. The AIC and BIC depend on the number of parameters in a model. Because NPMR models do not have explicit parameters as such, these are not directly applicable to NPMR models. Instead, one can control overfitting by setting a minimum average neighborhood size, minimum data:predictor ratio, and a minimum improvement required to add a predictor to a model. Nonparametric regression models sometimes use an AIC based on the "effective number of parameters".<ref>{{cite book \|last=Hastie \|first=T. \|first2=R. \|last2=Tibsharani \|first3=J. \|last3=Friedman \|year=2001 \|title=The Elements of Statistical Learning \|publisher=Springer \|___location=New York \|page=205 \|isbn=0387952845 }}</ref> This penalizes a measure of fit by the trace of the smoothing matrix—essentially how much each data point contributes to estimating itself, summed across all data points. If, however, you use leave-one-out cross validation in the model fitting phase, the trace of the smoothing matrix is always zero, corresponding to zero parameters for the AIC. Thus, NPMR with cross-validation in the model fitting phase already penalizes the measure of fit, such that the error rate of the training data set is expected to approximate the error rate in a validation data set. In other words, the training error rate approximates the prediction (extra-sample) error rate. ~~;Related techniques~~ NPMR is essentially a smoothing technique that can be cross-validated and applied in a predictive way. Many other smoothing techniques are well known, for example [[smoothing splines]] and wavelets. The optimal choice of a smoothing method depends on the specific application. ▲==Regression trees== {{Main\|Decision tree learning}} Line 71 ⟶ 52: \|___location=Monterey, CA \|isbn=978-0-412-04841-8 }}</ref> Although the original Classification And Regression Tree (CART) formulation applied only to predicting univariate data, the framework can be used to predict multivariate data, including time series.<ref>{{Cite journal \| last = Segal \| first = M.R. Line 82 ⟶ 63: \| jstor =2290271 \| doi = 10.2307/2290271 \| publisher = American Statistical Association, Taylor & Francis }}</ref> ==See also== ▲~~{{Main\|~~* [[Lasso (statistics)}}]] * [[Local regression]] * [[Non-parametric statistics]] * [[Semiparametric regression]] Line 94 ⟶ 78: ==Further reading== * {{cite book \|last=Bowman \|first=A. W. \|first2=A. \|last2=Azzalini \|year=1997 \|title=Applied Smoothing Techniques for Data Analysis \|publisher=Clarendon Press \|___location=Oxford \|isbn=~~0198523963~~0-19-852396-3 \|url=https://books.google.com/books?id=7WBMrZ9umRYC }} * {{cite book \|last=Fan \|first=J. \|first2=I. \|last2=Gijbels\|author2-link= Irène Gijbels \|year=1996 \|title=Local Polynomial Modelling and its Applications \|___location=Boca Raton \|publisher=Chapman and Hall \|isbn=0-412-98321-4 \|url=https://books.google.com/books?id=BM1ckQKCXP8C }} * [[William S. Cleveland\|Cleveland, W.S.]] (1979) "Robust locally weighted regression and smoothing scatterplots" J. Amer. Statist. Assoc., 74, pp. 829–836 * {{cite book \|last=Henderson \|first=D. J. \|first2=C. F. \|last2=Parmeter \|title=Applied Nonparametric Econometrics \|___location=New York \|publisher=Cambridge University Press \|year=2015 \|isbn=978-1-107-01025-3 \|url=https://books.google.com/books?id=hD3WBQAAQBAJ }} * McCune, B. and M. J. Mefford (2009). HyperNiche. Nonparametric Multiplicative Habitat Modeling. Version 2. MjM Software, Gleneden Beach, Oregon, U.S.A. * {{cite book \|last=Li \|first=Q. \|first2=J. \|last2=Racine \|year=2007 \|title=Nonparametric Econometrics: Theory and Practice \|___location=Princeton \|publisher=Princeton University Press \|isbn=978-0-691-12161-1 \|url=https://books.google.com/books?id=BI_PiWazY0YC }} * Fan, J. (1993) "Local linear regression smoothers and their minimax efficiency" Ann. Statist., 21, pp. 196–216 * {{cite book \|last=Pagan \|first=A. \|author-link=Adrian Pagan \|first2=A. \|last2=Ullah \|year=1999 \|title=Nonparametric Econometrics \|___location=New York \|publisher=Cambridge University Press \|isbn=0-521-35564-8 \|url=https://archive.org/details/nonparametriceco00paga \|url-access=registration }} * Fan, J. and I. Gijbels (1992) Variable bandwidth and local linear regression smoothers, Ann. Statist., 20, pp. 2008–2036 * Fan, J. and I. Gijbels (1996) Local Polynomial Modelling and its Applications, Chapman and Hall * Li, Q. and J. Racine (2007) Nonparametric Econometrics: Theory and Practice, Princeton University Press * [http://www.sciencedirect.com/science/article/pii/S0167947314001741 Li, D., Simar, L. and V. Zelenyuk (2014) "Generalized nonparametric smoothing with mixed discrete and continuous data" Computational Statistics and Data Analysis, 1-21. doi:10.1016/j.csda.2014.06.003] * Pagan, A. and A. Ullah (1999) Nonparametric Econometrics, Cambridge University Press. * Racine, J. and Q. Li, (2004) "Nonparametric estimation of regression functions with both categorical and continuous data" Journal of Econometrics 119, pp. 99–130 * [https://ideas.repec.org/a/eee/econom/v146y2008i1p185-198.html Park, Byeong U. & Simar, Léopold & Zelenyuk, Valentin, 2008. "Local likelihood estimation of truncated regression and its partial derivatives: Theory and application," Journal of Econometrics, Elsevier, vol. 146(1), pages 185-198, September.] ==External links== {{Commonscat}} * [http://www.hyperniche.com/ HyperNiche, software for nonparametric multiplicative regression]. * [http://www.cshyperniche.~~tut.fi~~com/~~~lasip~~ ~~Scale-adaptive~~HyperNiche, software for nonparametric multiplicative regression] ~~(with Matlab software)~~. *[http://www.cs.tut.fi/~lasip Scale-adaptive nonparametric regression] (with Matlab software). {{statistics}} ~~[[Category:Regression analysis]]~~ ~~[[Category:Non-parametric statistics]]~~ [[Category:Nonparametric regression\| ]]