Multivariate adaptive regression spline: Difference between revisions

Content deleted Content added
No edit summary
Line 1:
In [[statistics]], '''multivariate adaptive regression splines''' ('''MARS''') is a form of [[regression analysis]] introduced by [[Jerome H. Friedman]] in 1991.<ref>{{Cite journal | last1 = Friedman | first1 = J. H. | doi = 10.1214/aos/1176347963 | title = Multivariate Adaptive Regression Splines | journal = The Annals of Statistics | volume = 19 | issue = 1 | pages = 1–67 | year = 1991 | pmid = | pmc = |mr=1091842 | zbl = 0765.62064 | jstor = 2241837}}</ref> It is a [[non-parametric regression]] technique and can be seen as an extension of [[linear model]]s that automatically models nonlinearities and interactions between variables.
and can be seen as an extension of [[linear model]]s that automatically models nonlinearities and interactions between variables.
 
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark infringements, many open -source implementations of MARS are called "Earth".<ref>[https://cran.r-project.org/web/packages/earth/index.html CRAN Package earth]</ref><ref>[http://orange.biolab.si/blog/2011/12/20/earth-multivariate-adaptive-regression-splines/ Earth - Multivariate adaptive regression splines in Orange (Python machine learning library)]</ref>
 
== The basics ==
Line 142 ⟶ 141:
(which is the mean of the response values).
 
MARS then repeatedly adds basis function in pairs to the model. At each step it finds the pair of basis functions that gives the maximum reduction in sum-of-squares [[Errors and residuals in statistics|residual]] error (it is a [[greedy algorithm]]). The two basis functions in the pair are identical except that a different side of a mirrored hinge function is used for each function. Each new basis function consists of a term already in the model (which could perhaps be the intercept term) multiplied by a new hinge function. A hinge function is defined by a variable and a knot, so to add a new basis function, MARS must search over all combinations of the following:
MARS then repeatedly adds basis function in pairs to the model.
At each step it finds the pair of basis functions that
gives the maximum reduction in sum-of-squares
[[Errors and residuals in statistics|residual]] error
(it is a [[greedy algorithm]]).
The two basis functions in the pair
are identical except that a different
side of a mirrored hinge function is used for each function.
Each new basis function consists of
a term already in the model
(which could perhaps be the intercept term)
multiplied by a new hinge function.
A hinge function is defined by a variable and a knot,
so to add a new basis function, MARS must search over
all combinations of the following:
 
1) existing terms (called ''parent terms'' in this context)
Line 164 ⟶ 149:
3) all values of each variable (for the knot of the new hinge function).
 
To calculate the coefficient of each term MARS applies a linear regression over the terms.
MARS applies a linear regression over the terms.
 
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
This process of adding terms continues until
the change in residual error is too small to continue
or until the maximum number of terms is reached.
The maximum number of terms
is specified by the user before model building starts.
 
The search at each step is done in a [[Brute-force search|brute-force]] fashion, but a key aspect of MARS is that because of the nature of hinge functions the search can be done relatively quickly using a fast least-squares update technique. Actually, the search is not quite brute force. The search can be sped up with a [[Heuristics|heuristic]] that reduces the number of parent terms to consider at each step ("Fast MARS"<ref>[[Friedman, J. H.]] (1993) ''Fast MARS'', Stanford University Department of Statistics, Technical Report 110</ref>).
Line 177 ⟶ 157:
=== The backward pass ===
 
The forward pass usually builds an [[overfit]] model. (An overfit model has a good fit to the data used to build the model but will not generalize well to new data.) To build a model with better generalization ability, the backward pass prunes the model. It removes terms one by one, deleting the least effective term at each step until it finds the best submodel. Model subsets are compared using the GCV criterion described below.
The forward pass usually builds an [[overfit]] model.
(An overfit model has a good fit to the data used to build
the model but will not generalize well to new data.)
To build a model with better generalization ability,
the backward pass prunes the model.
It removes terms one by one,
deleting the least effective term at each step
until it finds the best submodel.
Model subsets are compared using the GCV criterion described below.
 
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
at any step it can choose any term to delete,
whereas the forward pass
at each step can only see the next pair of terms.
 
The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair and so terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for <math>\widehat{y}</math> in the first MARS example above; there are no complete pairs retained in the ozone example.
Line 302 ⟶ 271:
** [http://www.cs.rtu.lv/jekabsons/regression.html ARESLab: Adaptive Regression Splines toolbox for Matlab]
* Python
** [http://orange.biolab.si/blog/2011/12/20/earth-multivariate-adaptive-regression-splines/ Earth - Multivariate adaptive regression splines]
** [https://github.com/jcrudy/py-earth/ py-earth]