Multivariate adaptive regression spline: Difference between revisions

Content deleted Content added
Line 136:
3) all values of each variable (for the knot of the new hinge function).
 
To calculate the coefficient of each term, MARS applies a linear regression over the terms.
 
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
 
The search at each step is usually done in a [[Brute-force search|brute-force]] fashion, but a key aspect of MARS is that because of the nature of hinge functions, the search can be done relatively quickly using a fast least-squares update technique. Actually, the search is not quite brute Brute-force. The search can be sped up withby using a [[Heuristics|heuristic]] that reduces the number of parent terms to considerconsidered at each step ("Fast MARS"<ref>[[Friedman, J. H.]] (1993) ''Fast MARS'', Stanford University Department of Statistics, Technical Report 110</ref>).
 
=== The backward pass ===
 
The forward pass usually builds an [[overfit|overfits]] model. (An overfit model has a good fit to the data used to build the model but will not generalize well to new data.) To build a model with better generalization ability, the backward pass prunes the model. It removes terms one by one, deleting the least effective term at each step until it finds the best submodel. Model subsets are compared using the Generalized cross validation (GCV) criterion described below.
 
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
Line 151:
 
==== Generalized cross validation ====
{{further|Cross-validation (statistics)|Model selection|Akaike information criterion}}
 
The backward pass compares the performance of different models using Generalized Cross-Validation (GCV), a minor variant on the [[Akaike information criterion]] that approximates the [[leave-one-out cross-validation]] score in the special case where errors are Gaussian, or where the squared error loss function is used. GCV was introduced by Craven and [[Grace Wahba|Wahba]] and extended by Friedman for MARS; lower values of GCV indicate better models. The formula for the GCV is
The backward pass uses generalized cross validation (GCV) to compare the performance of model subsets in order to choose the best subset: lower values of GCV are better. The GCV is a form of [[Regularization (machine learning)|regularization]]: it trades off goodness-of-fit against model complexity.
 
(We want to estimate how well a model performs on ''new'' data, not on the training data. Such new data is usually not available at the time of model building, so instead we use GCV to estimate what performance would be on new data. The raw [[Residual sum of squares|residual sum-of-squares]] (RSS) on the training data is inadequate for comparing models, because the RSS always increases as MARS terms are dropped. In other words, if the RSS were used to compare models, the backward pass would always choose the largest model—but the largest model typically does not have the best generalization performance.)
 
The formula for the GCV is
 
: GCV = RSS / (''N'' · (1 − (effective number of parameters) / ''N'')<sup>2</sup>)
Line 163 ⟶ 159:
where RSS is the residual sum-of-squares measured on the training data and ''N'' is the number of observations (the number of rows in the '''x''' matrix).
 
The ''EffectiveNumberOfParameters''effective number of parameters is defined inas
the MARS context as
 
: (effective number of parameters) = (number of mars terms) + (penalty) · ((number of Mars terms) − 1 ) / 2
 
where '''penalty''' is abouttypically 2 or(giving 3results equivalent to (the MARS[[Akaike softwareinformation allowscriterion]]) but can be increased by the user toif presetthey penalty)so desire.
 
Note that
Line 174 ⟶ 169:
: (number of Mars terms − 1 ) / 2
 
is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV formula adjusts (i.e. increases) the training RSS to takepenalize intomore accountcomplex the flexibility of the modelmodels. We penalize flexibility because models that are too flexible will model the specific realization of noise in the data instead of just the systematic structure of the data.
 
Generalized cross-validation is so named because it uses a formula to approximate the error that would be determined by leave-one-out validation. It is just an approximation but works well in practice. GCVs were introduced by Craven and [[Grace Wahba|Wahba]] and extended by Friedman for MARS.
 
=== Constraints ===
Line 226 ⟶ 219:
* [[Generalized additive model]]s. Unlike MARS, GAMs fit smooth [[Local regression|loess]] or polynomial [[Spline (mathematics)|splines]] rather than hinge functions, and they do not automatically model variable interactions. The smoother fit and lack of regression terms reduces variance when compared to MARS, but ignoring variable interactions can worsen the bias.
* [[TSMARS]]. Time Series Mars is the term used when MARS models are applied in a time series context. Typically in this set up the predictors are the lagged time series values resulting in autoregressive spline models. These models and extensions to include moving average spline models are described in "Univariate Time Series Modelling and Forecasting using TSMARS: A study of threshold time series autoregressive, seasonal and moving average models using TSMARS".
* [[Bayesian MARS]] (BMARS) uses the same model form, but builds the model using a Bayesian approach. It may arrive at different optimal MARS models because the model building approach is different. The result of BMARS is typically an ensemble of posterior samples of MARS models, which allows for probabilistic prediction.<ref>{{cite journal |last1=Denison |first1=D. G. T. |last2=Mallick |first2=B. K. |last3=Smith |first3=A. F. M. |title=Bayesian MARS |journal=Statistics and Computing |date=1 December 1998 |volume=8 |issue=4 |pages=337–346 |doi=10.1023/A:1008824606259 |s2cid=12570055 |url=https://link.springer.com/content/pdf/10.1023/A:1008824606259.pdf |language=en |issn=1573-1375}}</ref>
 
== See also ==