Content deleted Content added
Kwiki user (talk | contribs) Added citation for some of the pros of MARS method and added "[citation needed]" for some of the additional pros mentioned in those lines since those are not mentioned in the newly added source. This section needs some cleanup and more references |
Link suggestions feature: 3 links added. Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit Newcomer task Suggested: add links |
||
(36 intermediate revisions by 24 users not shown) | |||
Line 1:
{{Short description|Non-parametric regression technique}}
In [[statistics]], '''multivariate adaptive regression splines''' ('''MARS''') is a form of [[regression analysis]] introduced by [[Jerome H. Friedman]] in 1991.<ref>{{Cite journal | last1 = Friedman | first1 = J. H. | doi = 10.1214/aos/1176347963 | title = Multivariate Adaptive Regression Splines | journal = The Annals of Statistics | volume = 19 | issue = 1 | pages = 1–67 | year = 1991 |mr=1091842 | zbl = 0765.62064 | jstor = 2241837| citeseerx = 10.1.1.382.970 }}</ref> It is a [[non-parametric regression]] technique and can be seen as an extension of [[linear model]]s that automatically models nonlinearities and interactions between variables.
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark infringements, many open
== The basics ==
Line 28:
: <math>
\
</math>
The hat on the <math>\
a line giving the predicted <math>\
The data at the extremes of ''x'' indicates that the relationship between ''y'' and ''x'' may be non-linear (look at the red dots relative to the regression line at low and high values of ''x''). We thus turn to MARS to automatically build a model taking into account non-linearities. MARS software constructs a model from the given ''x'' and ''y'' as follows
Line 37:
: <math>
\begin{align}
\
& {} + 6.1 \max(0, x - 13) \\
& {} - 3.1 \max(0, 13 - x)
\end{align}
</math>
Line 45:
[[File:Friedmans mars simple model.png|frame|right|A simple MARS model of the same data]]
The figure on the right shows a plot of this function: the predicted <math>\
MARS has automatically produced a kink in the predicted ''y'' to take into account non-linearity. The kink is produced by ''hinge functions''. The hinge functions are the expressions starting with <math>\max</math> (where <math>\max(a,b)</math> is <math>a</math> if <math>a > b</math>, else <math>b</math>). Hinge functions are described in more detail below.
In this simple example, we can easily see from the plot that ''y'' has a non-linear relationship with ''x'' (and might perhaps guess that y varies with the square of ''x''). However, in general there will be multiple [[Dependent and independent variables|independent variables]], and the relationship between ''y'' and these variables will be unclear and not easily visible by plotting. We can use MARS to discover that non-linear relationship.
An example MARS expression with multiple variables is
Line 69 ⟶ 56:
\begin{align}
\mathrm{ozone} = &\ 5.2 \\
&
&
&
&
\end{align}
</math>
[[File:Friedmans mars ozone model.png|frame|right|Variable interaction in a MARS model]]
This expression models air pollution (the ozone level) as a function of the temperature and a few other variables. Note that the last term in the formula (on the last line) incorporates an interaction between <math>\mathrm{wind} </math> and <math>\mathrm{vis}</math>.
The figure on the right plots the predicted <math>\mathrm{ozone}</math> as <math>\mathrm{wind}</math> and <math>\mathrm{vis}</math> vary, with the other variables fixed at their median values. The figure shows that wind does not affect the ozone level unless visibility is low. We see that MARS can build quite flexible regression surfaces by combining hinge functions.
To obtain the above expression, the MARS model building procedure automatically selects which variables to use (some variables are important, others not), the positions of the kinks in the hinge functions, and how the hinge functions are combined.
== The MARS model ==
Line 101 ⟶ 74:
MARS builds models of the form
: <math>\
The model is a weighted sum of basis functions
Line 109 ⟶ 82:
multiplied by its coefficient.
Each [[basis function]] <math>B_i(x)</math> takes one of the following three forms:
1) a constant 1. There is just one such term, [[The Intercept|the intercept]].
In the ozone formula above, the intercept term is 5.2.
2) a ''hinge'' function. A hinge function has the form <math> \max(0, x - \text{constant}) </math> or <math> \max(0, \text{constant} - x) </math>. MARS automatically selects variables and values of those variables for knots of the hinge functions. Examples of such basis functions can be seen in the middle three lines of the ozone formula.
3) a product of two or more hinge functions.
Line 131 ⟶ 94:
== Hinge functions ==
[[File:Friedmans mars hinge functions.png|frame|right|A mirrored pair of hinge functions with a knot at x=3.1]]
{{further|Hinge function}}
: <math>\max(0,x-c)</math>
or
Line 165 ⟶ 128:
(which is the mean of the response values).
MARS then repeatedly adds basis function in pairs to the model. At each step it finds the pair of basis functions that gives the maximum reduction in sum-of-squares [[Errors and residuals in statistics|residual]] error (it is a [[greedy algorithm]]). The two basis functions in the pair are identical except that a different side of a mirrored hinge function is used for each function. Each new basis function consists of a term already in the model (which could perhaps be the intercept term) multiplied by a new hinge function. A hinge function is defined by a variable and a knot, so to add a new basis function, MARS must search over all combinations of the following:
1) existing terms (called ''parent terms'' in this context)
Line 187 ⟶ 136:
3) all values of each variable (for the knot of the new hinge function).
To calculate the coefficient of each term, MARS applies a linear regression over the terms.
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
The search at each step is usually done in a [[Brute-force search|brute-force]] fashion, but a key aspect of MARS is that because of the nature of hinge functions, the search can be done quickly using a fast least-squares update technique. Brute-force search can be sped up by using a [[Heuristics|heuristic]] that reduces the number of parent terms considered at each step ("Fast MARS"<ref>[[Friedman, J. H.]] (1993) ''Fast MARS'', Stanford University Department of Statistics, Technical Report 110</ref>).
=== The backward pass ===
The forward pass usually [[overfit|overfits]] the model. To build a model with better generalization ability, the backward pass prunes the model, deleting the least effective term at each step until it finds the best submodel. Model subsets are compared using the Generalized cross validation (GCV) criterion described below.
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair and so terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for <math>\widehat{y}</math> in the first MARS example above; there are no complete pairs retained in the ozone example.
==== Generalized
{{further|Cross-validation (statistics)|Model selection|Akaike information criterion}}
The backward pass
: GCV = RSS / (''N'' · (1 − (effective number of parameters) / ''N'')<sup>2</sup>)
where RSS is the residual sum-of-squares measured on the training data and ''N'' is the number of observations (the number of rows in the '''x''' matrix).
The
: (effective number of parameters) = (number of mars terms) + (penalty) · ((number of Mars terms) − 1 ) / 2
where '''penalty''' is typically 2 (giving results equivalent to the [[Akaike information criterion]]) but can be increased by the user if they so desire.
Note that
: (number of Mars terms − 1 ) / 2
is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV formula adjusts (i.e. increases) the training RSS to penalize more complex models. We penalize flexibility because models that are too flexible will model the specific realization of noise in the data instead of just the systematic structure of the data.
=== Constraints ===
Line 293 ⟶ 191:
== Pros and cons ==
*MARS models are simple to understand and interpret.<ref name=":0">{{Cite book|title=Applied Predictive Modeling|last1=Kuhn|first1=Max|last2=Johnson|first2=Kjell|date=2013|publisher=Springer New York|isbn=9781461468486|___location=New York, NY|language=en|doi=10.1007/978-1-4614-6849-3}}</ref>
*MARS can handle both continuous and [[categorical data]].<ref>{{cite book | last=Friedman | first=Jerome H. | chapter=Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines | author-link=Friedman, J. H.|year=1993|title=New Directions in Statistical Data Analysis and Robustness |editor=Stephan Morgenthaler |editor2=Elvezio Ronchetti |editor3=Werner Stahel|publisher=Birkhauser}}</ref><ref name="Friedman 1991">{{cite journal | last=Friedman | first=Jerome H. | title=Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines | website=DTIC | date=1991-06-01 | url=https://apps.dtic.mil/sti/citations/ADA590939 | archive-url=https://web.archive.org/web/20220411085148/https://apps.dtic.mil/sti/citations/ADA590939 | url-status=live | archive-date=April 11, 2022 | access-date=2022-04-11}}</ref>
*MARS (like recursive partitioning) does automatic [[Feature selection|variable selection]] (meaning it includes important variables in the model and excludes unimportant ones). However, there can be some arbitrariness in the selection, especially when there are correlated predictors, and this can affect interpretability.<ref name=":0" />
*Building MARS models often requires little or no data preparation.<ref name=":0" />
* [https://web.stat.tamu.edu/~bmallick/wileybook/book_code.html Code] from the book ''Bayesian Methods for Nonlinear Classification and Regression''<ref>{{cite book |last1=Denison |first1=D. G. T. |last2=Holmes |first2=C. C. |last3=Mallick |first3=B. K. |last4=Smith |first4=A. F. M. |title=Bayesian methods for nonlinear classification and regression |date=2002 |publisher=Wiley |___location=Chichester, England |isbn=978-0-471-49036-4}}</ref> for Bayesian MARS.
== Extensions and related concepts ==
* [[Generalized linear model]]s (GLMs) can be incorporated into MARS models by applying a link function after the MARS model is built. Thus, for example, MARS models can incorporate [[logistic regression]] to predict probabilities.
* [[Nonlinear regression|Non-linear regression]] is used when the underlying form of the function is known and regression is used only to estimate the parameters of that function. MARS, on the other hand, estimates the functions themselves, albeit with severe constraints on the nature of the functions. (These constraints are necessary because discovering a model from the data is an [[inverse problem]] that is not [[Well-posed problem|well-posed]] without constraints on the model.)
* [[Recursive partitioning]] (commonly called CART). MARS can be seen as a generalization of recursive partitioning that allows
* [[Generalized additive model]]s.
* [[TSMARS]]. Time Series Mars is the term used when MARS models are applied in a [[time series]] context. Typically in this set up the predictors are the lagged time series values resulting in autoregressive spline models. These models and extensions to include moving average spline models are described in "Univariate Time Series Modelling and Forecasting using TSMARS: A study of threshold time series autoregressive, seasonal and moving average models using TSMARS".
* [[Bayesian MARS]] (BMARS) uses the same model form, but builds the model using a Bayesian approach. It may arrive at different optimal MARS models because the model building approach is different. The result of BMARS is typically an ensemble of posterior samples of MARS models, which allows for probabilistic prediction.<ref>{{cite journal |last1=Denison |first1=D. G. T. |last2=Mallick |first2=B. K. |last3=Smith |first3=A. F. M. |title=Bayesian MARS |journal=Statistics and Computing |date=1 December 1998 |volume=8 |issue=4 |pages=337–346 |doi=10.1023/A:1008824606259 |s2cid=12570055 |url=https://link.springer.com/content/pdf/10.1023/A:1008824606259.pdf |language=en |issn=1573-1375}}</ref>
== See also ==
Line 341 ⟶ 223:
* Berk R.A. (2008) ''Statistical learning from a regression perspective'', Springer, {{ISBN|978-0-387-77500-5}}
[[Category:Nonparametric regression]]
|