Content deleted Content added
Fix cite date error |
Link suggestions feature: 3 links added. Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit Newcomer task Suggested: add links |
||
(23 intermediate revisions by 18 users not shown) | |||
Line 1:
{{Short description|Non-parametric regression technique}}
In [[statistics]], '''multivariate adaptive regression splines''' ('''MARS''') is a form of [[regression analysis]] introduced by [[Jerome H. Friedman]] in 1991.<ref>{{Cite journal | last1 = Friedman | first1 = J. H. | doi = 10.1214/aos/1176347963 | title = Multivariate Adaptive Regression Splines | journal = The Annals of Statistics | volume = 19 | issue = 1 | pages = 1–67 | year = 1991
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark infringements, many open-source implementations of MARS are called "Earth".<ref>[https://cran.r-project.org/web/packages/earth/index.html CRAN Package earth]</ref><ref>[http://orange.biolab.si/blog/2011/12/20/earth-multivariate-adaptive-regression-splines/ Earth – Multivariate adaptive regression splines in Orange (Python machine learning library)]</ref>
Line 83 ⟶ 84:
Each [[basis function]] <math>B_i(x)</math> takes one of the following three forms:
1) a constant 1. There is just one such term, [[The Intercept|the intercept]].
In the ozone formula above, the intercept term is 5.2.
Line 93 ⟶ 94:
== Hinge functions ==
[[File:Friedmans mars hinge functions.png|frame|right|A mirrored pair of hinge functions with a knot at x=3.1]]
{{further|Hinge function}}
: <math>\max(0,x-c)</math>
or
Line 135 ⟶ 136:
3) all values of each variable (for the knot of the new hinge function).
To calculate the coefficient of each term, MARS applies a linear regression over the terms.
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
The search at each step is usually done in a [[Brute-force search|brute-force]] fashion, but a key aspect of MARS is that because of the nature of hinge functions, the search can be done
=== The backward pass ===
The forward pass usually
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
Line 150 ⟶ 151:
==== Generalized cross validation ====
{{further|Cross-validation (statistics)|Model selection|Akaike information criterion}}
The backward pass compares the performance of different models using Generalized Cross-Validation (GCV), a minor variant on the [[Akaike information criterion]] that approximates the [[leave-one-out cross-validation]] score in the special case where errors are Gaussian, or where the squared error [[loss function]] is used. GCV was introduced by Craven and [[Grace Wahba|Wahba]] and extended by Friedman for MARS; lower values of GCV indicate better models. The formula for the GCV is
: GCV = RSS / (''N'' · (1 − (effective number of parameters) / ''N'')<sup>2</sup>)
Line 162 ⟶ 159:
where RSS is the residual sum-of-squares measured on the training data and ''N'' is the number of observations (the number of rows in the '''x''' matrix).
The
: (effective number of parameters) = (number of mars terms) + (penalty) · ((number of Mars terms) − 1 ) / 2
where '''penalty''' is
Note that
: (number of Mars terms − 1 ) / 2
is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV formula adjusts (i.e. increases) the training RSS to
=== Constraints ===
Line 197 ⟶ 191:
== Pros and cons ==
*MARS models are simple to understand and interpret.<ref name=":0">{{Cite book|title=Applied Predictive Modeling|
*MARS can handle both continuous and [[categorical data]].<ref>{{cite book | last=Friedman | first=Jerome H. | chapter=Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines | author-link=Friedman, J. H.|year=1993|title=New Directions in Statistical Data Analysis and Robustness |editor=Stephan Morgenthaler |editor2=Elvezio Ronchetti |editor3=Werner Stahel|publisher=Birkhauser}}</ref><ref name="Friedman 1991">{{cite journal | last=Friedman | first=Jerome H. | title=Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines | website=DTIC | date=1991-06-01 | url=https://apps.dtic.mil/sti/citations/ADA590939 | archive-url=https://web.archive.org/web/20220411085148/https://apps.dtic.mil/sti/citations/ADA590939 | url-status=live | archive-date=April 11, 2022 | access-date=2022-04-11}}</ref>
*MARS (like recursive partitioning) does automatic [[Feature selection|variable selection]] (meaning it includes important variables in the model and excludes unimportant ones). However, there can be some arbitrariness in the selection, especially when there are correlated predictors, and this can affect interpretability.<ref name=":0" />▼
*Building MARS models often requires little or no data preparation.<ref name=":0" />
* [https://web.stat.tamu.edu/~bmallick/wileybook/book_code.html Code] from the book ''Bayesian Methods for Nonlinear Classification and Regression''<ref>{{cite book |last1=Denison |first1=D. G. T. |last2=Holmes |first2=C. C. |last3=Mallick |first3=B. K. |last4=Smith |first4=A. F. M. |title=Bayesian methods for nonlinear classification and regression |date=2002 |publisher=Wiley |___location=Chichester, England |isbn=978-0-471-49036-4}}</ref> for Bayesian MARS.
▲*MARS models are simple to understand and interpret.<ref name=":0">{{Cite book|title=Applied Predictive Modeling|last=Kuhn|first=Max|last2=Johnson|first2=Kjell|date=2013|publisher=Springer New York|isbn=9781461468486|___location=New York, NY|language=en|doi=10.1007/978-1-4614-6849-3}}</ref> Compare the equation for ozone concentration above to, say, the innards of a trained [[Artificial neural network|neural network]] or a [[random forest]].
▲*MARS (like recursive partitioning) does automatic [[Feature selection|variable selection]] (meaning it includes important variables in the model and excludes unimportant ones). However, there can be some arbitrariness in the selection, especially when there are correlated predictors, and this can affect interpretability<ref name=":0" />
== Extensions and related concepts ==
* [[Generalized linear model]]s (GLMs) can be incorporated into MARS models by applying a link function after the MARS model is built. Thus, for example, MARS models can incorporate [[logistic regression]] to predict probabilities.
* [[Nonlinear regression|Non-linear regression]] is used when the underlying form of the function is known and regression is used only to estimate the parameters of that function. MARS, on the other hand, estimates the functions themselves, albeit with severe constraints on the nature of the functions. (These constraints are necessary because discovering a model from the data is an [[inverse problem]] that is not [[Well-posed problem|well-posed]] without constraints on the model.)
* [[Recursive partitioning]] (commonly called CART). MARS can be seen as a generalization of recursive partitioning that allows
* [[Generalized additive model]]s.
* [[TSMARS]]. Time Series Mars is the term used when MARS models are applied in a [[time series]] context. Typically in this set up the predictors are the lagged time series values resulting in autoregressive spline models. These models and extensions to include moving average spline models are described in "Univariate Time Series Modelling and Forecasting using TSMARS: A study of threshold time series autoregressive, seasonal and moving average models using TSMARS".
* [[Bayesian MARS]] (BMARS) uses the same model form, but builds the model using a Bayesian approach. It may arrive at different optimal MARS models because the model building approach is different. The result of BMARS is typically an ensemble of posterior samples of MARS models, which allows for probabilistic prediction.<ref>{{cite journal |last1=Denison |first1=D. G. T. |last2=Mallick |first2=B. K. |last3=Smith |first3=A. F. M. |title=Bayesian MARS |journal=Statistics and Computing |date=1 December 1998 |volume=8 |issue=4 |pages=337–346 |doi=10.1023/A:1008824606259 |s2cid=12570055 |url=https://link.springer.com/content/pdf/10.1023/A:1008824606259.pdf |language=en |issn=1573-1375}}</ref>
== See also ==
Line 245 ⟶ 222:
* Denison D.G.T., Holmes C.C., Mallick B.K., and Smith A.F.M. (2004) [http://www.stat.tamu.edu/~bmallick/wileybook/book_code.html ''Bayesian Methods for Nonlinear Classification and Regression''], Wiley, {{ISBN|978-0-471-49036-4}}
* Berk R.A. (2008) ''Statistical learning from a regression perspective'', Springer, {{ISBN|978-0-387-77500-5}}
[[Category:Nonparametric regression]]
|