Multivariate adaptive regression spline: Difference between revisions

Content deleted Content added
Nhkdhg1 (talk | contribs)
Pros and cons: The resulting fitted function is not smooth (not differentiable along hinges).
Kwiki user (talk | contribs)
Added citation for some of the pros of MARS method and added "[citation needed]" for some of the additional pros mentioned in those lines since those are not mentioned in the newly added source. This section needs some cleanup and more references
Line 304:
 
*MARS models are more flexible than [[linear regression]] models.
*MARS models are simple to understand and interpret<ref name=":0">{{Cite book|url=http://link.springer.com/10.1007/978-1-4614-6849-3|title=Applied Predictive Modeling|last=Kuhn|first=Max|last2=Johnson|first2=Kjell|date=2013|publisher=Springer New York|isbn=9781461468486|___location=New York, NY|language=en|doi=10.1007/978-1-4614-6849-3}}</ref>. Compare the equation for ozone concentration above to, say, the innards of a trained [[Artificial neural network|neural network]] or a [[random forest]].
*MARS can handle both continuous and categorical data.<ref>[[Friedman, J. H.]] (1993) ''Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines'', New Directions in Statistical Data Analysis and Robustness (Morgenthaler, Ronchetti, Stahel, eds.), Birkhauser</ref> MARS tends to be better than recursive partitioning for numeric data because hinges are more appropriate for numeric variables than the piecewise constant segmentation used by recursive partitioning.
*Building MARS models often requires little or no data preparation<ref name=":0" />. The hinge functions automatically partition the input data, so the effect of outliers is contained. In this respect MARS is similar to [[recursive partitioning]] which also partitions the data into disjoint regions, although using a different method. (Nevertheless, as with most statistical modeling techniques, known outliers should be considered for removal before training a MARS model.{{Citation needed|date=March 2019}})
*MARS (like recursive partitioning) does ''automatic [[Feature selection|variable selection'']] (meaning it includes important variables in the model and excludes unimportant ones). However, bear in mind that variable selection is not a clean problem and there iscan usuallybe some arbitrariness in the selection, especially inwhen thethere presenceare ofcorrelated [[Multicollinearity|collinearity]]predictors, and 'concurvity'.this can affect interpretability<ref name=":0" />
*MARS models tend to have a good bias-variance trade-off. The models are flexible enough to model non-linearity and variable interactions (thus MARS models have fairly low bias), yet the constrained form of MARS basis functions prevents too much flexibility (thus MARS models have fairly low variance).
*MARS is suitable for handling fairly large datasets. It is a routine matter to build a MARS model from an input matrix with, say, 100 predictors and 10<sup>5</sup> observations. Such a model can be built in about a minute on a 1&nbsp;GHz machine, assuming the maximum degree of interaction of MARS terms is limited to one (i.e. additive terms only). A degree two model with the same data on the same 1&nbsp;GHz machine takes longer—about 12 minutes. Be aware that these times are highly data dependent. Recursive partitioning is much faster than MARS.{{Citation needed|date=March 2019}}
*With MARS models, as with any non-parametric regression, parameter confidence intervals and other checks on the model cannot be calculated directly (unlike [[linear regression]] models). [[Cross-validation (statistics)|Cross-validation]] and related techniques must be used for validating the model instead.
*MARS models do not give as good fits as [[Boosting (meta-algorithm)|boosted]] trees, but can be built much more quickly and are more interpretable. (An 'interpretable' model is in a form that makes it clear what the effect of each predictor is.)
*The <code>earth</code>, <code>mda</code>, and <code>polspline</code> implementations do not allow missing values in predictors, but free implementations of regression trees (such as <code>rpart</code> and <code>party</code>) do allow missing values using a technique called surrogate splits.