Revision as of 14:50, 1 March 2019 edit Nhkdhg1 (talk \| contribs) 222 edits →Pros and cons: The resulting fitted function is not smooth (not differentiable along hinges). Tag: Visual edit ← Previous edit		Revision as of 17:47, 23 March 2019 edit undo Kwiki user (talk \| contribs) 399 edits Added citation for some of the pros of MARS method and added "[citation needed]" for some of the additional pros mentioned in those lines since those are not mentioned in the newly added source. This section needs some cleanup and more references Tag: Visual edit Next edit →
Line 304: MARS models are more flexible than [[linear regression]] models. MARS models are simple to understand and interpret<ref name=":0">{{Cite book\|url=http://link.springer.com/10.1007/978-1-4614-6849-3\|title=Applied Predictive Modeling\|last=Kuhn\|first=Max\|last2=Johnson\|first2=Kjell\|date=2013\|publisher=Springer New York\|isbn=9781461468486\|___location=New York, NY\|language=en\|doi=10.1007/978-1-4614-6849-3}}</ref>. Compare the equation for ozone concentration above to, say, the innards of a trained [[Artificial neural network\|neural network]] or a [[random forest]]. MARS can handle both continuous and categorical data.<ref>[[Friedman, J. H.]] (1993) ''Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines'', New Directions in Statistical Data Analysis and Robustness (Morgenthaler, Ronchetti, Stahel, eds.), Birkhauser</ref> MARS tends to be better than recursive partitioning for numeric data because hinges are more appropriate for numeric variables than the piecewise constant segmentation used by recursive partitioning. Building MARS models often requires little or no data preparation<ref name=":0" />. The hinge functions automatically partition the input data, so the effect of outliers is contained. In this respect MARS is similar to [[recursive partitioning]] which also partitions the data into disjoint regions, although using a different method. (Nevertheless, as with most statistical modeling techniques, known outliers should be considered for removal before training a MARS model.{{Citation needed\|date=March 2019}}) MARS (like recursive partitioning) does ''automatic [[Feature selection\|variable selection'']] (meaning it includes important variables in the model and excludes unimportant ones). However, ~~bear in mind that variable selection is not a clean problem and~~ there iscan ~~usually~~be some arbitrariness in the selection, especially inwhen ~~the~~there ~~presence~~are ofcorrelated ~~[[Multicollinearity\|collinearity]]~~predictors, and ~~'concurvity'.~~this can affect interpretability<ref name=":0" /> MARS models tend to have a good bias-variance trade-off. The models are flexible enough to model non-linearity and variable interactions (thus MARS models have fairly low bias), yet the constrained form of MARS basis functions prevents too much flexibility (thus MARS models have fairly low variance). MARS is suitable for handling fairly large datasets. It is a routine matter to build a MARS model from an input matrix with, say, 100 predictors and 10<sup>5</sup> observations. Such a model can be built in about a minute on a 1 GHz machine, assuming the maximum degree of interaction of MARS terms is limited to one (i.e. additive terms only). A degree two model with the same data on the same 1 GHz machine takes longer—about 12 minutes. Be aware that these times are highly data dependent. Recursive partitioning is much faster than MARS.{{Citation needed\|date=March 2019}} With MARS models, as with any non-parametric regression, parameter confidence intervals and other checks on the model cannot be calculated directly (unlike [[linear regression]] models). [[Cross-validation (statistics)\|Cross-validation]] and related techniques must be used for validating the model instead. MARS models do not give as good fits as [[Boosting (meta-algorithm)\|boosted]] trees, but can be built much more quickly and are more interpretable. (An 'interpretable' model is in a form that makes it clear what the effect of each predictor is.) The <code>earth</code>, <code>mda</code>, and <code>polspline</code> implementations do not allow missing values in predictors, but free implementations of regression trees (such as <code>rpart</code> and <code>party</code>) do allow missing values using a technique called surrogate splits.

Multivariate adaptive regression spline: Difference between revisions