Revision as of 17:47, 23 March 2019 edit Kwiki user (talk \| contribs) 399 edits Added citation for some of the pros of MARS method and added "[citation needed]" for some of the additional pros mentioned in those lines since those are not mentioned in the newly added source. This section needs some cleanup and more references Tag: Visual edit ← Previous edit		Revision as of 19:34, 23 March 2019 edit undo Michael Hardy (talk \| contribs) Administrators 210,577 edits No edit summary Next edit →
Line 196: is specified by the user before model building starts. The search at each step is done in a [[Brute-force search\|brute-force]] fashion, but a key aspect of MARS is that because of the nature of hinge functions the search can be done relatively quickly using a fast least-squares update technique. Actually, the search is not quite brute force. The search can be sped up with a [[Heuristics\|heuristic]] that reduces the number of parent terms to consider at each step ("Fast MARS"<ref>[[Friedman, J. H.]] (1993) ''Fast MARS'', Stanford University Department of Statistics, Technical Report 110</ref>). ~~The search at each step is done in a [[Brute-force search\|brute force]] fashion,~~ ~~but a key aspect of MARS is that~~ ~~because of the nature of hinge functions~~ ~~the search can be done relatively~~ ~~quickly using a fast least-squares update technique.~~ ~~Actually, the search is not quite brute force.~~ ~~The search can be sped up with a [[Heuristics\|heuristic]]~~ ~~that reduces the number~~ ~~of parent terms to consider at each step~~ ~~("Fast MARS"~~ ~~<ref>[[Friedman, J. H.]] (1993)~~ ~~''Fast MARS'',~~ ~~Stanford University Department of Statistics, Technical Report 110~~ ~~</ref>).~~ === The backward pass === Line 228 ⟶ 215: at each step can only see the next pair of terms. The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair ~~but the backward pass typically discards one side of the pair~~ and so terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for <math>\hat{y}</math> in the first MARS example above; there are no complete pairs retained in the ozone example. ~~A paired hinge can be seen in~~ ~~the equation for <math>\hat{y}</math> in the~~ ~~first MARS example above;~~ ~~there are no complete pairs retained in the ozone example.~~ ==== Generalized ~~Cross~~cross ~~Validation~~validation ==== {{further\|Cross-validation (statistics)\|Model selection}} The backward pass uses ~~Generalized~~generalized ~~Cross~~cross ~~Validation~~validation (GCV) to compare the performance of model subsets in order to choose the best subset: lower values of GCV are better. The GCV is a form of [[Regularization (machine learning)\|regularization]]: it trades off goodness-of-fit against model complexity. ~~The GCV is a form of~~ ~~[[Regularization (machine learning)\|regularization]]:~~ ~~it trades off goodness-of-fit against model complexity.~~ (We want to estimate how well a model performs on ''new'' data, not on the training data. Such new data is usually not available at the time of model building, so instead we use GCV to estimate what performance would be on new data. The raw [[Residual sum of squares\|residual sum-of-squares]] (RSS) on the training data is inadequate for comparing models, because the RSS always increases as MARS terms are dropped. In other words, if the RSS were used to compare models, the backward pass would always choose the largest model—but the largest model typically does not have the best generalization performance.) The formula for the GCV is ~~:'''GCV = RSS / (N * (1 - EffectiveNumberOfParameters / N)^2)'''~~ : GCV = RSS / (''N'' · (1 − (effective number of parameters) / ''N'')<sup>2</sup>) ~~where '''RSS''' is the residual sum-of-squares~~ ~~measured on the training data and '''N''' is the~~ where RSS is the residual sum-of-squares measured on the training data and ''N'' is the number of observations (the number of rows in the '''x''' matrix). The ''EffectiveNumberOfParameters'' is defined in the MARS context as ~~:'''EffectiveNumberOfParameters = NumberOfMarsTerms + Penalty * (NumberOfMarsTerms - 1 ) / 2'''~~ : (effective number of parameters) = (number of mars terms) + (penalty) · ((number of Mars terms) − 1 ) / 2 ~~where '''Penalty''' is about 2 or 3 (the~~ where '''penalty''' is about 2 or 3 (the MARS software allows the user to preset ~~Penalty~~penalty). Note that ~~:'''(NumberOfMarsTerms - 1 ) / 2'''~~ ~~is the number of hinge-function knots,~~ ~~so the formula penalizes the addition of knots.~~ ~~Thus the GCV formula adjusts (i.e. increases) the training RSS to take into~~ ~~account the flexibility of the model.~~ We penalize flexibility because models that are too flexible will model the specific realization of noise in the data instead of just the systematic structure of the data.▼ : (number of Mars terms − 1 ) / 2 ~~Generalized Cross Validation is so named because~~ ~~it uses a formula to approximate the error~~ ▲is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV formula adjusts (i.e. increases) the training RSS to take into account the flexibility of the model. We penalize flexibility because models that are too flexible will model the specific realization of noise in the data instead of just the systematic structure of the data. ~~that would be determined by leave-one-out validation.~~ ~~It is just an approximation but works well in practice.~~ Generalized cross-validation is so named because it uses a formula to approximate the error that would be determined by leave-one-out validation. It is just an approximation but works well in practice. GCVs were introduced by Craven and [[Grace Wahba\|Wahba]] and extended by Friedman for MARS. ~~GCVs were introduced by Craven and~~ ~~[[Grace Wahba\|Wahba]] and extended by Friedman for MARS.~~ === Constraints ===

Multivariate adaptive regression spline: Difference between revisions