Content deleted Content added
DavidCBryant (talk | contribs) |
|||
(20 intermediate revisions by 16 users not shown) | |||
Line 1:
{{otheruses|Minimisation (disambiguation){{!}}Minimisation}}
'''Structural
In practical terms, Structural Risk Minimization is implemented by minimizing <math>E_{train} + \beta H(W)</math>, where <math>E_{train}</math> is the train error, the function <math>H(W)</math> is called a regularization function, and <math>\beta</math> is a constant. <math>H(W)</math> is chosen such that it takes large values on parameters <math>W</math> that belong to high-capacity subsets of the parameter space. Minimizing <math>H(W)</math> in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error.<ref>{{Cite web|url=http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf|title=Gradient-Based Learning Applied to Document Recognition|last=LeCun|first=Yann|date=|website=|archive-url=|archive-date=|access-date=}}</ref>
The SRM problem can be formulated in terms of data. Given n data points consisting of data x and labels y, the objective <math>J(\theta)</math> is often expressed in the following manner:
==See also==▼
<math>J(\theta) = \frac{1}{2n} \sum_{i=1}^{n}(h_{\theta}(x^i) - y^i)^2 + \frac{\lambda}{2} \sum_{j=1}^{d} \theta_j^2 </math>
* [[Support Vector Machines]]▼
The first term is the mean squared error (MSE) term between the value of the learned model, <math>h_{\theta}</math>, and the given labels <math>y</math>. This term is the training error, <math>E_{train}</math>, that was discussed earlier. The second term, places a prior over the weights, to favor sparsity and penalize larger weights. The trade-off coefficient, <math>\lambda</math>, is a hyperparameter that places more or less importance on the regularization term. Larger <math>\lambda</math> encourages sparser weights at the expense of a more optimal MSE, and smaller <math>\lambda</math> relaxes regularization allowing the model to fit to data. Note that as <math>\lambda \to \infty</math> the weights become zero, and as <math>\lambda \to 0</math>, the model typically suffers from overfitting.
==External links==▼
▲==See also==
* [[Vapnik–Chervonenkis theory]]
* [[Model selection]]
* [[Occam Learning]]
* [[Empirical risk minimization]]
* [[Ridge regression]]
* [[Regularization (mathematics)]]
==References==
{{Reflist}}
▲==External links==
* [http://www.svms.org/srm/ Structural risk minimization] at the support vector machines website.
[[Category:Machine learning]]
{{
|