Structural risk minimization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:38, 17 April 2007 edit DavidCBryant (talk \| contribs) Extended confirmed users 3,040 edits m moved Structural Risk Minimization to Structural risk minimization: WP:NAME ← Previous edit		Latest revision as of 19:42, 25 June 2025 edit undo Fadesga (talk \| contribs) Autopatrolled, Extended confirmed users 289,132 edits →External links
(20 intermediate revisions by 16 users not shown)
Line 1: {{otheruses\|Minimisation (disambiguation){{!}}Minimisation}} '''Structural ~~Risk~~risk ~~Minimization'''~~minimization (SRM)''' is an inductive principle of use in [[machine learning]]. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of [[overfitting]] -– the model becoming too strongly tailored to the particularities of the training set and generalizing poorly to new data. The SRM principle ~~tries to counter~~addresses this problem by balancing the model's complexity against its success at fitting the training data. This principle was first set out in a 1974 book<ref>{{Cite book \|last=Vapnik \|first=V. N. \|title=Teoriya raspoznavaniya obrazov \|last2=Chervonenkis \|first2=A. Ya. \|publisher=Nauka, Moscow \|year=1974 \|language=Russian \|trans-title=Theory of Pattern Recognition}}</ref> by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]] and uses the [[VC dimension]]. In practical terms, Structural Risk Minimization is implemented by minimizing <math>E_{train} + \beta H(W)</math>, where <math>E_{train}</math> is the train error, the function <math>H(W)</math> is called a regularization function, and <math>\beta</math> is a constant. <math>H(W)</math> is chosen such that it takes large values on parameters <math>W</math> that belong to high-capacity subsets of the parameter space. Minimizing <math>H(W)</math> in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error.<ref>{{Cite web\|url=http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf\|title=Gradient-Based Learning Applied to Document Recognition\|last=LeCun\|first=Yann\|date=\|website=\|archive-url=\|archive-date=\|access-date=}}</ref> ~~SRM was first set out in a 1974 by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]].~~ The SRM problem can be formulated in terms of data. Given n data points consisting of data x and labels y, the objective <math>J(\theta)</math> is often expressed in the following manner: ==See also==▼ <math>J(\theta) = \frac{1}{2n} \sum_{i=1}^{n}(h_{\theta}(x^i) - y^i)^2 + \frac{\lambda}{2} \sum_{j=1}^{d} \theta_j^2 </math> * [[Vapnik-Chervonenkis theory]] * [[Support Vector Machines]]▼ The first term is the mean squared error (MSE) term between the value of the learned model, <math>h_{\theta}</math>, and the given labels <math>y</math>. This term is the training error, <math>E_{train}</math>, that was discussed earlier. The second term, places a prior over the weights, to favor sparsity and penalize larger weights. The trade-off coefficient, <math>\lambda</math>, is a hyperparameter that places more or less importance on the regularization term. Larger <math>\lambda</math> encourages sparser weights at the expense of a more optimal MSE, and smaller <math>\lambda</math> relaxes regularization allowing the model to fit to data. Note that as <math>\lambda \to \infty</math> the weights become zero, and as <math>\lambda \to 0</math>, the model typically suffers from overfitting. ==External links==▼ * http://www.svms.org/srm/ ▲==See also== * [[Vapnik–Chervonenkis theory]] ▲* [[Support ~~Vector~~vector ~~Machines~~machines]] * [[Model selection]] * [[Occam Learning]] * [[Empirical risk minimization]] * [[Ridge regression]] * [[Regularization (mathematics)]] ==References== {{Reflist}} ▲==External links== * [http://www.svms.org/srm/ Structural risk minimization] at the support vector machines website. [[Category:Machine learning]] {{~~Mathapplied~~machine-learning-stub}}