Structural risk minimization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 05:20, 27 September 2023 edit Citation bot (talk \| contribs) Bots 5,870,759 edits Removed parameters. \| Use this bot. Report bugs. \| #UCB_CommandLine ← Previous edit		Latest revision as of 19:42, 25 June 2025 edit undo Fadesga (talk \| contribs) Autopatrolled, Extended confirmed users 289,132 edits →External links
(2 intermediate revisions by 2 users not shown)
Line 1: {{otheruses\|Minimisation (disambiguation){{!}}Minimisation}} '''Structural risk minimization (SRM)''' is an inductive principle of use in [[machine learning]]. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of [[overfitting]] – the model becoming too strongly tailored to the particularities of the training set and generalizing poorly to new data. The SRM principle addresses this problem by balancing the model's complexity against its success at fitting the training data. This principle was first set out in a 1974 ~~paper~~book<ref>{{Cite book \|last=Vapnik \|first=V. N. \|title=Teoriya raspoznavaniya obrazov \|last2=Chervonenkis \|first2=A. Ya. \|publisher=Nauka, Moscow \|year=1974 \|language=Russian \|trans-title=Theory of Pattern Recognition}}</ref> by [[Vladimir Vapnik]] and [[Alexey Chervonenkis]] and uses the [[VC dimension]]. In practical terms, Structural Risk Minimization is implemented by minimizing <math>E_{train} + \beta H(W)</math>, where <math>E_{train}</math> is the train error, the function <math>H(W)</math> is called a regularization function, and <math>\beta</math> is a constant. <math>H(W)</math> is chosen such that it takes large values on parameters <math>W</math> that belong to high-capacity subsets of the parameter space. Minimizing <math>H(W)</math> in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error.<ref>{{Cite web\|url=http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf\|title=Gradient-Based Learning Applied to Document Recognition\|last=LeCun\|first=Yann\|date=\|website=\|archive-url=\|archive-date=\|access-date=}}</ref> Line 17: * [[Model selection]] * [[Occam Learning]] * [[Empirical risk minimization]] * [[Ridge regression]] * [[Regularization (mathematics)]] ==References== Line 28 ⟶ 30: {{~~compu~~machine-~~sci~~learning-stub}}