Regolarizzazione

In matematica e statistica, particolarmente nei campi dell'apprendimento automatico e dei problemi inversi, la regolarizzazione implica l'introduzione di ulteriore informazione allo scopo di risolvere un problema mal condizionato o per prevenire l'overfitting. Tale informazione è solitamente nella forma di una penalità per complessità, tale come una restrizione su una funzione funzione liscia o una limitazione sulla norma di uno spazio vettoriale.

Una giustificazione teorica per la regolarizzazione è quella per cui essa costituisce un tentativo di imporre il rasoio di Occam alla soluzione. Da un punto di vista bayesiano, molte tecniche di regolarizzazione corrispondono ad imporre certe distribuzioni di probabilità a priori dei parametri del modello.

The same idea arose in many fields of science. For example, the least-squares method can be viewed as a very simple form of regularization. A simple form of regularization applied to integral equations, generally termed Tikhonov regularization after Andrey Nikolayevich Tikhonov, is essentially a trade-off between fitting the data and reducing a norm of the solution. More recently, non-linear regularization methods, including total variation regularization have become popular.

Regularization in statistics

In statistics and machine learning, regularization is used to prevent overfitting. Typical examples of regularization in statistical machine learning include ridge regression, lasso, and L²-norm in support vector machines.

Regularization methods are also used for model selection, where they work by implicitly or explicitly penalizing models based on the number of their parameters. For example, Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting not involving regularization include cross-validation.

Examples of applications of different methods of regularization to the linear model are:

Model	Fit measure	Entropy measure
AIC/BIC	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{0}$
Ridge regression	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{2}$
Lasso^[1]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{1}$
Basis pursuit denoising	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\beta \\|_{1}$
RLAD^[2]	$\\|Y-X\beta \\|_{1}$	$\\|\beta \\|_{1}$
Dantzig Selector^[3]	$\\|X^{\top }(Y-X\beta )\\|_{\infty }$	$\\|\beta \\|_{1}$

Notes

^ Robert Tibshirani, Regression Shrinkage and Selection via the Lasso (ps), in Journal of the Royal Statistical Society, Series B (Methodology), vol. 58, n. 1, 1996, pp. 267–288. URL consultato il 19 marzo 2009.
^ Li Wang, Michael D. Gordon & Ji Zhu, Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning, December 2006, pp. 690–700.
^ Emmanuel Candes, Tao, Terence, The Dantzig selector: Statistical estimation when p is much larger than n, in Annals of Statistics, vol. 35, n. 6, 2007, pp. 2313–2351, DOI:10.1214/009053606000001523.

References

A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636-666. Available in pdf from author's website.

[1] Robert Tibshirani, Regression Shrinkage and Selection via the Lasso (ps), in Journal of the Royal Statistical Society, Series B (Methodology), vol. 58, n. 1, 1996, pp. 267–288. URL consultato il 19 marzo 2009.

[2] Li Wang, Michael D. Gordon & Ji Zhu, Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning, December 2006, pp. 690–700.

[3] Emmanuel Candes, Tao, Terence, The Dantzig selector: Statistical estimation when p is much larger than n, in Annals of Statistics, vol. 35, n. 6, 2007, pp. 2313–2351, DOI:10.1214/009053606000001523.

[1]

[2]

[3]

Model	Fit measure	Entropy measure
AIC/BIC	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{0}$
Ridge regression	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{2}$
Lasso^[1]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{1}$
Basis pursuit denoising	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\beta \\|_{1}$
RLAD^[2]	$\\|Y-X\beta \\|_{1}$	$\\|\beta \\|_{1}$
Dantzig Selector^[3]	$\\|X^{\top }(Y-X\beta )\\|_{\infty }$	$\\|\beta \\|_{1}$

Utente:Rbattistin/Sandbox2

Indice

Regolarizzazione

Regularization in statistics

Notes

References