Normalization (machine learning): Difference between revisions

Content deleted Content added
Improve intro description/phrasing
Tags: Mobile edit Mobile web edit
Line 1:
{{Short description|RescalingMachine inputslearning to improve model trainingtechnique}}
{{Machine learning bar}}
In [[machine learning]], '''normalization''' is a statistical technique with various applications. There are mainly two main forms of normalization, namely ''data normalization'' and ''activation normalization''. Data normalization, (or [[feature scaling]], is a general technique in statistics, and it) includes methods that rescale input data so that theythe [[Feature (machine learning)|features]] have well-behavedthe same range, mean, variance, andor other statistical properties. ActivationFor instance, a popular choice of feature scaling method is [[Feature scaling#Rescaling (min-max normalization)|min-max normalization]], where each feature is specifictransformed to deephave learning,the andsame itrange includes(typically methods<math>[0,1]</math> thator rescale<math>[-1,1]</math>). This solves the activationproblem of hiddendifferent neuronsfeatures insidehaving avastly neuraldifferent scales, for example if one feature is measured in kilometers and another in networknanometers.
 
Activation normalization, on the other hand, is specific to [[deep learning]], and includes methods that rescale the activation of [[Hidden layer|hidden neurons]] inside [[Neural network (machine learning)|neural networks]].
Normalization is often used for faster training convergence, less sensitivity to variations in input data, less overfitting, and better generalization to unseen data. They are often theoretically justified as reducing covariance shift, smoother optimization landscapes, increasing [[Regularization (mathematics)|regularization]], though they are mainly justified by empirical success.<ref>{{Cite book |last=Huang |first=Lei |url=https://link.springer.com/10.1007/978-3-031-14595-7 |title=Normalization Techniques in Deep Learning |date=2022 |publisher=Springer International Publishing |isbn=978-3-031-14594-0 |series=Synthesis Lectures on Computer Vision |___location=Cham |language=en |doi=10.1007/978-3-031-14595-7}}</ref>
 
Normalization is often used to:
 
* increase the speed of training convergence,
* reduce sensitivity to variations and feature scales in input data,
* reduce [[overfitting]],
* and produce better model generalization to unseen data.
 
Normalization is often used for faster training convergence, less sensitivity to variations in input data, less overfitting, and better generalization to unseen data. Theytechniques are often theoretically justified as reducing covariance shift, smoothersmoothing optimization landscapes, and increasing [[Regularization (mathematics)|regularization]], though they are mainly justified by empirical success.<ref>{{Cite book |last=Huang |first=Lei |url=https://link.springer.com/10.1007/978-3-031-14595-7 |title=Normalization Techniques in Deep Learning |date=2022 |publisher=Springer International Publishing |isbn=978-3-031-14594-0 |series=Synthesis Lectures on Computer Vision |___location=Cham |language=en |doi=10.1007/978-3-031-14595-7}}</ref>
 
== Batch normalization ==