Pruning (artificial neural network): Difference between revisions

Content deleted Content added
Rawwbots (talk | contribs)
Elaborated further of the subject adding sections on types of pruning.
Tags: Reverted Visual edit
 
(25 intermediate revisions by 19 users not shown)
Line 1:
{{Short description|Trimming artificial neural networks to reduce computational overhead}}
{{Multiple issues|
{{other uses|Pruning (disambiguation)}}
{{Underlinked|date=August 2020}}
In [[deep learning]], '''pruning''' is the practice of removing [[parameter]]s from an existing [[Neural network (machine learning)|artificial neural network]].<ref>{{cite arXiv|last1=Blalock|first1=Davis|last2=Ortiz|first2=Jose Javier Gonzalez|last3=Frankle|first3=Jonathan|last4=Guttag|first4=John|date=2020-03-06|title=What is the State of Neural Network Pruning?|class=cs.LG|eprint=2003.03033}}</ref> The goal of this process is to reduce the size (parameter count) of the neural network (and therefore the [[computational resource]]s required to run it) whilst maintaining accuracy. This can be compared to the biological process of [[synaptic pruning]] which takes place in [[Mammal|mammalian]] brains during development.<ref>{{Cite journal |last1=Chechik |first1=Gal |last2=Meilijson |first2=Isaac |last3=Ruppin |first3=Eytan |date=October 1998 |title=Synaptic Pruning in Development: A Computational Account |url=https://ieeexplore.ieee.org/document/6790725 |journal=Neural Computation |volume=10 |issue=7 |pages=1759–1777 |doi=10.1162/089976698300017124 |pmid=9744896 |s2cid=14629275 |issn=0899-7667|url-access=subscription }}</ref>
{{Orphan|date=June 2020}}
}}
 
== Node (neuron) pruning ==
In the context of [[artificial neural network]], '''pruning''' is the practice of removing parameters (which may entail removing individual parameters, or parameters in groups such as by [[artificial neurons|neurons]]) from an existing network.<ref>{{cite arxiv|last1=Blalock|first1=Davis|last2=Ortiz|first2=Jose Javier Gonzalez|last3=Frankle|first3=Jonathan|last4=Guttag|first4=John|date=2020-03-06|title=What is the State of Neural Network Pruning?|class=cs.LG|eprint=2003.03033}}</ref> The goal of this process is to maintain accuracy of the network while increasing its efficiency. This can be done to reduce the computational resources required to run the [[neural network]]. After a network is trained to a desired solution with the training data, units (hidden layer nodes or interconnections) are analysed to determine which are not contributing to the solution. There are several approaches described in the literature to determine non-contributing units. A widely used approach is that non-essential units can be determined by a form of sensitivity analysis.So far, few studies have been performed in the analysis of pruning algorithms for classification of remotely-sensed data.
A basic algorithm for pruning is as follows:<ref>Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). ''Pruning convolutional neural networks for resource efficient inference''. arXiv preprint arXiv:1611.06440.</ref><ref>[https://jacobgil.github.io/deeplearning/pruning{{Cite web |last=Gildenblat |first=Jacob |date=2017-deep06-learning23 |title=Pruning deep neural networks to make them fast and small] |url=http://jacobgil.github.io/deeplearning/pruning-deep-learning |access-date=2024-02-04 |website=Github |language=en}}</ref>
 
A basic algorithm for pruning is as follows:<ref>Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). ''Pruning convolutional neural networks for resource efficient inference''. arXiv preprint arXiv:1611.06440.</ref><ref>[https://jacobgil.github.io/deeplearning/pruning-deep-learning Pruning deep neural networks to make them fast and small].</ref>
#Evaluate the importance of each neuron.
#Rank the [[neurons]] according to their importance (assuming there is a clearly defined measure for "importance").
#Remove the least important neuron.
#Check a termination condition (to be determined by the user) to see whether to continue pruning.
 
== TypesEdge (weight) pruning ==
Most work on neural network pruning focuses on removing weights, namely, setting their values to zero. Early work suggested to also change the values of non-pruned weights.<ref>{{Cite journal |last1=Chechik |first1=Gal |last2=Meilijson |first2=Isaac |last3=Ruppin |first3=Eytan |date=April 2001 |title=Effective Neuronal Learning with Ineffective Hebbian Learning Rules |url=https://ieeexplore.ieee.org/document/6789989 |journal=Neural Computation |volume=13 |issue=4 |pages=817–840 |doi=10.1162/089976601300014367 |pmid=11255571 |s2cid=133186 |issn=0899-7667|url-access=subscription }}</ref>
 
=== Magnitude based Pruning ===
The [[Magnitude Based pruning technique]] (MB) is the simplest pruning [[algorithm]] . It is based on deleting interconnections with small ‘[[Salience (neuroscience)|saliency]]’, i.e. those whose deletion will have the least effect on the training error. Saliency corresponds to the magnitude (weight) value of the [[interconnections]]. It is assumed that the interconnections whose magnitude value is small will have minor effect on the performance of the network. After reasonable initial training, the interconnection having the smallest magnitude value is removed. The network is then retrained and the process is repeated in an iterative fashion until the training error reaches a certain limit.
 
=== OptimumSee Brainalso Damage ===
* [[Knowledge distillation]]
The [https://nyuscholars.nyu.edu/en/publications/optimal-brain-damage Optimum Brain Damage pruning algorithm] (OBD), introduced by [http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf Le Cun, Denker and Solla] in 1990, is based on second order derivatives of the error function. The aim is to iteratively delete the weights whose deletion will result in the least increase of the error in the network. There is an important problem in the estimation of formula, is the size of the Hessian matrix. The calculation of the Hessian matrix is time-consuming, hence Le Cun et al. (1990) assume that the [[Hessian]] is diagonal. On the other hand, Hassibi and Stork (1993) argue that Hessian for every problem they have considered are strongly non-diagonal, and this leads OBD to eliminate the wrong weights
* [[Neural Darwinism]]
 
== References ==
Line 26 ⟶ 23:
 
 
{{Compudeep-ailearning-stub}}