Mathematics of neural networks in machine learning: Difference between revisions

Content deleted Content added
Algorithm: Adding wikilinks
1blulere (talk | contribs)
m Update main template with correct article name; if anyone else wishes to normalise 'ANN' -> 'neural network' throughout the article, feel free to Do so
 
(8 intermediate revisions by 8 users not shown)
Line 1:
{{MainShort description|ArtificialType neuralof network}}
{{Main|Neural network (machine learning)}}
 
An '''artificial neural network''' (ANN) or '''neural network''' combines biological principles with advanced statistics to solve problems in domains such as [[pattern recognition]] and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways.
 
== Structure ==
Line 23 ⟶ 24:
 
=== Propagation function ===
The ''propagation function'' computes the ''input'' <math>p_j(t)</math> to the neuron <math>j</math> from the outputs <math>o_i(t)</math>and typically has the form<ref name="Zell1994ch5.22">{{Cite book|title=Simulation neuronaler Netze|last=Zell|first=Andreas|date=2003|publisher=Addison-Wesley|isbn=978-3-89319-554-1|edition=1st|language=German|trans-title=Simulation of Neural Networks|chapter=chapter 5.2|oclc=249017987}}<"/ref>
 
: <math> p_j(t) = \sum_i o_i(t) w_{ij}. </math>
 
=== Bias ===
A bias term can be added, changing the form to the following:<ref name="DAWSON1998">{{cite journal|last1=DAWSON|first1=CHRISTIAN W|year=1998|title=An artificial neural network approach to rainfall-runoff modelling|journal=Hydrological Sciences Journal|volume=43|issue=1|pages=47–66|doi=10.1080/02626669809492102|doi-access=free|bibcode=1998HydSJ..43...47D }}</ref>
 
: <math> p_j(t) = \sum_i o_i(t) w_{ij}+ w_{0j}, </math> where <math>w_{0j}</math> is a bias.
 
== Neural networks as functions ==
{{See also|Graphical models}}
{{See also|Graphical models}}Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision) <math>\textstyle f : X \rightarrow Y </math> or a distribution over <math>\textstyle X</math> or both <math>\textstyle X</math> and <math>\textstyle Y</math>. Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a ''class'' of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity).
 
{{See also|Graphical models}}Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision) <math>\textstyle f : X \rightarrow Y </math> or a distribution over <math>\textstyle X</math> or both <math>\textstyle X</math> and <math>\textstyle Y</math>. Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a ''class'' of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity).
 
Mathematically, a neuron's network function <math>\textstyle f(x)</math> is defined as a composition of other functions <math>\textstyle g_i(x)</math>, that can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. A widely used type of composition is the ''nonlinear weighted sum'', where <math>\textstyle f (x) = K \left(\sum_i w_i g_i(x)\right) </math>, where <math>\textstyle K</math> (commonly referred to as the [[activation function]]<ref>{{Cite web|url=http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|title=The Machine Learning Dictionary|website=www.cse.unsw.edu.au|access-date=2019-08-18|archive-url=https://web.archive.org/web/20180826151959/http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn|archive-date=2018-08-26|url-status=dead}}</ref>) is some predefined function, such as the [[Hyperbolic function#Standard analytic expressions|hyperbolic tangent]], [[sigmoid function]], [[softmax function]], or [[ReLU|rectifier function]]. The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output. The following refers to a collection of functions <math>\textstyle g_i</math> as a [[Vector (mathematics and physics)|vector]] <math>\textstyle g = (g_1, g_2, \ldots, g_n)</math>.
Line 94 ⟶ 97:
[[Pseudocode]] for a [[stochastic gradient descent]] algorithm for training a three-layer network (one hidden layer):
 
initialize network weights (often small random values).
'''do'''
'''for each''' training example named ex '''do'''
Line 111 ⟶ 114:
{{Reflist}}
 
{{Mathematics of}}
 
[[Category:Computational statistics]]
[[Category:Artificial neural networks| ]]