Mathematics of neural networks in machine learning: Difference between revisions

Content deleted Content added
m linking
Line 71:
This makes <math>w_1</math> the minimizing weight found by gradient descent.
 
== Learning pseudo-codepseudocode ==
To implement the algorithm above, explicit formulas are required for the gradient of the function <math>w \mapsto E(f_N(w, x), y)</math> where the function is <math>E(y,y')= |y-y'|^2</math>.
 
Line 95:
=== Pseudocode ===
[[Pseudocode]] for a [[stochastic gradient descent]] algorithm for training a three-layer network (one hidden layer):
 
initialize network weights (often small random values)
'''do'''
'''forEachfor each''' training example named ex '''do'''
prediction = <u>neural-net-output</u>(network, ex) ''// forward pass''
actual = <u>teacher-output</u>(ex)
Line 104 ⟶ 105:
{{nowrap|compute <math>\Delta w_i</math> for all weights from input layer to hidden layer}} ''// backward pass continued''
update network weights ''// input layer not modified by error estimate''
'''until''' error rate becomes acceptably low
'''return''' the network
 
The lines labeled "backward pass" can be implemented using the backpropagation algorithm, which calculates the gradient of the error of the network regarding the network's modifiable weights.<ref>Werbos, Paul J. (1994). ''The Roots of Backpropagation''. From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY: John Wiley & Sons, Inc.</ref>