[[File:Gradient descent Hamiltonian Monte Carlo comparison.gif|thumb|upright=0.9|Gradient descent vs Monte carloCarlo]]
* First, we present the pseudocode for the ADAM algorithm as follows:<ref name="Adam2014">{{cite arXiv |first1=Diederik |last1=Kingma |first2=Jimmy |last2=Ba |eprint=1412.6980 |title=Adam: A Method for Stochastic Optimization |year=2014 |class=cs.LG }}</ref>
===Adam optimizer===
===Adam<ref name="Adam2014">{{cite arXiv |first1=Diederik |last1=Kingma |first2=Jimmy |last2=Ba |eprint=1412.6980 |title=Adam: A Method for Stochastic Optimization |year=2014 |class=cs.LG }}</ref> (short for Adaptive Moment Estimation) algorithm===
This function implements the Adam<ref optimizationname="Adam2014">{{cite arXiv |eprint=1412.6980 |class=cs.LG |first1=Diederik |last1=Kingma |first2=Jimmy |last2=Ba |title=Adam: A Method for Stochastic Optimization |year=2014}}</ref> algorithm for minimizing the target function <math>\mathcal{G}(\theta)</math>.
* With the ADAM algorithm described above, we now present the pseudocode corresponding to a multilayer feedforward neural network:
===Backpropagation algorithm===
===Backpropagation algorithm<ref name="DLhistory">{{cite arXiv |eprint=2212.11279 |class=cs.NE |first=Juergen |last=Schmidhuber |author-link=Juergen Schmidhuber |title=Annotated History of Modern AI and Deep Learning |date=2022}}</ref> for multilayer feedforward neural networks===
This function implements the backpropagation algorithm for training a multi-layer feedforward neural network.