Content deleted Content added
clarified NumPy's autograd package. |
Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
||
Line 2:
{{Programming paradigms}}
'''Differentiable programming''' is a [[programming paradigm]] in which a numeric computer program can be [[Differentiation (mathematics)|differentiated]] throughout via [[automatic differentiation]].<ref name="izzo2016_dCGP">{{cite book|last1=Izzo|first1=Dario|last2=Biscani|first2=Francesco|last3=Mereta|first3=Alessio|title=Genetic Programming |chapter=Differentiable Genetic Programming |series=Lecture Notes in Computer Science |year=2017|volume=18|pages=35–51|doi=10.1007/978-3-319-55696-3_3 |arxiv=1611.04766 |isbn=978-3-319-55695-6 |s2cid=17786263 |chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-55696-3_3}}</ref><ref name="baydin2018automatic">{{cite journal|last1=Baydin|first1=Atilim Gunes|last2=Pearlmutter|first2=Barak|last3=Radul|first3=Alexey Andreyevich|last4=Siskind|first4=Jeffrey|title=Automatic differentiation in machine learning: a survey|journal=Journal of Machine Learning Research |year=2018 |volume=18|pages=1–43|url=http://jmlr.org/papers/v18/17-468.html}}</ref><ref>{{cite book |last1=Wang |first1=Fei |chapter=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming |date=2018 |chapter-url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf |title=NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems |pages=10201–12 |access-date=2019-02-13 |last2=Decker |first2=James |last3=Wu |first3=Xilun |last4=Essertel |first4=Gregory |last5=Rompf |first5=Tiark |editor-last=Bengio |editor-first=S. |editor2-last=Wallach |editor2-first=H. |editor3-last=Larochelle |editor3-first=H. |editor4-last=Grauman |editor4-first=K |publisher=Curran Associates |url=https://dl.acm.org/doi/proceedings/10.5555/3327546 |ref={{harvid|NIPS'18}} }}</ref><ref name="innes">{{Cite journal|last=Innes|first=Mike|date=2018|title=On Machine Learning and Programming Languages|url=http://www.sysml.cc/doc/2018/37.pdf|journal=SysML Conference 2018|access-date=2019-07-04|archive-date=2019-07-17|archive-url=https://web.archive.org/web/20190717211700/http://www.sysml.cc/doc/2018/37.pdf|url-status=dead}}</ref><ref name="diffprog-zygote">{{Citation|date=2019|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing|arxiv=1907.07587|last1=Innes|first1=Mike|last2=Edelman|first2=Alan|last3=Fischer|first3=Keno|last4=Rackauckas|first4=Chris|last5=Saba|first5=Elliot|author6=Viral B Shah|last7=Tebbutt|first7=Will}}</ref> This allows for [[Gradient method|gradient-based optimization]] of parameters in the program, often via [[gradient descent]], as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly [[scientific computing]] and [[artificial intelligence]].<ref name="diffprog-zygote" /> One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the [[Advanced Concepts Team]] at the [[European Space Agency]] in early 2016.<ref name="differential intelligence">{{Cite web|url=https://www.esa.int/gsp/ACT/projects/differential_intelligence/|title=Differential Intelligence|date=October 2016 |access-date=2022-10-19}}</ref>
==Approaches==
Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arXiv|last1=Innes|first1=Michael|last2=Saba|first2=Elliot|last3=Fischer|first3=Keno|last4=Gandhi|first4=Dhairya|last5=Rudilosso|first5=Marco Concetto|last6=Joy|first6=Neethu Mariya|last7=Karmali|first7=Tejan|last8=Pal|first8=Avik|last9=Shah|first9=Viral|date=2018-10-31|title=Fashionable Modelling with Flux|eprint=1811.01457|class=cs.PL}}</ref> Attempts generally fall into two groups:
* ''' Static, [[compiled]] graph'''-based approaches such as [[TensorFlow]],<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref> [[Theano (software)|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /> A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.<ref>{{cite book |last1=Merriënboer |first1=Bart van |last2=Breuleux |first2=Olivier |last3=Bergeron |first3=Arnaud |last4=Lamblin |first4=Pascal |chapter=Automatic differentiation in ML: where we are and where we should be going |title={{harvnb|NIPS'18}} |date=3 December 2018 |volume=31 |pages=8771–81 |chapter-url = https://papers.nips.cc/paper/2018/hash/770f8e448d07586afbf77bb59f698587-Abstract.html}}</ref><ref name="myia1">{{Cite web |last1=Breuleux |first1=O. |last2=van Merriënboer |first2=B. |date=2017 |url=https://www.sysml.cc/doc/2018/39.pdf |title=Automatic Differentiation in Myia |access-date=2019-06-24 |archive-date=2019-06-24 |archive-url=https://web.archive.org/web/20190624180156/https://www.sysml.cc/doc/2018/39.pdf |url-status=dead }}</ref><ref name="pytorchtut">{{Cite web|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html |title=TensorFlow: Static Graphs |work=Tutorials: Learning PyTorch |publisher=PyTorch.org |access-date=2019-03-04}}</ref>
* '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[NumPy]]'s autograd package. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /> A package for the [[Julia (programming language)|Julia]] programming language{{snd}} [https://github.com/FluxML/Zygote.jl Zygote]{{snd}} works directly on Julia's [[intermediate representation]], allowing it to still be [[compiler optimization|optimized]] by Julia's just-in-time compiler.<ref name="flux" /><ref>{{cite arXiv|last=Innes|first=Michael|date=2018-10-18|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs|eprint=1810.07951|class=cs.PL}}</ref><ref name="diffprog-zygote" />
|