Revision as of 17:19, 13 November 2023 edit Digital27 (talk \| contribs) 353 edits clarified NumPy's autograd package. Tag: Visual edit: Switched ← Previous edit		Revision as of 03:29, 1 February 2024 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,683,478 edits Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.9.5 Next edit →
Line 2: {{Programming paradigms}} '''Differentiable programming''' is a [[programming paradigm]] in which a numeric computer program can be [[Differentiation (mathematics)\|differentiated]] throughout via [[automatic differentiation]].<ref name="izzo2016_dCGP">{{cite book\|last1=Izzo\|first1=Dario\|last2=Biscani\|first2=Francesco\|last3=Mereta\|first3=Alessio\|title=Genetic Programming \|chapter=Differentiable Genetic Programming \|series=Lecture Notes in Computer Science \|year=2017\|volume=18\|pages=35–51\|doi=10.1007/978-3-319-55696-3_3 \|arxiv=1611.04766 \|isbn=978-3-319-55695-6 \|s2cid=17786263 \|chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-55696-3_3}}</ref><ref name="baydin2018automatic">{{cite journal\|last1=Baydin\|first1=Atilim Gunes\|last2=Pearlmutter\|first2=Barak\|last3=Radul\|first3=Alexey Andreyevich\|last4=Siskind\|first4=Jeffrey\|title=Automatic differentiation in machine learning: a survey\|journal=Journal of Machine Learning Research \|year=2018 \|volume=18\|pages=1–43\|url=http://jmlr.org/papers/v18/17-468.html}}</ref><ref>{{cite book \|last1=Wang \|first1=Fei \|chapter=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming \|date=2018 \|chapter-url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf \|title=NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems \|pages=10201–12 \|access-date=2019-02-13 \|last2=Decker \|first2=James \|last3=Wu \|first3=Xilun \|last4=Essertel \|first4=Gregory \|last5=Rompf \|first5=Tiark \|editor-last=Bengio \|editor-first=S. \|editor2-last=Wallach \|editor2-first=H. \|editor3-last=Larochelle \|editor3-first=H. \|editor4-last=Grauman \|editor4-first=K \|publisher=Curran Associates \|url=https://dl.acm.org/doi/proceedings/10.5555/3327546 \|ref={{harvid\|NIPS'18}} }}</ref><ref name="innes">{{Cite journal\|last=Innes\|first=Mike\|date=2018\|title=On Machine Learning and Programming Languages\|url=http://www.sysml.cc/doc/2018/37.pdf\|journal=SysML Conference 2018\|access-date=2019-07-04\|archive-date=2019-07-17\|archive-url=https://web.archive.org/web/20190717211700/http://www.sysml.cc/doc/2018/37.pdf\|url-status=dead}}</ref><ref name="diffprog-zygote">{{Citation\|date=2019\|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing\|arxiv=1907.07587\|last1=Innes\|first1=Mike\|last2=Edelman\|first2=Alan\|last3=Fischer\|first3=Keno\|last4=Rackauckas\|first4=Chris\|last5=Saba\|first5=Elliot\|author6=Viral B Shah\|last7=Tebbutt\|first7=Will}}</ref> This allows for [[Gradient method\|gradient-based optimization]] of parameters in the program, often via [[gradient descent]], as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly [[scientific computing]] and [[artificial intelligence]].<ref name="diffprog-zygote" /> One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the [[Advanced Concepts Team]] at the [[European Space Agency]] in early 2016.<ref name="differential intelligence">{{Cite web\|url=https://www.esa.int/gsp/ACT/projects/differential_intelligence/\|title=Differential Intelligence\|date=October 2016 \|access-date=2022-10-19}}</ref> ==Approaches== Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arXiv\|last1=Innes\|first1=Michael\|last2=Saba\|first2=Elliot\|last3=Fischer\|first3=Keno\|last4=Gandhi\|first4=Dhairya\|last5=Rudilosso\|first5=Marco Concetto\|last6=Joy\|first6=Neethu Mariya\|last7=Karmali\|first7=Tejan\|last8=Pal\|first8=Avik\|last9=Shah\|first9=Viral\|date=2018-10-31\|title=Fashionable Modelling with Flux\|eprint=1811.01457\|class=cs.PL}}</ref> Attempts generally fall into two groups: * ''' Static, [[compiled]] graph'''-based approaches such as [[TensorFlow]],<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref> [[Theano (software)\|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)\|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /> A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.<ref>{{cite book \|last1=Merriënboer \|first1=Bart van \|last2=Breuleux \|first2=Olivier \|last3=Bergeron \|first3=Arnaud \|last4=Lamblin \|first4=Pascal \|chapter=Automatic differentiation in ML: where we are and where we should be going \|title={{harvnb\|NIPS'18}} \|date=3 December 2018 \|volume=31 \|pages=8771–81 \|chapter-url = https://papers.nips.cc/paper/2018/hash/770f8e448d07586afbf77bb59f698587-Abstract.html}}</ref><ref name="myia1">{{Cite web \|last1=Breuleux \|first1=O. \|last2=van Merriënboer \|first2=B. \|date=2017 \|url=https://www.sysml.cc/doc/2018/39.pdf \|title=Automatic Differentiation in Myia \|access-date=2019-06-24 \|archive-date=2019-06-24 \|archive-url=https://web.archive.org/web/20190624180156/https://www.sysml.cc/doc/2018/39.pdf \|url-status=dead }}</ref><ref name="pytorchtut">{{Cite web\|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html \|title=TensorFlow: Static Graphs \|work=Tutorials: Learning PyTorch \|publisher=PyTorch.org \|access-date=2019-03-04}}</ref> * '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[NumPy]]'s autograd package. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)\|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /> A package for the [[Julia (programming language)\|Julia]] programming language{{snd}} [https://github.com/FluxML/Zygote.jl Zygote]{{snd}} works directly on Julia's [[intermediate representation]], allowing it to still be [[compiler optimization\|optimized]] by Julia's just-in-time compiler.<ref name="flux" /><ref>{{cite arXiv\|last=Innes\|first=Michael\|date=2018-10-18\|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs\|eprint=1810.07951\|class=cs.PL}}</ref><ref name="diffprog-zygote" />

Differentiable programming: Difference between revisions