Revision as of 03:46, 15 November 2022 edit RDBrown (talk \| contribs) Extended confirmed users 15,902 edits →Applications: →cite journal \| Add: s2cid. \| Use this tool. Report bugs. \| #UCB_Gadget ← Previous edit		Revision as of 09:43, 15 November 2022 edit undo RDBrown (talk \| contribs) Extended confirmed users 15,902 edits Tweak cites Next edit →
Line 3: '''Differentiable programming''' is a [[programming paradigm]] in which a numeric computer program can be [[Differentiation (mathematics)\|differentiated]] throughout via [[automatic differentiation]].<ref name="izzo2016_dCGP">{{cite journal\|last1=Izzo\|first1=Dario\|last2=Biscani\|first2=Francesco\|last3=Mereta\|first3=Alessio\|title=Differentiable genetic programming\|journal=European Conference on Genetic Programming (EuroGP)\|series=Lecture Notes in Computer Science \|year=2017\|volume=18\|pages=35–51\|doi=10.1007/978-3-319-55696-3_3 \|arxiv=1611.04766 \|isbn=978-3-319-55695-6 \|s2cid=17786263 \|url=https://link.springer.com/chapter/10.1007/978-3-319-55696-3_3}}</ref> <ref name="baydin2018automatic">{{cite journal\|last1=Baydin\|first1=Atilim Gunes\|last2=Pearlmutter\|first2=Barak\|last3=Radul\|first3=Alexey Andreyevich\|last4=Siskind\|first4=Jeffrey\|title=Automatic differentiation in machine learning: a survey\|journal=Journal of Machine Learning Research \|year=2018 \|volume=18\|pages=1–43\|url=http://jmlr.org/papers/v18/17-468.html}}</ref><ref>{{~~Citation~~cite book \|last1=Wang \|first1=Fei \|~~title~~chapter=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming \|date=2018 \|chapter-url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf \|~~work~~title=~~Advances~~NIPS'18: inProceedings of the 32nd International Conference on Neural Information Processing Systems 31\|pages=~~10201–10212\|editor-last=Bengio\|editor-first=S.\|publisher=Curran~~10201–12 ~~Associates, Inc.~~\|access-date=2019-02-13 \|last2=Decker \|first2=James \|last3=Wu \|first3=Xilun \|last4=Essertel \|first4=Gregory \|last5=Rompf \|first5=Tiark \|editor-last=Bengio \|editor-first=S. \|editor2-last=Wallach \|editor2-first=H. \|editor3-last=Larochelle \|editor3-first=H. \|editor4-last=Grauman \|editor4-first=K \|publisher=Curran Associates \|url=https://dl.acm.org/doi/proceedings/10.5555/3327546 \|ref={{harvid\|NIPS'18}} }}</ref><ref name="innes">{{Cite journal\|last=Innes\|first=Mike\|date=2018\|title=On Machine Learning and Programming Languages\|url=http://www.sysml.cc/doc/2018/37.pdf\|journal=SysML Conference 2018}}</ref><ref name="diffprog-zygote">{{Citation\|date=2019\|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing\|arxiv=1907.07587\|last1=Innes\|first1=Mike\|last2=Edelman\|first2=Alan\|last3=Fischer\|first3=Keno\|last4=Rackauckas\|first4=Chris\|last5=Saba\|first5=Elliot\|author6=Viral B Shah\|last7=Tebbutt\|first7=Will}}</ref> This allows for [[Gradient method\|gradient-based optimization]] of parameters in the program, often via [[gradient descent]], as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly [[scientific computing]] and [[artificial intelligence]].<ref name="diffprog-zygote" /> One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the [[Advanced Concepts Team]] at the [[European Space Agency]] in early 2016.<ref name="differential intelligence">{{Cite web\|url=https://www.esa.int/gsp/ACT/projects/differential_intelligence/\|title=Differential Intelligence\|date=October 2016 \|access-date=2022-10-19}}</ref> ==Approaches== Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arXiv\|last1=Innes\|first1=Michael\|last2=Saba\|first2=Elliot\|last3=Fischer\|first3=Keno\|last4=Gandhi\|first4=Dhairya\|last5=Rudilosso\|first5=Marco Concetto\|last6=Joy\|first6=Neethu Mariya\|last7=Karmali\|first7=Tejan\|last8=Pal\|first8=Avik\|last9=Shah\|first9=Viral\|date=2018-10-31\|title=Fashionable Modelling with Flux\|eprint=1811.01457\|class=cs.PL}}</ref> Attempts generally fall into two groups: * ''' Static, [[compiled]] graph'''-based approaches such as [[TensorFlow]],<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref> [[Theano (software)\|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)\|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /> A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.<ref>{{cite ~~journal~~book \|last1=Merriënboer \|first1=Bart van \|last2=Breuleux \|first2=Olivier \|last3=Bergeron \|first3=Arnaud \|last4=Lamblin \|first4=Pascal \|~~title~~chapter=Automatic differentiation in ML: where we are and where we should be going \|~~journal~~title=~~Proceedings of the 32nd International Conference on Neural Information Processing Systems~~{{harvnb\|NIPS'18}} \|date=3 December 2018 \|volume=31 \|pages=~~8771–8781~~8771–81 \|chapter-url = https://papers.nips.cc/paper/2018/hash/770f8e448d07586afbf77bb59f698587-Abstract.html}}</ref><ref name="myia1">{{Cite web \|last1=Breuleux \|first1=O. \|last2=van Merriënboer \|first2=B. \|date=2017 \|url=https://www.sysml.cc/doc/2018/39.pdf \|title=Automatic Differentiation in Myia \|access-date=2019-06-24}}</ref><ref name="pytorchtut">{{Cite web\|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html \|title=TensorFlow: Static Graphs \|work=Tutorials: Learning PyTorch \|publisher=PyTorch.org \|access-date=2019-03-04}}</ref> * '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[AutoGrad (NumPy)\|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)\|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /> A package for the [[Julia (programming language)\|Julia]] programming language{{snd}} [https://github.com/FluxML/Zygote.jl Zygote]{{snd}} works directly on Julia's [[intermediate representation]], allowing it to still be [[compiler optimization\|optimized]] by Julia's just-in-time compiler.<ref name="flux" /><ref>{{cite arXiv\|last=Innes\|first=Michael\|date=2018-10-18\|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs\|eprint=1810.07951\|class=cs.PL}}</ref><ref name="diffprog-zygote" />

Differentiable programming: Difference between revisions