Differentiable programming: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:05, 29 March 2022 edit 73.223.72.104 (talk) →Approaches: grammar ← Previous edit		Latest revision as of 03:27, 18 August 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: doi updated in citation with #oabot.
(47 intermediate revisions by 33 users not shown)
Line 1: {{Short description\|Programming paradigm}} {{Machine learning}} {{Programming paradigms}}▼ '''Differentiable programming''' is a [[programming paradigm]] in which a numeric computer program can be [[Differentiation (mathematics)\|differentiated]] throughout via [[automatic differentiation]].<ref name="izzo2016_dCGP">{{cite book \|doi=10.1007/978-3-319-55696-3_3 \|chapter=Differentiable Genetic Programming \|title=Genetic Programming \|series=Lecture Notes in Computer Science \|date=2017 \|last1=Izzo \|first1=Dario \|last2=Biscani \|first2=Francesco \|last3=Mereta \|first3=Alessio \|volume=10196 \|pages=35–51 \|arxiv=1611.04766 \|isbn=978-3-319-55695-6 \|s2cid=17786263 }}</ref><ref name="baydin2018automatic">{{cite journal \|last1=Baydin \|first1=Atilim Gunes \|last2=Pearlmutter \|first2=Barak A. \|last3=Radul \|first3=Alexey Andreyevich \|last4=Siskind \|first4=Jeffrey Mark \|title=Automatic ~~differentiation~~Differentiation in ~~machine~~Machine ~~learning~~Learning: a ~~survey~~Survey \|journal=Journal of ~~Machine~~Marchine Learning Research \|~~year~~date=2018 \|volume=18 \|issue=153 \|pages=1–43 \|url=~~http~~https://jmlr.org/papers/v18/17-468.html }}</ref><ref>{{~~Citation~~cite book \|last1=Wang \|first1=Fei \|~~title~~chapter=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming \|date=2018 \|chapter-url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf \|~~work~~title=~~Advances~~NIPS'18: inProceedings of the 32nd International Conference on Neural Information Processing Systems 31\|pages=10201–10212~~\|editor-last=Bengio\|editor-first=S.\|publisher=Curran Associates,~~ ~~Inc.\|access-date=2019-02-13~~\|last2=Decker \|first2=James \|last3=Wu \|first3=Xilun \|last4=Essertel \|first4=Gregory \|last5=Rompf \|first5=Tiark \|editor-last=Bengio \|editor-first=S. \|editor2-last=Wallach \|editor2-first=H. \|editor3-last=Larochelle \|editor3-first=H. \|editor4-last=Grauman \|editor4-first=K. \|publisher=Curran Associates \|ref={{harvid\|NIPS'18}} }}</ref><ref name="innes">{{Cite journal\|last=Innes\|first=Mike\|date=2018\|title=On Machine Learning and Programming Languages\|url=http://www.sysml.cc/doc/2018/37.pdf\|journal=SysML Conference 2018\|access-date=2019-07-04\|archive-date=2019-07-17\|archive-url=https://web.archive.org/web/20190717211700/http://www.sysml.cc/doc/2018/37.pdf\|url-status=dead}}</ref><ref name="diffprog-zygote">{{~~Citation\|date=2019\|title=∂P:~~cite ~~A Differentiable Programming~~arXiv ~~System to Bridge Machine Learning and Scientific Computing~~\|~~arxiv~~eprint=1907.07587 \|last1=Innes \|first1=Mike \|last2=Edelman \|first2=Alan \|last3=Fischer \|first3=Keno \|last4=Rackauckas \|first4=Chris \|last5=Saba \|first5=Elliot \|author6=Viral B Shah \|last7=Tebbutt \|first7=Will \|title=A Differentiable Programming System to Bridge Machine Learning and Scientific Computing \|date=2019 \|class=cs.PL }}</ref> This allows for [[Gradient method\|gradient-based optimization]] of parameters in the program, often via [[gradient descent]], as well as other learning approaches that are based on higher-order derivative information. Differentiable programming has found use in a wide variety of areas, particularly [[scientific computing]] and [[~~artificial~~machine ~~intelligence~~learning]].<ref name="diffprog-zygote" /> One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the [[Advanced Concepts Team]] at the [[European Space Agency]] in early 2016.<ref name="differential intelligence">{{Cite web \|url=https://www.esa.int/gsp/ACT/projects/differential_intelligence/ \|title=Differential Intelligence \|date=October 2016 \|access-date=2022-10-19}}</ref>▼ ▲'''Differentiable programming''' is a [[programming paradigm]] in which a numeric computer program can be [[Differentiation (mathematics)\|differentiated]] throughout via [[automatic differentiation]].<ref name="baydin2018automatic">{{cite journal\|last1=Baydin\|first1=Atilim Gunes\|last2=Pearlmutter\|first2=Barak\|last3=Radul\|first3=Alexey Andreyevich\|last4=Siskind\|first4=Jeffrey\|title=Automatic differentiation in machine learning: a survey\|journal=Journal of Machine Learning Research\|year=2018\|volume=18\|pages=1–43\|url=http://jmlr.org/papers/v18/17-468.html}}</ref><ref>{{Citation\|last1=Wang\|first1=Fei\|title=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming\|date=2018\|url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf\|work=Advances in Neural Information Processing Systems 31\|pages=10201–10212\|editor-last=Bengio\|editor-first=S.\|publisher=Curran Associates, Inc.\|access-date=2019-02-13\|last2=Decker\|first2=James\|last3=Wu\|first3=Xilun\|last4=Essertel\|first4=Gregory\|last5=Rompf\|first5=Tiark\|editor2-last=Wallach\|editor2-first=H.\|editor3-last=Larochelle\|editor3-first=H.\|editor4-last=Grauman\|editor4-first=K.}}</ref><ref name="innes">{{Cite journal\|last=Innes\|first=Mike\|date=2018\|title=On Machine Learning and Programming Languages\|url=http://www.sysml.cc/doc/2018/37.pdf\|journal=SysML Conference 2018}}</ref><ref name="diffprog-zygote">{{Citation\|date=2019\|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing\|arxiv=1907.07587\|last1=Innes\|first1=Mike\|last2=Edelman\|first2=Alan\|last3=Fischer\|first3=Keno\|last4=Rackauckas\|first4=Chris\|last5=Saba\|first5=Elliot\|author6=Viral B Shah\|last7=Tebbutt\|first7=Will}}</ref> This allows for [[Gradient method\|gradient-based optimization]] of parameters in the program, often via [[gradient descent]]. Differentiable programming has found use in a wide variety of areas, particularly [[scientific computing]] and [[artificial intelligence]].<ref name="diffprog-zygote" /> ==Approaches== Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arXiv \|eprint=1811.01457 \|last1=Innes \|first1=Michael \|last2=Saba \|first2=Elliot \|last3=Fischer \|first3=Keno \|last4=Gandhi \|first4=Dhairya \|~~last5=Rudilosso\|first5~~author5=Marco Concetto Rudilosso \|~~last6=Joy\|first6~~author6=Neethu Mariya Joy \|last7=Karmali \|first7=Tejan \|last8=Pal \|first8=Avik \|last9=Shah \|first9=Viral~~\|date=2018-10-31~~ \|title=Fashionable Modelling with Flux \|~~eprint~~date=~~1811.01457~~2018 \|class=cs.PL }}</ref> ~~Earlier attempts~~Attempts generally fall into two groups: * ''' Static, [[compiled]] graph'''-based approaches such as [[TensorFlow]],<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref> [[Theano (software)\|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)\|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /> A proof-of-concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.<ref>{{cite book \|last1=Merriënboer \|first1=Bart van \|last2=Breuleux \|first2=Olivier \|last3=Bergeron \|first3=Arnaud \|last4=Lamblin \|first4=Pascal \|chapter=Automatic differentiation in ML: where we are and where we should be going \|title={{harvnb\|NIPS'18}} \|date=3 December 2018 \|volume=31 \|pages=8771–8781 \|chapter-url = https://papers.nips.cc/paper/2018/hash/770f8e448d07586afbf77bb59f698587-Abstract.html}}</ref><ref name="myia1">{{Cite web \|last1=Breuleux \|first1=O. \|last2=van Merriënboer \|first2=B. \|date=2017 \|url=https://www.sysml.cc/doc/2018/39.pdf \|title=Automatic Differentiation in Myia \|access-date=2019-06-24 \|archive-date=2019-06-24 \|archive-url=https://web.archive.org/web/20190624180156/https://www.sysml.cc/doc/2018/39.pdf \|url-status=dead }}</ref><ref name="pytorchtut">{{Cite web\|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html \|title=TensorFlow: Static Graphs \|work=Tutorials: Learning PyTorch \|publisher=PyTorch.org \|access-date=2019-03-04}}</ref>▼ * '''[[Operator overloading]], dynamic graph''' -based approaches such as [[PyTorch]] ~~and~~, [[~~AutoGrad (~~NumPy~~)\|AutoGrad~~]]'s [[autograd]] package, and [https://darioizzo.github.io/audi/ Pyaudi]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)\|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut~~" /><ref name="diffprog-zygote~~" />▼ The use of just-in-time compilation has emerged recently{{when\|date=April 2025}} as a possible solution to overcome some of the bottlenecks of interpreted languages. The C++ [https://bluescarni.github.io/heyoka/index.html heyoka] and Python package [https://bluescarni.github.io/heyoka.py/index.html heyoka.py] make large use of this technique to offer advanced differentiable programming capabilities (also at high orders). A package for the [[Julia (programming language)\|Julia]] programming language{{snd}} [https://github.com/FluxML/Zygote.jl Zygote]{{snd}} works directly on Julia's [[intermediate representation]].<ref name="flux" /><ref>{{cite arXiv \|eprint=1810.07951 \|last1=Innes \|first1=Michael \|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs \|date=2018 \|class=cs.PL }}</ref><ref name="diffprog-zygote" /> ▲* ''' Static, [[compiled]] graph'''-based approaches such as [[TensorFlow]],<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref> [[Theano (software)\|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)\|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /><ref name="myia1">{{Cite web\|url=https://www.sysml.cc/doc/2018/39.pdf\|title=Automatic Differentiation in Myia\|access-date=2019-06-24}}</ref><ref name="pytorchtut">{{Cite web\|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html\|title=TensorFlow: Static Graphs\|access-date=2019-03-04}}</ref> ~~Both~~A oflimitation ~~these~~of ~~early~~earlier approaches is that they are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. Newer approaches resolve this issue by constructing the graph from the language's syntax or IR, allowing arbitrary code to be differentiated.<ref name="flux" /><ref name="myia1" />▼ ▲* '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[AutoGrad (NumPy)\|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)\|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /><ref name="diffprog-zygote" /> ==Applications==▼ ▲Both of these early approaches are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. Differentiable programming has been applied in areas such as combining [[deep learning]] with [[physics engines]] in [[robotics]],<ref>{{cite arXiv \|eprint=1611.01652 \|last1=Degrave \|first1=Jonas \|last2=Hermans \|first2=Michiel \|last3=Dambre \|first3=Joni \|last4=wyffels \|first4=Francis \|title=A Differentiable Physics Engine for Deep Learning in Robotics \|date=2016 \|class=cs.NE }}</ref> solving [[Quantum chemistry#Electronic structure\|electronic-structure]] problems with differentiable [[density functional theory]],<ref name="Li2021">{{cite journal \|last1=Li \|first1=Li \|last2=Hoyer \|first2=Stephan \|last3=Pederson \|first3=Ryan \|last4=Sun \|first4=Ruoxi \|last5=Cubuk \|first5=Ekin D. \|last6=Riley \|first6=Patrick \|last7=Burke \|first7=Kieron \|year=2021 \|title=Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics \|journal=Physical Review Letters \|volume=126 \|issue=3 \|pages=036401 \|arxiv=2009.08551 \|bibcode=2021PhRvL.126c6401L \|doi=10.1103/PhysRevLett.126.036401 \|pmid=33543980 \|doi-access=free}}</ref> differentiable [[Ray tracing (graphics)\|ray tracing]],<ref>{{cite journal \|first1=Tzu-Mao \|last1=Li \|first2=Miika \|last2=Aittala \|first3=Frédo \|last3=Durand \|first4=Jaakko \|last4=Lehtinen \|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling \|journal=ACM Transactions on Graphics \|volume=37 \|issue=6 \|pages=222:1–11 \|date=2018 \|doi=10.1145/3272127.3275109 \|s2cid=52839714 \|url=https://people.csail.mit.edu/tzumao/diffrt/\|doi-access=free }}</ref> [[differentiable imaging]],<ref>{{Cite journal \|last=Chen \|first=Ni \|last2=Cao \|first2=Liangcai \|last3=Poon \|first3=Ting‐Chung \|last4=Lee \|first4=Byoungho \|last5=Lam \|first5=Edmund Y. \|title=Differentiable Imaging: A New Tool for Computational Optical Imaging \|url=https://onlinelibrary.wiley.com/doi/10.1002/apxr.202200118 \|journal=Advanced Physics Research \|language=en \|volume=2 \|issue=6 \|doi=10.1002/apxr.202200118 \|issn=2751-1200\|doi-access=free \|hdl=10754/686576 \|hdl-access=free }}</ref><ref>{{Cite journal \|last=Chen \|first=Ni \|last2=Brady \|first2=David J. \|last3=Lam \|first3=Edmund Y. \|date=2025-07-04 \|title=Differentiable Imaging: Progress, Challenges, and Outlook \|url=https://spj.science.org/doi/10.34133/adi.0117 \|journal=Advanced Devices & Instrumentation \|language=en \|volume=6 \|doi=10.34133/adi.0117 \|issn=2767-9713\|doi-access=free }}</ref> [[image processing]],<ref>{{cite journal \|first1=Tzu-Mao \|last1=Li \|first2=Michaël \|last2=Gharbi \|first3=Andrew \|last3=Adams \|first4=Frédo \|last4=Durand \|first5=Jonathan \|last5=Ragan-Kelley \|title=Differentiable Programming for Image Processing and Deep Learning in Halide \|journal=ACM Transactions on Graphics \|volume=37 \|issue=4 \|pages=139:1–13 \|date=August 2018 \|doi=10.1145/3197517.3201383 \|s2cid=46927588 \|url=https://cseweb.ucsd.edu/~tzli/gradient_halide \|doi-access=free \|hdl=1721.1/122623 \|hdl-access=free }}</ref> and [[probabilistic programming]].<ref name="diffprog-zygote"/> ==Multidisciplinary application== A more recent package for the [[Julia (programming language)\|Julia]] programming language{{snd}} [https://github.com/FluxML/Zygote.jl Zygote]{{snd}} resolves the issues that earlier attempts faced by treating the language's syntax as the graph. The [[intermediate representation]] of arbitrary code can then be differentiated directly, [[compiler optimization\|optimized]], and compiled.<ref name="flux" /><ref>{{cite arXiv\|last=Innes\|first=Michael\|date=2018-10-18\|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs\|eprint=1810.07951\|class=cs.PL}}</ref> Differentiable programming is making significant strides in various fields beyond its traditional applications. In healthcare and life sciences, for example, it is being used for deep learning in biophysics-based modelling of molecular mechanisms, in areas such as protein structure prediction and drug discovery. These applications demonstrate the potential of differentiable programming in contributing to significant advancements in understanding complex biological systems and improving healthcare solutions.<ref>{{cite journal \|last1=AlQuraishi \|first1=Mohammed \|last2=Sorger \|first2=Peter K. \|title=Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms \|journal=Nature Methods \|date=October 2021 \|volume=18 \|issue=10 \|pages=1169–1180 \|doi=10.1038/s41592-021-01283-4 \|pmid=34608321 \|pmc=8793939 }}</ref> A programming language "currently under development and is not yet ready for use" called [https://github.com/mila-iqia/myia Myia]<ref name="myia1" /> allows defining a model using a subset of [[Python (programming language)\|Python]], which is compiled to Myia. ▲==Applications== Differentiable programming has been applied in areas such as combining [[deep learning]] with [[physics engines]] in [[robotics]], solving electronic structure problems with differentiable [[density functional theory]], differentiable [[Ray tracing (graphics)\|ray tracing]], [[image processing]], and [[probabilistic programming]].<ref>{{cite arXiv\|last1=Degrave\|first1=Jonas\|last2=Hermans\|first2=Michiel\|last3=Dambre\|first3=Joni\|last4=wyffels\|first4=Francis\|date=2016-11-05\|title=A Differentiable Physics Engine for Deep Learning in Robotics\|eprint=1611.01652\|class=cs.NE}}</ref><ref name='Li2021'>{{cite journal \|title=Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics \|journal=Physical Review Letters \|year=2021 \|first1=Li \|last1=Li \| first2=Stephan \| last2=Hoyer \| first3=Ryan \| last3=Pederson \| first4=Ruoxi \| last4=Sun \| first5=Ekin D. \| last5=Cubuk \| first6=Patrick \| last6=Riley \|first7=Kieron \| last7=Burke \|volume=126 \|issue=3 \|pages=036401 \|doi=10.1103/PhysRevLett.126.036401 \|pmid=33543980 \|arxiv=2009.08551 \|bibcode=2021PhRvL.126c6401L \|doi-access=free}}</ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/diffrt/\|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref><ref>{{Cite web\|url=https://sciml.ai/roadmap/\|title=SciML Scientific Machine Learning Open Source Software Organization Roadmap\|website=sciml.ai\|access-date=2020-07-19}}</ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/gradient_halide/\|title=Differentiable Programming for Image Processing and Deep Learning in Halide\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref><ref name="diffprog-zygote"/> ==See also== Line 31 ⟶ 29: {{Differentiable computing}} ▲{{Programming paradigms navbox}} [[Category:Differential calculus]]