Differentiable programming: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
m Add: class, eprint. Removed parameters. | You can use this bot yourself. Report bugs here. | Headbomb
Add Myia + rephrase stuff.
Line 2:
 
'''Differentiable programming''' is a [[programming paradigm]] in which the programs can be [[Differentiation (mathematics)|differentiated]] throughout, usually via [[automatic differentiation]].<ref>{{Citation|last=Wang|first=Fei|title=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming|date=2018|url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf|work=Advances in Neural Information Processing Systems 31|pages=10201–10212|editor-last=Bengio|editor-first=S.|publisher=Curran Associates, Inc.|access-date=2019-02-13|last2=Decker|first2=James|last3=Wu|first3=Xilun|last4=Essertel|first4=Gregory|last5=Rompf|first5=Tiark|editor2-last=Wallach|editor2-first=H.|editor3-last=Larochelle|editor3-first=H.|editor4-last=Grauman|editor4-first=K.}}</ref><ref name="innes">{{Cite journal|last=Innes|first=Mike|date=2018|title=On Machine Learning and Programming Languages|url=http://www.sysml.cc/doc/37.pdf|journal=SysML Conference 2018|volume=|pages=|via=}}</ref> This allows for [[Gradient method|gradient based optimization]] of parameters in the program, often via [[gradient descent]]. Differentiable programming has found use in areas such as combining [[deep learning]] with [[physics engines]] in [[robotics]], differentiable [[Ray tracing (graphics)|ray tracing]], and [[image processing]].<ref>{{cite arxiv|last=Degrave|first=Jonas|last2=Hermans|first2=Michiel|last3=Dambre|first3=Joni|last4=wyffels|first4=Francis|date=2016-11-05|title=A Differentiable Physics Engine for Deep Learning in Robotics|eprint=1611.01652|class=cs.NE}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/diffrt/|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling|website=people.csail.mit.edu|access-date=2019-02-13}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/gradient_halide/|title=Differentiable Programming for Image Processing and Deep Learning in Halide|website=people.csail.mit.edu|access-date=2019-02-13}}</ref>
== Approaches ==
Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arxiv|last=Innes|first=Michael|last2=Saba|first2=Elliot|last3=Fischer|first3=Keno|last4=Gandhi|first4=Dhairya|last5=Rudilosso|first5=Marco Concetto|last6=Joy|first6=Neethu Mariya|last7=Karmali|first7=Tejan|last8=Pal|first8=Avik|last9=Shah|first9=Viral|date=2018-10-31|title=Fashionable Modelling with Flux|eprint=1811.01457|class=cs.PL}}</ref> Earlier attempts generally fall into two groups:
 
* ''' Static [[compiled]] graph based''' approaches such as [[TensorFlow]], [[Theano]], and [[MXNet]]. They tend to allow for good compiler optimization and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving loops or recursion), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /><ref name="myia1">{{Cite web|url=https://github.com/mila-iqia/myia/blob/master/README.rst|title=Myia|access-date=2019-03-04}}</ref>
Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arxiv|last=Innes|first=Michael|last2=Saba|first2=Elliot|last3=Fischer|first3=Keno|last4=Gandhi|first4=Dhairya|last5=Rudilosso|first5=Marco Concetto|last6=Joy|first6=Neethu Mariya|last7=Karmali|first7=Tejan|last8=Pal|first8=Avik|last9=Shah|first9=Viral|date=2018-10-31|title=Fashionable Modelling with Flux|eprint=1811.01457|class=cs.PL}}</ref> Earlier attempts generally featured a tradeoff between a "dynamic" [[Interpreted language|interpreted]] graph — chosen by frameworks such as [[PyTorch]] and [[AutoGrad (NumPy)|AutoGrad]] — which leads to interpreter overhead and poorer scalability, and a "static" [[compiled]] graph — chosen by frameworks such as [[TensorFlow]] — which limits interactivity and the types of programs that can be created easily, as well as making it harder for users to reason effectively about their programs.<ref name="flux" /> These earlier attempts are also generally only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. A more recent framework in the [[Julia (programming language)|Julia]] programming language — called Zygote — resolves these problems by treating the language's syntax as the graph; the [[intermediate representation]] for arbitrary Julia code can then be differentiated directly, [[compiler optimization|optimized]], and compiled.<ref name="flux" /><ref>{{cite arxiv|last=Innes|first=Michael|date=2018-10-18|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs|eprint=1810.07951|class=cs.PL}}</ref>
 
* '''Operator overloading (dynamic graph) based''' approaches such as [[PyTorch]] and [[AutoGrad (NumPy)|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to interpreter overhead (particularly when composing many small operations), poorer scalability, and cannot gain much benefit from compiler optimization.<ref name="myia1" />
 
Both of these earlier attempts are also generally only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs.
 
A more recent framework in the [[Julia (programming language)|Julia]] programming language — called Zygote — resolves these problems by treating the language's syntax as the graph; the [[intermediate representation]] for arbitrary Julia code can then be differentiated directly, [[compiler optimization|optimized]], and compiled.<ref name="flux" /><ref>{{cite arxiv|last=Innes|first=Michael|date=2018-10-18|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs|eprint=1810.07951|class=cs.PL}}</ref> An in-development differentiable programming language called [[Myia (programming language)|Myia]] also uses a similar approach.<ref name="myia1" />
 
==References==