Differentiable programming: Difference between revisions

Content deleted Content added
Clarify Tensorflow 1 vs 2
Add some extra info from new research paper on ∂P.
Line 1:
{{Programming paradigms}}
 
'''Differentiable programming''', or '''∂P''', is a [[programming paradigm]] in which the programs can be [[Differentiation (mathematics)|differentiated]] throughout, usually via [[automatic differentiation]].<ref>{{Citation|last=Wang|first=Fei|title=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming|date=2018|url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf|work=Advances in Neural Information Processing Systems 31|pages=10201–10212|editor-last=Bengio|editor-first=S.|publisher=Curran Associates, Inc.|access-date=2019-02-13|last2=Decker|first2=James|last3=Wu|first3=Xilun|last4=Essertel|first4=Gregory|last5=Rompf|first5=Tiark|editor2-last=Wallach|editor2-first=H.|editor3-last=Larochelle|editor3-first=H.|editor4-last=Grauman|editor4-first=K.}}</ref><ref name="innes">{{Cite journal|last=Innes|first=Mike|date=2018|title=On Machine Learning and Programming Languages|url=http://www.sysml.cc/doc/2018/37.pdf|journal=SysML Conference 2018|volume=|pages=|via=}}</ref><ref name="diffprog-zygote">{{Citation|date=2019|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing|url=https://arxiv.org/pdf/1907.07587.pdf}}</ref> This allows for [[Gradient method|gradient based optimization]] of parameters in the program, often via [[gradient descent]]. Differentiable programming has found use in areas sucha aswide combiningvariety [[deepof learning]] with [[physics engines]] in [[robotics]]areas, differentiableparticularly [[Rayscientific tracing (graphics)|ray tracingcomputing]], and [[imageartificial processingintelligence]].<ref>{{cite arxiv|lastname=Degrave|first=Jonas|last2=Hermans|first2=Michiel|last3=Dambre|first3=Joni|last4=wyffels|first4=Francis|date=2016"diffprog-11-05|title=Azygote" Differentiable Physics Engine for Deep Learning in Robotics|eprint=1611.01652|class=cs.NE}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/diffrt/|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling|website=people.csail.mit.edu|access-date=2019-02-13}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/gradient_halide/|title=Differentiable Programming for Image Processing and Deep Learning in Halide|website=people.csail.mit.edu|access-date=2019-02-13}}</ref>
 
== Etymology ==
The abbreviation "∂P" comes from the initials of "Differentiable Programming" (DP). The D is substituted for [[∂]], partly to avoid confusion with [[dynamic programming]], and partly because the latter is a symbol for [[partial differentiation]].
 
== Approaches ==
Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arxiv|last=Innes|first=Michael|last2=Saba|first2=Elliot|last3=Fischer|first3=Keno|last4=Gandhi|first4=Dhairya|last5=Rudilosso|first5=Marco Concetto|last6=Joy|first6=Neethu Mariya|last7=Karmali|first7=Tejan|last8=Pal|first8=Avik|last9=Shah|first9=Viral|date=2018-10-31|title=Fashionable Modelling with Flux|eprint=1811.01457|class=cs.PL}}</ref> Earlier attempts generally fall into two groups:
Line 7 ⟶ 11:
* ''' Static, [[compiled]] graph''' based approaches such as [[TensorFlow]]<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref>, [[Theano (software)|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /><ref name="myia1">{{Cite web|url=https://www.sysml.cc/doc/2018/39.pdf|title=Automatic Differentiation in Myia|access-date=2019-06-24}}</ref><ref name="pytorchtut">{{Cite web|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html|title=TensorFlow: Static Graphs|access-date=2019-03-04}}</ref>
 
* '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[AutoGrad (NumPy)|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and cannotstruggle to gain benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /><ref name="diffprog-zygote" />
 
Both of these early approaches are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs.
 
A more recent package in the [[Julia (programming language)|Julia]] programming language — [https://github.com/FluxML/Zygote.jl Zygote] — resolves the issues that earlier attempts faced by treating the language's syntax as the graph; the design of the Julia language makes it easy for the [[intermediate representation]] of arbitrary Julia code to be differentiated directly, [[compiler optimization|optimized]], and compiled.<ref name="flux" /><ref>{{cite arxiv|last=Innes|first=Michael|date=2018-10-18|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs|eprint=1810.07951|class=cs.PL}}</ref> An in-development differentiableA programming language called [[Myia (programming language)|Myia]] also uses a similar approach,<ref name="myia1" /> as does an in-developmenta project for [[Swift (programming language)|Swift]] implemented via compiler transformation on the Swift intermediate language ([https://github.com/apple/swift/blob/tensorflow/docs/SIL.rst SIL]).<ref>{{Cite web|url=https://forums.swift.org/t/pre-pre-pitch-swift-differentiable-programming-design-overview/25992|title=Pre-pre-pitch: Swift Differentiable Programming Design Overview|date=2019-06-17|website=Swift Forums|language=en-US|access-date=2019-06-18}}</ref>; both of these are quite early in development (for example, Swift ∂P lacks any control flow).
 
== Applications ==
Differentiable programming has been applied in areas such as combining [[deep learning]] with [[physics engines]] in [[robotics]], differentiable [[Ray tracing (graphics)|ray tracing]], [[image processing]], and [[probabilistic programming]].<ref>{{cite arxiv|last=Degrave|first=Jonas|last2=Hermans|first2=Michiel|last3=Dambre|first3=Joni|last4=wyffels|first4=Francis|date=2016-11-05|title=A Differentiable Physics Engine for Deep Learning in Robotics|eprint=1611.01652|class=cs.NE}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/diffrt/|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling|website=people.csail.mit.edu|access-date=2019-02-13}}</ref><ref>{{Cite web|url=https://people.csail.mit.edu/tzumao/gradient_halide/|title=Differentiable Programming for Image Processing and Deep Learning in Halide|website=people.csail.mit.edu|access-date=2019-02-13}}</ref><ref name="diffprog-zygote" />
 
==See also==