Revision as of 12:49, 13 July 2019 edit 31.49.219.119 (talk) Clarify Tensorflow 1 vs 2 ← Previous edit		Revision as of 21:29, 19 July 2019 edit undo 81.154.7.75 (talk) Add some extra info from new research paper on ∂P. Next edit →
Line 1: {{Programming paradigms}} '''Differentiable programming''', or '''∂P''', is a [[programming paradigm]] in which the programs can be [[Differentiation (mathematics)\|differentiated]] throughout, usually via [[automatic differentiation]].<ref>{{Citation\|last=Wang\|first=Fei\|title=Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming\|date=2018\|url=http://papers.nips.cc/paper/8221-backpropagation-with-callbacks-foundations-for-efficient-and-expressive-differentiable-programming.pdf\|work=Advances in Neural Information Processing Systems 31\|pages=10201–10212\|editor-last=Bengio\|editor-first=S.\|publisher=Curran Associates, Inc.\|access-date=2019-02-13\|last2=Decker\|first2=James\|last3=Wu\|first3=Xilun\|last4=Essertel\|first4=Gregory\|last5=Rompf\|first5=Tiark\|editor2-last=Wallach\|editor2-first=H.\|editor3-last=Larochelle\|editor3-first=H.\|editor4-last=Grauman\|editor4-first=K.}}</ref><ref name="innes">{{Cite journal\|last=Innes\|first=Mike\|date=2018\|title=On Machine Learning and Programming Languages\|url=http://www.sysml.cc/doc/2018/37.pdf\|journal=SysML Conference 2018\|volume=\|pages=\|via=}}</ref><ref name="diffprog-zygote">{{Citation\|date=2019\|title=∂P: A Differentiable Programming System to Bridge Machine Learning and Scientific Computing\|url=https://arxiv.org/pdf/1907.07587.pdf}}</ref> This allows for [[Gradient method\|gradient based optimization]] of parameters in the program, often via [[gradient descent]]. Differentiable programming has found use in ~~areas~~ ~~such~~a aswide ~~combining~~variety ~~[[deep~~of ~~learning]] with [[physics engines]] in [[robotics]]~~areas, ~~differentiable~~particularly [[~~Ray~~scientific ~~tracing (graphics)\|ray tracing~~computing]], and [[~~image~~artificial ~~processing~~intelligence]].<ref~~>{{cite~~ ~~arxiv\|last~~name=~~Degrave\|first=Jonas\|last2=Hermans\|first2=Michiel\|last3=Dambre\|first3=Joni\|last4=wyffels\|first4=Francis\|date=2016~~"diffprog-~~11-05\|title=A~~zygote" ~~Differentiable Physics Engine for Deep Learning in Robotics\|eprint=1611.01652\|class=cs.NE}}<~~/ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/diffrt/\|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/gradient_halide/\|title=Differentiable Programming for Image Processing and Deep Learning in Halide\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref> == Etymology == The abbreviation "∂P" comes from the initials of "Differentiable Programming" (DP). The D is substituted for [[∂]], partly to avoid confusion with [[dynamic programming]], and partly because the latter is a symbol for [[partial differentiation]]. == Approaches == Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arxiv\|last=Innes\|first=Michael\|last2=Saba\|first2=Elliot\|last3=Fischer\|first3=Keno\|last4=Gandhi\|first4=Dhairya\|last5=Rudilosso\|first5=Marco Concetto\|last6=Joy\|first6=Neethu Mariya\|last7=Karmali\|first7=Tejan\|last8=Pal\|first8=Avik\|last9=Shah\|first9=Viral\|date=2018-10-31\|title=Fashionable Modelling with Flux\|eprint=1811.01457\|class=cs.PL}}</ref> Earlier attempts generally fall into two groups: Line 7 ⟶ 11: * ''' Static, [[compiled]] graph''' based approaches such as [[TensorFlow]]<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref>, [[Theano (software)\|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)\|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /><ref name="myia1">{{Cite web\|url=https://www.sysml.cc/doc/2018/39.pdf\|title=Automatic Differentiation in Myia\|access-date=2019-06-24}}</ref><ref name="pytorchtut">{{Cite web\|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html\|title=TensorFlow: Static Graphs\|access-date=2019-03-04}}</ref> * '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[AutoGrad (NumPy)\|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)\|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and ~~cannot~~struggle to gain benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" /><ref name="diffprog-zygote" /> Both of these early approaches are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. A more recent package in the [[Julia (programming language)\|Julia]] programming language — [https://github.com/FluxML/Zygote.jl Zygote] — resolves the issues that earlier attempts faced by treating the language's syntax as the graph; the design of the Julia language makes it easy for the [[intermediate representation]] of arbitrary Julia code to be differentiated directly, [[compiler optimization\|optimized]], and compiled.<ref name="flux" /><ref>{{cite arxiv\|last=Innes\|first=Michael\|date=2018-10-18\|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs\|eprint=1810.07951\|class=cs.PL}}</ref> ~~An in-development differentiable~~A programming language called [[Myia (programming language)\|Myia]] also uses a similar approach,<ref name="myia1" /> as does ~~an in-development~~a project for [[Swift (programming language)\|Swift]] ~~implemented via compiler transformation on the Swift intermediate language ([https://github.com/apple/swift/blob/tensorflow/docs/SIL.rst SIL]).~~<ref>{{Cite web\|url=https://forums.swift.org/t/pre-pre-pitch-swift-differentiable-programming-design-overview/25992\|title=Pre-pre-pitch: Swift Differentiable Programming Design Overview\|date=2019-06-17\|website=Swift Forums\|language=en-US\|access-date=2019-06-18}}</ref>; both of these are quite early in development (for example, Swift ∂P lacks any control flow). == Applications == Differentiable programming has been applied in areas such as combining [[deep learning]] with [[physics engines]] in [[robotics]], differentiable [[Ray tracing (graphics)\|ray tracing]], [[image processing]], and [[probabilistic programming]].<ref>{{cite arxiv\|last=Degrave\|first=Jonas\|last2=Hermans\|first2=Michiel\|last3=Dambre\|first3=Joni\|last4=wyffels\|first4=Francis\|date=2016-11-05\|title=A Differentiable Physics Engine for Deep Learning in Robotics\|eprint=1611.01652\|class=cs.NE}}</ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/diffrt/\|title=Differentiable Monte Carlo Ray Tracing through Edge Sampling\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref><ref>{{Cite web\|url=https://people.csail.mit.edu/tzumao/gradient_halide/\|title=Differentiable Programming for Image Processing and Deep Learning in Halide\|website=people.csail.mit.edu\|access-date=2019-02-13}}</ref><ref name="diffprog-zygote" /> ==See also==

Differentiable programming: Difference between revisions