Differentiable programming: Difference between revisions

Content deleted Content added
m Fix "on machine learning..." reference url
Clarify Tensorflow 1 vs 2
Line 5:
Most differentiable programming frameworks work by constructing a graph containing the control flow and [[data structures]] in the program.<ref name="flux">{{cite arxiv|last=Innes|first=Michael|last2=Saba|first2=Elliot|last3=Fischer|first3=Keno|last4=Gandhi|first4=Dhairya|last5=Rudilosso|first5=Marco Concetto|last6=Joy|first6=Neethu Mariya|last7=Karmali|first7=Tejan|last8=Pal|first8=Avik|last9=Shah|first9=Viral|date=2018-10-31|title=Fashionable Modelling with Flux|eprint=1811.01457|class=cs.PL}}</ref> Earlier attempts generally fall into two groups:
 
* ''' Static, [[compiled]] graph''' based approaches such as [[TensorFlow]]<ref group=note>TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.</ref>, [[Theano (software)|Theano]], and [[MXNet]]. They tend to allow for good [[compiler optimization]] and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving [[loop (computing)|loops]] or [[recursion]]), as well as making it harder for users to reason effectively about their programs.<ref name="flux" /><ref name="myia1">{{Cite web|url=https://www.sysml.cc/doc/2018/39.pdf|title=Automatic Differentiation in Myia|access-date=2019-06-24}}</ref><ref name="pytorchtut">{{Cite web|url=https://pytorch.org/tutorials/beginner/examples_autograd/tf_two_layer_net.html|title=TensorFlow: Static Graphs|access-date=2019-03-04}}</ref>
 
* '''[[Operator overloading]], dynamic graph''' based approaches such as [[PyTorch]] and [[AutoGrad (NumPy)|AutoGrad]]. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to [[interpreter (computing)|interpreter]] overhead (particularly when composing many small operations), poorer scalability, and cannot gain benefit from compiler optimization.<ref name="myia1" /><ref name="pytorchtut" />
Line 11:
Both of these early approaches are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs.
 
A more recent package in the [[Julia (programming language)|Julia]] programming language — [https://github.com/FluxML/Zygote.jl Zygote] — resolves the issues that earlier attempts faced by treating the language's syntax as the graph; the design of the Julia language makes it easy for the [[intermediate representation]] of arbitrary Julia code to be differentiated directly, [[compiler optimization|optimized]], and compiled.<ref name="flux" /><ref>{{cite arxiv|last=Innes|first=Michael|date=2018-10-18|title=Don't Unroll Adjoint: Differentiating SSA-Form Programs|eprint=1810.07951|class=cs.PL}}</ref> An in-development differentiable programming language called [[Myia (programming language)|Myia]] also uses a similar approach ,<ref name="myia1" />, as does an in-development project for [[Swift (programming language)|Swift]] implemented via compiler transformation on the Swift intermediate language ([https://github.com/apple/swift/blob/tensorflow/docs/SIL.rst SIL]). <ref>{{Cite web|url=https://forums.swift.org/t/pre-pre-pitch-swift-differentiable-programming-design-overview/25992|title=Pre-pre-pitch: Swift Differentiable Programming Design Overview|date=2019-06-17|website=Swift Forums|language=en-US|access-date=2019-06-18}}</ref>
 
==See also==
* [[Machine learning]]
 
==Notes==
{{reflist|group=note}}
 
==References==