Data-oriented design: Difference between revisions

Content deleted Content added
m Cut needless carriage returns: in paragraphs, between items.
mNo edit summary
 
(83 intermediate revisions by 54 users not shown)
Line 1:
{{Short description|Program optimization approach in computing}}
In [[computing]], '''data oriented design''' (not the same as [[data-driven design]]) is a [[program optimization]] approach motivated by [[cache coherency]], used in [[video game]] development (usually in the programming languages [[C (programming language)|C]] or [[C++]]).<ref>{{cite web|title = Data-oriented design|url=http://www.dice.se/wp-content/uploads/2014/12/Introduction_to_Data-Oriented_Design.pdf}}</ref> The approach is to focus on the data layout, separating and sorting [[field (computing)|fields]] according to when they are needed, and to think about transformations of data. Proponents include Mike Acton.<ref>{{cite web|title=CppCon 2014: Mike Acton "Data-Oriented Design and C++"|url=https://www.youtube.com/watch?v=rX0ItVEVjHc}}</ref>
{{Distinguish|Data-driven programming}}
{{More citations needed|date=July 2020}}
In [[computing]], '''data -oriented design''' (not the same as [[data-driven design]]) is a [[program optimization]] approach motivated by [[cacheefficient coherency]],usage usedof inthe [[videoCPU gamecache]], developmentoften (usuallyused in the programming languages [[Cvideo (programming language)|Cgame]] or [[C++]])development.<ref name="Llopis">{{cite web|url=http://gamesfromwithin.com/data-oriented-design|title = Data-oriented design|last=Llopis|first=Noel|date=December 4, 2009|website=Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)|archive-url=http://www.dice.se/wp|archive-content/uploads/2014/12/Introduction_to_Datadate=|access-Oriented_Design.pdfdate=April 17, 2020}}</ref> The approach is to focus on the data layout, separating and sorting [[field (computing)|fields]] according to when they are needed, and to think about transformations of data. Proponents include Mike Acton.,<ref>{{cite web|title=CppCon 2014: Mike Acton "Data-Oriented Design and C++"|website = [[YouTube]]| date=29 September 2014 |url=https://www.youtube.com/watch?v=rX0ItVEVjHc}}</ref> [[Scott Meyers]],<ref>{{cite web|title=code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care|website = [[YouTube]]| date=5 January 2015 |url=https://www.youtube.com/watch?v=WDIkqP4JbkE}}</ref> and [[Jonathan Blow]].
 
The [[parallel array]] (or [[structure of arrays]]) is the main example of data-oriented design. It is contrasted with the ''array of structures'' typical of object-oriented designs.
== Motives ==
These methods became especially popular during the [[seventh generation of video game consoles]] which included [[PlayStation 3]] (PS3) and [[Xbox 360]], when the hazards of [[cache misses]] became especially pronounced, due to their use of [[in-order processor]]s with high clock speeds and deep [[Pipeline (computing)|pipelines]] (some of the software issues were similar to those encountered on the [[Itanium]], requiring unrolling for upfront scheduling). In modern systems (even with [[out of order execution]]), [[main memory]] is as many as hundreds of [[clock cycle]]s away from the [[processing element]]s; consequently [[locality of reference]] issues dominate performance, requiring improvement of [[memory access pattern]]s to fix. [[Game console]]s often have relatively weak [[central processing unit]]s (CPUs) to devote more power and transistor budget to the [[graphics processing unit]]s (GPUs). Thus, it is critical that CPU side code is efficient to avoid [[Von Neumann architecture#Von Neumann bottleneck|Von Neumann bottlenecking]].
 
The definition of data-oriented design as a [[programming paradigm]] can be seen as contentious as many believe that it can be used side by side with another paradigm,<ref>{{cite web|access-date=2023-12-20|title=Data-Oriented Design|author=Richard Fabian|date=October 8, 2018|url=https://www.dataorienteddesign.com/dodbook/|website=www.dataorienteddesign.com}}</ref> but due to the emphasis on data layout, it is also incompatible with most other paradigms.<ref name="Llopis"/>
== Contrast with object-orientation ==
 
The claim is that traditional [[object-oriented programming]] (OOP) design principles result in poor data locality, more so if runtime polymorphism ([[dynamic dispatch]]) is used (which is especially problematic on some processors<ref>{{cite web|title=What's wrong with Object-Oriented Design? Where's the harm in it?|url=http://www.dataorienteddesign.com/dodmain/node17.html}}describes the problems with virtual function calls, e.g., i-cache misses</ref>).<ref>{{cite web|title=Data-oriented design - why you might be shooting yourself in the foot with OOP|url=http://gamesfromwithin.com/data-oriented-design}}</ref> Although OOP does superficially seem to ''organise code around data'', the practice is quite different. OOP is actually about organising [[source code]] around [[data type]]s, rather than physically grouping individual fields and arrays in a format efficient for access by specific functions. It also often hides layout details under [[abstraction layer]]s, while a data-oriented programmer wants to consider this first and foremost.
== Motives ==
These methods became especially popular in the mid to late 2000s during the [[seventh generation of video game consoles]] that included the [[IBM]] [[PowerPC]] based [[PlayStation 3]] (PS3) and [[Xbox 360]] consoles. Historically, [[game console]]s often have relatively weak [[central processing unit]]s (CPUs) compared to the top-of-line desktop computer counterparts. This is a design choice to devote more power and [[transistor budget]] to the [[graphics processing unit]]s (GPUs). For example, the 7th generation CPUs were not manufactured with modern [[out-of-order execution]] processors, but instead use [[in-order processor]]s with high clock speeds and deep [[Pipeline (computing)|pipelines]]. In addition, most types of computing systems have [[main memory]] located hundreds of [[clock cycle]]s away from the [[processing element]]s. Furthermore, as CPUs have become faster alongside a large increase in main memory capacity, there is massive data consumption that increases the likelihood of [[cache misses]] in the [[system bus|shared bus]], otherwise known as [[Von Neumann architecture#Von Neumann bottleneck|Von Neumann bottlenecking]]. Consequently, [[locality of reference]] methods have been used to control performance, requiring improvement of [[memory access pattern]]s to fix bottlenecking. Some of the software issues were also similar to those encountered on the [[Itanium]], requiring [[loop unrolling]] for upfront scheduling.
 
== Contrast with object- orientation ==
== Other languages ==
{{Original research section|date=September 2021}}
The experimental programming language [[Jonathan Blow#JAI language|JAI]] being developed by [[Jonathan Blow]] has explicit support for data-oriented design, while eschewing the traditional OOP paradigm. This is facilitated by being able to transparently move fields between [[Record (computer science)|records]] without extensive source code changes to functions using them (or without extensive [[boilerplate code]] to enable this), and by adding direct support for ''[[AOS and SOA#Structure of arrays|structure of arrays]]'' (SoA) data layout.<ref>{{cite web|title=Data-oriented demo:SOA,composition|url=https://www.youtube.com/watch?v=ZHqFrNyLlpA}}Demonstration of data-oriented and SOA features in the JAI language, also explaining the motives.</ref>
The claim is that traditional [[object-oriented programming]] (OOP) design principles result in poor data locality,<ref>{{cite web
| title = INTEL ® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT IMPROVE VECTORIZATION EFFICIENCY USING INTEL SIMD DATA LAYOUT TEMPLATE (INTEL SDLT)
| url = https://www.intel.com/content/dam/www/public/us/en/documents/presentation/improving-vectorization-efficiency.pdf
}}</ref><ref>{{cite journal
| title = SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes
| author1 = Holger Homann
| author2 = Francois Laenen
| journal = Computer Physics Communications
| date = 2018
| volume = 224
| pages = 325–332
| doi = 10.1016/j.cpc.2017.11.015
| arxiv = 1710.03462
| bibcode = 2018CoPhC.224..325H
| s2cid = 2878169
| language = English
The claim is that traditional [[object-oriented programming]] (OOP) design principles result in poor data locality,}}</ref> more so if runtime polymorphism ([[dynamic dispatch]]) is used (which is especially problematic on some processors).<ref>{{cite web|title=What's wrong with Object-Oriented Design? Where's the harm in it?|url=http://www.dataorienteddesign.com/dodmain/node17.html}}describes the problems with virtual function calls, e.g., i-cache misses</ref>).<ref>{{cite web|titlename=Data-oriented design - why you might be shooting yourself in the foot with OOP|url=http:"Llopis"//gamesfromwithin.com/data-oriented-design}}</ref> Although OOP does superficially seemappears to ''"organise code around data''", the practice is quite different. OOP isit actually about organisingorganises [[source code]] around [[data type]]s, rather than physically grouping individual fields and arrays in a formatan efficient format for access by specific functions. ItMoreover, alsoit often hides layout details under [[abstraction layer]]s, while a data-oriented programmer wants to consider this first and foremost.
 
== See also ==
* [[CPU cache]]
* [[AOSData-driven and SOAprogramming]]
* [[Entity component system]]
* [[Memory access pattern]]
* [[Video game development]]
Line 19 ⟶ 41:
{{Reflist}}
 
[[Category:Computing]]
[[Category:Software optimization]]
[[Category:Video game development]]
[[Category:Programming paradigms]]