Data-oriented design: Difference between revisions

Content deleted Content added
mNo edit summary
 
(98 intermediate revisions by 56 users not shown)
Line 1:
{{Short description|Program optimization approach in computing}}
In [[computing]], '''Data oriented design''' (not to be confused with [[data-driven design]]) is a software optimisation approach motivated by [[cache coherency]], used in [[video game]] development (usually in the [[C programming language|C]] or [[C++ programming language]]s).
{{Distinguish|Data-driven programming}}
<ref>{{cite web|title = data oriented design|url=http://www.dice.se/wp-content/uploads/2014/12/Introduction_to_Data-Oriented_Design.pdf}}</ref>
{{More citations needed|date=July 2020}}
The approach is to focus on the data layout, separating and sorting [[field (computing)|field]]s according to when they are needed, and to think about transformations of data. Proponents include [[Mike Acton]].
In [[computing]], '''data-oriented design''' is a [[program optimization]] approach motivated by efficient usage of the [[CPU cache]], often used in [[video game]] development.<ref name="Llopis">{{cite web|url=http://gamesfromwithin.com/data-oriented-design|title=Data-oriented design|last=Llopis|first=Noel|date=December 4, 2009|website=Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)|archive-url=|archive-date=|access-date=April 17, 2020}}</ref> The approach is to focus on the data layout, separating and sorting [[field (computing)|fields]] according to when they are needed, and to think about transformations of data. Proponents include Mike Acton,<ref>{{cite web|title=CppCon 2014: Mike Acton "Data-Oriented Design and C++"|website = [[YouTube]]| date=29 September 2014 |url=https://www.youtube.com/watch?v=rX0ItVEVjHc}}</ref> [[Scott Meyers]],<ref>{{cite web|title=code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care|website = [[YouTube]]| date=5 January 2015 |url=https://www.youtube.com/watch?v=WDIkqP4JbkE}}</ref> and [[Jonathan Blow]].
 
The [[parallel array]] (or [[structure of arrays]]) is the main example of data-oriented design. It is contrasted with the ''array of structures'' typical of object-oriented designs.
== Motivation ==
 
The definition of data-oriented design as a [[programming paradigm]] can be seen as contentious as many believe that it can be used side by side with another paradigm,<ref>{{cite web|access-date=2023-12-20|title=Data-Oriented Design|author=Richard Fabian|date=October 8, 2018|url=https://www.dataorienteddesign.com/dodbook/|website=www.dataorienteddesign.com}}</ref> but due to the emphasis on data layout, it is also incompatible with most other paradigms.<ref name="Llopis"/>
These techniques became especially popular during the [[PS3]] and [[xbox 360]] console generation when the hazards of [[cache misses]] became especially pronounced, due to their use of [[in-order processor]]s and high clock speeds. In modern systems (even with [[out of order execution]]), [[main memory]] is as many as hundreds of [[clock cycle]]s away from the [[processing element]]s, consequently [[locality of reference]] issues dominate performance.
 
== Contrast with OOPMotives ==
These methods became especially popular in the mid to late 2000s during the [[seventh generation of video game consoles]] that included the [[IBM]] [[PowerPC]] based [[PlayStation 3]] (PS3) and [[Xbox 360]] consoles. Historically, [[game console]]s often have relatively weak [[central processing unit]]s (CPUs) compared to the top-of-line desktop computer counterparts. This is a design choice to devote more power and [[transistor budget]] to the [[graphics processing unit]]s (GPUs). For example, the 7th generation CPUs were not manufactured with modern [[out-of-order execution]] processors, but instead use [[in-order processor]]s with high clock speeds and deep [[Pipeline (computing)|pipelines]]. In addition, most types of computing systems have [[main memory]] located hundreds of [[clock cycle]]s away from the [[processing element]]s. Furthermore, as CPUs have become faster alongside a large increase in main memory capacity, there is massive data consumption that increases the likelihood of [[cache misses]] in the [[system bus|shared bus]], otherwise known as [[Von Neumann architecture#Von Neumann bottleneck|Von Neumann bottlenecking]]. Consequently, [[locality of reference]] methods have been used to control performance, requiring improvement of [[memory access pattern]]s to fix bottlenecking. Some of the software issues were also similar to those encountered on the [[Itanium]], requiring [[loop unrolling]] for upfront scheduling.
 
== Contrast with object orientation ==
The claim is that traditional [[object-oriented]] design principles result in poor data locality, especially if [[runtime polymorphism]] is used (which itself is especially problematic on certain processors). Although OOP does superficially seem to 'organise code around data', the practice is quite different. OOP is actually about organising [[source code]] around [[data types]], rather than making the grouping of individual fields and arrays convenient for access by specific functions. It also frequently hides layout details under [[abstraction layer]]s, whilst a data-oriented programmer wants to think about this first and foremost.
{{Original research section|date=September 2021}}
 
The claim is that traditional [[object-oriented programming]] (OOP) design principles result in poor data locality,<ref>{{cite web
== Other languages ==
| title = INTEL ® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT IMPROVE VECTORIZATION EFFICIENCY USING INTEL SIMD DATA LAYOUT TEMPLATE (INTEL SDLT)
 
| url = https://www.intel.com/content/dam/www/public/us/en/documents/presentation/improving-vectorization-efficiency.pdf
The experimental [[Jonathan Blow#JAI language|JAI programming language]] being developed by [[Jonathan Blow]] has explicit support for data oriented design, whilst eschewing the traditional OOP paradigm. This is facilitated by being able to transparently move fields between [[Record (computer science)|record]]s without extensive source code changes to functions using them (or without extensive boilerplate to enable this), and by adding direct support for [[SoA]] data layout.
}}</ref><ref>{{cite journal
| title = SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes
| author1 = Holger Homann
| author2 = Francois Laenen
| journal = Computer Physics Communications
| date = 2018
| volume = 224
| pages = 325–332
| doi = 10.1016/j.cpc.2017.11.015
| arxiv = 1710.03462
| bibcode = 2018CoPhC.224..325H
| s2cid = 2878169
| language = English
}}</ref> more so if runtime polymorphism ([[dynamic dispatch]]) is used (which is especially problematic on some processors).<ref>{{cite web|title=What's wrong with Object-Oriented Design? Where's the harm in it?|url=http://www.dataorienteddesign.com/dodmain/node17.html}}describes the problems with virtual function calls, e.g., i-cache misses</ref><ref name="Llopis"/> Although OOP appears to "organise code around data", it actually organises [[source code]] around [[data type]]s rather than physically grouping individual fields and arrays in an efficient format for access by specific functions. Moreover, it often hides layout details under [[abstraction layer]]s, while a data-oriented programmer wants to consider this first and foremost.
 
== See also ==
 
* [[CPU cache]]
* [[AOSData-driven vs SOAprogramming]]
* [[memoryEntity accesscomponent patternsystem]]
* [[videoMemory gameaccess developmentpattern]]
* [[Video game development]]
 
==References==
{{Reflist}}
 
[[Category:ComputingSoftware optimization]]
[[Category:Video game development]]
 
[[Category:Programming paradigms]]
 
{{stub}}