Data-oriented design: Difference between revisions

Content deleted Content added
Inkeliz (talk | contribs)
Undid revision 1183477772 by SOLARSCUFFLEBOT (talk). This page is about "Data-Oriented Design" and not "Data-Oriented Programming".
mNo edit summary
 
(14 intermediate revisions by 7 users not shown)
Line 1:
{{Short description|Program optimization approach in computing modelling programs as transforms}}
{{Distinguish|Data-driven programming}}
{{Distinguish|Data-oriented programming}}
{{More citations needed|date=July 2020}}
In the context of [[computing]], '''data-oriented programmingdesign''' heavilyis benefits froma [[program optimization|program optimizations]] approach motivated by efficient usage of the [[CPU cache]], often used in [[video game]] development.<ref name=gamesfromwithin"Llopis">{{cite web |last=Llopis |first=Noel |date=December 4, 2009 |title=Data-oriented design |url=http://gamesfromwithin.com/data-oriented-design |url-statustitle=live |archive-url=https://web.archive.org/web/20190423233051/http://gamesfromwithin.com/dataData-oriented- design |archive-datelast=Apr 23, 2019 Llopis|first=Noel|access-date=AprilDecember 174, 2020 2009|website=Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)|archive-url=|archive-date=|access-date=April 17, 2020}}</ref> The approach is to focus on the data layout, separating and sorting [[field (computing)|fields]] according to when they are needed, and to think about transformations of data. Proponents include Mike Acton,<ref>{{cite web|title=CppCon 2014: Mike Acton "Data-Oriented Design and C++"|website = [[YouTube]]| date=29 September 2014 |url=https://www.youtube.com/watch?v=rX0ItVEVjHc}}</ref> [[Scott Meyers]],<ref>{{cite web|title=code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care|website = [[YouTube]]| date=5 January 2015 |url=https://www.youtube.com/watch?v=WDIkqP4JbkE}}</ref> and [[Jonathan Blow]], and [[Andrew Kelley (computer programmer)|Andrew Kelley]]. The [[parallel array]] (or [[structure of arrays]]) is a commonly referenced example of one such cache-motivated data structure. It is contrasted with the ''array of structures'' typical of object-oriented designs, and eventually balanced to a structure of arrays of structures.
{{Programming paradigms}}
 
The [[parallel array]] (or [[structure of arrays]]) is the main example of data-oriented design. It is contrasted with the ''array of structures'' typical of object-oriented designs.
As a design paradigm, '''data-oriented-design''' focuses on optimal transformations of data and focuses on modelling programs as '''transforms.''' Transforms are abstractions of code that solely focus on the mapping of inputs to outputs. They do not distinguish between accessing inputs by [[Parameter (computer programming)|parameter]], [[Pointer (computer programming)|pointer]], [[Reference (computer science)|reference]], [[upvalue]], and vice versa with writing outputs. This eliminates the concept of a [[Side effect (computer science)|Side-effect]] and focuses solely on how inputs transform into outputs, logically identical to [[Function (mathematics)|functions]] in mathematics.
 
The definition of data-oriented design as a [[programming paradigm]] can be seen as contentious as many believe that it can be used side by side with another paradigm,<ref>{{cite web|access-date=2023-12-20|title=Data-Oriented Design|author=Richard Fabian|date=October 8, 2018|url=https://www.dataorienteddesign.com/dodbook/|website=www.dataorienteddesign.com}}</ref> but due to the emphasis on data layout, it is also incompatible with most other paradigms.<ref name="Llopis"/>
Strategies and patterns emerging from the notion of modelling via transforms often base themselves upon allowing assumptions about a [[Computer program|program]] or [[subprogram]]'s [[State (computer science)|state]]. Examples such as [https://www.dataorienteddesign.com/dodbook/node4.html Existential Processing]<ref>{{Cite web |title=Existential Processing |url=https://www.dataorienteddesign.com/dodbook/node4.html |access-date=2023-06-01 |website=www.dataorienteddesign.com}}</ref> and [https://www.dataorienteddesign.com/dodbook/node6.html Hierarchical Level of Detail]<ref>{{Cite web |title=Hierarchical Level of Detail |url=https://www.dataorienteddesign.com/dodbook/node6.html |access-date=2023-06-01 |website=www.dataorienteddesign.com}}</ref> are all integral proponents of the core design principles.
 
== Motives ==
As a programming paradigm, '''data-oriented programming''' (also commonly referred to as data-oriented design), is about implementing '''transforms''' into the native language, often with [[Procedural programming|Procedural]], [[Functional programming|Functional]], and [[Array programming|Array]] programming, though not limited from [[Object-oriented programming]]. To most optimally transform data between different states, the approach is to first focus on what transforms exist and discovering what they need to operate. Second is to optimize data layouts for these transforms, separating and sorting [[field (computing)|fields]] according to when they are needed, and to think about how data flows through the transform chains.
 
In the context of [[computing]], data-oriented programming heavily benefits from [[program optimization|program optimizations]] motivated by efficient usage of the [[CPU cache]], often used in [[video game]] development.<ref name=gamesfromwithin>{{cite web |last=Llopis |first=Noel |date=December 4, 2009 |title=Data-oriented design |url=http://gamesfromwithin.com/data-oriented-design |url-status=live |archive-url=https://web.archive.org/web/20190423233051/http://gamesfromwithin.com/data-oriented-design |archive-date=Apr 23, 2019 |access-date=April 17, 2020 |website=Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)}}</ref> Proponents include Mike Acton,<ref>{{cite web|title=CppCon 2014: Mike Acton "Data-Oriented Design and C++"|website = [[YouTube]]|url=https://www.youtube.com/watch?v=rX0ItVEVjHc}}</ref> [[Scott Meyers]],<ref>{{cite web|title=code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care|website = [[YouTube]]|url=https://www.youtube.com/watch?v=WDIkqP4JbkE}}</ref> [[Jonathan Blow]], and [[Andrew Kelley (computer programmer)|Andrew Kelley]]. The [[parallel array]] (or [[structure of arrays]]) is a commonly referenced example of one such cache-motivated data structure. It is contrasted with the ''array of structures'' typical of object-oriented designs, and eventually balanced to a structure of arrays of structures.
 
== Computing motives ==
These methods became especially popular in the mid to late 2000s during the [[seventh generation of video game consoles]] that included the [[IBM]] [[PowerPC]] based [[PlayStation 3]] (PS3) and [[Xbox 360]] consoles. Historically, [[game console]]s often have relatively weak [[central processing unit]]s (CPUs) compared to the top-of-line desktop computer counterparts. This is a design choice to devote more power and [[transistor budget]] to the [[graphics processing unit]]s (GPUs). For example, the 7th generation CPUs were not manufactured with modern [[out-of-order execution]] processors, but instead use [[in-order processor]]s with high clock speeds and deep [[Pipeline (computing)|pipelines]]. In addition, most types of computing systems have [[main memory]] located hundreds of [[clock cycle]]s away from the [[processing element]]s. Furthermore, as CPUs have become faster alongside a large increase in main memory capacity, there is massive data consumption that increases the likelihood of [[cache misses]] in the [[system bus|shared bus]], otherwise known as [[Von Neumann architecture#Von Neumann bottleneck|Von Neumann bottlenecking]]. Consequently, [[locality of reference]] methods have been used to control performance, requiring improvement of [[memory access pattern]]s to fix bottlenecking. Some of the software issues were also similar to those encountered on the [[Itanium]], requiring [[loop unrolling]] for upfront scheduling.
 
== Contrast with object orientation ==
{{Original research section|date=September 2021}}
The claim is that traditional [[object-oriented programming]] (OOP) design principles result in poor data locality,<ref>{{cite web
The claim is that traditional [[object-oriented programming]] (OOP) results in poor data locality,{{Clarify|reason=What? Sorry, but OOP doesn't have to do anything with data layouts or design. Data are instances of the Objects and can be organized irrelevantly to OOP itself. The whole paragraph make no sense.|date=September 2021}} more so if runtime [[Polymorphism (computer science)|polymorphism]] ([[dynamic dispatch]]) is used (which is especially problematic on some processors).<ref>{{cite web|title=What's wrong with Object-Oriented Design? Where's the harm in it?|url=http://www.dataorienteddesign.com/dodmain/node17.html}}describes the problems with virtual function calls, e.g., i-cache misses</ref><ref name=gamesfromwithin/> Although OOP appears to "organize code around data", it actually organizes [[source code]] around the interaction of [[data type]]s and their relationships, rather than physically grouping individual fields and arrays in an efficient format for access by specific procedures. Moreover, it often hides layout details under [[abstraction layer]]s, while data orientation wants to consider this first and foremost.
| title = INTEL ® HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT IMPROVE VECTORIZATION EFFICIENCY USING INTEL SIMD DATA LAYOUT TEMPLATE (INTEL SDLT)
| url = https://www.intel.com/content/dam/www/public/us/en/documents/presentation/improving-vectorization-efficiency.pdf
}}</ref><ref>{{cite journal
| title = SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes
| author1 = Holger Homann
| author2 = Francois Laenen
| journal = Computer Physics Communications
| date = 2018
| volume = 224
| pages = 325–332
| doi = 10.1016/j.cpc.2017.11.015
| arxiv = 1710.03462
| bibcode = 2018CoPhC.224..325H
| s2cid = 2878169
| language = English
The claim is that traditional [[object-oriented programming]] (OOP) results in poor data locality,{{Clarify|reason=What? Sorry, but OOP doesn't have to do anything with data layouts or design. Data are instances of the Objects and can be organized irrelevantly to OOP itself. The whole paragraph make no sense.|date=September 2021}}</ref> more so if runtime [[Polymorphism (computer science)|polymorphism]] ([[dynamic dispatch]]) is used (which is especially problematic on some processors).<ref>{{cite web|title=What's wrong with Object-Oriented Design? Where's the harm in it?|url=http://www.dataorienteddesign.com/dodmain/node17.html}}describes the problems with virtual function calls, e.g., i-cache misses</ref><ref name=gamesfromwithin"Llopis"/> Although OOP appears to "organizeorganise code around data", it actually organizesorganises [[source code]] around the interaction of [[data type]]s and their relationships, rather than physically grouping individual fields and arrays in an efficient format for access by specific proceduresfunctions. Moreover, it often hides layout details under [[abstraction layer]]s, while a data-oriented orientationprogrammer wants to consider this first and foremost.
 
== See also ==