0% found this document useful (0 votes)

53 views

Data Oriented Design

This document discusses data oriented design, which focuses on organizing data for efficient processing by computers. It emphasizes laying out data in contiguous blocks in memory to maximize cache hits and minimize latency. A simple example shows row-major arrays allow for much faster processing than column-major due to better cache behavior. The key aspects of data oriented design are focusing on the data type and layout, and bundling related data and transformations together at the lowest level. This contrasts with object oriented design, which bundles at higher levels of abstraction that do not match hardware. Data oriented design promotes easier parallelization, modularity, testing and performance.

Uploaded by

Lucas Borges

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Data Oriented Design

Uploaded by

Lucas Borges

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Oriented Design

O RGANIZING DATA FOR EFFICIENT PROCESSING

G ERMANY, 2016

J AN S CHEFFCZYK

J ANA D EUFEL
Fachhochschule Bingen
Contents
1 Abstract 1

2 A simple example 2
2.1 CPU Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Designing around the ideal data . . . . . . . . . . . . . . . . . . . 6
2.3 Key-Value model . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Data Oriented design put to the test 8

3.1 Considering the expenses . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Final considerations 13
4.1 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1 Abstract

Within the last few decades CPU performance doubled almost annually, leav-
ing the memory performance lacking behind. While new memory generations
generally increase the total data throughput, the latency decrease is marginal
at best. This disparity makes it difficult to reach a high CPU utilization, which
can presently only be achieved by minimizing uncached memory reads, thus
avoiding the latency-bottleneck.

Figure 1: Processor-Memory Performance Gap [6]

This is the area Data Oriented design operates in. It focuses on the data, it’s
type, how it is laid in memory, and how it will be read and transformed. It
bundles data and their transformation at the lowest possible level. This stands
in direct contrast to more classical program-paradigms like Object-Oriented-
Programming. Objects and their operations are bundled at a high abstraction
level which often do not apply on hardware level, thus reducing the likelihood of
an efficient implementation. Data Oriented design also promotes easier paral-
lelization, high modularity, ease of testing and an excellent performance.

1
2 A simple example

The following example illustrates the fundamental idea of data oriented design.
Suppose one wishes to access every element in a two-dimensional array. Two
obvious possibilities come to mind:

(a) Row Major (b) Column Major

The runtime complexity for both is O(n2 ), so equal performance is expected.

The actual runtime-performance differs greatly however, as can be seen in figure
3. One should recognize that this difference can not be understood from a high
level and abstract perspective.

”Understand the data to understand the problem” Mike Acton

1
Figure 3: Row/Col Major with highlights for the cache line

2
2.1 CPU Caches

The hardware component that causes the observed performance difference is

the CPU Cache. Caches are special memory-modules which feature very low
latency to reduce the initial problem of the processor-memory performance gap,
at the cost of capacity. the cache is organized in a hierarchical manner. For each
core there is a first level data cache (L1 DCache) and a second level cache(L2
Cache). The third level cache is usually shared across all cores which can results
not only in coherency problems2 . Most modern CPUs use 3 cache levels of which
the first level is the smallest and fastest. Instead of loading data directly from
the main-memory the CPU will load the data from the cache. If the needed
data is currently stored in the cache (cache-hit) the data can be rapidly loaded
into the CPU. If it is not (cache-miss), the data needs to be fetched from the
next higher instance. The level 1 cache will acquire it’s data from the level 2
cache and so on until cache hit or eventually main memory is reached. The
data will be stored in a cache entry (write back) which consists of a section that
stores the actual data called cacheline, a section that stores some of the physical
memory address so future request can be identified and mapped to this entry
called tag and some status bits. Since caches are fairly small it’s content gets
replaced quickly. So even if the instructions for a certain task have been loaded
into the cache other tasks will replaces the original instructions. By the time
the first task is called all it’s instructions may already be replaced.
Cache entry
Notice that a cache entry always stores a full
Flags

Tag Cache line

cache line i.e. even if only a single byte is
Figure 4: Cache entry needed a full cacheline is fetched from mem-

1 Tests have been performed on a i7-4790K and 16GB DDR3 OS win7. The test program
has been written in Java and is given in [1]
2 The L3 Cache differentiates between 3 Cache hits, hit unshared, hi shared, hit modified

3
ory, which is typically 64Bytes3 . Thus the data in close proximity of the re-
quested memory-address is now also in the cache. Ideally all data that was
fetched is actually needed for the current or an imminent instruction. A data-
format that would enable such processing would be the ideal-data. This is
realized by organizing the needed data in a continuous in section of memory.
One very powerful tool to organize data in such a manner is an array.

”I don’t know [data structure], but I know an array will beat it.” Scott Mayers

Now the initial example can be understood. As seen in figure 5a the first four
data-sets are stored within the first cacheline i.e. each access is routed to this
cacheline, only causing a single cache miss. Therefore the whole Row-Major
traversal only provokes further fetching of a new cachelines in step 5,9,13. Even
better those cache misses can be eliminated by a process known as pre-fetching
that will fetch data before it is needed. The compiler takes care of pre-fetching,
however this is only possible if a simple access pattern, such as a linear array
traversal, is used.
Cache lines Cache lines
1 2 3 4 1 2 3 4

5 6 7 8 5 6 7 8

9 10 11 12 9 10 11 12

13 14 15 16 13 14 15 16

(a) Row Major (b) Col Major

Figure 5: Row/Col Major with highlights for the cachelines

3 The cacheline size depends on the hardware and can differ from system to system. Mobile
CPUs might have L1 cachelines of 32B and L2 cachelines of diffrent sizes [2]

4
In figure5b can be seen that the Col-Major provokes a cache miss on every
step4 . These cache misses cause the performance difference observed in figure
3. In fact memory and cache access is one of the most common and devastating
bottlenecks in modern programming.

Cache Size Cachelines Event Cycles

L1 32KB 64 Bytes cache hit 4
Instruction 32KB N/A N/A N/A
L2 256KB 64 Bytes cache hit 12
L3 varies 64 Bytes cache hit unshared 26-31
shared in other core 43
modified in other core 60
miss remote access 100-300

Table 1: Cache latencys for the 6th gen of i7 intel processors [5]

In summery large blocks of contiguous, homogeneous data that will be pro-

cessed sequentially will keep the CPU busy. Where as fragmented data ac-
cess or instructions will cause cache misses which in turn result in wasted cpu-
times.

The implications for programmers can be condensed into three simple guidlines
[7]:

• Small ≡ fast

• Locality counts, stay in cache

• Predictable access patterns, to improve pre-fetching

4 This is only true for matrices that are so large that the first cacheline has been replaced
before it is needed again.

5
2.2 Designing around the ideal data

The goal of data oriented design is to format input data in such a way that it can
be efficiently processed. For current generations of CPUs the ideal data format
consists of continuous and homogeneous memory layout as can be seen in figure
6a. Objects (OOP) on the other always form tree-structures because of their
hierarchical nature (e.g. Inheritance-tress, containment trees, messaging trees).
Tree generate a fragmented call structure, as can be seen in figure 6b, resulting
in a frequent change of operations on different data i.e. both I-and-D-Cache
data will be rapidly replaced causing cache misses.

A A A ... A

B B B ...
B D
C C C ...
D D D ... C E F

DOD call sequence OO call sequence

A, A, A, … A,B,C,D,E,F,
B, B, B, … A,B,C,D,E,F,
C, C,C, … A,B,C,D,E ...

(a) DOD Callsequence (b) OO Callsequence

Figure 6: Call seq object oriented and data oriented [8]

To achieve the best possible data-layout it is often required to break down each
object and isolate their components and then group components of the same
type together.

2.3 Key-Value model

The principle of keys and values is often utilized in programming tasks. When
searching for an entry the keys in the table and the searched key need to be
compared until a match has been found. Remember that on a cache miss a
full cache line will be fetched and stored so the rest of the cache line will be

6
filled with the values attached to key. This data however is only needed when
the current key is the one that is searched for, which is highly unlikely. The
common case the next key is required for further comparisons.
One task that realizes this concept would be the modeling of car. A car object
would contain attributes like id, weight, height, maxSpeed, etc. where the id
represents the key and the other attribute represent the values. Classes store
all their fields in the same memory location thus when the id is loaded the rest
of the cache line is filled with the object’s attributes. The search for a specific
car object within a continuous data-structure is the exact same procedure as
described above. This structure will scale towards the worst case regarding
cache-misses and therefore performance.

Keys Values To form the ideal data structure the object

needs to be separated into two groups, a key
array that will store the ids and a value ar-
ray that will store the rest of the object’s-
Figure 7: [3]
attributes. Each will have it’s separate con-
tinuous piece of memory. To find the right index the key-array will be traversed
and only data that is actually needed, the keys, will be stored in the D-cache.
Once the correct key is found the index can be used to gather the corresponding
data, from the value array, as is illustrated in figure 8.

Key
Keys

Find Key Index

Values
Get Value

Value

Figure 8: Improved key-value structure [3]

7
3 Data Oriented design put to the test

A linear search optimization is a rather specific example, a more general ap-

proach is considered next. An arbitrary method represented by the move()
method of the car object, is considered and then a more hardware friendly
version is discussed.

The object oriented car class can be examined in Code1. To simplify and con-
dense the code for the example irrelevant attributes like id, weight, and so on,
have been reduced into a single variable (unrelatedData) which will take up as
much space as the individual attributes would have taken. In the object ori-
ented approach all attributes for a specific car are stored within a car object as
well as the method operate on this data.

class Car {
vec3 position ;
vec3 speed ;
// 1 : data that is not r e l e v a n t to the move () method
char unrelatedData [256];
void move (){
position . x += speed . x ;
position . y += speed . y ;
position . z += speed . z ;
}

// 2: other methods
}

Example Code 1: Car class

To solve the same task in a data oriented manner the method and it’s data
need to separated from the class. When extracting a method the relevant input
data needs to be identified first. In this case all unrelated data have already
been masked. Position and speed data need to be separated from the car class
and managed independently. In the next step the output data needs to be
considered, in this case the updated position. There are different choices for the
organization of input and output data.

One option is to create a wrapper for both input and output parameters. The

8
input wrapper holds position and speed data and the output wrapper stores the
updated position. The extracted method takes an array of input data, an array
of output data and the size of the arrays. This is a common practice when the
language of choice does not support classes but is just as valid otherwise.

struct InputMove {...}

struct OutputMove {...}

void moveAll ( int size , InputMove inputs [] , OutputMove outputs []){...}

Example Code 2: The DOD approach

Another option is store the data in a manager object. The data itself is then
stored directly in arrays, or another continuous data structure, as seen in Code2.
Note that the use of the manager is not limited to the car class but rather to all
objects that need to be moved in this manner. The data oriented Car object can
add itself to the ManageMove to ensure that it is being processed, in this case
moved. Through the returned index the object can access the current position
and speed data.

class ManageMove {
vec3 positions [];
vec3 speeds [];

unsigned int void add ( vec3 pos , vec3 speed ){...}

void moveAll (){...}
}

Example Code 3: DOD approach with wrapper

3.1 Considering the expenses

To evaluate the expenses it also matters how the method is accessed. The object
oriented approach iterates through all cars and calls the update() method(Code
Example 4). This is a fairly common approach however it yields the worst per-
formance. In a typical hierarchy the root class defines an update() method and
each subclass overwrites it providing a unique implementation where various

9
methods are called including move(). Therefore every call to an update() ex-
ecutes a different tasks which in turn requires different instructions. This will
almost ensure that the I-cache has been completely replaced thus every call of
move(), or any other method in fact, will cause an I-Cache miss. The instruc-
tions need to be loaded from memory which will cause 100-300 cycles of waiting.

for ( Car c : cars )

// updates e v e r y t h i n g within the object that needs to be updated
// among them is the move () method
c . update ();
}

Example Code 4: general update method

The data oriented approach simply calls moveAll() method on the manager ob-
ject. Therefore the same instructions on all the data of the manager is used.
Thereby only a single I-cache miss is generated for moving all objects. One
might could further optimize this approach by explicitly using SIMD5 opera-
tions.

Cache line Car object One might argue that similar results can
Position

Position

Unrelated be achieved in object orientation by defin-

Speed

Unrelated Data
Data

ing an interface Moveable. Creating a data-

Position

Unrelated
Speed

Speed

Unrelated Data
Data
structure of the type Movable would allow
Figure 9: Car object to call the move() method for all objects no
matter the actual type. However this would
Cache line
also neglect some of the object oriented no-
Speed

Speed

Speed
Speed

tion that every object knows how to update

itself. Further this would also break object
Position

Position

Position
Position
Position

Position

Position
Position

encapsulation in a sense.

Figure 10: ManageMove Obj Next the D-cache misses need to be consid-
ered which depend on how the Car class is

5 Single Instruction Multiple Data

10
allocated (as in code example1). If the allocation is scattered each access will
provoke a D-cache miss. However even if the allocation is in a continuous man-
ner and the method is accessed as in code example 4 the cache lines will store
not only the needed but also the unrelated data ,as can be seen in figure 9.
Thus if the object holds additional data the performance will degrade until the
unrelated data will fill up the rest of the cache line. If the method needs data
from other objects each access will yield another D-cache miss.

Again in the case of the ManageMove object only actually needed data is loaded
thus all cache misses but the first are avoided (figure 10). Further more a simple
access pattern was created that can easily pre-fetched. The unrelated data from
the original class would be split up into other managers that perform operations
on that data.

3.2 Test results

Because I-Cache misses are difficult to recreate in a closed test environment

only D-Cache misses have been tested. The test have been performed on three
different compilers which were set to highest optimization level respectively.
As can be seen in figure 11 even in this very simple test the differences are
substantial. The code for the test can be found in reference [1].

Large data fields are usually not stored directly in objects therefore a maximum
size of 256Bytes for the unrelated data array has been chosen, figure 12b.A fur-
ther increase of unrelated data results in further performance penalties however
not nearly as steep as the first 256Bytes.

11
Figure 11: Test result with 256 Bytes of unrelated Data

(a) 64 Bytes of unrelated Data (b) Progression from 256Bytes to 0

12
4 Final considerations

4.1 Maintenance

By separating methods and data from objects, highly independent structures

are created. All the data needed to perform a task is stored directly in the
manager object. These compact, single purpose manager object are easy to un-
derstand and modify because there are no further dependencies to other objects.
An object oriented code base is highly hierarchical which introduces not only
dependencies within each sub-tree but also dependencies between other objects.
To understand a moderately complex application it is almost always required
to consider data from a multitude of classes.

Also the context free approach allows for code reuse independent of application
and context as long as the problem, thus the data transformation, is the same.
Object orientation binds context to data in form of objects which can in turn
hinder code reuse.

4.2 Parallelization

To take advantage of the higher core-count of modern CPUs multi-threading

becomes increasingly important. Synchronizing object oriented code correctly
without unnecessary blocking is very difficult and leads to locking of data to
prevent concurrent access. This does not only add overhead it may also block
threads leaving the CPU idling.

If every manager runs in a separate thread only the execution order needs to
be synchronized. There is no need for any data synchronization since all data
is independent. It is even easier to instantiate multiple threads within a single
manager object. Then manager has a single task and a lot of data that all
needs to be processed in the same manner. This is the perfect SIMD scenario
and follows the same approach as GPU processing. In case of the ManageMove

13
class each thread could run on it’s own part of the array which guarantees that
there will be no concurrent data access.

4.3 Testing

The independent entities also result in easy testing. Valid data can be gener-
ated and used as test input the output is checked to see if the transformation
was correct. There are no hidden dependencies since all input data has been
identified, this allows to test the complete algorithm.

4.4 Drawbacks

As discussed previously data oriented design differs heavily from the most com-
mon programing paradigm, object orientation. It is taught in most schools and
universities. Thus most programs who read data oriented code for the first time
will be confused as it differs from what they are used to.

It can also be challenging to interface with existing object oriented or procedural

code. Data oriented design is often applied to a whole subsystem which will then
be interfaced. This will still yield most of the data oriented benefits.

Furthermore data oriented design can lead to higher development costs due
to the difficulty of writing perfectly isolated code. It also requires higher ex-
pertise from the developer especially if software for specific hardware is devel-
oped.

14
References
[1] Program code for the tests. https://gitlab.jan.m1234.de/jan/
DataOrientedDesignTestsl.

[2] Qualcomm krait 300. http://www.7-cpu.com/cpu/Krait.html. cppcon [Online;

accessed July 18, 2016].

[3] Mice Acton. Data-oriented design and c++. https://github.com/CppCon/

CppCon2014/tree/master/Presentations/Data-Oriented%20Design%20and%
20C%2B%2B, 2014. cppcon [Online; accessed July 18, 2016.

[4] Richard Fabian. Data-oriented design). http://www.dataorienteddesign.com/

dodmain/l, 2009. [Online; accessed July 18, 2016].

[5] Agner Fog. Lists of instruction latencies, throughputs and micro-operation break-
downs for Intel, AMD and VIA CPUs. Technical University of Denmark,
2016. http://www.agner.org/optimize/instruction_tables.pdfl [Online; ac-
cessed July 18, 2016].

[6] John L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan

Kaufmann, 2011.

[7] Scott Meyers. Cpu caches and why you care. https://github.com/CppCon/
CppCon2014/tree/master/Presentations/Data-Oriented%20Design%20and%
20C%2B%2B, 2014. code::dive conference [Online; accessed July 18, 2016.

[8] Noel. Data-oriented design (or why you might be shooting yourself in the foot with
oop). http://gamesfromwithin.com/data-oriented-designl, 2009. [Online; ac-
cessed July 18, 2016].

Download Full ILBCO Food safety act and regulations 22nd Edition Rajan Nijhawan PDF All Chapters
100% (6)
Download Full ILBCO Food safety act and regulations 22nd Edition Rajan Nijhawan PDF All Chapters
62 pages
Cache Memory Presentation Slides
No ratings yet
Cache Memory Presentation Slides
25 pages
David and Goliath, A Story of Place. The Narrative-Geographical Shaping of 1 Samuel 17
No ratings yet
David and Goliath, A Story of Place. The Narrative-Geographical Shaping of 1 Samuel 17
11 pages
Report Flight Dynamic (Mirza) PDF
No ratings yet
Report Flight Dynamic (Mirza) PDF
51 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
MTP 01 Final J.raghunat b15216
No ratings yet
MTP 01 Final J.raghunat b15216
10 pages
Non Inclusive Caches
No ratings yet
Non Inclusive Caches
10 pages
Reconfigurable Cache Architecture: Major Technical Project On
No ratings yet
Reconfigurable Cache Architecture: Major Technical Project On
9 pages
Memory Hierarchy SMT
No ratings yet
Memory Hierarchy SMT
8 pages
Dagatan Nino PR
No ratings yet
Dagatan Nino PR
12 pages
Final Draft
No ratings yet
Final Draft
11 pages
CPU Cache
No ratings yet
CPU Cache
19 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
Components of The Memory System
No ratings yet
Components of The Memory System
11 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Cache Memories
No ratings yet
Cache Memories
58 pages
CS-30005(HPC)-CS_END_NOV_2024
No ratings yet
CS-30005(HPC)-CS_END_NOV_2024
23 pages
Compute Caches: Ntroduction
No ratings yet
Compute Caches: Ntroduction
12 pages
Cache: Why Level It: Departamento de Informática, Universidade Do Minho 4710 - 057 Braga, Portugal Nunods@ipb - PT
No ratings yet
Cache: Why Level It: Departamento de Informática, Universidade Do Minho 4710 - 057 Braga, Portugal Nunods@ipb - PT
8 pages
Cache-Assignment Handout 12
No ratings yet
Cache-Assignment Handout 12
9 pages
CDA 3103 Final Exam Practice Fall 2024 (1)
No ratings yet
CDA 3103 Final Exam Practice Fall 2024 (1)
3 pages
Cse410 Sp09 Final Sol
No ratings yet
Cse410 Sp09 Final Sol
10 pages
Coa Unit Test QP 1
0% (1)
Coa Unit Test QP 1
7 pages
Relatorio
No ratings yet
Relatorio
54 pages
MRPB: Memory Request Prioritization For Massively Parallel Processors
No ratings yet
MRPB: Memory Request Prioritization For Massively Parallel Processors
12 pages
Slides04 05
No ratings yet
Slides04 05
25 pages
Lab 2 Process & Multithreaded Process Course: Operating Systems
No ratings yet
Lab 2 Process & Multithreaded Process Course: Operating Systems
18 pages
ASSIGNMENT ON COMPUTER ARCHITECTURE
No ratings yet
ASSIGNMENT ON COMPUTER ARCHITECTURE
4 pages
34_Fetch Directed Instruction Prefetching
No ratings yet
34_Fetch Directed Instruction Prefetching
12 pages
Ece4750 Lab3 Mem
No ratings yet
Ece4750 Lab3 Mem
16 pages
Literature Review of Cache Memory
100% (2)
Literature Review of Cache Memory
7 pages
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
No ratings yet
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
10 pages
Cpu Concepts-2
No ratings yet
Cpu Concepts-2
52 pages
Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors - A Case Study
No ratings yet
Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors - A Case Study
10 pages
Best Practices For A Data Warehouse On Oracle Database 11g: An Oracle White Paper November 2010
No ratings yet
Best Practices For A Data Warehouse On Oracle Database 11g: An Oracle White Paper November 2010
34 pages
A Case For Intelligent RAM IRAM
No ratings yet
A Case For Intelligent RAM IRAM
23 pages
Lect Part 2
No ratings yet
Lect Part 2
123 pages
Cache AN3544
No ratings yet
Cache AN3544
12 pages
HPCA_Endsem_Spr_2024
No ratings yet
HPCA_Endsem_Spr_2024
3 pages
Jouppi Improving Direct Mapped Cache Performance
No ratings yet
Jouppi Improving Direct Mapped Cache Performance
10 pages
10 1 1 92 377 PDF
No ratings yet
10 1 1 92 377 PDF
22 pages
Jemalloc
No ratings yet
Jemalloc
14 pages
Abstract
No ratings yet
Abstract
23 pages
2008 - Evaluation of a Cache-Oblivious Data Structure
No ratings yet
2008 - Evaluation of a Cache-Oblivious Data Structure
10 pages
Lab 2
No ratings yet
Lab 2
24 pages
Changes in Hardware: 4.1 Memory Cells
No ratings yet
Changes in Hardware: 4.1 Memory Cells
11 pages
22CS304 - Operating Systems (Lab Integrated) - Answer Key
No ratings yet
22CS304 - Operating Systems (Lab Integrated) - Answer Key
13 pages
Trabalho Faculdade
No ratings yet
Trabalho Faculdade
5 pages
Scheduling Threads For Constructive Cache Sharing On Cmps
No ratings yet
Scheduling Threads For Constructive Cache Sharing On Cmps
11 pages
Cache
No ratings yet
Cache
31 pages
Mapping Functions
No ratings yet
Mapping Functions
23 pages
Electrical Engineering and Computer Science Department: Chip Multiprocessor Cooperative Cache Compression and Migration
No ratings yet
Electrical Engineering and Computer Science Department: Chip Multiprocessor Cooperative Cache Compression and Migration
23 pages
Final Solution
No ratings yet
Final Solution
8 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Memory Hierarchy - Short Review
No ratings yet
Memory Hierarchy - Short Review
11 pages
Multi-Core Processors: Concepts and Implementations
No ratings yet
Multi-Core Processors: Concepts and Implementations
10 pages
Memory Latency
No ratings yet
Memory Latency
7 pages
Top Level View of Computer Function and Interconnection
No ratings yet
Top Level View of Computer Function and Interconnection
13 pages
Unit 5 Notes (1)
No ratings yet
Unit 5 Notes (1)
26 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Craft of C Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Narrative Text
No ratings yet
Narrative Text
12 pages
(eBook PDF) The Art of Public Speaking 13th Edition by Stephen Lucas pdf download
100% (8)
(eBook PDF) The Art of Public Speaking 13th Edition by Stephen Lucas pdf download
54 pages
Multiplexer and De-Multiplexer
No ratings yet
Multiplexer and De-Multiplexer
11 pages
Machado Paulo PDF
No ratings yet
Machado Paulo PDF
161 pages
ECON3206 - Tutorial 4 - Felipe
No ratings yet
ECON3206 - Tutorial 4 - Felipe
19 pages
srx300 Datasheet
No ratings yet
srx300 Datasheet
4 pages
2017 2912 Brochure Garage Equipment 170721 72dpi
No ratings yet
2017 2912 Brochure Garage Equipment 170721 72dpi
20 pages
Excel 2013 Basic Quick Reference PDF
No ratings yet
Excel 2013 Basic Quick Reference PDF
3 pages
Snoek Sterrekyker Here Be Dragons - Werkkaart en Memo
100% (1)
Snoek Sterrekyker Here Be Dragons - Werkkaart en Memo
12 pages
Chapter 4 PythonLoops
No ratings yet
Chapter 4 PythonLoops
28 pages
Guidelines For Determining Interfering Element Corrections On ICAP 6000-7000 Series Running ITEVA
No ratings yet
Guidelines For Determining Interfering Element Corrections On ICAP 6000-7000 Series Running ITEVA
5 pages
Frequently Asked Questions: Compounding of Contraventions Under FEMA, 1999
No ratings yet
Frequently Asked Questions: Compounding of Contraventions Under FEMA, 1999
7 pages
Banana Floorwax
No ratings yet
Banana Floorwax
14 pages
TCPIP
No ratings yet
TCPIP
2 pages
Download ebooks file Hibernate Recipes A Problem Solution Approach Recipe Series 1st Edition Gary Mak all chapters
100% (5)
Download ebooks file Hibernate Recipes A Problem Solution Approach Recipe Series 1st Edition Gary Mak all chapters
61 pages
Single Wide Master Catalog: Goss Community Unit Goss Community Folder
100% (1)
Single Wide Master Catalog: Goss Community Unit Goss Community Folder
109 pages
Study On Marketing Strategies and Consumer Preference in Puma
No ratings yet
Study On Marketing Strategies and Consumer Preference in Puma
10 pages
Rabbit Laser USA Middletown, OH: Common Settings For Engraving and Cutting Materials
No ratings yet
Rabbit Laser USA Middletown, OH: Common Settings For Engraving and Cutting Materials
3 pages
Rendering Fat Screw Press
No ratings yet
Rendering Fat Screw Press
2 pages
THE CCOAF Baby of Darshna Tudankar Premature Baby01
No ratings yet
THE CCOAF Baby of Darshna Tudankar Premature Baby01
11 pages
Channel Rack & Step Sequencer
No ratings yet
Channel Rack & Step Sequencer
7 pages
Applied Physics: Lab Manual
No ratings yet
Applied Physics: Lab Manual
13 pages
Communication Skills
No ratings yet
Communication Skills
27 pages
DLP Research
No ratings yet
DLP Research
3 pages
2018 Book ExploringLanguageAptitude PDF
No ratings yet
2018 Book ExploringLanguageAptitude PDF
394 pages
Tyco Inline Joint Single Core Unarmoured Xlpe Mechanical Conn PDF
No ratings yet
Tyco Inline Joint Single Core Unarmoured Xlpe Mechanical Conn PDF
8 pages
Blood Report - 2
No ratings yet
Blood Report - 2
13 pages

Data Oriented Design

Uploaded by

Data Oriented Design

Uploaded by

Data Oriented Design

O RGANIZING DATA FOR EFFICIENT PROCESSING

3 Data Oriented design put to the test 8

Figure 1: Processor-Memory Performance Gap [6]

(a) Row Major (b) Column Major

The runtime complexity for both is O(n2 ), so equal performance is expected.

”Understand the data to understand the problem” Mike Acton

The hardware component that causes the observed performance difference is

Tag Cache line

(a) Row Major (b) Col Major

Figure 5: Row/Col Major with highlights for the cachelines

Cache Size Cachelines Event Cycles

In summery large blocks of contiguous, homogeneous data that will be pro-

• Locality counts, stay in cache

• Predictable access patterns, to improve pre-fetching

DOD call sequence OO call sequence

(a) DOD Callsequence (b) OO Callsequence

Figure 6: Call seq object oriented and data oriented [8]

2.3 Key-Value model

Keys Values To form the ideal data structure the object

Find Key Index

Figure 8: Improved key-value structure [3]

A linear search optimization is a rather specific example, a more general ap-

Example Code 1: Car class

struct InputMove {...}

void moveAll ( int size , InputMove inputs [] , OutputMove outputs []){...}

Example Code 2: The DOD approach

unsigned int void add ( vec3 pos , vec3 speed ){...}

Example Code 3: DOD approach with wrapper

3.1 Considering the expenses

for ( Car c : cars )

Example Code 4: general update method

Unrelated be achieved in object orientation by defin-

ing an interface Moveable. Creating a data-

tion that every object knows how to update

5 Single Instruction Multiple Data

3.2 Test results

Because I-Cache misses are difficult to recreate in a closed test environment

(a) 64 Bytes of unrelated Data (b) Progression from 256Bytes to 0

By separating methods and data from objects, highly independent structures

To take advantage of the higher core-count of modern CPUs multi-threading

It can also be challenging to interface with existing object oriented or procedural

[2] Qualcomm krait 300. http://www.7-cpu.com/cpu/Krait.html. cppcon [Online;

[3] Mice Acton. Data-oriented design and c++. https://github.com/CppCon/

[4] Richard Fabian. Data-oriented design). http://www.dataorienteddesign.com/

[6] John L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan

You might also like