ECS (Part 1/3) - Introduction to Data-Oriented Design

Vu Phuong Hoang
ECSPart 1: Introduction to
Data-Oriented Design
2018

▪ found it too hard to reduce lags ?
▪ tried to improve core functions ?
▪ be defeated by a heavy loop ?
Check this out for some experiments
Have you ever ...
A “simple” loop
Frame Rate

TEST #1
TestCacheMissByOrder.cs
Find a minimum value in a table 1000x1000

Test #1 - Result
for (int r = 0; r < ROWS_COUNT; ++r) {
for (int c = 0; c < COLUMNS_COUNT; ++c) {
minValue = Math.Min(minValue, table[r][c]);
}
}
Iterate by each row, then by each column
4ms
}
}
Just swap the loops order
8ms ???

Test #1 - Result
}
}
Iterate by each row, then by each column
4ms
}
}
Just swap the loops order
8ms ???
CPU Cache

▪ Loading from Cache is faster than RAM
▪ Both data & instructions will be loaded
▪ References:
▪ Dogged Determination
▪ Fuzzy Reflection
▪ codeburst.io
CPU Cache
PS4 data loading latency

▪ When a value is read from memory, next values will be read too 
à Data is loaded in batch (size = cache line)
▪ A cache line = 64 bytes
▪ Data already in Cache à Cache-hit
▪ Data not in Cache à Cache-miss à Need to load from slower memory
CPU Cache

H: Cache-Hit, M: Cache-Miss
Test #1 - Result explain
}
}
r1
r2
r3
}
}
c1 c2 c3
M H H H ...
M H H H ...
M
M M M
M M
M M
... ...

Test #1 - Take it further
int[][] table = new int[ ROWS_COUNT ][ ];
// table[i] = new int[ COLUMNS_COUNT ];
Iterate 2D array
4ms ???
int CELLS_COUNT = ROWS_COUNT * COLUMNS_COUNT
int[] flatTable = new int[ CELLS_COUNT ];
Iterate 1D array
2ms
Fragmentation

▪ Contiguous data is faster to load
▪ CPU allocates memory block where it fits
▪ Memory fragmentation is like a Swiss cheese
▪ Lead to cache-misses
Swiss-Cheese Memory

TEST #2
TestCacheMissByDataSize.cs
Read values in Arrays of different data types (10M elements)

Test #2 - Result
Iterate an array of int (4 bytes)
35ms
Iterate an array of struct (32 bytes)
58ms
Why ?
Bigger struct (36 bytes) is even worse
60ms

Iterate an array of int (4 bytes)
35ms
Iterate an array of struct (32 bytes)
58ms
Why ? Answer: CPU Cache, again

Cache Pollution
Un-used data
still loaded
Less space
in cache-line
More
cache-misses

Un-used data
still loaded
Less space
in cache-line
More
cache-misses
GameObject in
OOP style ?

Add 1 byte data to the struct.
Then its size comes from 32 to 36 bytes (expect 33).
Why ?

Add 1 byte data to the struct.
Then its size comes from 32 to 36 bytes (expect 33).
Why ?
Answer: Data alignment
More:
▪ Try appending 1 more byte, size keeps at 36.
▪ Try prepending 1 more byte, size goes to 40.

▪ Data is put into 4-bytes “buckets” 
for fast access
▪ When added data doesn’t fit
▪ Next (& empty) bucket will be used
▪ Wasted un-used bytes = padding
▪ References:
▪ Stdio.vn
▪ Wikipedia
▪ Song Ho Ahn
Data alignment
Without
data alignment

TEST #3
TestDataAlignment.cs
Change order of data in struct

Just re-order data from biggest to smallest size
8 bytes
Test #3 - Result
12 bytes
???

Cache-miss
▪ Fastest way to load data: NOT LOADING IT :)
▪ Second best ways ?
▪ Keep data small (if not, notice about data alignment)
▪ Keep data contiguous
▪ Separate data by function
▪ In Relational Database, sometimes we de-normalize for performance, too !
◆ Problem #1: Encapsulation makes it hard to do this

▪ Function is split into instruction blocks
▪ CPU looks up these blocks from a table
▪ CPU loads these blocks into instruction cache (I$)
▪ Function call suffers from cache-miss, too !!!
▪ References:
▪ Wikipedia (Instruction Cycle)
▪ Wikipedia (Branch Misprediction)
Function call

TEST #4
TestVirtualFunctions.cs
How overriden functions affect performance ?

Test #4 - Result
Direct call
35ms
1-level indirect call
61ms
10-levels indirect call
411ms

▪ Fastest way to call a function: NOT CALLING IT :)
▪ Second best ways:
▪ Keep high-performance function small (fits in cache)
▪ Keep narrow class hierarchy
▪ 1 function to process multiple instances, not 1 function for each instance
◆ Problem #2: Encapsulation / Polymorphism makes it hard to do this
Function call

Wait, they are OOP core !
Encapsulation + Inheritance + Polymorphism

▪ Multiple inheritance
▪ Useful for game development, bad architecture
▪ “Diamond of dead”
◆ Problem #3: Not an easy way to implement multiple inheritance properly
Other OOP problems

▪ Unit test
▪ My test uses some members, but I need to initialize them all !!!
◆ Problem #4: Unit test involves un-related constraints
Other OOP problems

▪ Unit test
▪ My test uses some members, but I need to initialize them all !!!
◆ Problem #4: Unit test involves un-related constraints
▪ Jobify, False sharing, ...
Other OOP problems

▪ Focus on how data is laid out in memory
▪ Focus on how data is read / processed
▪ Build functions around data

▪ Focus on how data is laid out in memory
▪ Focus on how data is read / processed
▪ Build functions around data
▪ References:
▪ DICE
▪ Mike Acton (Insomniac Games, Unity)
▪ Richard Fabian
▪ Keith O’Connor (Ubisoft Montreal)

“The purpose of all programs, and all
parts of those programs, is to transform
data from one form to another ”
- Mike Acton -

“When there is one, there are many ”
- Mike Acton -

“Designing the code around the data,
not the other way around ”
- Linus Torvalds -

TEST #5
TestGoodEnoughAlgorithms.cs
Find closest object

Test #5 - Result
for (int i = 0; i < ELEMENTS_COUNT; ++i) {
d = GetDistance(center, objects[i].position);
if (minDistance > d) {
minDistance = d;
closestId = i;
}
}
Iterate Array of “GameObjects”
209ms
d = GetDistance(center, positions[i]);
if (minDistance > d) {
minDistance = d;
closestId = i;
}
}
Iterate Array of positions
128ms
They’re almost identical, except line #2

▪ You already knew DOD is faster (from previous test results)
▪ Let’s improve the algorithm (current: 209ms)
▪ Use GetSquareDistance instead of GetDistance à 137ms
▪ *Eliminate too far objects & pick the 1st close-enough object à 36ms
▪ Reduce branch mis-prediction à 34ms
*Human needs good-enough choice, not the optimal one.

d = GetSqDistance(center, objects[i].position);
if (d > MAX_SQ_DST) continue;
if (d < MIN_SQ_DST) { closestId = i; break; }
// ... original comparison here
}
Iterate Array of “GameObjects”
36ms
d = GetSqDistance(center, positions[i]);
if (d > MAX_SQ_DST) continue;
if (d < MIN_SQ_DST) { closestId = i; break; }
// ... original comparison here
}
Iterate Array of positions
25ms
Your smart algorithm + DOD = AWESOME

▪ Reduce data cache-misses (Problem #1)
▪ Reduce function cache-misses, indirect function calls (Problem #2)
▪ Component over inheritance (Problem #3)
▪ Unit test = Feed input & Assert the output (Problem #4)
▪ References:
▪ Games From Within
▪ Tencent

ECSEntity
Component
System
Smart DOD
Architecture ?

▪ Performance & flexibility
▪ It’s the FUTURE (click links to see more)
▪ Mentioned top companies (Insomniac Games, Ubisoft, EA/DICE, ...)
▪ Sony
▪ Intel
▪ Apple
▪ Riot Games
▪ Unity !!! (other, other, other, other)
▪ More ...
Why should we care ?

These masterpieces
also use ECS
* Click images for more details

ECS (Part 1/3) - Introduction to Data-Oriented Design

Recommended

More Related Content

What's hot (20)

Similar to ECS (Part 1/3) - Introduction to Data-Oriented Design (20)

More from Phuong Hoang Vu (11)

Recently uploaded (20)

ECS (Part 1/3) - Introduction to Data-Oriented Design