MIT6 172F10 Lec03
MIT6 172F10 Lec03
LIMIT
PER ORDER OF 6.172
B i P f E i i Basic Performance Engineering
Saman Amarasinghe
Fall 2010
od data st uctu es
Basic Performance Engineering
Matrix Multiply
Exampple
Maximum use of the
compiler/processor/system compiler/processor/system
Modify yingg data structures
Today
Modifying code structures
Using the right algorithm
Most Bithacks
SamanAmarasinghe2009
Bentleys Rules
There is no theory of performance programming
Performance Programming is:
Knowledge of all the layers involved Knowledge of all the layers involved
Experience in knowing when and how performance can be a problem
Skill in detecting and zooming in on the problems
A good dose of common sense
A set of rules A set of rules
Patterns that occur regularly
Mistakes many make
Possibility o
f
f substantial perfformance impact
Similar to Design Patterns you learned in 6.005
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
1. Space for Time
22. Time for Space Time for Space
3. Space and Time
B. Modifying Code
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
1. Space for Time
a Data Structure Augmentation Augmentation a. Data Structure
b. Storing Precomputed Results
c. Caching
d LLazy E Evalluation d. i
2. Time for Space
3. Space and Time
B. Modifying Code
SamanAmarasinghe2009
Caching
Store some of the heavily used/recently used results so
they dont need to be computed
When is this viable? When is this viable?
Function is expensive
Function is heavily used
Argument space is large
There is temporal locality in accessing the arguments
A single hash value can be calculated from the arguments
There exists a good hash function
Results only depend on the arguments
Function has no side effects
Coherence
Is required:
Ability to invalidate the cache when the results change Ability to invalidate the cache when the results change
Function is deterministic
Or stale data can be tolerated for a little while
SamanAmarasinghe2009
Caching Template Code
typedef struct cacheval {
argtype1 arg1;
argtypen argn; argtypen argn;
resulttype result;
}
struct cacheval cache[MAXHASH];
resulttype func_driver(argtype1 a1, , argtypen an) {
resulttype res;
int bucket;
bucket = get_hash(a1, a2, , an);
if((cache[bucket].arg1 == a1)&&&&(cache[bucket].argn == an))
return cache[bucket].result;
res = func(a1, , an); res func(a1, , an);
cache[bucket].arg1 = a1;
cache[bucket].argn = an;
h [b k l cache[bucket]].result = res;
return res;
}
SamanAmarasinghe2009
Lazy Evaluation
Differ the computation until the results are really
needed
When is this viable? When is this viable?
Only a few results of a large computation is ever used
Accessing the result can be done by a function call
The result values can be calculated incrementally
All the data needed to calculate the results will remain unchanged or can
be packaged-up
SamanAmarasinghe2009
Lazy Template Code
resulttype precompute[MAXARG];
resulttype func_apply(int arg)
{
resulttype res;
if(precompute[arg] != EMPTY)
return precompute[arg];
res = func(arg);
precompute[arg] = res;
return res;
}
SamanAmarasinghe2009
Pascals Triangle
int pascal(int y, int x)
{
if(x == 0) return 1
if(x == y) return 1;
l( 1 1) l( 1 ) return pascal(y-1, x-1) + pascal(y-1, x);
}
Normal
int pt[MAXPT][MAXPT];
main() {
int pt[MAXPT][MAXPT];
int pascal(int y int x)
o a
a () {