MIT6 172F10 Lec03

Performance Programming is not a "theory" Performance Programming is a set of rules and common sense. A good dose of common sense can help avoid common performance Mistakes. Performance Programming is about knowing when and how performance can be a problem.

Uploaded by

printesoi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

MIT6 172F10 Lec03

Uploaded by

printesoi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

SPEED SPEED

LIMIT
PER ORDER OF 6.172
B i P f E i i Basic Performance Engineering
Saman Amarasinghe
Fall 2010
od data st uctu es
Basic Performance Engineering
Matrix Multiply
Exampple
Maximum use of the
compiler/processor/system compiler/processor/system
Modify yingg data structures
Today
Modifying code structures
Using the right algorithm
Most Bithacks
SamanAmarasinghe2009
Bentleys Rules
There is no theory of performance programming
Performance Programming is:
Knowledge of all the layers involved Knowledge of all the layers involved
Experience in knowing when and how performance can be a problem
Skill in detecting and zooming in on the problems
A good dose of common sense
A set of rules A set of rules
Patterns that occur regularly
Mistakes many make
Possibility o
f
f substantial perfformance impact
Similar to Design Patterns you learned in 6.005
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
1. Space for Time
22. Time for Space Time for Space
3. Space and Time
B. Modifying Code
SamanAmarasinghe2009

Bentleys Rules
A. Modifying Data
1. Space for Time
a Data Structure Augmentation Augmentation a. Data Structure
b. Storing Precomputed Results
c. Caching
d LLazy E Evalluation d. i
2. Time for Space
3. Space and Time
B. Modifying Code
SamanAmarasinghe2009

Data Structure Augmentation

Add some more info to the data structures to make
common operations quicker
When is this viable?
Additional information offers a clear benefit
Calculating the information is cheap/easy
Keeping the information current is not too difficult
Examples?
Faster Naviggation
Doubly linked list and Delete Operation
Reduced Computation
Reference Counting Reference Counting
SamanAmarasinghe2009
Storing Precomputed Results
Store the results of a previous calculation. Reuse the
precomputed results than redoing the calculation.
When is this viable?
Function is expensive
Function is heavily used
Argument space is small
Results only depend on the arguments Results only depend on the arguments
Function has no side effects
Function is deterministic
Examples:
SamanAmarasinghe2009
PrecomputingTemplate Code
result precompute[MAXARG];
result func_initialize(int arg)
{
for(i=0; i < MAXARG; i++)
precompute[arg] = func(arg);
}
result func_apply(int arg)
{
return precompute[arg];
}
SamanAmarasinghe2009
Pascals Triangle
int pascal(int y, int x)
{
if(x == 0) return 1;
if( ) 1 if(x == y) return 1;
return pascal(y-1, x-1) + pascal(y-1, x);
}
Normal
int pt[MAXPT][MAXPT];
main() {
o a
a () {

for(i=0; i < PTMAX; i++) {

pt[i][0] = 1;
pt[i][i] = 1;
for(j=1; j < i; j++)
pt[i][j] = pt[i-1][j-1] + pt[i-1][j];
}
i t l(i t i t ) { int pascal(int y, int x) {
return pt[y][x];
}
SamanAmarasinghe2009
Precomputation
Another example of precomputing
unsigned long fib(int n) unsigned long fib(int n)
{ {
if((n==1)) return 1; int i;
if(n==2) return 1; unsigned long prev, curr, tmp;
return fib(n-1) + fib(n-2); if(n==1) return 1;
}} if((n==2)) return 1; ;
prev = 1;
curr = 1;
for((i=3; i <=n;;
i++
) { ; ) {
tmp = prev + curr;
prev = curr;
curr = tmpp;;
}
return curr;
}}
SamanAmarasinghe2009

Caching
Store some of the heavily used/recently used results so
they dont need to be computed
When is this viable? When is this viable?
Function is expensive
Function is heavily used
Argument space is large
There is temporal locality in accessing the arguments
A single hash value can be calculated from the arguments
There exists a good hash function
Results only depend on the arguments
Function has no side effects
Coherence
Is required:
Ability to invalidate the cache when the results change Ability to invalidate the cache when the results change
Function is deterministic
Or stale data can be tolerated for a little while
SamanAmarasinghe2009
Caching Template Code
typedef struct cacheval {
argtype1 arg1;

argtypen argn; argtypen argn;
resulttype result;
}
struct cacheval cache[MAXHASH];
resulttype func_driver(argtype1 a1, , argtypen an) {
resulttype res;
int bucket;
bucket = get_hash(a1, a2, , an);
if((cache[bucket].arg1 == a1)&&&&(cache[bucket].argn == an))
return cache[bucket].result;
res = func(a1, , an); res func(a1, , an);
cache[bucket].arg1 = a1;

cache[bucket].argn = an;
h [b k l cache[bucket]].result = res;
return res;
}
SamanAmarasinghe2009
Lazy Evaluation
Differ the computation until the results are really
needed
When is this viable? When is this viable?
Only a few results of a large computation is ever used
Accessing the result can be done by a function call
The result values can be calculated incrementally
All the data needed to calculate the results will remain unchanged or can
be packaged-up
SamanAmarasinghe2009
Lazy Template Code
resulttype precompute[MAXARG];
resulttype func_apply(int arg)
{
resulttype res;
if(precompute[arg] != EMPTY)
return precompute[arg];
res = func(arg);
precompute[arg] = res;
return res;
}
SamanAmarasinghe2009
Pascals Triangle
int pascal(int y, int x)
{
if(x == 0) return 1
if(x == y) return 1;
l( 1 1) l( 1 ) return pascal(y-1, x-1) + pascal(y-1, x);
}
Normal
int pt[MAXPT][MAXPT];
main() {
int pt[MAXPT][MAXPT];
int pascal(int y int x)
o a
a () {

for(i=0; i < PTMAX; i++) {

pt[i][0] = 1;
pt[i][i] = 1;
int pascal(int y, int x)
{
if(x == 0) return 1;
if(x == y) return 1;
if(pt[y][x] > 0) return pt[y][x];
for(j=1; j < i; j++)
pt[i][j] = pt[i-1][j-1] + pt[i-1][j];
}
i t l(i t i t ) {
if(pt[y][x] > 0) return pt[y][x];
val = pascal(y-1, x-1) + pascal(y-1, x);
pt[y][x] = val;
return val;
}
int pascal(int y, int x) {
return pt[y][x];
}
}
SamanAmarasinghe2009
Precomputation
Lazy Evaluation

Bentleys Rules
A. Modifying Data
1. Space for Time
22. Time for Space Time for Space
1. Packing/Compression
2. Interpreters
33. S d Ti Space and Time
B. Modifying Code
SamanAmarasinghe2009
Packing/Compression
Reduce the space of the data by storing them as
processed which will require additional computation
to get the data. to get the data.
When is it viable?
Storage is at a premium
Old days most of the time!
Now Now
Embedded devices with very little memory/storage
Very large data sets
A Ability to drastically reduce the data size in storage
Extraction process is amenable to the required access pattern
Batch expand it all
Steam
Random access
SamanAmarasinghe2009
Packing / Compression
Packing Level
Packing in memory
Packing out of core storage Packing out of core storage
Packing Methods
Use smaller data size
Eliminate leading zeros
Eliminate reppetitions ((LZ77))
Heavyweight compression
SamanAmarasinghe2009
L A
LZ77 Basics
input stream output stream
decompress
<1, 3> O <2, 4> L A
<2, 4>L A
LZ77 Basics
input stream
decompress
output stream
L A <1, 3> O <2, 4>
L A L A
LZ77 Basics
<2, 4>
input stream
decompress
output stream
L
A <1, 3> O

LZ77 Basics
input stream
<2, 4>
decompress
output stream
L A L A <1, 3> O L A
<1, 3>
LZ77 Basics
input stream output stream
decompress
<1, 3> O L A L A L A
O O
LZ77 Basics
input stream output stream
decompress
<1, 3>
O L A L A L A

LZ77 Basics
input stream
<1, 3>
decompress
output stream
O O O L A L A L A O
Interpreters
Instead of writing a program to do a computation, use a
language to describe the computation at a high level
and write an interpreter for that language and write an interpreter for that language
Benefits:
Nice and clean abstraction of the language
Easy to add/change operations by changing the HLL program
Much more compact representation
Examples:
String processing
Bytecodes
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
1. Space for Time
22. Time for Space Time for Space
3. Space and Time
1. SIMD
B. Modifying Code
SamanAmarasinghe2009
SIMD
Store short width data packed into the machine word
(our Intel machines are 64 bit)
64 Booleans (unsigned long long) 64 Booleans (unsigned long long)
2 32-bit floats
Intel SSE instructions operate on
2 32-bit integers
these multi granularities these multi granularities
4 16 bi i 4 16-bit integers
8 8-bit integers
Singgle op peration on all the data items
Win-win situation both faster and less storage
When viable?
If the same operation is performed on all the data items
Items can be stored contiguous in memory Items can be stored contiguous in memory
Common case dont have to pick or operate on each item separately
SamanAmarasinghe2009
Example: overlap in battleship boards
#define BSZ 64
i t l (i t b d1[BSZ][BSZ] int overlap(int board1[BSZ][BSZ],
int board2[BSZ][BSZ])
{
int i; j; int i; j;
for(i=0; i<BSZ; i++)
for(j=0; j<BSZ; j++)
if((board1[i][j] == 1)&& if((board1[i][j] == 1)&&
(board2[i][j] == 1))
return 1;
return 0; return 0;
}
#define BSZ 64
i t l ( int overlap(
unit64_t board1[BSZ],
unit64_t board2[BSZ])
{{
int i;
for(i=0; i<BSZ; i++)
if((b d1[i] & b d2[i]) ! 0) if((board1[i] & board2[i]) != 0)
return 1;
return 0;
}}
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
2. Logic Rules
3. Procedure Rules
4. Expression Rules
5. Parallelism Rules
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
a. Loop Invariant Code Motion
b. Sentinel Loop Exit Test
c. Loop Elimination by Unrolling
d. Partial Loop Unrolling
e. Loopp fusion
f. Eliminate wasted iterations
2. Logic Rules
33. Procedure Rules Procedure Rules
4. Expression Rules
5. Parallelism Rules
SamanAmarasinghe2009
Loops
If each program instruction is only executed once
3 GHz machine
Requres 12 Gbytes of instructions a second! (1 CPI 32 bit instructions) Requres 12 Gbytes of instructions a second!
(1 CPI
, 32 bit instructions)
A 100 GB disk full of programs done in 8 seconds
Each program instruction has to run millions of times
Loops
90% of the program execution time in 10% of the code
All in inner loopps
SamanAmarasinghe2009
Loop Invariant Code Motion
Move as much code as possible out of the loops
Compilers do a good job today
Analyzable code Analyzable code
Provably results are the same in every iteration
Provably no side effects
Viability?
Loop invariant computation that compiler cannot find Loop invariant computation that compiler cannot find
The cost of keeping the value in a register is amortized by the savings
SamanAmarasinghe2009
Loop Invariant Code Motion Example
double factor;
for(i=0; i < N; i++) factor = exp(sqrt(PI/2));
X[i] = X[i] * exp(sqrt(PI/2)); for(i=0; i < N; i++)
X[i] = X[i] * factor;
SamanAmarasinghe2009
Sentinel Loop Exit Test
When we iterate over a data to find a value, we have to
check the end of the data as well as for the value.
Add an extra data item at the end that matches the Add an extra data item at the end that matches the
test
Viability?
Early loop exit condition that can be harnessed as the loop test
Wh When an exttra d
d
atta it
it
em can b
b
e add dded d at th t the endd
Data array is modifiable
SamanAmarasinghe2009
Example of Sentinel Loop Exit Test
#define DSZ 1024
datatype array[DSZ];
int find(datatype val)
{
int i;
for(i=0;
i<DSZ; i++)
if(array[i] == val)
return i;
return -1;
OR
i = 0;
while((i<DSZ)&&(array[i] != val)) while((i<DSZ)&&(array[i] != val))
i++;
if(i==DSZ) return -1;
return i; return i;
}
#define DSZ 1024
datatype array[DSZ+1];
int find(datatype val)
{
i i int i;
array[DSZ] = val;
i = 0;
hil ( [i] ! l) while(array[i] != val)
i++;
if(i S ) if(i == DSZ)
return -1;
return i;
}
SamanAmarasinghe2009
Loop Elimination by Unrolling
Known loop bounds can fully unroll the loop
Viability?
Small number of iterations (code blow up is manageable) Small number of iterations (code blow-up is manageable)
Small loop body (code blow-up is manageable)
Little work in the loop body (loop test cost is non trivial)
Can get the compiler to do this.
E l Example:
sum = 0; sum= A[0] + A[1] + A[2] + A[3] +
for(i=0; i<10; i++)
A[4] + A[5] + A[6] + A[7] +
A[8] + A[9]
sum = sum + A[i];
A[8] + A[9];
SamanAmarasinghe2009
Partial Loop Unrolling
Make a few copies of the loop body.
Viability?
Work in the loop body is minimal (viable impact of running the loop test Work in the loop body is minimal (viable impact of running the loop test
fewer number of times)
Or the ability to perform optimizations on combine loop bodies
Can get the compiler to do this
Example:
sum = 0; sum = 0;
for(i=0; i<n; i++) for(i=0; i<n-3; i += 4)
sum = sum + A[i]; sum += A[i] + A[i+1] + A[i+2] + A[i+3];
for(; i<n; i++)
sum = sum + A[i];
SamanAmarasinghe2009
Loop Fusion
When multiple loops iterate over the same set of data
put the computation in one loop body.
Viability? Viability?
No aggregate from one loop is needed in the next
Loop bodies are manageable
Example
amin = INTMAX;
amin = INTMIN;
for(i=0; i<n; i++)
amax = INTMAX;
if(A[ if(A[ii] < ] < amiin) amiin = A[i];
for(i=0; i<n; i++) {
)
A[
i]
int atmp = A[i];
amax = INTMIN;
if(atmp < amin) amin = atmp;
for(i=0; i<n; i++) for(i=0; i<n; i++)
if(A[i] > amax) amax = A[i];
if(atmp > amax) amax = atmp;
}
SamanAmarasinghe2009
Eliminate Wasted Iterations
Change the loop bounds so that it will not iterate over
an empty loop body
Viability? Viability?
For a lot of iterations the loop body is empty
Can change the loop bounds to make the loop tighter
Or ability to change the data structures around (efficiently and correctly)
Example I Example I
for(i=0; i<n; i++) for(i=0; i<n/2; i++)
for(j = i; j < n i;
j++)
for(j = i; j < n i;
j++)

SamanAmarasinghe2009
Example II
i t l[N] int val[N];
int visited[N];
f (i 0 i < N i++) for(i=0; i < N; i++)
visited[i] = 0;
f ( for(ii=00; i < N;
i
i
++) {
i < N
) {
int minloc;
int minval = MAXINT;
f (j 0
j N
j ) for(j=0;
j
< N; j++)
if(!visited[j])
if(val[j] < minval) {
miinval = val[jj];
minloc = j;
}
visited[minloc] = 1;
// process val[minloc]
}
i t int l[N] val[N];
f ( for(ii=00; i < N; ii++) { i < N ) {
int minloc;
int minval = MAXINT;
f (j 0 j N i j ) for(j=0; j < N i; j++)
if(val[j] < minval) {
minval = val[j];
miinloc = j
j
;
}
// process val[minloc]
val[minloc] = val[N i 1];
}
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
2. Logic Rules
a. Exploit Algebraic Identities
b. Short Circuit Monotone functions
c. Reordering tests
d. Precomppute Loggic Functions
e. Boolean Variable Elimination
3. Procedure Rules
44. Expression Rules Expression Rules
5. Parallelism Rules
SamanAmarasinghe2009
X != 0
x*x + y*y < a*a + b*b
11
Exploit Algebraic Identities
If the evaluation of a logical expression is costly, replace
algebraically equivalent
Examples
sqr (x) > 0
sqrt(x*x + y*y) < sqrt(a*a + b*b)
ln(A) + ln(B) ln(A*B)
SIN(X) SIN(X) SIN(X) + COS(X) COS(X) *SIN(X) + COS(X)*COS(X)
SamanAmarasinghe2009
Short Circuit Monotone Functions
In checking if a monotonically increasing function is over
a threshold, dont evaluate beyond the threshold
Example: Example:
sum = 0;
i = 0;
while((i < N) && (sum < cutoff))
sum = sum + X[i++];
if(sum > cutoff) return 0;
return 1;
sum = 0;
i = 0;
X[N] = cutoff;
while (sum < cutoff)
sum = sum + X[i++];
if(i == N) return 1;
return 0;
sum = 0;
for(i=0; i < N; i++) for(i 0; i < N; i )
sum = sum + X[i];
if(sum > cutoff) return 0;
return 1; return 1;
SamanAmarasinghe2009
Reordering tests
Logical tests should be arranged such that inexpensive
and often successful tests precede expensive and rarely
successful ones successful ones
Add inexpensive and often successful tests before
expensive ones
Example:
if (sqrt(sqr(x1 x2) + if(abs(x1-x2) > rad1+rad2)
sqr(y1 y2)) < rad1+rad2) return OK;
return COLLITION; if(abs(y1-y2) > rad1+rad2)
return OK; return OK;
if (sqrt(sqr(x1 x2) +
sqr(y1 y2)) < rad1+rad2)
return COLLITION;
return OK;
SamanAmarasinghe2009
Precompute Logic Functions
A logical function over a small finite domain can be
replaced by a lookup in a table that represents the
domain. domain.
Example:
int palindrome(unsigned char val) int palindrome(unsigned char val)
{
unsigned char l, r;
int i;
01 ll = 0
0
x01;
int isPalindrome[256];
r = 0x80;
for(i=0; i <4; i++) {
if(((val & l) == 0) ^ ((val & r) == 0))
// initialize array
return 0;
l = l << 1;
int palindrome(unsigned char val)
r = r >> 1;
}
{
}
return 1; return isPalindrome[val];
}
}
SamanAmarasinghe2009
Boolean Variable Elimination
Replace the assignment to a Boolean variable by re
placing it by an IF-THEN-ELSE
Example:
int v; if(Boolean expression) {
v = Boolean expression; S1;
S2;;
S1; S4;
if(v) S5;
S2;; }}
else
{ {
else S1;
S3; S3;
S4;; S4;;
if(v) }
S5;
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
2. Logic Rules
3. Procedure Rules
a. Collapse Procedure Hierarchies
b. Coroutines
c. Tail Recursion Elimination
4. Expression Rules
5. Parallelism Rules
SamanAmarasinghe2009
Collapse Procedure Hierarchies
Inline small functions into the main body
Eliminates the call overhead
Provide further opportunities for compiler optimization Provide further opportunities for compiler optimization
#define max(a, b) (((a) > (b))?(a):(b))
int getmax() {
int xmax, i;
xmax = MININT;
for(i=0; i < N; i++ ; ) ( ; )
xmax = max(xmax, X[i]);
return xmax;
}
xmax = MININT;
for(i=0; i < N; i++)
xmax = max(xmax, X[i]);
t return xmax;
}
inline int max(int a, int b) {
if(a > b) return a;
return b;
}
int getmax() {
int xmax, i;
int getmax() {
int xmax, i;
MININT xmax = MININT;
for(i=0; i < N; i++)
xmax = max(xmax, X[i]);
return xmax;
}
SamanAmarasinghe2009
Coroutines
Multiple passes over data should be combined
Similar to loop fusion(B.1.f), but at a scale and complexity beyond a single loop
Example pattern
Loop { Loop {
Read from 1 Loop {
ProcessA ProcessA Read from I Read from I
Write to II ProcessA
} Write to buffer
Loop { }
Read from II Loop {
ProcessB Read from buffer
Write to III Process B
}} W i III Write to III
}
}
SamanAmarasinghe2009

Tail Recursion Elimination
In a self recursive function, if the last action is calling itself,
eliminate the recursion.
Example pattern
int fact(int n, int res) { int fact(int n, int res) {
if(n == 1) return res; while(1) {
retturn f
f
act(n 11, res**n) ); if( 1) t( if(n == 1) ret turn res;
} res = res*n;
n = n 1;
}}
}
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
2. Logic Rules
3. Procedure Rules
4. Expression Rules
a. Compile-time Initialization
b. Common Subexppression Elimination
c. Pairing Computation
5. Parallelism Rules
SamanAmarasinghe2009
Compile-Time Initialization
If a value is a constant, make it a compile-time constant.
Save the effort of calculation
Allow value inlining Allow value inlining
More optimization opportunities
Example
#define PI 3.14159265358979
#d #defi fine R 12 R 12
..
vol = 2 * pi() * r * r; {
..
vol = 2 * PI * R * R;
SamanAmarasinghe2009
Common Subexpression Elimination
If the same expression is evaluated twice, do it only once
Viability?
Expression has no side effects Expression has no side effects
The expression value does not change between the evaluations
The cost of keeping a copy is amortized by the complexity of the expression
Too complicated for the compiler to do it automatically
Example Example
x = sin(a) * sin(a);
double tmp;

tmp = sin(a);
x = tmp * tmp;
SamanAmarasinghe2009
Pairing Computation
If two similar functions are called with the same
arguments close to each other in many occasions,
combine them. combine them.
Reduce call overhead
Possibility of sharing the computation cost
MMore opt
i
imi izatiion possibili ibilitiies
Example Example
typedef struct twoduble {
double d1;
x = r * cos(a);
double d2;
y = r * sin(a);
}
y = r sin(a);
}
.
twodouble dd;
dd = sincos(a); dd sincos(a);
x = r * dd.d1;
y = r * dd.d2;
SamanAmarasinghe2009
Bentleys Rules
A. Modifying Data
B. Modifying Code
11. Loop Rules Loop Rules
2. Logic Rules
3. Procedure Rules
4. Expression Rules
5. Parallelism Rules
a. Expploit Imp plicit Parallelism
b. Exploit Inner Loop Parallelism
c. Exploit Coarse Grain Parallelism
dd. Extra computation to create parallelism Extra computation to create parallelism
SamanAmarasinghe2009
Implicit Parallelism
Reduce the loop carried dependences so that software
pipelining can execute a compact schedule without
stalls. stalls.
Example:
xmax1 = MININT;
xmax2 = MININT;
xmax = MININT;
for(i=0; i < N 1; I += 2) {
for(i=0; i < N; i++)
if(X[i] > 1) 1
X[
i]
for(i=0; i < N; i++)
if(X[i] > xmax1) xmax1 = X[i];
if(X[i] > xmax) xmax = X[i];
if(X[i+1] > xmax2) xmax2 = X[i+1];
}
if((i < N) &&(X[i] > xmax1)) xmax1 = X[i]; if((i < N) &&(X[i] > xmax1)) xmax1
X[
i];
xmax = (xmax1 > xmax2)?xmax1:xmax2;
SamanAmarasinghe2009
Example 2
next next next next next
curr = head;
tot = 0;
while(curr != NULL) {
tot = tot + currval;
curr = currnext;
}
return tot;
SamanAmarasinghe2009
Example 2
next next next next next
nextnext
curr = head;
tot = 0;
while(curr != NULL) {
tot = tot + currval;
curr = currnext;
}
return tot;
Also see Rule A 1 a Data Also see Rule A.1.a Data
Structure Augmentation
nextnext nextnext nextnext
curr = head;
if(curr == NULL) return 0;
tot1 = 0;
tot2 = 0;
while(currnext) {
tot1 = tot1 + currval;
tot2 = tot2 + currnextval;
curr = currnextnext;
}
if(curr)
tot1 = tot1 + currval;
return tot1 + tot2;
SamanAmarasinghe2009
Exploit Inner Loop Parallelism
Facilitate inner loop vectorization (for SSE type
instructions)
How? by gingerly guiding the compiler to do so How? by gingerly guiding the compiler to do so
Iterative process by looking at why the loop is not vectorized and fixing
those issues
MMost of th he rules ab bove can b
b
e used d to simpllif fy the loop so th hat the f l h l h
compiler can vectorize it
SamanAmarasinghe2009
Exploit Coarse Grain Parallelism
Outer loop parallelism (doall and doacross loops)
Task parallelism
Id l f lti Ideal for multicores
You need to do the parallelism yourself later lectures You need to do the parallelism yourself later lectures
SamanAmarasinghe2009
Extra Computation to Create Parallelism
In many cases doing a little more work (or a slower
algorithm) can make a sequential program a parallel
one. Parallel execution may amortize the cost one. Parallel execution may amortize the cost
Example:
double tot;
double tottmp[N];
double tot;
for(i = 0; i < N; i++)
tot = 0;
tottmp[i] = 0;
for(i = 0; i < N; i++) for(i 0; i < N; i++)
for(i = 0; i < N; i++) { //parallelizable
for(j = 0; j < N; j++)
double tmp;
tot = tot + A[i][j];
for(j = 0; j < N; j++)
tmp = tmp + A[i][j];
tottmp[i]= tottmp[i]+ tmp;
}
tot = 0;
for(i = 0; i < N; i++)
SamanAmarasinghe2009
tot = tot + tottmp[i];
Bentleys Rules
A Modifying Data
1. Space for Time
a. Data Structure Augmentation
b S i P d R l b. Storing Precomputed Results
c. Caching
d. Lazy Evaluation
2. Time for Sppace
a. Packing/Compression
b. Interpreters
3. Space and Time
a SIMD a. SIMD
B Modifying Code
1. Loop Rules
a. Loop Invariant Code Motion a. Loop Invariant Code Motion
b. Sentinel Loop Exit Test
c. Loop Elimination by Unrolling
d. Partial Loop Unrolling
e Loop fusion e. Loop fusion
f. Eliminate wasted iterations
2. Logic Rules
a. Exploit Algebraic Identities
b. Short Circuit Monotone functions
c. RReord deri ing
t
t
est ts
d. Precompute Logic Functions
e. Boolean Variable Elimination
3. Procedure Rules
a. Collapse Procedure Hierarchies
b. Coroutines
c. Tail Recursion Elimination
44. Expression Rules Expression Rules
a. Compile-time Initialization
b. Common Subexpression Elimination
c. Pairing Computation
5. Parallelism Rules
a. Exploit Implicit Parallelism
b. Exploit Inner Loop Parallelism
c. Expploit Coarse Grain Parallelism
d. Extra computation to create parallelism
SamanAmarasinghe2009
te ate u t a t e c t es a e v s te
Traveling Salesman Problem
Definition
List of cities
Location of each city ( x y coordinates on a 2-D map) Location of each city ( x,y coordinates on a 2 D map)
Need to visit all the cities
What order to visit the cities so that the distance traveled is shortest
Exact Shortest Distance Algorithm Exponential
A Good Greedy Heuristic
Start with any city
Find the closest city that havent been visited
Visit that city next
Iterate until all the cities are visited
SamanAmarasinghe2009
TSP Example
void gget_ppath 1 _ ((int ppath[]) {
int visited[MAXCITIES];
[]) {
Original
int i, j;
int curr, closest;
double cdist;
double totdist;
f (i 0 i MAXCITIES i ) for(i=0; i <MAXCITIES; i++)
visited[i] = 0;
curr = MAXCITIES-1;
visited[curr] = 1;
totdist = 0;
path[0] = curr; path[0] curr;
for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE;
for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0)
if(dist(curr, j) < cdist) {
cdist = dist(curr, j);
closest = j;
}
path[i] = closest;
visited[closest] = 1;
totdist += dist(curr, closest);
curr curr = closest; = closest;
}
}
double dist(int i, int j) {
return sqrt((Cities[i].x - Cities[j].x)*(Cities[i].x - Cities[j].x)+
(Cities[[i]] y .y - Cities[j].y)*((Cities[[i]] y .y - Cities[j].y)); ( [j] y) [j] y))
}
SamanAmarasinghe2009
g _p _ ( p []) { g _p _ ( p []) {
TSP Example
void get path 1(int path[]) { void get path 2(int path[]) {
B 2 a Exploit
int visited[MAXCITIES];
Original
int visited[MAXCITIES];
B.2.a Exploit
int i, j; int i, j;
Algebraic
int curr, closest; int curr, closest;
double cdist; double cdist; Identities
double totdist; double totdist;
f ( for(ii=00; ii <MAXCITIES MAXCITIES; ii++)) f (i 0 i MAXCITIES i ) for(i=0; i <MAXCITIES; i++)
visited[i] = 0; visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1; visited[curr] = 1;
totdist = 0; totdist = 0;
path[0] = curr curr;; path[0] = curr curr;; path[0] path[0]
for(i=1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0) if(visited[j] == 0)
if(dist(curr, j) < cdist) { if(distsq(curr, j) < cdist) {
cdist = dist(curr, j); cdist = distsq(curr, j);
closest = j; closest = j;
} }
path[i] = closest; path[i] = closest;
visited[closest] = 1; visited[closest] = 1;
totdist += dist(curr, closest); totdist += dist(curr, closest);
curr = closest; curr = closest; curr = closest; curr = closest;
} }
} }
double dist(int i, int j) { double distsq(int i, int j) {
return sqrt((Cities[i].x - Cities[j].x)*(Cities[i].x - Cities[j].x)+ return ((Cities[i].x - Cities[j].x)*(Cities[i].x - Cities[j].x)+
((Cities[i].y - Cities[j] [j] y) .y)*(Cities[i].y - Cities[j] [j] y)) .y)); ((Cities[i].y - Cities[j] [j] y) .y)*(Cities[[i]] y .y - Cities[j].y)); [ ] y ( [ ] y [ ] y ( [j] y))
} }
SamanAmarasinghe2009
a
g _p _ ( p []) g _p _ ( p [])
TSP Example
void get path 2(int path[]) void get path 3(int path[])
{
B 2 Exploit
{
B 4 b Common
B.2.a Exploit
B.4.b Common
int visited[MAXCITIES]; int visited[MAXCITIES];
Subexpression
int i, j;
Algebraic
int i, j;
int curr, closest;
Identities
int curr, closest; Elimination
double cdist; double cdist, tdist;
double totdist; double totdist;
for(i=0; i <MAXCITIES; i++) for(i=0; i <MAXCITIES; i++)
visited[i] = 0; visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1; visited[curr] = 1;
totdist = 0;
0;
totdist = 0;
0;
totdist totdist
path[0] = curr; path[0] = curr;
for(i=1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0) if(visited[j] == 0) {
tdist = distsq(curr, j);
if(distsq(curr, j) < cdist) { if(tdist < cdist) {
cdist = distsq(curr, j); cdist = tdist;
closest = j; closest = j;
} }
}
path[i] = closest; path[i] = closest; path[i] = closest; path[i] = closest;
visited[closest] = 1; visited[closest] = 1;
totdist += dist(curr, closest); totdist += dist(curr, closest);
curr = closest; curr = closest;
} }
} }
double distsq(int i, int j) {
return ((Cities[i].x - Cities[j].x)*(Cities[i].x - Cities[j].x)+
(Cities[i].y - Cities[j].y)*(Cities[i].y - Cities[j].y));
}
SamanAmarasinghe2009
TSP Example
void get_ppath 3(int ppath[]) []) void get_ppath 4((int path[]) [])
B.1.a Loop
g _ ( g _ p
{
B 4 b Common
{
B 1 a Loop
B.4.b Common
int visited[MAXCITIES]; int visited[MAXCITIES];
Invariant
int i, j;
Subexpression
int i, j;
int curr, closest;
Elimination
int curr, closest; Code
double cdist, tdist; double cdist, tdist;
double totdist; double cx, cy;
Motion
d bl di double totdist;
for(i=0; i <MAXCITIES; i++) for(i=0; i <MAXCITIES; i++)
visited[i] = 0; visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1;
1;
visited[curr] = 1;
1;
visited[curr] visited[curr]
totdist = 0; totdist = 0;
path[0] = curr; path[0] = curr;
for(i=1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
cx = Cities[curr].x;
cy = Cities[curr].y;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0) { if(visited[j] == 0) {
tdist = distsq(curr, j); tdist = (Cities[j].x - cx)*(Cities[j].x - cx) +
(Cities[j].y - cy)*(Cities[j].y - cy);
if(tdist < cdist) { if(tdist < cdist) {
cdist = tdist; cdist = tdist;
closest = j; closest = j; closest = j; closest = j;
} }
} }
}
path[i] = closest;
visited[closest] = 1;
totdist += dist(curr, closest); ( )
curr = closest;
}
path[i] = closest;
visited[closest] = 1;
totdist += dist(curr, closest); ( )
curr = closest;
}
}
SamanAmarasinghe2009
TSP Example
g _p _ ( p []) g _p _ ( p []) void get path 4(int path[]) void get path 5(int path[])
B 1 c Reordering
{
B 1 B.1.aa Loop
{
B.1.c Reordering
Loop
int visited[MAXCITIES];
Invariant
int visited[MAXCITIES];
Tests
int i, j, curr, closest; int i, j, curr, closest;
double cdist, tdist, cx, cy, totdist;
Code
double cdist, tdist, cx, cy, totdist;
for(i=0; i <MAXCITIES; i++)
Motion
for(i=0; i <MAXCITIES; i++)
vi i isited[ d[ii]] = 0; i i d[i] 0 0 visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1; visited[curr] = 1;
totdist = 0; totdist = 0;
path[0] = curr; path[0] = curr;
for(i=1; 1; ii < MAXCITIES; < MAXCITIES; ii++) { ++) { for(i=1; 1; ii < MAXCITIES; < MAXCITIES; ii++) { ++) { for(i for(i
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
cx = Cities[curr].x; cx = Cities[curr].x;
cy = Cities[curr].y; cy = Cities[curr].y;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0) { if(visited[j] == 0) {
tdist = (Cities[j].x - cx)*(Cities[j].x - cx) + tdist = (Cities[j].x - cx)*(Cities[j].x - cx);
(Cities[j].y - cy)*(Cities[j].y - cy); if(tdist < cdist) {
tdist += (Cities[j].y - cy)*(Cities[j].y - cy);
if(tdist < cdist) { if(tdist < cdist) {
cdist = tdist; cdist = tdist;
closest = j; closest = j;
} }
}} }}
}
path[i] = closest; path[i] = closest;
visited[closest] = 1; visited[closest] = 1;
totdist += dist(curr, closest); totdist += dist(curr, closest);
curr = closest; curr = closest;
}} }}
} }
SamanAmarasinghe2009
TSP Example
void get_ppath 5(int ppath[]) []) void get_ppath 6((int path[]) [])
B.4.b Common
g _ ( g _ p
{
B 1 B.1.cc Reordering
{
B 4 b Common
Reordering
int visited[MAXCITIES];
Tests
int visited[MAXCITIES];
Subexpression
int i, j, curr, closest; int i, j, curr, closest;
double cdist, tdist, cx, cy, totdist; double cdist, tdist, cx, cy, totdist; Elimination
for(i=0; i <MAXCITIES; i++) for(i=0; i <MAXCITIES; i++)
vi i isited[ d[ii]] = 0; i i d[i] 0 0 visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1; visited[curr] = 1;
totdist = 0; totdist = 0;
path[0] = curr; path[0] = curr;
for(i=1; i < MAXCITIES; i++) { for(i 1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) { for(i 1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
cx = Cities[curr].x; cx = Cities[curr].x;
cy = Cities[curr].y; cy = Cities[curr].y;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES; j++)
if(visited[j] == 0) { if(visited[j] == 0) {
tdist = (Cities[j].x - cx)*(Cities[j].x - cx); double tx =(Cities[j].x - cx);
tdist = tx*tx;
if(tdist < cdist) { if(tdist < cdist) {
tdist += (Cities[j].y - cy)*(Cities[j].y - cy); double ty = (Cities[j].y - cy);
tdist += ty*ty;
if(tdist < cdist) { if(tdist < cdist) {
cdist = tdist; cdist = tdist;
closest = j; closest = j; closest = j; closest = j;
} }
} }
} }
path[i] = closest; path[i] = closest;
visited[closest] = 1; visited[closest] = 1;
totdist += dist(curr, closest); ( ) totdist += dist(curr, closest); ( )
curr = closest; curr = closest;
} }
} }
SamanAmarasinghe2009
= =
TSP Example
B 4 b Common
B 1 f Eliminate
int visited[MAXCITIES];
Wasted
int unvisited[MAXCITIES];
int i, j, curr,
Subexpression
closek, closej; int i, j, k, curr, closest;
void get_path_6(int path[]) void get_path_7(int path[])
{{
Common b . 4 . B
{ {
B.1.f Eliminate
double cdist, tdist, cx, cy, totdist;
Elimination
Iterations
double cdist, tdist, cx, cy, totdist;
for(i=0; i <MAXCITIES; i++) for(i=0; i <MAXCITIES; i++)
visited[i] = 0; unvisited[i] = i;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1;
totdist = 0; totdist = 0;
path[0] = curr; path[0] = curr;
for(i=1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE; cdist = MAXDOUBLE;
cx = Cities[cur ].x; cx = Cities[cur ].x; r r
cy = Cities[curr].y; cy = Cities[curr].y;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES-i; j++) {
if(visited[j] == 0) { double tx;
k = unvisited[j];
double tx =(Cities[ ].x - cx); tx =(Cities[ ].x - cx); j k
tdist tdist = tx*tx tx;; ; tdist = tx tx*tx; tx tdist tx
if(tdist < cdist) { if(tdist < cdist) {
double ty = (Cities[ ].y - cy); double ty = (Cities[ ].y - cy); j k
tdist += ty*ty; tdist += ty*ty;
if(tdist < cdist) { if(tdist < cdist) {
cdist = tdist; cdist = tdist;
closest = j; closek = k;
closej = j;
} }
} }
totdist += dist(curr, closek);
curr = closest; ; curr = closek
} }
} }
} }
path[i] = closek; path[i] = closest;
visited[ unvisited[MAXCITIES - i - 1]; ] = closej unvisited[ ; ] = closest 1
totdist totdist += dist( += dist( ); , curr curr closest closest); totdist += dist(curr closek);
SamanAmarasinghe2009
TSP Example
void get_path_6(int path[]) void get_path_8(int path[])
{{
B 4 b Common
{{
B 1 f Eliminate
B.4.b Common
B.1.f Eliminate
int visited[MAXCITIES];
int i, j, curr, closest;
Subexpression
int i, j, curr, closest;
Wasted
double cdist, tdist, cx, cy, totdist;
Elimination
double cdist, tdist, cx, cy, totdist; Iterations
for(i=0; i <MAXCITIES; i++)
visited[i] = 0;
curr = MAXCITIES-1; curr = MAXCITIES-1;
visited[curr] = 1;
totdist = 0; totdist = 0;
path[0] = curr; path[0] = curr;
for(i=1; i < MAXCITIES; i++) { for(i=1; i < MAXCITIES; i++) {
cdist = MAXDOUBLE;; cdist = MAXDOUBLE;;
cx = Cities[curr].x; cx = Cities[curr].x;
cy = Cities[curr].y; cy = Cities[curr].y;
for(j=0; j < MAXCITIES; j++) for(j=0; j < MAXCITIES-i; j++)
if(visited[j] == 0) { if(curr != j) {
double tx =(Cities[j].x - cx); double tx =(Cities[j].x - cx);
tdist tdist = tx*tx tx;; tdist = tx tx*tx; tx tdist tx;
if(tdist < cdist) { if(tdist < cdist) {
double ty = (Cities[j].y - cy); double ty = (Cities[j].y - cy);
tdist += ty*ty; tdist += ty*ty;
if(tdist < cdist) { if(tdist < cdist) {
cdist = tdist; cdist = tdist;
closest = j; closest = j; closest = j; closest = j;
} }
} }
} }
path[i] = closest; path[i] = closest;
visited[closest] = 1;
totdist += dist(curr, closest); totdist += dist(curr, closest);
Cities[curr] = Cities[MAXCITIES - i];
curr = closest; curr = closest;
totdist += dist(curr closest); totdist += dist(curr closest);
}
}
}
}
SamanAmarasinghe2009
Performance
SamanAmarasinghe2009
MIT OpenCourseWare
http://ocw.mit.edu
6.172 Performance Engineering of Software Systems
Fall 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Uconnect Modification Instructions
0% (1)
Uconnect Modification Instructions
8 pages
Web Caching
No ratings yet
Web Caching
119 pages
MIT6 172F09 Lec02
No ratings yet
MIT6 172F09 Lec02
85 pages
HPC Unit 5 a
No ratings yet
HPC Unit 5 a
49 pages
Session
No ratings yet
Session
51 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
Clase de Progrea 555
No ratings yet
Clase de Progrea 555
35 pages
J277- 2022 Revision Booklet - Paper 2
No ratings yet
J277- 2022 Revision Booklet - Paper 2
12 pages
AS Computer Science 9618 P2
No ratings yet
AS Computer Science 9618 P2
34 pages
HPC Unit 5 b
No ratings yet
HPC Unit 5 b
31 pages
Migdalskiy Sergiy Physics Optimization Strategies
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
104 pages
AS Computer Science 9618 P2
100% (1)
AS Computer Science 9618 P2
34 pages
p2 stuff cs
No ratings yet
p2 stuff cs
35 pages
Week one
No ratings yet
Week one
55 pages
Data Structure Unit 1
No ratings yet
Data Structure Unit 1
45 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
Computer Science Fundamentals Cheat Sheet
No ratings yet
Computer Science Fundamentals Cheat Sheet
1 page
Notes On Data Structures and Algorithms: Dr. Anindita Kundu
No ratings yet
Notes On Data Structures and Algorithms: Dr. Anindita Kundu
64 pages
Algo - 1
No ratings yet
Algo - 1
54 pages
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
No ratings yet
Unit Ii Program Design and Analysis: - Software Components. - Representations of Programs. - Assembly and Linking
60 pages
Dodbook
No ratings yet
Dodbook
217 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
21 pages
Code Tuning Techniques
No ratings yet
Code Tuning Techniques
39 pages
M269_lec10 Fall 1819
No ratings yet
M269_lec10 Fall 1819
34 pages
Algorithms and Data Structures Fundamentals Cheat Sheet
No ratings yet
Algorithms and Data Structures Fundamentals Cheat Sheet
1 page
2.1.2. Thinking Ahead
No ratings yet
2.1.2. Thinking Ahead
6 pages
CH #2 Solved Exercise
No ratings yet
CH #2 Solved Exercise
3 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
DSA Infoway MasterPPT
No ratings yet
DSA Infoway MasterPPT
353 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
16 pages
Technical Interview Study Guide
No ratings yet
Technical Interview Study Guide
18 pages
Mc4101 - Adsa Notes
No ratings yet
Mc4101 - Adsa Notes
142 pages
1604393139CSC 207-Slide1-Introduction and Terminologies
No ratings yet
1604393139CSC 207-Slide1-Introduction and Terminologies
22 pages
01 - Introduction + Sorting
No ratings yet
01 - Introduction + Sorting
49 pages
2.1.2. Thinking Ahead
No ratings yet
2.1.2. Thinking Ahead
5 pages
Algorithms and Data Structures
No ratings yet
Algorithms and Data Structures
11 pages
computer notes
No ratings yet
computer notes
14 pages
Python
100% (3)
Python
540 pages
Tips For DSA Interview
No ratings yet
Tips For DSA Interview
3 pages
FUNCTIONING PROGRAMMING, ALGORITHMS, CODE EDITORS & Amp OPERATING SYSTEMS
No ratings yet
FUNCTIONING PROGRAMMING, ALGORITHMS, CODE EDITORS & Amp OPERATING SYSTEMS
4 pages
Unit - 2 HPC
No ratings yet
Unit - 2 HPC
96 pages
Unit II - Data Science
No ratings yet
Unit II - Data Science
113 pages
Data Abstraction Is
No ratings yet
Data Abstraction Is
6 pages
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
No ratings yet
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
201 pages
Revision CS p2 CIE Alevels
No ratings yet
Revision CS p2 CIE Alevels
11 pages
Optimization of Computer Programs in C
No ratings yet
Optimization of Computer Programs in C
2 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
58 pages
DS Notes
No ratings yet
DS Notes
13 pages
250116_L4
No ratings yet
250116_L4
53 pages
Computing-GCSE-paper 2
No ratings yet
Computing-GCSE-paper 2
21 pages
A979968895 - 21482 - 28 - 2020 - Ds 1-Basic Data Structure
No ratings yet
A979968895 - 21482 - 28 - 2020 - Ds 1-Basic Data Structure
65 pages
Internship Report
No ratings yet
Internship Report
65 pages
Computer Science Full GCSE Notes
No ratings yet
Computer Science Full GCSE Notes
16 pages
1 INTRO
No ratings yet
1 INTRO
125 pages
Introduction To Complexity Computational 2019
No ratings yet
Introduction To Complexity Computational 2019
94 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
ACONIS2000e User's Manual(c)
No ratings yet
ACONIS2000e User's Manual(c)
301 pages
Geo SCADA 2022 Update Sep 2023 (85.8650.1) Release Notes
No ratings yet
Geo SCADA 2022 Update Sep 2023 (85.8650.1) Release Notes
39 pages
Characterization of Existing Systems: Puneet Chopra Ramakrishna Kotla
No ratings yet
Characterization of Existing Systems: Puneet Chopra Ramakrishna Kotla
9 pages
Linux Networking Guide (Index) : Why Host Your Own Site?
No ratings yet
Linux Networking Guide (Index) : Why Host Your Own Site?
10 pages
SSCE External UserGuide Min
No ratings yet
SSCE External UserGuide Min
32 pages
ME37 004 R03a
No ratings yet
ME37 004 R03a
586 pages
Em-Enw9503 9504
No ratings yet
Em-Enw9503 9504
7 pages
Configuring NTP
No ratings yet
Configuring NTP
10 pages
Geospatial Survey SW FW
No ratings yet
Geospatial Survey SW FW
10 pages
DVR Manual
No ratings yet
DVR Manual
342 pages
Lab 3
No ratings yet
Lab 3
5 pages
CCNA 200-301 - Lab-9 Default Floating Routes v1.0
No ratings yet
CCNA 200-301 - Lab-9 Default Floating Routes v1.0
6 pages
LDL0202X Lab Guide
No ratings yet
LDL0202X Lab Guide
15 pages
PIC16F627A/628A/648A: 4.0 Memory Organization
No ratings yet
PIC16F627A/628A/648A: 4.0 Memory Organization
2 pages
Mini Cps
No ratings yet
Mini Cps
20 pages
2.data Acquisition
No ratings yet
2.data Acquisition
42 pages
Sun t5120-t5220 v4
No ratings yet
Sun t5120-t5220 v4
70 pages
RDX Manager Manual 1022447
No ratings yet
RDX Manager Manual 1022447
41 pages
LAB-12 Traffic Light Controler LAB
No ratings yet
LAB-12 Traffic Light Controler LAB
9 pages
Network Automation Cookbook Pdf00007
No ratings yet
Network Automation Cookbook Pdf00007
5 pages
Vendor: Network Appliance Exam Code: Ns0-505 Exam Name: Netapp Certified Implementation Engineer
No ratings yet
Vendor: Network Appliance Exam Code: Ns0-505 Exam Name: Netapp Certified Implementation Engineer
16 pages
stack_trace_2025.03.10_09-30-49.045
No ratings yet
stack_trace_2025.03.10_09-30-49.045
2 pages
Lo'ai Tawalbeh: Cpe 252: Computer Organization 1
No ratings yet
Lo'ai Tawalbeh: Cpe 252: Computer Organization 1
84 pages
Microsoft Dynamics CRM 2011 Implementation Guide Installing
No ratings yet
Microsoft Dynamics CRM 2011 Implementation Guide Installing
49 pages
01-b - Python
No ratings yet
01-b - Python
23 pages
Huawei Router Common Commands
0% (1)
Huawei Router Common Commands
6 pages
WB04-PGCB - SAS Architecture - REV-3
No ratings yet
WB04-PGCB - SAS Architecture - REV-3
8 pages
Belarc Advisor Computer Profile
No ratings yet
Belarc Advisor Computer Profile
7 pages

MIT6 172F10 Lec03

Uploaded by

MIT6 172F10 Lec03

Uploaded by

SPEED SPEED

Data Structure Augmentation

for(i=0; i < PTMAX; i++) {

for(i=0; i < PTMAX; i++) {

You might also like