Parallel Matlab 2010
Parallel Matlab 2010
08 February 2010
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
The word local is choosing the local configuration, that is, the
cores assigned to be workers will be on the local machine.
The value ”4” is the number of workers you are asking for. It can
be up to 8 on a local machine. It does not have to match the
number of cores you have.
If all is well, the program runs the same as before... but faster.
Output will still appear in the command window in the same way,
and the data will all be available to you.
What has happened is simply that some of the computations were
carried out by other cores in a way that was hidden from you.
The program may seem like it ran faster, but it’s important to
measure the time exactly.
tic
md_parallel
toc
matlabpool close
tic starts the clock, toc stops the clock and prints the time.
for labs = 0 : 4
if ( 0 < labs ) matlabpool ( ’open’, ’local’, labs )
tic
md_parallel
toc
if ( 0 < labs ) matlabpool ( ’close’ )
end
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
>> profile on
>> md
>> profile viewer
MD: Profile
Home Results
Profile Summary
Generated 27-Apr-2009 15:37:30 using cpu time.
Function Name Calls Total Time Self Time* Total Time Plot
(dark band = self time)
md 1 415.847 s 0.096 s
Self time is the time spent in a function excluding the time spent in its child functions. Self time also includes overhead res
the process of profiling.
Burkardt/Cliff MATLAB Parallel Computing
MD: The COMPUTE Function
f = z e r o s ( nd , np ) ;
pot = 0 . 0 ;
pi2 = pi / 2.0;
f o r i = 1 : np
R i = p o s − re p ma t ( p o s ( : , i ) , 1 , np ) ; % a r r a y of v e c t o r s to ’ i ’
D = s q r t ( sum ( R i . ˆ 2 ) ) ; % array of distances
Ri = Ri ( : , ( D > 0.0 ) ) ;
D = D( D > 0 . 0 ) ; % save o n l y pos v a l u e s
D2 = D . ∗ ( D <= p i 2 ) + p i 2 ∗ ( D > pi2 ) ; % truncate the p o t e n t i a l .
p o t = p o t + 0 . 5 ∗ sum ( s i n ( D2 ).ˆ2 ); % accumulate pot . energy
f ( : , i ) = R i ∗ ( s i n ( 2∗D2 ) . / D ); % f o r c e on p a r t i c l e ’ i ’
end
k i n = 0 . 5 ∗ mass ∗ sum ( d i a g ( v e l ’ ∗ v e l ) ) ; % k i n e t i c e n e r g y
return
end
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
f u n c t i o n t o t a l = prime number ( n )
total = 0;
for i = 2 : n
prime = 1;
for j = 2 : sqrt ( i )
i f ( mod ( i , j ) == 0 )
prime = 0;
break
end
end
t o t a l = t o t a l + prime ;
end
return
end
m a t l a b p o o l ( ’ open ’ , ’ l o c a l ’ , 4 )
n = 50;
w h i l e ( n <= 500000 )
primes = prime number parallel ( n ) ;
f p r i n t f ( 1 , ’ %8d %8d\n ’ , n , p r i m e s ) ;
n = n ∗ 10;
end
matlabpool ( ’ c l o s e ’ )
PRIME_NUMBER_PARALLEL_RUN
Run PRIME_NUMBER_PARALLEL with 0, 1, 2, and 4 labs.
There are many thoughts that come to mind from these results!
Why does 500 take less time than 50? (It doesn’t, really).
How can ”1+1” take longer than ”1+0”?
(It does, but it’s probably not as bad as it looks!)
This data suggests two conclusions:
Parallelism doesn’t pay until your problem is big enough;
AND
Parallelism doesn’t pay until you have a decent number of workers.
n = 500000;
%f u n c t i o n t o t a l = p r i m e n u m b e r ( n )
total = 0;
parfor i = 2 : n
prime = 1;
parfor j = 2 : sqrt ( i )
i f ( mod ( i , j ) == 0 )
prime = 0;
break
end
end
t o t a l = t o t a l + prime ;
end
%end
wait ( job ); <-- One way to find out when job is done.
Using load, you can examine just a single output variable from a
finished job if you list its name:
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
job_id = batch (
’script_to_run’, ...
’configuration’, ’local’ or ’ithaca_2009b’, ...
’FileDependencies’, ’file’ or {’file1’,’file2’}, ...
’PathDependencies’, ’path’ or {’path1’,’path2’}, ...
’matlabpool’, number of workers (can be zero!) )
Note that you do not include the file extension when naming the
script to run, or the files in the FileDependencies.
See page 13-2 of the PCT User’s Guide for more information.
This slide is NOT in your handouts.
will print out the current value (by simply printing this file’s
contents).
Thus, instead of using the wait(job) command, you can simply
check the job’s state from time to time to see if it is ’finished’, and
otherwise go on talking to MATLAB.
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
n = length ( w );
for code = 0 : 2^n-1
% Convert CODE into vector of indices in W.
subset = find ( bitget ( code, 1:n ) );
% Did we match the target sum?
if ( sum ( w(subset) ) == t )
return
end
end
return
end
n = length ( w );
for code = range(1) : range(2)
% Convert CODE into vector of indices in W.
subset = find ( bitget ( code, 1:n ) );
% Did we match the target sum?
if ( sum ( w(subset) ) == t )
return
end
end
return
end
i2 = -1;
for task = 1 : 4
i1 = i2 + 1;
i2 = floor ( ( 2^n - 1 ) * task / 4 );
createTask ( job, @knapdist, 2, { w, t, [ i1, i2 ] } );
end
job_id = createJob (
’configuration’, ’local’ or ’ithaca_2009b’, ...
’FileDependencies’, ’file’ or {’file1’,’file2’}, ...
’PathDependencies’, ’path’ or {’path1’,’path2’} )
The createTask command defines the tasks that make up the job.
In particular, it names the MATLAB function that will be called,
the number of output arguments it has, and the values of the input
arguments.
task_id = createTask (
job_id, ... <-- ID of the job
@function, ... <-- MATLAB function to be called
numarg, ... <-- Number of output arguments
{ arg1,arg2,...} ) <-- Input arguments
With the following commands, we submit the job, and then pause
our interactive MATLAB session until the job is finished.
We then retrieve the output arguments from each task, in a cell
array we call results.
for task = 1 : 4
if ( isempty ( results{task,1} ) )
fprintf ( 1, ’Task %d found no solutions.\n’, task );
else
disp ( ’Weights:’ );
disp ( results{task,1} );
disp ( ’Weight Values:’ );
disp ( results{task,2} );
end
end
If you have set up your machine so that the local copy of MATLAB
can talk to remote copies, then this same distributed job can be
run on a remote machine, such as the Virginia Tech ithaca cluster.
All you have to do is change the configuration argument when you
define the job:
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
spmd
a = ( labindex - 1 ) / numlabs;
b = labindex / numlabs;
fprintf ( 1, ’ A = %f, B = %f\n’, a, b );
end
or
spmd
a = ( labindex - 1 ) / numlabs;
b = labindex / numlabs;
end
for i = 1 : 4 <-- "numlabs" wouldn’t work here!
fprintf ( 1, ’ A = %f, B = %f\n’, a{i}, b{i} );
end
Burkardt/Cliff MATLAB Parallel Computing
SPMD: The Solution in 4 Parts
Assuming we’ve defined our limits of integration, we now want
to carry out the trapezoid rule for integration:
spmd
x = linspace ( a, b, n );
fx = f ( x );
quad_part = ( fx(1) + 2 * sum(fx(2:n-1)) + fx(n) )
/2 /(n-1);
fprintf ( 1, ’ Partial approx %f\n’, quad_part );
end
with result:
2 Partial approx 0.874676
4 Partial approx 0.567588
1 Partial approx 0.979915
3 Partial approx 0.719414
Burkardt/Cliff MATLAB Parallel Computing
SPMD: Combining Partial Results
with result:
Approximation 3.14159265
f p r i n t f ( 1 , ’ Compute l i m i t s \n ’ ) ;
spmd
a = ( l a b i n d e x − 1 ) / n umlab s ;
b = labindex / n umlab s ;
f p r i n t f ( 1 , ’ Lab %d w o r k s on [% f ,% f ] . \ n ’ , l a b i n d e x , a , b ) ;
end
f p r i n t f ( 1 , ’ Each l a b e s t i m a t e s p a r t o f t h e i n t e g r a l . \ n ’ ) ;
spmd
i f ( n == 1 )
quad part = ( b − a ) ∗ f ( ( a + b ) / 2 ) ;
else
x = linspace ( a , b , n );
fx = f ( x );
q u a d p a r t = ( b − a ) ∗ ( f x ( 1 ) + 2 ∗ sum ( f x ( 2 : n−1) ) + f x ( n ) ) ...
/ 2.0 / ( n − 1 ) ;
end
f p r i n t f ( 1 , ’ Approx %f \n ’ , q u a d p a r t ) ;
end
return
end
Burkardt/Cliff MATLAB Parallel Computing
SPMD: Combining Values Directly
spmd
x = linspace ( a, b, n );
fx = f ( x );
quad_part = ( fx(1) + 2 * sum(fx(2:n-1)) + fx(n) )
/2 /(n-1);
quad = gplus(quad_part);
if ( labindex == 1 )
fprintf ( 1, ’ Approximation %f\n’, quad );
end
end
gop(@max,a), maximum of a;
gop(@min,a), minimum of a;
gop(@and.a), AND of a;
gop(@or.a), OR of a;
gop(@xor.a), XOR of a;
gop(@bitand.a), bitwise AND of a;
gop(@bitor.a), bitwise OR of a;
gop(@bitxor.a), bitwise XOR of a.
For details on how these commands work, start with the MATLAB
HELP facility!
For more information, refer to the documentation for the Parallel
Computing Toolbox.
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
% f n a m e p o i n t s t o a u s e r−s u p p l i e d f u n c t i o n w i t h a s i n g l e i n p u t argument
% n i s a d i s c r e t i z a t i o n p a r a m e t e r . The f i n i t e −d i m e n s i o n a l p r o b l e m a r i s e s
% by t r e a t i n g t h e ( s c a l a r ) c o n t r o l a s p i e c e w i s e c o n s t a n t
% The f u n c t i o n r e f e r e n c e d by f n a m e must d e f i n e t h e e l e m e n t s o f
% t h e u n d e r l y i n g o p t i m a l c o n t r o l p r o b l e m . See ’ z e r m e l o ’ a s an e x a m p l e .
%% Problem d a t a
PAR = f e v a l ( s t r 2 f u n c ( f n a m e ) , n ) ;
% some l i n e s o m i t t e d
%% A l g o r i t h m s e t up
OPT = o p t i m s e t ( o p t i m s e t ( ’ f m i n c o n ’ ) , . . .
’ LargeScale ’ , ’ off ’ , . . .
’ A l g o r i t h m ’ , ’ a c t i v e −s e t ’ , ...
’ Display ’ , ’ i t e r ’ , . . .
’ U s e P a r a l l e l ’ , ’ Always ’ ) ;
h c o s t = @( z ) g e n e r a l c o s t ( z , PAR ) ;
h c n s t = @( z ) g e n e r a l c o n s t r a i n t ( z , PAR ) ;
%% Run t h e a l g o r i t h m
[ z star , f star , exit ] = . . .
f m i n c o n ( h c o s t , z0 , [ ] , [] , [] , [ ] , LB , UB, h c n s t , OPT ) ;
i f e x i t >=0 && i s f i e l d (PAR , ’ p l o t ’ )
f e v a l (PAR . p l o t , z s t a r , PAR)
end
ans = 1
ans = 1
>> F2 = G{1, 2}
F2= 1 1
1 1
>> whos
Name Size Bytes Class Attributes
A 2x2 32 double
B 2x2 32 double
C 3x4 96 double
D 1x8 16 char
F1 1x2 184 cell
F2 2x2 32 double
G 2x2 416 cell
>> spmd
V = eye(2) + (labindex -1);
end
>> V{1}
ans = 1 0
0 1
>> V{2}
ans = 2 1
1 2
>> whos
Name Size Bytes Class Attributes
V 1x2 373 Composite
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
One can construct local arrays on the labs and assemble them into
a codistributed array:
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
where:
F (x, y , t) is a specified source term,
σ > 0 is the areal density of the material,
Cp > 0 is the thermal capacitance of the material, and
kx > 0 (ky > 0) is the conductivity in the x direction (the
y -direction).
∂T (x, 0) ∂T (x, w )
= =0,
∂y ∂y
∂T (L, y )
kx = f (y ) ,
∂x
∂T (0, y )
kx = α(y ) (T (0, y ) − β(y )) .
∂x
4
where T n (x, y ) = T (n ∆t, x, y ), and Ψ ∈ H 1 (Ω) is a test function.
X Z
Φ (x, y ) Φı (x, y ) dω
Ω
Z Z 0
∆t
+ (k∇Φ · ∇Φı ) dω + α(y ) Φ (0, y ) Φı (0, y ) dy zn+1
σ Cp Ω w
X Z
∆t
Z
− Φ (x, y ) Φı (x, y ) dω zn − F (x, y , tn+1 )Φı dω
Ω σ Cp Ω
Z w Z 0
∆t
− f (y )Φı (L, y ) dy + α(y )β(y )Φı (0, y ) dy = 0
σ Cp 0 w
%% I n i t i a l i z a t i o n & g e o m e t r y
%−−−−l i n e s omitted
%% S e t up c o d i s t r i b u t e d s t r u c t u r e
% column p o i n t e r s and s u c h f o r c o d i s t r i b u t e d a r r a y s
Vc = c o d c o l o n ( 1 , n e q u a t i o n s ) ;
l P = l o c a l P a r t ( Vc ) ; l P 1 = l P ( 1 ) ; l P e n d = l P ( end ) ;
dPM = d i s t r i b u t i o n P a r t i t i o n ( c o d i s t r i b u t o r ( Vc ) ) ;
c o l s h f t = [ 0 cumsum (dPM ( 1 : end − 1 ) ) ] ;
%% B u i l d t h e f i n i t e e l e m e n t m a t r i c e s − B e g i n l o o p o v e r e l e m e n t s
f o r n e l =1: n e l e m e n t s
nodes local = e conn ( n e l , : ) ;% which nodes a r e i n t h i s element
% s u b s e t o f n o d e s / c o l u m n s on t h i s l a b
l a b n o d e s l o c a l = my extract ( n o d e s l o c a l , lP 1 , lP end ) ;
i f ˜ isempty ( l a b n o d e s l o c a l ) % c o n t i n u e the c a l c u l a t i o n f o r t h i s elmnt
%−−− c a l c u l a t e l o c a l a r r a y s − l i n e s o m i t t e d
%
if t g l b >= l P 1 && t g l b <= l P e n d % i s node on t h i s l a b ?
t loc = t glb − col shft ( labindex );
b l a b ( t l o c , 1 ) = b l a b ( t l o c , 1 ) − param . d t ∗ b l o c ( n t , 1 ) ;
F l a b ( t l o c , 1 ) = F l a b ( t l o c , 1 ) − param . d t ∗ F l o c ( n t , 1 ) ;
end
end % f o r n t
end % i f n o t empty
end % n e l
%
% Assemble the l a b c o n t r i b u t i o n s i n a c o d i s t r i b u t e d format
M1 = c o d i s t r i b u t e d ( M1 lab , c o d i s t r i b u t o r ( ’ 1d ’ , 2 ) ) ;
M2 = c o d i s t r i b u t e d ( M2 lab , c o d i s t r i b u t o r ( ’ 1d ’ , 2 ) ) ;
b = c o d i s t r i b u t e d ( b l a b , c o d i s t r i b u t o r ( ’ 1d ’ , 1 ) ) ;
F = c o d i s t r i b u t e d ( F l a b , c o d i s t r i b u t o r ( ’ 1d ’ , 1 ) ) ;
% S c r i p t t o a s s e m b l e m a t r i c e s f o r a 2D d i f f u s i o n p r o b l e m
%% s e t p a t h
a d d p a t h ’ . / s u b s s o u r c e / oned ’ ; a d d p a t h ’ . / s u b s s o u r c e / twod ’
%% s e t p a r a m e t e r v a l u e s and a s s e m b l e a r r a y s
param = p d a t a ( ) ;
[ M1, M2, F , b , x , e c o n n ] = a s s e m b c o ( param ) ;
%% c l e a n −up p a t h
rmpath ’ . / s u b s s o u r c e / oned ’ ; rmpath ’ . / s u b s s o u r c e / twod ’
%% S t e a d y s t a t e s o l u t i o n s
z tmp = − f u l l (M2)\ f u l l ( F+b ) ; % T e m p e r a t u r e d i s t r i b u t i o n
z s s = g a t h e r ( z tmp , 1 ) ;
%% P l o t and s a v e a s u r f a c e p l o t
i f l a b i n d e x == 1
xx = x ( 1 : param . n o d e s x , 1 ) ;
yy = x ( 1 : param . n o d e s x : param . n o d e s x ∗param . n o d e s y , 2 ) ;
figure
s u r f ( xx , yy , r e s h a p e ( z s s , param . n o d e s x , param . n o d e s y ) ’ ) ;
x l a b e l ( ’\bf x ’ ) ; y l a b e l ( ’\bf y ’ ) ; z l a b e l ( ’\bf T ’ )
t axis = axis ;
p r i n t −dpng f i g s s . png
close a l l
end
Introduction
Local Parallel Computing
The MD Example
PRIME NUMBER Example
Remote Computing
KNAPSACK Example
SPMD Parallelism
fmincon Example
Codistributed Arrays
A 2D Heat Equation
Conclusion
Burkardt/Cliff MATLAB Parallel Computing
Conclusion: Desktop Experiments
http://www.arc.vt.edu/index.php
Until then, you can get a “friendly user” account by sending mail
to John Burkardt [email protected].
If you want to use parallel MATLAB regularly, you may want to set
up a way to submit jobs from your PC to Ithaca, without logging
in directly.
This requires defining a configuration file on your PC, adding some
scripts to your MATLAB directory, and setting up a secure
connection between your PC and Ithaca. The steps for doing this
are described in the document:
http://people.sc.fsu.edu/~burkardt/pdf/...
matlab_remote_submission.pdf
http://www.mathworks.com/products/parallel-computing/