0% found this document useful (0 votes)
11 views5 pages

Download

This paper compares synchronous and asynchronous implementations of parallel Genetic Programming (GP) to reduce processing time in solving complex problems, specifically in mobile robot navigation. The study demonstrates that superlinear speedup can be achieved through a coarse-grained model and asynchronous migration, leading to improved robustness in the generated robot control programs. Experimental results indicate that while both implementations show speedup, the asynchronous method performs slightly better than the synchronous one.

Uploaded by

monjezi.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Download

This paper compares synchronous and asynchronous implementations of parallel Genetic Programming (GP) to reduce processing time in solving complex problems, specifically in mobile robot navigation. The study demonstrates that superlinear speedup can be achieved through a coarse-grained model and asynchronous migration, leading to improved robustness in the generated robot control programs. Experimental results indicate that while both implementations show speedup, the asynchronous method performs slightly better than the synchronous one.

Uploaded by

monjezi.v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2330092

Comparison Between Synchronous and Asynchronous Implementation of


Parallel Genetic Programming

Article · February 2000


Source: CiteSeer

CITATIONS READS

9 318

2 authors:

Shisanu Tongchim Prabhas Chongstitvatana


Chulalongkorn University Chulalongkorn University
25 PUBLICATIONS 159 CITATIONS 197 PUBLICATIONS 1,434 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Prabhas Chongstitvatana on 15 November 2015.

The user has requested enhancement of the downloaded file.


Comparison Between Synchronous and Asynchronous Implementation of Parallel
Genetic Programming

Shisanu Tongchim Prabhas Chongstitvatana


Department of Computer Engineering Department of Computer Engineering
Chulalongkorn University Chulalongkorn University
Bangkok 10330, Thailand Bangkok 10330, Thailand
[email protected] [email protected]

Abstract pirical study about some problem-specific factors which


An evolutionary method such as Genetic Program- affect the effectiveness of parallel GP. Punch concluded
ming (GP) can be used to solve a large number of com- that the achieved performance of parallel GP by using a
plex problems in various application domains. However, coarse-grained model may vary according to the nature
one obvious shortcoming of GP is that it usually uses of problems.
a substantial amount of processing time to arrive at a The remaining sections are organized as follows: The
solution. In this paper, we present the parallel imple- next section is a description of the mobile robot naviga-
mentations that can reduce the processing time by using tion problem. Section 3 describes a problem represen-
a coarse-grained model for parallelization and an asyn- tation in the serial GP. Section 4 shows the parallel so-
chronous migration. The problem chosen to examine the lutions. Section 5 presents the experimental results and
parallel GP is a mobile robot navigation problem. The discussion. Finally, section 6 provides the conclusions of
experimental results show that superlinear speedup of this work.
GP can be achieved.
2 Mobile Robot Navigation Problem
1 Introduction
Our previous work [1], GP was used to generate a
Genetic Programming was successfully used to per- robot control program for the mobile robot navigation
form automatic generation of mobile robot programs [1]. problem. The task was to control a mobile robot from
The use of the perturbation to improve robustness of the a starting point to a target point in a simulated envi-
robot programs was proposed. Each robot program was ronment. The environment was filled with the obstacles
evaluated under many environments that were different which had several geometrical shapes.
from the original one. As a result, the substantial pro- The aim of the work was to generate robust control
cessing time was required to evaluate the fitness of the programs. In the evolution process, each individual was
robot programs. evaluated under many environments that were different
To reduce the computational time, this study pro- from the original one. The result showed that the ro-
posed two parallel implementations. Asynchronous and bustness of the robot programs was improved by such
synchronous parallelization approaches were examined. an approach.
We also compared the quality of the solutions generated
from the serial and parallel GP. 3 Serial Algorithm
The earlier work of parallel GP was implemented on
a network of transputers by Koza and Andre [2]. Their The terminal set is composed of three primitive move-
result showed that the parallel speedup was greater than ment controls {move, left, right} and one sensor in-
linear. Dracopoulos and Kent [3] proposed the use of the formation {isnearer}. The function set is composed of
Bulk Synchronous Parallel Programming (BSP) model three functions {if-and,if-or,if-not}. The GP pa-
to parallelize genetic programming. Two approaches of rameters are shown in Table 1.
parallel GP were examined on a cluster of Sun worksta- The fitness function is a sum of the fitness value in
tions. The first was based on a master-slave model while each environment which is based on the distance of the
the second was based on a coarse-grained model. The final position and the number of moves.
results showed that the achieved speedup was close to In the evolution process, the percent of disturbance
linear. A recent paper by Punch [4] presented the em- is 20% and the number of training environments is 8.
Table 1: GP parameters Table 2: Experimental Parameters

Total population 6000


Crossover probability 0.9 Num. of Processors
Mutation probability 0.1 1 2 4 6 10
Reproduction 5% of Total population Pop. size ∗ 6000 3000 1500 1000 600
Maximum generation 200 Environments ∗ 8 7 4 3 2
Migration interval NA 100 50 34 20

per node
4 Parallel Genetic Programming
In a coarse-grained model, the population is divided procedure Migration
into subpopulations and are maintained by different pro- begin
cessors. The model is also known as Island model and barrier1 wait all nodes ready
the subpopulation is called deme [2]. for i = 1 to n
Some works in parallelization of GA and GP using a begin
coarse-grained model [2, 5] show that the results can if (my process id = i)
achieve superlinear speedup 1 . This is caused by two broadcast send
factors; the speedup from the populations distributed else
across different processors and the speedup obtained by begin
increasing the probability in finding the correct solution, broadcast receive
as the number of populations is increased. end
In applying the parallel approach to the previous work barrier2 wait for the next broadcast
[1] by using a conventional coarse-grained model, the re- end
sult achieves only linear speedup [6] since the amount end
of work is fixed – the algorithm is terminated when it
reaches the maximum generation. Hence, the parallel
algorithm does not exploit the probabilistic advantage Figure 1: The migration process
that the answer may be obtained before the maximum
generation. We reduce redundant jobs by dividing the
environments among the processing nodes. After a spe- node broadcasts its subpopulation to all other nodes by
cific number of generations, every subpopulations are mi- MPI Bcast function, this is repeated for every node. The
grated between processors using a fully connected topol- top 5% of individuals from each subpopulation are ex-
ogy. However, this scheme leads to the reduction of ro- changed during the migration. Pseudo-code for the mi-
bustness since each individual has a shorter period in gration is shown in figure 1. The detail will be discussed
each training environment. To mend this problem, we in the timing analysis section.
increase the number of environments in each node. How- The total population is hold constantly for the task
ever, the number of environments per node should be and is divided equally among workstations. The number
less than the number of environments per node in the of selected individuals, crossover operation, mutation op-
general coarse-grained model. eration, reproduction are a percentage of the amount of
We implement our parallel algorithm on a dedicated the total population. The parallel efficiency is measured
cluster of PC workstations with 350 MHz Pentium II pro- by varying a number of nodes and the results are aver-
cessors, each with 32 Mb of RAM, and running Linux as aged over 20 runs for each number of nodes.
an operating system. These machines are connected via In the first implementation, the migration between
10 Mbs ethernet cabling. We extend the program used subpopulations is synchronized. Each node is blocked
in [1] to run under a clustered computer by using MPI by MPI Barrier function until all subpopulations evolve
as a message passing library. to the same number of generations. However, the syn-
Several trials are examined to find an appropriate chronizing migration results in uneven work loads among
value for the number of environments per node (see Ta- the processors since the time required to complete the
ble 2). The migration is carried out as follows: each evaluation varies, with the least effective programs tak-
1 Superlinear speedup means that speedup is greater than the ing the longest period and the best programs taking the
number of processors used. shortest period.
100
25
1 node Ideal
2 nodes (Syn) Synchronous
90 22 Asynchronous
2 nodes (Asyn)
4 nodes (Syn)
4 nodes (Asyn) 19
80 6 nodes (Syn)
Robustness (%)

6 nodes (Asyn) 16

Speedup
10 nodes (Syn)
70 10 nodes (Asyn)
13

10
60
7
50
4

40 1
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10

Disturbance (%) Number of Processors

Figure 2: Robustness Figure 3: Speedup

Communication Computation
In the second implementation, we attempt to fur-
100%
ther improve the speedup of the parallel algorithm by 90%
the asynchronous migration. When the fastest node 80%

reaches predetermined generation numbers, the migra- 70%


60%
tion request is sent to all subpopulations. The migration Percent
50%
takes place at the end of the current generation. In this 40%
state, if any populations are still in the fitness evaluation 30%

phase, the other nodes must wait. The waiting time will 20%

be at most less than one generation. 10%


0%
2 (Syn) 2 (Asyn) 4 (Syn) 4 (Asyn) 6 (Syn) 6 (Asyn) 10 (Syn) 10 (Asyn)
5 Results and Discussion Number of Processors
5.1 Speedup
To make an adequate comparison between the serial Figure 4: Percentage of time spent in computation and
algorithm and parallel algorithm, Cantú-Paz [7] suggests communication
that the two must give the same quality of the solution.
In this paper, we define the robustness of the gener-
ated program from the serial algorithm as a baseline. of this result, the timing analysis is performed in the
In addition, if the generated program from the paral- next section.
lel algorithm gives the same robustness as the program 5.2 Timing Analysis
from the serial algorithm, the equal amount of work to
achieve the same quality of the answer is done. From Figure 4 shows the relative time spent in each section
the robustness graph in figure 2, the generated program of the implementations. The communication overhead –
from the parallel GP is better than the serial GP. Hence, the sum of the barrier time and broadcast time – goes up
the amount of work from the parallel algorithm is not considerably as the number of processors increases. The
less than the serial algorithm. asynchronous implementation does not help much on re-
The parallel speedup is defined as the ratio of the ducing the communication overhead at large numbers
serial execution time to the parallel execution time. of processors. Thus, we investigate the further detailed
analysis of the communication overhead.
Serial time Figure 5 shows the absolute time spent in major func-
Speedup = (1)
P arallel time tions of the communication. The time spent in barriers
Figure 3 illustrates the speedup observed on the two indicates the time spent on waiting for all processes to
implementations as a function of the number of pro- reach the same point. From pseudo-code of the migra-
cessors used. The performance is less than we ex- tion in figure 1, the barrier time consists of the time
pect, although both implementations exhibit superlinear spent to wait for all nodes to be ready for the migration
speedup. The speedup curves taper off for 10 processors and the next broadcast.
and the performance of the asynchronous implementa- In the synchronous implementation, the time spent in
tion is slightly better than the performance of the syn- barriers reduces as the number of processors increases.
chronous implementation. In order to discern the cause This is because the barrier time depends on the variation
MPI_Bcast MPI_Barrier 6 Conclusions
1400
The result presented in this paper shows a success
1200 of speeding up the Genetic Programming process by
1000 means of parallel processing. The parallel implemen-
Time in seconds

800
tations of Genetic Programming successfully exploit the
600
computing resource of a dedicated cluster of PC work-
stations. Superlinear speedup of GP can be acquired by
400
improving a coarse-grained model for parallelization as
200
less computational work needs to be done. Furthermore,
0 the timing analyses indicate the scalability of the paral-
2 (Syn) 2 (Asyn) 4 (Syn) 4 (Asyn) 6 (Syn) 6 (Asyn) 10 (Syn) 10 (Asyn)

Number of Processors
lel approaches, as the size of the problem increases, the
speedup will be improved.
Figure 5: Absolute time spent in communication References
[1] Chongstitvatana P (1998), Improving robustness of
robot programs generated by genetic programming
of the computation time of each node. As the number for dynamic environments. Proc. of IEEE Asia Pa-
of nodes is increased, the computation time per node is cific Conference on Circuits and Systems, p.523–526
decreased. Hence, the barrier time is reduced.
In contrast, the barrier time in the asynchronous im- [2] Koza JR, Andre D (1995), Parallel genetic program-
plementation increases as the number of processors in- ming on a network of transputers. Proc. of the Work-
creases. This is due to the fact that the time spent in the shop on Genetic Programming: From Theory to
second barrier (waiting for the next broadcast) increases Real-World Applications, University of Rochester,
with the number of nodes. However, the asynchronous National Resource Laboratory for the Study of Brain
implementation eliminates the first barrier therefore it and Behavior, Technical Report 95-2, p.111-120
reduces the total time in the barriers compared to the [3] Dracopoulos DC, Kent S (1996), Bulk synchronous
synchronous implementation. parallelisation of genetic programming. Proc. of the
The absolute time spent in a broadcast increases con- Third International Workshop on Applied Parallel
siderably – greater than linear. From the inspection in Computing in Industrial Problems and Optimization
the trace information by using a visualization tool, we (PARA ’96), Springer Verlag, Berlin
found that the transmission of the broadcast functions
in the implementation of MPI that we use may be exe- [4] Punch B (1998), How effective are multiple poplu-
cuted more than once, especially for a large number of lations in genetic programming. Proc. of the Third
processors. Annual Conference in Genetic Programming, pp.308-
313
After obtaining some timing analyses, the results re-
veal the cause of the problem. The performance degra- [5] Lin S-C, Punch WF, Goodman ED (1994), Coarse-
dation in 10 processors is caused by the excessive com- grain parallel genetic algorithms: Categorization and
munication time due to the broadcast function. Al- new approach. Proc. of the Sixth IEEE SPDP, pp.28-
though the asynchronous migration reduces the barrier 37
time effectively compared to the synchronous migration,
the increase in the communication time in 10 processors [6] Tongchim S, Chongstitvatana P (1999), Speedup Im-
obliterates this advantage. In case of the small number provement on Automatic Robot Programming by
of processors (2,4,6), the gain from the asynchronous mi- Parallel Genetic Programming. Proc. of 1999 IEEE
gration is considerable as the evolution proceeds at the International Symposium on Intelligent Signal Pro-
speed of the fastest node. cessing and Communication Systems (ISPACS’99),
As the size of the work increases (i.e., the number Phuket, Thailand
of training environments increases), the serial and par- [7] Cantú-Paz E (1999), Designing efficient and accurate
allel computation time will be increased when the time parallel genetic algorithms. PhD thesis, University of
spent in the communication is constant. If the ratio Illinois at Urbana-Champaign
of the computation/communication can be kept large
(large work load), then one can expect that the parallel
performance will be improved.

View publication stats

You might also like