SlideShare a Scribd company logo
TELKOMNIKA, Vol. x, No. x, April 2017, pp. 1 ∼ 10
ISSN: 1693-6930
1
A Load-Balanced Parallelization of AKS Algorithm
Ardhi Wiratama Baskara Yudha and Reza Pulungan
Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences,
Universitas Gadjah Mada, Yogyakarta, Indonesia
e-mails: ardhi.wiratama.b@mail.ugm.ac.id, pulungan@ugm.ac.id
Abstract
The best known deterministic polynomial-time algorithm for primality testing right now is due to
Agrawal, Kayal, and Saxena. This algorithm has a time complexity O(log15/2
(n)). Although this algorithm is
polynomial, its reliance on the congruence of large polynomials results in enormous computational require-
ment. In this paper, we propose a parallelization technique for this algorithm based on message-passing
parallelism together with four workload-distribution strategies. We perform a series of experiments on an
implementation of this algorithm in a high-performance computing system consisting of 15 nodes, each with
4 CPU cores. The experiments indicate that our proposed parallelization technique introduce a significant
speedup on existing implementations. Furthermore, the dynamic workload-distribution strategy performs
better than the others. Overall, the experiments show that the parallelization obtains up to 36 times speedup.
Keywords: Primality testing, AKS algorithm, parallelization, load balancing, high-performance computing.
Copyright c 2017 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
Prime numbers are the cornerstone of number theory. Mathematicians and number the-
orist, since ancient times, have been fascinated by many problems concerning prime numbers. In
modern time, many of the most important cryptographic algorithms rely on big prime numbers to
perform encryption and decryption. One of them is Rivest-Shamir-Adlemann (RSA) algorithm [1],
which is now widely used in web security [2], including in banking transaction security. RSA al-
gorithm depends on the fact that it is difficult to find the prime factors of a big integer. Electronic
Frontier Foundation (EFF) offers $250,000 as a reward to the first individual or group who discov-
ers a prime number with at least 1,000,000,000 decimal digits [3]. Searching for a prime number
is usually based on an efficient algorithm that determines whether a given number is prime or
composite. Such algorithms are called primality testing algorithms.
Most of primality testing algorithms are probabilistic, namely they cannot ascertain the
primality of a given number, but only provide a probability that the given number is prime. Miller-
Rabin primality test [4, 5], for instance, has an error rate below 25%, which means that if the given
number passes this test n times, then the probability that the number is prime is 1 − 0.25n
[6].
Solovay-Strassen [7] primality test, on the other hand, has an error rate below 50%. Probabilis-
tic primality testing algorithms are relatively fast, of low complexity, but with tunable accuracy.
However, there are cases that require certainty that a given number is prime or not; and thus,
probabilistic algorithms cannot be used.
In 2002, three Indian computer scientists Agrawal, Kayal, and Saxena [8] proposed a
deterministic—i.e., non-probabilistic—primality testing algorithm that runs in polynomial time; we
will refer to this algorithm as AKS algorithm. This is the first deterministic polynomial-time al-
gorithm for primality testing. Since this seminal paper, the primality testing problem no longer
resides in the complexity classes of NP-Hard, NP, or ZPP [9]. AKS algorithm, interestingly, is
relatively simple and straightforward, while previous work by other researchers attempted to show
that primality testing is of polynomial time complexity by making complex modifications on existing
primality testing algorithms [10].
Since this theoretical breakthrough, many researchers have proposed theoretical and
practical improvements to this algorithm soon after it was released in public. Notable among them
Received May 9, 201x; Revised August 3, 201x; Accepted August 16, 201x
2 ISSN: 1693-6930
are Lenstra [11] and Bernstein [12]. Bernstein [12] proposed two practical possibilites for acceler-
ating AKS algorithm with low-level speedup by improving the integer squaring method and high-
level speedup by reducing the number of for loop iterations in the last step of the algorithm. This
included all state-of-the-art improvements on reducing the last for loop iterations and produced
speedup of many orders of magnitudes. These have been incorporated in the latest version of
AKS algorithm.
Lenstra and Pomerance [13, 14], on the other hand, proposed theoretical improvements
to AKS algorithm and obtained a new technique with time complexity O(log6
(n)). They modi-
fied the original AKS algorithm by decreasing the number of iterations in the for loop. This is
done by replacing the use of the cyclotomic polynomials in AKS algorithm by a monic polynomial
f(x) of degree r with integer coefficients such that the ring Z[x]/(f(x), n) is a pseudofield. Bern-
stein in [15] proposed a further theoretical improvement to AKS algorithm with time complexity
O(log4
(n)). The proposal also attempted to decrease the number of iterations in the for loop by
replacing the use of the cyclotomic polynomials by random Kummer extensions of Z[x]/n.
Crandall and Papadopoulos [16] implemented a variant of AKS algorithm by Lenstra [11]
and found that empirically the time complexity of the variant is c log6
(n), where c is around 1,000
clock cycles. Li [17] also implemented the Lenstra variant of AKS algorithm using C++ and NTL
library to handle the polynomial data structure. In this implementation, a 15-decimal-digit prime
number required around 3,000 seconds to compute in a single-processor computer. Menon in [18]
implemented AKS algorithm in SAGE (Software for Algebra and Geometry Experimentation), and
produced an implementation, in which a 25-decimal-digit prime number required more than 4,000
seconds to compute in a single-processor computer. Cao [19] analyzed the storage space re-
quirement for AKS algorithm and showed that the required storage space for testing a number
with length 1,024 bits is about 1,000,000,000 Gigabyte, which is practically infeasible. This is due
to the need to store extremely large polynomials during the computation.
This paper reports on our effort to develop a parallelization technique for AKS algorithm
based on message-passing parallelism (using MPI) and to find out the best workload-distribution
strategy for the parallelization technique.
Organization of the paper: The paper is organized as follows: Section 2. presents the basis of
AKS algorithm. Section 3. describes the proposed parallelization technique, together with four
accompanying workload-distribution strategies. In Section 4., we present the result of our exper-
iments with the proposed parallelization technique and the four workload-distribution strategies
and provide analysis. Section 5. concludes the paper.
2. Preliminaries
Let Z be the set of integers and let a and b be two positive integers. Let gcd(a, b) be the
greatest common divisor of a and b. The two integers a and b are relatively prime if and only
if gcd(a, b) = 1. Let φ(a) be the Euler’s totient function, namely the number of positive integers
smaller than a that are relatively prime to a. For relatively prime a and r, let or(a) be the order of a
modulo r, namely the smallest integer k such that ak
≡ 1 (mod r). Let a rem b be the remainder
of integer division between a and b.
In the earliest version of their publication, Agrawal, Kayal, and Saxena obtained an algo-
rithm with the worst-case time complexity of O(log12
(n)), where n is the given number. In this
paper, we are referring to the latest version (version 6) of their publication [8], in which the latest
AKS algorithm was presented. The latest version has incorporated many improvements proposed
by many researchers and the resulting algorithm runs in polynomial time with the worst-case com-
plexity of O(log15/2
(n)). Prior to the publication of this algorithm, there were other primality proving
algorithms that seemed to run in polynomial time, but AKS algorithm is the first one that is de-
terministic as well as of polynomial time [16]. The main idea of AKS algorithm is described in
Lemma 1, which is a generalization of Fermat’s little theorem.
TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
TELKOMNIKA ISSN: 1693-6930 3
Lemma 1 ([8]) Let a ∈ Z be relatively prime to n ∈ Z and n ≥ 2. Then n is prime if and only if:
(x + a)n
≡ xn
+ a (mod n). (1)
To reduce the number of operations performed, both sides of Equation (1) can be sim-
plified by taking their respective remainders modulo a polynomial xr
− 1, for some small positive
r ∈ Z, namely:
(x + a)n
≡ xn
+ a (mod xr
− 1, n). (2)
However, right now the bi-implication in Lemma 1 no longer applies, since non-prime n
may also satisfy Equation (2) for some a and r. Theorem 1—as reformulated by Granville in [10]—
forms the cornerstone of AKS algorithm. The theorem basically asserts that for appropriately
selected r’s, if Equation (2) is satisfied by some a, then n must be a prime. Therefore, r must be
selected accordingly.
Theorem 1 ([8, 10]) Given n ∈ Z and n ≥ 2, let r < n be a positive integer satisfying or(n) >
log2
(n). Then n is prime if and only if:
(1) n is not a perfect power,
(2) n does not have any prime factor ≤ r, and
(3) (x + a)n
≡ xn
+ a (mod xr
− 1, n) for any integer a, where 1 ≤ a ≤ φ(r) log(n).
A straightforward implementation of Theorem 1 is given in Algorithm 1, where condition
(1) corresponds to the first if; and conditions (2) and (3) correspond to the first and the last for,
respectively.
Algorithm 1: AKS algorithm
Input: n ∈ Z, n ≥ 2
Output: A string “Prime” or “Composite”
1 begin
2 if n = ab
, where a, b ∈ Z and a, b > 1 then
3 return “Composite”
4 end
5 Find the smallest r that satisfies or(n) > log2
(n)
6 for 2 to r do
7 if gcd(a, n) > 1 then
8 return “Composite”
9 end
10 end
11 for a ←− 1 to φ(r) log(n) do
12 if (x + a)n
≡ xn
+ a (mod xr
− 1, n) then
13 return “Composite”
14 end
15 end
16 return “Prime”
17 end
3. Proposed Method
3.1. Parallel AKS Algorithm
Scrutinizing Algorithm 1, we can see that the algorithm basically comprises 4 steps: de-
termining whether n is a perfect power (lines 2–4); determining r (line 5); determining whether n
A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
4 ISSN: 1693-6930
has prime factors ≤ r (lines 6–10); and determining the congruence of polynomials (x + a)n
and
xn
+ a modulo (xr
− 1, n) (lines 11–15) for some values of a. Of these four steps, the last takes
most of the computation times of the algorithm, since we are dealing with an enormous n. Fur-
thermore, when raising polynomial (x+a) to the power n—albeit modulo (xr
−1, n)—intermediate
results might be enormous polynomials requiring large storage and heavy computation. Our par-
allelization effort will be focused on computing this last step. Parallelizing the other steps will incur
communication overhead that, with the current state of networking technology, renders the saving
achieved by the parallelization worthless even for hundreds-decimal-digit n.
As has been noticed by Crandall and Papadopoulos in [16], AKS algorithm is an embar-
rasingly parallel algorithm. It can easily be parallelized using master-slave technique, by distribut-
ing the work of determining the congruence of polynomials (x+a)n
and xn
+a modulo (xr
−1, n) for
different values of a to different computer nodes in a message-passing parallel system. Figure 1
illustrates this master-slave technique.
0
21 uu-1
1 2 v 1 2 v 1 2 v 1 2 v… … … …
…
Master node
Slave nodes
i
j
: node i
: core j
: communication
: owner
Figure 1. The design of the parallelization technique
In the beginning, the master node performs the computation of the first three steps of AKS
algorithm sequentially. Once the master node obtains the value of r, it broadcasts the values of n
and r together with other necessary information about distribution of work (namely the distribution
of the values of a) to all slave nodes. Each slave node then proceeds with the computation of
determining the congruence of polynomials (x + a)n
and xn
+ a modulo (xr
− 1, n) for several
values of a.
A slave node communicates only with the master node and only in two cases: (1) when
for some value of a the polynomials are not congruent, and (2) when for all values of a assigned
by the master node, both polynomials are always congruent, and thus signalling that the work as-
signed to the slave node has been completed. Upon receiving a communication of type (1) from a
slave node, the master node immediately dismisses the last for loop and thereby announces that
n is composite; and proceeds to command the rest of the slave nodes to abort their computation.
Receiving communication type (2) from all slave nodes indicates that all slave nodes have com-
pleted their work and all of them find that the two polynomials are congruent for all values of a;
the master node then proceeds to announce that n is prime.
A modern computer system usually has multi-core CPUs. A parallelization technique
where each of these CPU cores in a slave computer node is treated as a slave node as well is
referred to as single-level parallelization. In this technique, each core is assigned by the master
node several values of a to compute separately from other cores in the same computer node.
Communications from all cores in a slave node to the master node must pass through the same
TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
TELKOMNIKA ISSN: 1693-6930 5
channel of communication and this may result in contention. However, compared to the computa-
tion time spent for each value of a, the overhead produced by this contention is negligible.
3.2. Workload-Distribution Strategies
The single-level parallelization techniques requires the distribution of workload from the
master node to all slave nodes. This basically entails distributing the values of a for slave nodes to
work on. Recall from Algorithm 1 that the congruence of polynomials (x + a)n
and xn
+ a modulo
xr
− 1 must be determined for 1 ≤ a ≤ φ(r) log(n) . Let q = φ(r) log(n) and u be the
number of slave nodes. Further, let % stand for the integer division operator. In the following, we
present four workload-distribution strategies that will be experimented on in this study.
Strategy 1 Figure 2 illustrates the first workload-distribution strategy. A rectangle in the figure
represents a single value of a, while the circle right blow the rectangle represents the slave node
responsible for computing that value. This strategy is the simplest of the three strategies, where
slave node i is responsible to determine the congruence of polynomials (x+a)n
and xn
+a modulo
xr
− 1, for (i − 1)(q%u) + 1 ≤ a ≤ i(q%u). Hence slave node #1 works on the first q%u values of
a, slave node #2 works on the second q%u values of a, and so on, while slave node #u works on
the remaining values of a. This last slave node may only work on fewer than q%u values of a, if q
is not divisible by u.
1 2 … q%u q%u+1 q%u+2 … 2(q%u)
2(q%u)
+1
2(q%u)
+2
… 3(q%u) … … …
(u-1)
(q%u)
(u-1)
(q%u)+1
… q
1 1 1 2 2 2 3……
Values of a
Node
3 3 u u…… u………
Values of a
Node
Figure 2. Workload-distribution strategy 1
Strategy 2 One of the main concerns with the first strategy is that one slave node is assigned
only with values of a that are consistently smaller or larger than those assigned to other slave
nodes. All values of a assigned to slave node #1, for instance, are smaller than those assigned
to slave node #2. A larger value of a may result in a longer computation time, since the resulting
intermediate polynomials will have larger coefficients, which in turn take longer to multiply and
require more storage. The second and third strategies try to address this.
1 2 … u u+1 u+2 … 2u 2u+1 2u+2 … 3u 3u+1 3u+2 … 4u 4u+1 … q
1 2 u 1 2 u 1…… 2 u 1 2 u 1…… …
q rem u
Values of a
Node
Values of a
Node
Figure 3. Workload-distribution strategy 2
Figure 3 illustrates the second workload-distribution strategy. The first slave node will get
a = 1, the second slave node will get a = 2, and so on, until the last slave node will get a = u. This
is then repeated until all values of a are exhausted. Therefore slave node i will be assigned the
values of a of i, i + u, i + 2u, . . . , i + ju, where j is the largest integer that still satisfies i + ju ≤ q.
A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
6 ISSN: 1693-6930
This strategy manages to avoid assigning one slave node values of a that are consistently smaller
or larger than those assigned to other slave nodes. However, each value of a assigned to a slave
node is always relatively smaller or larger than that assigned to other slave nodes. For every
value i assigned to slave node #1, for instance, the value i + 1 is assigned to slave node #2.
Hence, if larger value of a always results in longer computation time, slave node #1 will complete
its workload earlier than slave node #2. This problem will be addressed by the third strategy.
Strategy 3 The third strategy addresses the problem encountered in the second strategy by
ensuring that if a slave node is assigned a small value of a, it will be compensated by another
assignment with large value of a. Figure 4 illustrates the third workload-distribution strategy.
1 2 … u u+1 u+2 … 2u 2u+1 … q-2u-1 q-2u … q-u-2 q-u-1 q-u … q-1 q
1 12 2uu 1 2 12u u1 1… … … ……
Values of a
Node
Values of a
Node
Figure 4. Workload-distribution strategy 3
Hence, since slave node #1 is assigned the value a = 1 (the smallest), then it will also be
assigned the value a = q (the largest). Similarly, since slave node #2 is assigned the value a = 2
(the second smallest), then it will also be assigned the value a = q − 1 (the second largest). This
is carried out subsequently until all values of a are assigned to all slave nodes in similar fashion:
if the value a = i is assigned to slave node j, then the value a = q − i is also assigned to slave
node j, for i ≤ q%2.
Strategy 4 All previous strategies are static, in the sense that workload distributions are actually
predefined even before execution; a specific node always gets the same set of a’s when the input
n is the same. In this fourth strategy we propose a dynamic strategy where the set of a’s assigned
to a slave node cannot be predicted before it is run. The idea is, firstly, a slave node is assigned
an a according to its id, for example slave node #1 gets a = 1, slave node #2 gets a = 2 and so on
until slave node #u gets a = u. After completing a work, a slave node requests to the master node
for another remaining a or informs the master node if the polynomial congruence check produces
false result. The master node then sends a remaining a to the requesting node or the master
node simply terminates all nodes and output composite result for the other condition. When no
remaining a exists, the master node terminates all slave nodes then outputs prime result.
4. Experiments and Result
4.1. Implementation
Since we are primariy concerned with big numbers, we use GNU Multiple Precision
(GMP) arithmetic library version 6.10 to handle integer of arbitrary length. In the first step of
the algorithm, GMP function mpz perfect power p() is used to check for perfect powers. To com-
pute the value of r in the second step, we use function PowerMod() of NTL library version 9.10.0,
which basically performs integer modular exponentations. For checking the existence of factors
of the input number that are no more than r in the third step, NTL function GCD() is used.
Communications between master and slave nodes is performed using MPICH library ver-
sion 3.2. To broadcast the input number n and the value of r, the master and slave nodes use
function MPI Bcast(). Since the MPI does not support data type mpz t defined by GMP as well as
data type ZZ defined by NTL, n and r are first converted to arrays of bytes before they are broad-
cast. Once arrived, they will be converted back to type mpz t using function mpz init set str().
TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
TELKOMNIKA ISSN: 1693-6930 7
After all the values required to compute the last step are obtained by a slave node, it then com-
putes the left and right side of the congruence using function PowerMod().
4.2. Experimental Setup
All experiments in this study are conducted using High-Performance Computing (HPC)
system provided by Directorate of Information System and Resources (DSSDI) of Universitas
Gadjah Mada. The HPC system has 15 slave nodes, each with 2 CPU Dual Core AMD OpteronTM
Processor 280 (hence, 4 CPU cores), 4 GB DDR3 RAM, OpenSUSE 11.2 64 bit operating system,
and GCC compiler version 6.1.
We experiment on prime numbers ranging from 5 digits to 35 digits in length as shown in
Table 1. The seven prime numbers selected are the largest prime numbers for the corresponding
numbers of digits according to [20].
Table 1. Prime numbers used in experiments
Digits Prime Number
5 99,929
10 9,999,999,929
15 999,998,727,899,999
20 99,999,999,999,999,999,989
25 9,989,999,899,883,889,989,999,899
30 909,090,909,090,909,090,909,090,909,091
35 68,476,562,763,327,854,359,085,599,065,855,383
4.3. Result
Comparing the workload-distribution strategies Table 2 shows the running times of the se-
quential as well as the parallel implementations of AKS algorithm for the seven prime numbers.
The parallel implementations are run on a 60-processor message-passing system, while the se-
quential one is run on one of the processors. It is evident that the dynamic workload distribution
(strategy 4) performs consistently and significantly better than other strategies in all experiments,
which means that this strategy is the most load balanced among the proposed strategies. This
also indicates that the overheads associated with communication times between the master node
and slave nodes are insignificant compared to the computation times for different values of a.
Table 2. Running times (in seconds) of the different workload-distribution strategies
Digits Sequential Parallel
Strategy 1 Strategy 2 Strategy 3 Strategy 4
5 1.57168 0.41278 0.19912 0.20151 0.11407
10 105.7 5.3 2.1 2.2 1.7
15 712.2 35.3 22.6 24.0 19.6
20 3,236.0 128.2 130.4 128.8 112.3
25 8,848.8 446.6 371.5 374.1 325.4
30 32,343.2 1,421.6 1,317.3 1,972.4 1,179.6
35 70,901.7 4,121.8 4,177.6 4,152.3 2,457.4
From Table 2, it is clear that patterns from the running times of the first three workload-
distribution strategies are not easy to discern. This result is contrary to the authors’ original
A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
8 ISSN: 1693-6930
expectation, as described in Section 3.2. The result indicates that the computation times required
for the values of a are not proportional to those values: a larger value of a may require less
computation time than that of smaller one.
Speedups for various number of processors The previous result shows that workload dis-
tribution strategy 4 produces the best parallel implementation for AKS algorithm. In this part,
we focus on this strategy and find out the speedups that are achievable for various number of
processors. The result is presented in Figure 5.
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60
Speedup
Number	 of	processors
5	digits
10	digits
15	digits
20	digits
25	digits
30	digits
35	digits
Figure 5. Speedups obtained by varying the number of processors
Figure 5 shows that for almost all numbers of digits, speedup mostly grows linearly as the
number of processors used in the computation increases. The apparent exception to this is for
when the number of digits is 5. For this case there are only 275 different values of a to check and
each of them requires a relatively short computation time. When the number of processors ex-
ceeds 30, communication overheads becomes large enough to offset the savings of computation
times by the parallelism. Overall, the largest speedup is obtained when the number of digits is 10
and it is a bit more than 36 times the computation time of the sequential implementation.
The effects of multi-core processors The high-performance computing system used in the
experiments consists of computers, each with multi-core processors. When each of this core
is treated as a node, contentions may occur when there is more than one core communicating
simultaneously with the master node. In this part, for each workload-distribution strategy, we vary
the number of cores per node used in the parallel computation to establish their effects on the
overall computation time. For this purpose, three scenarios are created, namely 1 core per node,
2 cores per node, and 4 cores per node. In all of these scenarios, the overall number of cores is
maintained at 8 in order to set a baseline. Figure 6 depicts the result of the experiments.
From Figure 6, we can conclude that workload distribution strategies 1, 2 and 3 are al-
most not affected by the number of cores per node used in the parallel computation. This is
understandable since, in these strategies, a slave node rarely communicates with the master
node. Communications between a slave node and the master node occurs only during termina-
tion, namely when the slave node finds that the polynomials are not congruent for a specific value
of a or when when it finds that the polynomials are congruent for all assigned values of a.
The effect for workload-distribution strategy 4, however, is stark and the larger the prime
number the more pronounced the effect. Having more cores per node results in longer computa-
tion time. This is in line with our expectation, since, having more cores per node results in heavier
use of the communication line between the master and the slave nodes. What we do not expect
TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
TELKOMNIKA ISSN: 1693-6930 9
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
Running	time	(seconds)
Digits	of	the	prime	 number
Workload-distribution	 strategy	2
2	nodes	(8	cores)
4	nodes	(8	cores)
8	nodes	(8	cores)
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
Running	time	(seconds)
Digits	of	the	prime	 number
Workload-distribution	 strategy	1
2	nodes	(8	cores)
4	nodes	(8	cores)
8	nodes	(8	cores)
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
Running	time	(seconds)
Digits	of	the	prime	 number
Workload-distribution	 strategy	3
2	nodes	(8	cores)
4	nodes	(8	cores)
8	nodes	(8	cores)
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
Running	time	(seconds)
Digits	of	the	prime	 number
Workload-distribution	 strategy	4
2	nodes	(8	cores)
4	nodes	(8	cores)
8	nodes	(8	cores)
Figure 6. Running times for various numbers of cores per node
is for the effect to be so strong (1 core per node is faster more than twice compared to 4 cores
per node). This means that an HPC with single-core nodes will produces even better results.
5. Concluding Remarks
In this paper, we proposed a parallelization technique based on message passing paral-
lelism for AKS algorithm. We also developed four workload-distribution strategies for this par-
allelization technique. From the experiments we have conducted we conclude that dynamic
workload-distribution strategy is the most load-balanced one. Furthermore, the difference be-
tween the dynamic strategy and static strategies is so significant that it is difficult to envision cir-
cumstances when one wishes to use the static ones. Overall, the dynamic strategy can achieve
a speedup of up to 36 times the sequential computation. Nevertheless, the dynamic strategy
has one obvious drawback, namely the bottleneck in the communication line towards the master
node. The more nodes involved in the parallelism, the busier the master node and the heavier the
communication line towards the master node. We did not manage to demonstrate this due to the
limited size of the HPC available to us. We also showed that the number of cores per node has a
strong effect for the dynamic workload-distribution strategy.
Acknowledgement
The authors would like to thank Directorate of Information System and Resources (DSSDI)
of Universitas Gadjah Mada for providing the high-performance computing service used in this re-
search.
A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
10 ISSN: 1693-6930
References
[1] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and
public-key cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, Feb.
1978. [Online]. Available: http://doi.acm.org/10.1145/359340.359342
[2] A. Muzakir and A. Ashari, “Rancang bangun keamanan web service dengan metode
ws-security,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 6,
no. 1, pp. 1–10, 2012. [Online]. Available: https://jurnal.ugm.ac.id/ijccs/article/view/2035
[3] EFF, “EFF offers cooperative computing prizes,” 2009, last accessed: 2017-03-19. [Online].
Available: https://www.eff.org/awards/coop
[4] G. L. Miller, “Riemann’s hypothesis and tests for primality,” Journal of Computer
and System Sciences, vol. 13, no. 3, pp. 300–317, 1976. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0022000076800438
[5] M. O. Rabin, “Probabilistic algorithm for testing primality,” Journal of Num-
ber Theory, vol. 12, no. 1, pp. 128–138, 1980. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/0022314X80900840
[6] C. Dong, “Math in network security: A crash course,” 2016, last accessed: 2017-03-19.
[Online]. Available: http://www.doc.ic.ac.uk/∼mrh/330tutor/
[7] R. Solovay and V. Strassen, “A fast Monte-Carlo test for primality,” SIAM Journal on Com-
puting, vol. 6, no. 1, pp. 84–85, 1977. [Online]. Available: http://dx.doi.org/10.1137/0206006
[8] M. Agrawal, N. Kayal, and N. Saxena, “PRIMES is in P,” Annals of Mathematics, vol. 2, pp.
781–793, 2002.
[9] L. K. Nemana and V. C. Venkaiah, “An empirical study towards refining the AKS primality
testing algorithm,” IACR Cryptology ePrint Archive, vol. 2016, p. 362, 2016. [Online].
Available: http://eprint.iacr.org/2016/362
[10] A. Granville, “It is easy to determine whether a given integer is prime,” Bulletin of the Ameri-
can Mathematical Society, vol. 42, no. 1, pp. 3–38, 2005.
[11] H. W. Lenstra, “Primality testing with cyclotomic rings,” Mathematic Institute, University of
Leiden, Tech. Rep., 2002.
[12] D. Bernstein, “Proving primality after Agrawal-Kayal-Saxena,” Department of Mathematics,
Statistics, and Computer Science, University of Illinois, Tech. Rep., 2003. [Online]. Available:
http://cr.yp.to/papers/aks.pdf
[13] H. W. Lenstra, “Primality testing with Gaussian periods,” in FST TCS 2002: Foundations
of Software Technology and Theoretical Computer Science, 22nd Conference Kanpur,
India, December 12-14, 2002, Proceedings, ser. Lecture Notes in Computer Science,
M. Agrawal and A. Seth, Eds., vol. 2556. Springer, 2002, p. 1. [Online]. Available:
http://dx.doi.org/10.1007/3-540-36206-1 1
[14] H. W. Lenstra and C. Pomerance, “Primality testing with Gaussian periods,” Department
of Mathematics, University of Dartmouth, Tech. Rep., 2011. [Online]. Available:
https://math.dartmouth.edu/∼carlp/aks041411.pdf
[15] D. Bernstein, “Proving primality in essentially quartic random time,” Mathematics of compu-
tation, vol. 76, no. 257, pp. 389–403, 2007.
[16] R. E. Crandall and J. S. Papadopoulos, “On the implementation of AKS-class primality tests,”
University of Maryland College Park, Tech. Rep., 2003.
[17] H. Li, “The analysis and implementation of the AKS algorithm and its improvement algo-
rithms,” Master’s thesis, Department of Computer Science, University of Bath, 2007.
[18] V. Menon, “Deterministic primality testing - understanding the AKS algorithm,” CoRR, vol.
abs/1311.3785, 2013. [Online]. Available: http://arxiv.org/abs/1311.3785
[19] Z. Cao, “A note on the storage requirement for AKS primality testing algorithm,”
IACR Cryptology ePrint Archive, vol. 2013, p. 449, 2013. [Online]. Available:
http://eprint.iacr.org/2013/449
[20] C. K. Caldwell, “The prime pages: prime number research, records, and resources,” 2017,
last accessed: 2017-03-19. [Online]. Available: https://primes.utm.edu
TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
Ad

More Related Content

What's hot (19)

D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
SEENET-MTP
 
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
Jimmy Shih-Chun Hung
 
Bin packing problem two approximation
Bin packing problem two approximationBin packing problem two approximation
Bin packing problem two approximation
ijfcstjournal
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
Szymon Klarman
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
Dr. Khurram Mehboob
 
Bin packing
Bin packingBin packing
Bin packing
Haji Abdul Ghaffar Jaafar
 
Slides TSALBP ACO 2008
Slides TSALBP ACO 2008Slides TSALBP ACO 2008
Slides TSALBP ACO 2008
Manuel ChiSe
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...
ijcsity
 
220exercises2
220exercises2220exercises2
220exercises2
sadhanakumble
 
Presentacion limac-unc
Presentacion limac-uncPresentacion limac-unc
Presentacion limac-unc
Pucheta Julian
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
Bita Kazemi
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computing
theijes
 
Trajectory clustering using adaptive Euclidean distances
Trajectory clustering using adaptive Euclidean distancesTrajectory clustering using adaptive Euclidean distances
Trajectory clustering using adaptive Euclidean distances
University of Campania L. Vanvitelli
 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
Engineering Research Publication
 
B04430609
B04430609B04430609
B04430609
IOSR-JEN
 
Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means Clustering
Andreina Uzcategui
 
Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
Muhammad Ahsan
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
yuanchung
 
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
D. Vulcanov, REM — the Shape of Potentials for f(R) Theories in Cosmology and...
SEENET-MTP
 
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...
Jimmy Shih-Chun Hung
 
Bin packing problem two approximation
Bin packing problem two approximationBin packing problem two approximation
Bin packing problem two approximation
ijfcstjournal
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
Szymon Klarman
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
Dr. Khurram Mehboob
 
Slides TSALBP ACO 2008
Slides TSALBP ACO 2008Slides TSALBP ACO 2008
Slides TSALBP ACO 2008
Manuel ChiSe
 
International Journal of Computational Science and Information Technology (...
  International Journal of Computational Science and Information Technology (...  International Journal of Computational Science and Information Technology (...
International Journal of Computational Science and Information Technology (...
ijcsity
 
Presentacion limac-unc
Presentacion limac-uncPresentacion limac-unc
Presentacion limac-unc
Pucheta Julian
 
Spectral Clustering Report
Spectral Clustering ReportSpectral Clustering Report
Spectral Clustering Report
Miaolan Xie
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
Bita Kazemi
 
One More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific ComputingOne More Comments on Programming with Big Number Library in Scientific Computing
One More Comments on Programming with Big Number Library in Scientific Computing
theijes
 
Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means Clustering
Andreina Uzcategui
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
yuanchung
 

Similar to A Load-Balanced Parallelization of AKS Algorithm (20)

A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming Problems
Jody Sullivan
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
AllanKelvinSales
 
PhysRevE.89.042911
PhysRevE.89.042911PhysRevE.89.042911
PhysRevE.89.042911
chetan.nichkawde
 
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Wendy Belieu
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Karlos Svoboda
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMPERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
IJCNCJournal
 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...
Ahmed Ammar Rebai PhD
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
IJMIT JOURNAL
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
Swarm Intelligence Heuristics for Graph Coloring Problem
Swarm Intelligence Heuristics for Graph Coloring ProblemSwarm Intelligence Heuristics for Graph Coloring Problem
Swarm Intelligence Heuristics for Graph Coloring Problem
Mario Pavone
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
International Journal of Engineering Inventions www.ijeijournal.com
 
Plan economico
Plan economicoPlan economico
Plan economico
Mayra Elizabeth Tubón Lagua
 
Plan economico del 2017
Plan economico del 2017Plan economico del 2017
Plan economico del 2017
Guillermo Gallardo
 
Plan economico
Plan economicoPlan economico
Plan economico
Crist Oviedo
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Estimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methodsEstimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methods
mehmet şahin
 
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Polytechnique Montreal
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
IJCNCJournal
 
A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming Problems
Jody Sullivan
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
AllanKelvinSales
 
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Analyzing The Quantum Annealing Approach For Solving Linear Least Squares Pro...
Wendy Belieu
 
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse CodinBag of Pursuits and Neural Gas for Improved Sparse Codin
Bag of Pursuits and Neural Gas for Improved Sparse Codin
Karlos Svoboda
 
Rapport_Cemracs2012
Rapport_Cemracs2012Rapport_Cemracs2012
Rapport_Cemracs2012
Jussara F.M.
 
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMPERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
IJCNCJournal
 
Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...Ill-posedness formulation of the emission source localization in the radio- d...
Ill-posedness formulation of the emission source localization in the radio- d...
Ahmed Ammar Rebai PhD
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)International Journal of Managing Information Technology (IJMIT)
International Journal of Managing Information Technology (IJMIT)
IJMIT JOURNAL
 
An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...An improved spfa algorithm for single source shortest path problem using forw...
An improved spfa algorithm for single source shortest path problem using forw...
IJMIT JOURNAL
 
Swarm Intelligence Heuristics for Graph Coloring Problem
Swarm Intelligence Heuristics for Graph Coloring ProblemSwarm Intelligence Heuristics for Graph Coloring Problem
Swarm Intelligence Heuristics for Graph Coloring Problem
Mario Pavone
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
ANIRBANMAJUMDAR18
 
Estimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methodsEstimation of global solar radiation by using machine learning methods
Estimation of global solar radiation by using machine learning methods
mehmet şahin
 
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Polytechnique Montreal
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
IJCNCJournal
 
Ad

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
TELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
TELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
TELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
TELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
TELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
TELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
TELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
TELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
TELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
TELKOMNIKA JOURNAL
 
Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
TELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
TELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
TELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
TELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
TELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
TELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
TELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
TELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
TELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
TELKOMNIKA JOURNAL
 
Ad

Recently uploaded (20)

SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Dynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptxDynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptx
University of Glasgow
 
Comprehensive-Event-Management-System.pptx
Comprehensive-Event-Management-System.pptxComprehensive-Event-Management-System.pptx
Comprehensive-Event-Management-System.pptx
dd7devdilip
 
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxbMain cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
SunilSingh610661
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Reese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary_ The Role of Perseverance in Engineering Success.pdfReese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary
 
Surveying through global positioning system
Surveying through global positioning systemSurveying through global positioning system
Surveying through global positioning system
opneptune5
 
How to Buy Snapchat Account A Step-by-Step Guide.pdf
How to Buy Snapchat Account A Step-by-Step Guide.pdfHow to Buy Snapchat Account A Step-by-Step Guide.pdf
How to Buy Snapchat Account A Step-by-Step Guide.pdf
jamedlimmk
 
COMPUTER GRAPHICS AND VISUALIZATION :MODULE-02 notes [BCG402-CG&V].pdf
COMPUTER GRAPHICS AND VISUALIZATION :MODULE-02 notes [BCG402-CG&V].pdfCOMPUTER GRAPHICS AND VISUALIZATION :MODULE-02 notes [BCG402-CG&V].pdf
COMPUTER GRAPHICS AND VISUALIZATION :MODULE-02 notes [BCG402-CG&V].pdf
Alvas Institute of Engineering and technology, Moodabidri
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Interfacing PMW3901 Optical Flow Sensor with ESP32
Interfacing PMW3901 Optical Flow Sensor with ESP32Interfacing PMW3901 Optical Flow Sensor with ESP32
Interfacing PMW3901 Optical Flow Sensor with ESP32
CircuitDigest
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Data Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptxData Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptx
RushaliDeshmukh2
 
Artificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptxArtificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptx
DrMarwaElsherif
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
ZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JITZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JIT
maximechevalierboisv1
 
2025 Apply BTech CEC .docx
2025 Apply BTech CEC                 .docx2025 Apply BTech CEC                 .docx
2025 Apply BTech CEC .docx
tusharmanagementquot
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Dynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptxDynamics of Structures with Uncertain Properties.pptx
Dynamics of Structures with Uncertain Properties.pptx
University of Glasgow
 
Comprehensive-Event-Management-System.pptx
Comprehensive-Event-Management-System.pptxComprehensive-Event-Management-System.pptx
Comprehensive-Event-Management-System.pptx
dd7devdilip
 
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxbMain cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
Main cotrol jdbjbdcnxbjbjzjjjcjicbjxbcjcxbjcxb
SunilSingh610661
 
Novel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth ControlNovel Plug Flow Reactor with Recycle For Growth Control
Novel Plug Flow Reactor with Recycle For Growth Control
Chris Harding
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Reese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary_ The Role of Perseverance in Engineering Success.pdfReese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary_ The Role of Perseverance in Engineering Success.pdf
Reese McCrary
 
Surveying through global positioning system
Surveying through global positioning systemSurveying through global positioning system
Surveying through global positioning system
opneptune5
 
How to Buy Snapchat Account A Step-by-Step Guide.pdf
How to Buy Snapchat Account A Step-by-Step Guide.pdfHow to Buy Snapchat Account A Step-by-Step Guide.pdf
How to Buy Snapchat Account A Step-by-Step Guide.pdf
jamedlimmk
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Interfacing PMW3901 Optical Flow Sensor with ESP32
Interfacing PMW3901 Optical Flow Sensor with ESP32Interfacing PMW3901 Optical Flow Sensor with ESP32
Interfacing PMW3901 Optical Flow Sensor with ESP32
CircuitDigest
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Data Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptxData Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptx
RushaliDeshmukh2
 
Artificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptxArtificial Intelligence introduction.pptx
Artificial Intelligence introduction.pptx
DrMarwaElsherif
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
ZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JITZJIT: Building a Next Generation Ruby JIT
ZJIT: Building a Next Generation Ruby JIT
maximechevalierboisv1
 

A Load-Balanced Parallelization of AKS Algorithm

  • 1. TELKOMNIKA, Vol. x, No. x, April 2017, pp. 1 ∼ 10 ISSN: 1693-6930 1 A Load-Balanced Parallelization of AKS Algorithm Ardhi Wiratama Baskara Yudha and Reza Pulungan Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia e-mails: [email protected], [email protected] Abstract The best known deterministic polynomial-time algorithm for primality testing right now is due to Agrawal, Kayal, and Saxena. This algorithm has a time complexity O(log15/2 (n)). Although this algorithm is polynomial, its reliance on the congruence of large polynomials results in enormous computational require- ment. In this paper, we propose a parallelization technique for this algorithm based on message-passing parallelism together with four workload-distribution strategies. We perform a series of experiments on an implementation of this algorithm in a high-performance computing system consisting of 15 nodes, each with 4 CPU cores. The experiments indicate that our proposed parallelization technique introduce a significant speedup on existing implementations. Furthermore, the dynamic workload-distribution strategy performs better than the others. Overall, the experiments show that the parallelization obtains up to 36 times speedup. Keywords: Primality testing, AKS algorithm, parallelization, load balancing, high-performance computing. Copyright c 2017 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction Prime numbers are the cornerstone of number theory. Mathematicians and number the- orist, since ancient times, have been fascinated by many problems concerning prime numbers. In modern time, many of the most important cryptographic algorithms rely on big prime numbers to perform encryption and decryption. One of them is Rivest-Shamir-Adlemann (RSA) algorithm [1], which is now widely used in web security [2], including in banking transaction security. RSA al- gorithm depends on the fact that it is difficult to find the prime factors of a big integer. Electronic Frontier Foundation (EFF) offers $250,000 as a reward to the first individual or group who discov- ers a prime number with at least 1,000,000,000 decimal digits [3]. Searching for a prime number is usually based on an efficient algorithm that determines whether a given number is prime or composite. Such algorithms are called primality testing algorithms. Most of primality testing algorithms are probabilistic, namely they cannot ascertain the primality of a given number, but only provide a probability that the given number is prime. Miller- Rabin primality test [4, 5], for instance, has an error rate below 25%, which means that if the given number passes this test n times, then the probability that the number is prime is 1 − 0.25n [6]. Solovay-Strassen [7] primality test, on the other hand, has an error rate below 50%. Probabilis- tic primality testing algorithms are relatively fast, of low complexity, but with tunable accuracy. However, there are cases that require certainty that a given number is prime or not; and thus, probabilistic algorithms cannot be used. In 2002, three Indian computer scientists Agrawal, Kayal, and Saxena [8] proposed a deterministic—i.e., non-probabilistic—primality testing algorithm that runs in polynomial time; we will refer to this algorithm as AKS algorithm. This is the first deterministic polynomial-time al- gorithm for primality testing. Since this seminal paper, the primality testing problem no longer resides in the complexity classes of NP-Hard, NP, or ZPP [9]. AKS algorithm, interestingly, is relatively simple and straightforward, while previous work by other researchers attempted to show that primality testing is of polynomial time complexity by making complex modifications on existing primality testing algorithms [10]. Since this theoretical breakthrough, many researchers have proposed theoretical and practical improvements to this algorithm soon after it was released in public. Notable among them Received May 9, 201x; Revised August 3, 201x; Accepted August 16, 201x
  • 2. 2 ISSN: 1693-6930 are Lenstra [11] and Bernstein [12]. Bernstein [12] proposed two practical possibilites for acceler- ating AKS algorithm with low-level speedup by improving the integer squaring method and high- level speedup by reducing the number of for loop iterations in the last step of the algorithm. This included all state-of-the-art improvements on reducing the last for loop iterations and produced speedup of many orders of magnitudes. These have been incorporated in the latest version of AKS algorithm. Lenstra and Pomerance [13, 14], on the other hand, proposed theoretical improvements to AKS algorithm and obtained a new technique with time complexity O(log6 (n)). They modi- fied the original AKS algorithm by decreasing the number of iterations in the for loop. This is done by replacing the use of the cyclotomic polynomials in AKS algorithm by a monic polynomial f(x) of degree r with integer coefficients such that the ring Z[x]/(f(x), n) is a pseudofield. Bern- stein in [15] proposed a further theoretical improvement to AKS algorithm with time complexity O(log4 (n)). The proposal also attempted to decrease the number of iterations in the for loop by replacing the use of the cyclotomic polynomials by random Kummer extensions of Z[x]/n. Crandall and Papadopoulos [16] implemented a variant of AKS algorithm by Lenstra [11] and found that empirically the time complexity of the variant is c log6 (n), where c is around 1,000 clock cycles. Li [17] also implemented the Lenstra variant of AKS algorithm using C++ and NTL library to handle the polynomial data structure. In this implementation, a 15-decimal-digit prime number required around 3,000 seconds to compute in a single-processor computer. Menon in [18] implemented AKS algorithm in SAGE (Software for Algebra and Geometry Experimentation), and produced an implementation, in which a 25-decimal-digit prime number required more than 4,000 seconds to compute in a single-processor computer. Cao [19] analyzed the storage space re- quirement for AKS algorithm and showed that the required storage space for testing a number with length 1,024 bits is about 1,000,000,000 Gigabyte, which is practically infeasible. This is due to the need to store extremely large polynomials during the computation. This paper reports on our effort to develop a parallelization technique for AKS algorithm based on message-passing parallelism (using MPI) and to find out the best workload-distribution strategy for the parallelization technique. Organization of the paper: The paper is organized as follows: Section 2. presents the basis of AKS algorithm. Section 3. describes the proposed parallelization technique, together with four accompanying workload-distribution strategies. In Section 4., we present the result of our exper- iments with the proposed parallelization technique and the four workload-distribution strategies and provide analysis. Section 5. concludes the paper. 2. Preliminaries Let Z be the set of integers and let a and b be two positive integers. Let gcd(a, b) be the greatest common divisor of a and b. The two integers a and b are relatively prime if and only if gcd(a, b) = 1. Let φ(a) be the Euler’s totient function, namely the number of positive integers smaller than a that are relatively prime to a. For relatively prime a and r, let or(a) be the order of a modulo r, namely the smallest integer k such that ak ≡ 1 (mod r). Let a rem b be the remainder of integer division between a and b. In the earliest version of their publication, Agrawal, Kayal, and Saxena obtained an algo- rithm with the worst-case time complexity of O(log12 (n)), where n is the given number. In this paper, we are referring to the latest version (version 6) of their publication [8], in which the latest AKS algorithm was presented. The latest version has incorporated many improvements proposed by many researchers and the resulting algorithm runs in polynomial time with the worst-case com- plexity of O(log15/2 (n)). Prior to the publication of this algorithm, there were other primality proving algorithms that seemed to run in polynomial time, but AKS algorithm is the first one that is de- terministic as well as of polynomial time [16]. The main idea of AKS algorithm is described in Lemma 1, which is a generalization of Fermat’s little theorem. TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
  • 3. TELKOMNIKA ISSN: 1693-6930 3 Lemma 1 ([8]) Let a ∈ Z be relatively prime to n ∈ Z and n ≥ 2. Then n is prime if and only if: (x + a)n ≡ xn + a (mod n). (1) To reduce the number of operations performed, both sides of Equation (1) can be sim- plified by taking their respective remainders modulo a polynomial xr − 1, for some small positive r ∈ Z, namely: (x + a)n ≡ xn + a (mod xr − 1, n). (2) However, right now the bi-implication in Lemma 1 no longer applies, since non-prime n may also satisfy Equation (2) for some a and r. Theorem 1—as reformulated by Granville in [10]— forms the cornerstone of AKS algorithm. The theorem basically asserts that for appropriately selected r’s, if Equation (2) is satisfied by some a, then n must be a prime. Therefore, r must be selected accordingly. Theorem 1 ([8, 10]) Given n ∈ Z and n ≥ 2, let r < n be a positive integer satisfying or(n) > log2 (n). Then n is prime if and only if: (1) n is not a perfect power, (2) n does not have any prime factor ≤ r, and (3) (x + a)n ≡ xn + a (mod xr − 1, n) for any integer a, where 1 ≤ a ≤ φ(r) log(n). A straightforward implementation of Theorem 1 is given in Algorithm 1, where condition (1) corresponds to the first if; and conditions (2) and (3) correspond to the first and the last for, respectively. Algorithm 1: AKS algorithm Input: n ∈ Z, n ≥ 2 Output: A string “Prime” or “Composite” 1 begin 2 if n = ab , where a, b ∈ Z and a, b > 1 then 3 return “Composite” 4 end 5 Find the smallest r that satisfies or(n) > log2 (n) 6 for 2 to r do 7 if gcd(a, n) > 1 then 8 return “Composite” 9 end 10 end 11 for a ←− 1 to φ(r) log(n) do 12 if (x + a)n ≡ xn + a (mod xr − 1, n) then 13 return “Composite” 14 end 15 end 16 return “Prime” 17 end 3. Proposed Method 3.1. Parallel AKS Algorithm Scrutinizing Algorithm 1, we can see that the algorithm basically comprises 4 steps: de- termining whether n is a perfect power (lines 2–4); determining r (line 5); determining whether n A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
  • 4. 4 ISSN: 1693-6930 has prime factors ≤ r (lines 6–10); and determining the congruence of polynomials (x + a)n and xn + a modulo (xr − 1, n) (lines 11–15) for some values of a. Of these four steps, the last takes most of the computation times of the algorithm, since we are dealing with an enormous n. Fur- thermore, when raising polynomial (x+a) to the power n—albeit modulo (xr −1, n)—intermediate results might be enormous polynomials requiring large storage and heavy computation. Our par- allelization effort will be focused on computing this last step. Parallelizing the other steps will incur communication overhead that, with the current state of networking technology, renders the saving achieved by the parallelization worthless even for hundreds-decimal-digit n. As has been noticed by Crandall and Papadopoulos in [16], AKS algorithm is an embar- rasingly parallel algorithm. It can easily be parallelized using master-slave technique, by distribut- ing the work of determining the congruence of polynomials (x+a)n and xn +a modulo (xr −1, n) for different values of a to different computer nodes in a message-passing parallel system. Figure 1 illustrates this master-slave technique. 0 21 uu-1 1 2 v 1 2 v 1 2 v 1 2 v… … … … … Master node Slave nodes i j : node i : core j : communication : owner Figure 1. The design of the parallelization technique In the beginning, the master node performs the computation of the first three steps of AKS algorithm sequentially. Once the master node obtains the value of r, it broadcasts the values of n and r together with other necessary information about distribution of work (namely the distribution of the values of a) to all slave nodes. Each slave node then proceeds with the computation of determining the congruence of polynomials (x + a)n and xn + a modulo (xr − 1, n) for several values of a. A slave node communicates only with the master node and only in two cases: (1) when for some value of a the polynomials are not congruent, and (2) when for all values of a assigned by the master node, both polynomials are always congruent, and thus signalling that the work as- signed to the slave node has been completed. Upon receiving a communication of type (1) from a slave node, the master node immediately dismisses the last for loop and thereby announces that n is composite; and proceeds to command the rest of the slave nodes to abort their computation. Receiving communication type (2) from all slave nodes indicates that all slave nodes have com- pleted their work and all of them find that the two polynomials are congruent for all values of a; the master node then proceeds to announce that n is prime. A modern computer system usually has multi-core CPUs. A parallelization technique where each of these CPU cores in a slave computer node is treated as a slave node as well is referred to as single-level parallelization. In this technique, each core is assigned by the master node several values of a to compute separately from other cores in the same computer node. Communications from all cores in a slave node to the master node must pass through the same TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
  • 5. TELKOMNIKA ISSN: 1693-6930 5 channel of communication and this may result in contention. However, compared to the computa- tion time spent for each value of a, the overhead produced by this contention is negligible. 3.2. Workload-Distribution Strategies The single-level parallelization techniques requires the distribution of workload from the master node to all slave nodes. This basically entails distributing the values of a for slave nodes to work on. Recall from Algorithm 1 that the congruence of polynomials (x + a)n and xn + a modulo xr − 1 must be determined for 1 ≤ a ≤ φ(r) log(n) . Let q = φ(r) log(n) and u be the number of slave nodes. Further, let % stand for the integer division operator. In the following, we present four workload-distribution strategies that will be experimented on in this study. Strategy 1 Figure 2 illustrates the first workload-distribution strategy. A rectangle in the figure represents a single value of a, while the circle right blow the rectangle represents the slave node responsible for computing that value. This strategy is the simplest of the three strategies, where slave node i is responsible to determine the congruence of polynomials (x+a)n and xn +a modulo xr − 1, for (i − 1)(q%u) + 1 ≤ a ≤ i(q%u). Hence slave node #1 works on the first q%u values of a, slave node #2 works on the second q%u values of a, and so on, while slave node #u works on the remaining values of a. This last slave node may only work on fewer than q%u values of a, if q is not divisible by u. 1 2 … q%u q%u+1 q%u+2 … 2(q%u) 2(q%u) +1 2(q%u) +2 … 3(q%u) … … … (u-1) (q%u) (u-1) (q%u)+1 … q 1 1 1 2 2 2 3…… Values of a Node 3 3 u u…… u……… Values of a Node Figure 2. Workload-distribution strategy 1 Strategy 2 One of the main concerns with the first strategy is that one slave node is assigned only with values of a that are consistently smaller or larger than those assigned to other slave nodes. All values of a assigned to slave node #1, for instance, are smaller than those assigned to slave node #2. A larger value of a may result in a longer computation time, since the resulting intermediate polynomials will have larger coefficients, which in turn take longer to multiply and require more storage. The second and third strategies try to address this. 1 2 … u u+1 u+2 … 2u 2u+1 2u+2 … 3u 3u+1 3u+2 … 4u 4u+1 … q 1 2 u 1 2 u 1…… 2 u 1 2 u 1…… … q rem u Values of a Node Values of a Node Figure 3. Workload-distribution strategy 2 Figure 3 illustrates the second workload-distribution strategy. The first slave node will get a = 1, the second slave node will get a = 2, and so on, until the last slave node will get a = u. This is then repeated until all values of a are exhausted. Therefore slave node i will be assigned the values of a of i, i + u, i + 2u, . . . , i + ju, where j is the largest integer that still satisfies i + ju ≤ q. A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
  • 6. 6 ISSN: 1693-6930 This strategy manages to avoid assigning one slave node values of a that are consistently smaller or larger than those assigned to other slave nodes. However, each value of a assigned to a slave node is always relatively smaller or larger than that assigned to other slave nodes. For every value i assigned to slave node #1, for instance, the value i + 1 is assigned to slave node #2. Hence, if larger value of a always results in longer computation time, slave node #1 will complete its workload earlier than slave node #2. This problem will be addressed by the third strategy. Strategy 3 The third strategy addresses the problem encountered in the second strategy by ensuring that if a slave node is assigned a small value of a, it will be compensated by another assignment with large value of a. Figure 4 illustrates the third workload-distribution strategy. 1 2 … u u+1 u+2 … 2u 2u+1 … q-2u-1 q-2u … q-u-2 q-u-1 q-u … q-1 q 1 12 2uu 1 2 12u u1 1… … … …… Values of a Node Values of a Node Figure 4. Workload-distribution strategy 3 Hence, since slave node #1 is assigned the value a = 1 (the smallest), then it will also be assigned the value a = q (the largest). Similarly, since slave node #2 is assigned the value a = 2 (the second smallest), then it will also be assigned the value a = q − 1 (the second largest). This is carried out subsequently until all values of a are assigned to all slave nodes in similar fashion: if the value a = i is assigned to slave node j, then the value a = q − i is also assigned to slave node j, for i ≤ q%2. Strategy 4 All previous strategies are static, in the sense that workload distributions are actually predefined even before execution; a specific node always gets the same set of a’s when the input n is the same. In this fourth strategy we propose a dynamic strategy where the set of a’s assigned to a slave node cannot be predicted before it is run. The idea is, firstly, a slave node is assigned an a according to its id, for example slave node #1 gets a = 1, slave node #2 gets a = 2 and so on until slave node #u gets a = u. After completing a work, a slave node requests to the master node for another remaining a or informs the master node if the polynomial congruence check produces false result. The master node then sends a remaining a to the requesting node or the master node simply terminates all nodes and output composite result for the other condition. When no remaining a exists, the master node terminates all slave nodes then outputs prime result. 4. Experiments and Result 4.1. Implementation Since we are primariy concerned with big numbers, we use GNU Multiple Precision (GMP) arithmetic library version 6.10 to handle integer of arbitrary length. In the first step of the algorithm, GMP function mpz perfect power p() is used to check for perfect powers. To com- pute the value of r in the second step, we use function PowerMod() of NTL library version 9.10.0, which basically performs integer modular exponentations. For checking the existence of factors of the input number that are no more than r in the third step, NTL function GCD() is used. Communications between master and slave nodes is performed using MPICH library ver- sion 3.2. To broadcast the input number n and the value of r, the master and slave nodes use function MPI Bcast(). Since the MPI does not support data type mpz t defined by GMP as well as data type ZZ defined by NTL, n and r are first converted to arrays of bytes before they are broad- cast. Once arrived, they will be converted back to type mpz t using function mpz init set str(). TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
  • 7. TELKOMNIKA ISSN: 1693-6930 7 After all the values required to compute the last step are obtained by a slave node, it then com- putes the left and right side of the congruence using function PowerMod(). 4.2. Experimental Setup All experiments in this study are conducted using High-Performance Computing (HPC) system provided by Directorate of Information System and Resources (DSSDI) of Universitas Gadjah Mada. The HPC system has 15 slave nodes, each with 2 CPU Dual Core AMD OpteronTM Processor 280 (hence, 4 CPU cores), 4 GB DDR3 RAM, OpenSUSE 11.2 64 bit operating system, and GCC compiler version 6.1. We experiment on prime numbers ranging from 5 digits to 35 digits in length as shown in Table 1. The seven prime numbers selected are the largest prime numbers for the corresponding numbers of digits according to [20]. Table 1. Prime numbers used in experiments Digits Prime Number 5 99,929 10 9,999,999,929 15 999,998,727,899,999 20 99,999,999,999,999,999,989 25 9,989,999,899,883,889,989,999,899 30 909,090,909,090,909,090,909,090,909,091 35 68,476,562,763,327,854,359,085,599,065,855,383 4.3. Result Comparing the workload-distribution strategies Table 2 shows the running times of the se- quential as well as the parallel implementations of AKS algorithm for the seven prime numbers. The parallel implementations are run on a 60-processor message-passing system, while the se- quential one is run on one of the processors. It is evident that the dynamic workload distribution (strategy 4) performs consistently and significantly better than other strategies in all experiments, which means that this strategy is the most load balanced among the proposed strategies. This also indicates that the overheads associated with communication times between the master node and slave nodes are insignificant compared to the computation times for different values of a. Table 2. Running times (in seconds) of the different workload-distribution strategies Digits Sequential Parallel Strategy 1 Strategy 2 Strategy 3 Strategy 4 5 1.57168 0.41278 0.19912 0.20151 0.11407 10 105.7 5.3 2.1 2.2 1.7 15 712.2 35.3 22.6 24.0 19.6 20 3,236.0 128.2 130.4 128.8 112.3 25 8,848.8 446.6 371.5 374.1 325.4 30 32,343.2 1,421.6 1,317.3 1,972.4 1,179.6 35 70,901.7 4,121.8 4,177.6 4,152.3 2,457.4 From Table 2, it is clear that patterns from the running times of the first three workload- distribution strategies are not easy to discern. This result is contrary to the authors’ original A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
  • 8. 8 ISSN: 1693-6930 expectation, as described in Section 3.2. The result indicates that the computation times required for the values of a are not proportional to those values: a larger value of a may require less computation time than that of smaller one. Speedups for various number of processors The previous result shows that workload dis- tribution strategy 4 produces the best parallel implementation for AKS algorithm. In this part, we focus on this strategy and find out the speedups that are achievable for various number of processors. The result is presented in Figure 5. 0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 Speedup Number of processors 5 digits 10 digits 15 digits 20 digits 25 digits 30 digits 35 digits Figure 5. Speedups obtained by varying the number of processors Figure 5 shows that for almost all numbers of digits, speedup mostly grows linearly as the number of processors used in the computation increases. The apparent exception to this is for when the number of digits is 5. For this case there are only 275 different values of a to check and each of them requires a relatively short computation time. When the number of processors ex- ceeds 30, communication overheads becomes large enough to offset the savings of computation times by the parallelism. Overall, the largest speedup is obtained when the number of digits is 10 and it is a bit more than 36 times the computation time of the sequential implementation. The effects of multi-core processors The high-performance computing system used in the experiments consists of computers, each with multi-core processors. When each of this core is treated as a node, contentions may occur when there is more than one core communicating simultaneously with the master node. In this part, for each workload-distribution strategy, we vary the number of cores per node used in the parallel computation to establish their effects on the overall computation time. For this purpose, three scenarios are created, namely 1 core per node, 2 cores per node, and 4 cores per node. In all of these scenarios, the overall number of cores is maintained at 8 in order to set a baseline. Figure 6 depicts the result of the experiments. From Figure 6, we can conclude that workload distribution strategies 1, 2 and 3 are al- most not affected by the number of cores per node used in the parallel computation. This is understandable since, in these strategies, a slave node rarely communicates with the master node. Communications between a slave node and the master node occurs only during termina- tion, namely when the slave node finds that the polynomials are not congruent for a specific value of a or when when it finds that the polynomials are congruent for all assigned values of a. The effect for workload-distribution strategy 4, however, is stark and the larger the prime number the more pronounced the effect. Having more cores per node results in longer computa- tion time. This is in line with our expectation, since, having more cores per node results in heavier use of the communication line between the master and the slave nodes. What we do not expect TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10
  • 9. TELKOMNIKA ISSN: 1693-6930 9 0 5000 10000 15000 20000 25000 30000 0 5 10 15 20 25 30 Running time (seconds) Digits of the prime number Workload-distribution strategy 2 2 nodes (8 cores) 4 nodes (8 cores) 8 nodes (8 cores) 0 5000 10000 15000 20000 25000 30000 0 5 10 15 20 25 30 Running time (seconds) Digits of the prime number Workload-distribution strategy 1 2 nodes (8 cores) 4 nodes (8 cores) 8 nodes (8 cores) 0 5000 10000 15000 20000 25000 30000 0 5 10 15 20 25 30 Running time (seconds) Digits of the prime number Workload-distribution strategy 3 2 nodes (8 cores) 4 nodes (8 cores) 8 nodes (8 cores) 0 5000 10000 15000 20000 25000 30000 0 5 10 15 20 25 30 Running time (seconds) Digits of the prime number Workload-distribution strategy 4 2 nodes (8 cores) 4 nodes (8 cores) 8 nodes (8 cores) Figure 6. Running times for various numbers of cores per node is for the effect to be so strong (1 core per node is faster more than twice compared to 4 cores per node). This means that an HPC with single-core nodes will produces even better results. 5. Concluding Remarks In this paper, we proposed a parallelization technique based on message passing paral- lelism for AKS algorithm. We also developed four workload-distribution strategies for this par- allelization technique. From the experiments we have conducted we conclude that dynamic workload-distribution strategy is the most load-balanced one. Furthermore, the difference be- tween the dynamic strategy and static strategies is so significant that it is difficult to envision cir- cumstances when one wishes to use the static ones. Overall, the dynamic strategy can achieve a speedup of up to 36 times the sequential computation. Nevertheless, the dynamic strategy has one obvious drawback, namely the bottleneck in the communication line towards the master node. The more nodes involved in the parallelism, the busier the master node and the heavier the communication line towards the master node. We did not manage to demonstrate this due to the limited size of the HPC available to us. We also showed that the number of cores per node has a strong effect for the dynamic workload-distribution strategy. Acknowledgement The authors would like to thank Directorate of Information System and Resources (DSSDI) of Universitas Gadjah Mada for providing the high-performance computing service used in this re- search. A Load-Balanced Parallelization of AKS Algorithm (Yudha and Pulungan)
  • 10. 10 ISSN: 1693-6930 References [1] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, Feb. 1978. [Online]. Available: http://doi.acm.org/10.1145/359340.359342 [2] A. Muzakir and A. Ashari, “Rancang bangun keamanan web service dengan metode ws-security,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 6, no. 1, pp. 1–10, 2012. [Online]. Available: https://jurnal.ugm.ac.id/ijccs/article/view/2035 [3] EFF, “EFF offers cooperative computing prizes,” 2009, last accessed: 2017-03-19. [Online]. Available: https://www.eff.org/awards/coop [4] G. L. Miller, “Riemann’s hypothesis and tests for primality,” Journal of Computer and System Sciences, vol. 13, no. 3, pp. 300–317, 1976. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0022000076800438 [5] M. O. Rabin, “Probabilistic algorithm for testing primality,” Journal of Num- ber Theory, vol. 12, no. 1, pp. 128–138, 1980. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0022314X80900840 [6] C. Dong, “Math in network security: A crash course,” 2016, last accessed: 2017-03-19. [Online]. Available: http://www.doc.ic.ac.uk/∼mrh/330tutor/ [7] R. Solovay and V. Strassen, “A fast Monte-Carlo test for primality,” SIAM Journal on Com- puting, vol. 6, no. 1, pp. 84–85, 1977. [Online]. Available: http://dx.doi.org/10.1137/0206006 [8] M. Agrawal, N. Kayal, and N. Saxena, “PRIMES is in P,” Annals of Mathematics, vol. 2, pp. 781–793, 2002. [9] L. K. Nemana and V. C. Venkaiah, “An empirical study towards refining the AKS primality testing algorithm,” IACR Cryptology ePrint Archive, vol. 2016, p. 362, 2016. [Online]. Available: http://eprint.iacr.org/2016/362 [10] A. Granville, “It is easy to determine whether a given integer is prime,” Bulletin of the Ameri- can Mathematical Society, vol. 42, no. 1, pp. 3–38, 2005. [11] H. W. Lenstra, “Primality testing with cyclotomic rings,” Mathematic Institute, University of Leiden, Tech. Rep., 2002. [12] D. Bernstein, “Proving primality after Agrawal-Kayal-Saxena,” Department of Mathematics, Statistics, and Computer Science, University of Illinois, Tech. Rep., 2003. [Online]. Available: http://cr.yp.to/papers/aks.pdf [13] H. W. Lenstra, “Primality testing with Gaussian periods,” in FST TCS 2002: Foundations of Software Technology and Theoretical Computer Science, 22nd Conference Kanpur, India, December 12-14, 2002, Proceedings, ser. Lecture Notes in Computer Science, M. Agrawal and A. Seth, Eds., vol. 2556. Springer, 2002, p. 1. [Online]. Available: http://dx.doi.org/10.1007/3-540-36206-1 1 [14] H. W. Lenstra and C. Pomerance, “Primality testing with Gaussian periods,” Department of Mathematics, University of Dartmouth, Tech. Rep., 2011. [Online]. Available: https://math.dartmouth.edu/∼carlp/aks041411.pdf [15] D. Bernstein, “Proving primality in essentially quartic random time,” Mathematics of compu- tation, vol. 76, no. 257, pp. 389–403, 2007. [16] R. E. Crandall and J. S. Papadopoulos, “On the implementation of AKS-class primality tests,” University of Maryland College Park, Tech. Rep., 2003. [17] H. Li, “The analysis and implementation of the AKS algorithm and its improvement algo- rithms,” Master’s thesis, Department of Computer Science, University of Bath, 2007. [18] V. Menon, “Deterministic primality testing - understanding the AKS algorithm,” CoRR, vol. abs/1311.3785, 2013. [Online]. Available: http://arxiv.org/abs/1311.3785 [19] Z. Cao, “A note on the storage requirement for AKS primality testing algorithm,” IACR Cryptology ePrint Archive, vol. 2013, p. 449, 2013. [Online]. Available: http://eprint.iacr.org/2013/449 [20] C. K. Caldwell, “The prime pages: prime number research, records, and resources,” 2017, last accessed: 2017-03-19. [Online]. Available: https://primes.utm.edu TELKOMNIKA Vol. x, No. x, April 2017 : 1 ∼ 10