The best known deterministic polynomial-time algorithm for primality testing right now is due to
Agrawal, Kayal, and Saxena. This algorithm has a time complexity O(log15=2(n)). Although this algorithm is
polynomial, its reliance on the congruence of large polynomials results in enormous computational requirement.
In this paper, we propose a parallelization technique for this algorithm based on message-passing
parallelism together with four workload-distribution strategies. We perform a series of experiments on an
implementation of this algorithm in a high-performance computing system consisting of 15 nodes, each with
4 CPU cores. The experiments indicate that our proposed parallelization technique introduce a significant
speedup on existing implementations. Furthermore, the dynamic workload-distribution strategy performs
better than the others. Overall, the experiments show that the parallelization obtains up to 36 times speedup.