Sort Open MP
Sort Open MP
Abstract
Some algorithmic patterns are difficult to express in
OpenMP. In this paper, we use a simple sorting algorithm
to illustrate problems with recursion and the avoidance of
busy waiting. We compare several solution approaches with
respect to programming expense and performance: stacks,
nesting and a workqueue (for recursion), as well as condi-
tion variables and the sched yield–function (for busy wait-
ing). Enhancements of the OpenMP–specification are sug-
gested where we saw the need. Performance measurements
are included to backup our claims.
1. Introduction
Figure 4. Parallel part of sort omp nested 1.0 condition variables like this:
References
system with and without busy waiting. The heavy load was
[1] D. an Mey. Two OpenMP programming patterns. In Pro-
built up by using four times as many threads as there are ceedings of the Fifth European Workshop on OpenMP -
processors available on the machine. It shows average wall– EWOMP’03, September 2003.
clock time to reduce the chance of lucky scheduling deci- [2] O. A. R. Board. OpenMP specifications. http://www.
sions. openmp.org/specs.
On the SUN platform, the Pthreads solutions show [3] D. R. Butenhof. Programming with POSIX Threads.
the results we expected: Program sort pthreads 1.0 Addison–Wesley, 1997.
takes considerably longer than sort pthreads cv and [4] C. Hoare. Quicksort. The Computer Journal, 5:10–15, 1962.
[5] H. Lu, C. Hu, and W. Zwaenepoel. OpenMP on networks of
sort pthreads yield. There are two relatively big surprises
workstations. In Proc. of Supercomputing’98, 1998.
for us though: First, the OpenMP–versions are slow as com- [6] R. Parikh. Accelerating quicksort on the intel pen-
pared to the Pthreads–versions (in case of the SUN ma- tium 4 processor with hyper–threading technology.
chine so slow that we decided to cancel the runs after 10 www.intel.com/cd/ids/developer/asmo-na/eng/technologies/
minutes). The reasons for this are not yet clear to us and threading/hyperthreading/20372.htm, 2003.
still under investigation. Second, on the AMD–machine, no [7] S. Shah, G. Haab, P. Petersen, and J. Throop. Flexible con-
performance difference is noticeable between the different trol structures for parallelism in OpenMP. In Proceedings of
Pthreads–versions. A better scheduler might account for the Fourth European Workshop on OpenMP - EWOMP’02,
this, but we are still investigating this question as well. September 2002.