SlideShare a Scribd company logo
Debugging and optimization of multi-
thread OpenMP-programs
Authors: Andrey Karpov, Evgeniy Romanovsky

Date: 20.11.2009


Abstract
The task of familiarizing programmers with the sphere of developing parallel applications is getting more
and more urgent. This article is a brief introduction into creation of multi-thread applications based on
OpenMP technology. The approaches to debugging and optimization of parallel applications are
described.

In this article we discuss OpenMP technology whose main value is the possibility of improving and
optimizing the already created code. OpenMP standard is a set of specifications for paralleling code in
an environment with shared memory. OpenMP can be used if this standard is supported by the
compiler. Besides, you need absolutely new tools for the debugging stage when errors are detected,
localized and corrected and optimization is performed.

A debugger for sequential code is a tool which a programmer knows very well and uses very often. It
allows you to trace changes of variables' values when debugging a program in single-stepping mode
with the help of advanced user interface. But in case of debugging and testing of multi-thread
applications everything is different, in particular creation of multi-thread applications becomes the main
area in creating effective applications.

Debugging of a sequential program is based on that the level of predictability of initial and current
program's states is determined by the input data. When a programmer debugs multi-thread code he
usually faces absolutely unique problems: in different operation systems different planning strategies
are used, the computer system's load changes dynamically, processes' priorities can be different etc.
Precise reconstruction of the program's state at a certain moment of its execution (a trivial thing for
sequential debugging) is very difficult in case of a parallel program because of its undetermined
behavior. In other words, behavior of processes launched in the system, in particular their execution and
waiting for execution, deadlocks and the like, depends on random events occurring in the system. What
to do? It is obvious that we need absolutely different means for debugging parallel code.

As parallel computer systems have become a common thing on consumer market, demand for means of
debugging multi-thread applications has increased greatly. We will consider debugging and increase of
performance of a multi-thread application based on OpenMP technology. The full program text,
separate fragments from which we will consider further, is given in Appendix 1 at the end of the article.

Let's consider sequential code of 'Function' function given in Listing 1 as an example. This simple
subprogram calculates values of some mathematical function with one argument.

double Function(int N)

{
double x, y, s=0;

    for (int i=1; i<=N; i++) {

      x = (double)i/N;

      y = x;

      for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

      };

    };

    return s;

}

When calling this function with N argument equal 15000, we'll get 287305025.528.

This function can be easily paralleled with the help of OpenMP standard's means. Let's add only one
string before the first operator for (Listing 2).

double FunctionOpenMP(int N)

{

    double x, y, s=0;

    #pragma omp parallel for num_threads(2)

    for (int i=1; i<=N; i++) {

      x = (double)i/N;

      y = x;

      for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

      };

    };

    return s;

}

Unfortunately, the code we've created is incorrect and the result of the function is in general undefined.
For example, it can be 298441282.231. Let's consider possible causes.
The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources
common for all launched processes, and in particular - with shared variables.

This program is successfully compiled in Microsoft Visual Studio 2005 environment and the compiler
even doesn't show any warning messages. However it is incorrect. To understand this you should recall
that variables in OpenMP-programs are divided into shared, existing as single copies and available for all
the threads, and private, localized in a concrete process. Besides, there is a rule saying that by default all
the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables
defined inside these parallel regions.

From the example above it is obvious that x, y and s variables are shared what it absolutely incorrect. s
variable should be shared as it is the adder in the given algorithm. But when working with x or y each
process calculates their next value and writes it into the corresponding variable (x or y). And in this case
the result depends on the sequence of executing the parallel threads. In other words, if the first thread
calculates x value, writes it into x variable and after the same operations are performed by the second
thread, tries to read the value of x variable it will get the value written into it by the last thread, i.e. the
second one. Such errors, when program operation depends on the sequence of executing different code
sections, are called race condition or data race ("race" condition or "race" of computing threads; it
means that unsynchronized memory accesses occur).

To search such errors we need special program means. One of them is Intel Thread Checker. The
product's site: http://www.viva64.com/go.php?url=526. This program is provided as a module for Intel
VTune Performance Analyzer profiler adding to already existing means for working with multi-thread
code. Intel Thread Checker allows you to detect both the errors described above and many other
defects, for example, deadlocks (points of mutual lock of computational threads) and memory leaks.

When Intel Thread Checker is installed, a new project category will appear in New Project dialog of Intel
VTune Performance Analyzer application - Threading Wizards (wizards for working with threads), among
which there will be Intel Thread Checker Wizard. To launch an example, you should select it and point
the path to the executed program in the next wizard's window. The program will be executed and the
profiler will collect all data about the application's operation. An example of such information provided
by Intel Thread Checker is shown on Figure 1.
Debugging and optimization of multi-thread OpenMP-programs
Figure 1 - a lot of critical errors were detected during Thread Checker's operation

As we see, the number of errors is large even for such a small program. Thread Checker groups the
detected errors simultaneously estimating how critical they are for program operation and provides
their description increasing the programmer's labor effectiveness. Besides, on Source View inlay there is
program code of the application where the sections with the errors are marked (Figure 2).
Debugging and optimization of multi-thread OpenMP-programs
Figure 2 - Analysis of multi-thread code by Intel Thread Checker

We should mention that Intel Thread Checker cannot detect errors in some cases. This occurs when the
code rarely gets control or is executed on a system with a different architecture. Errors can be also
missed when the set of input text data differs greatly from the data processed by the program when
used by end users. All this doesn't allow us to be sure that there are no errors in a multi-thread program
when it is tested by dynamic analysis means the results of which depend on the environment and time
of its execution.

But there is another tool and it is good news for OpenMP developers. This tool is VivaMP offering an
alternative approach to verifying parallel programs. VivaMP is built according the principle of static code
analyzer and allows you to test the application code without launching it. To learn more about VivaMP
tool visit the developers' site http://www.viva64.com/vivamp-tool/.

VivaMP's scopes of use:

   •   Controlling correctness of the code of applications developed on the basis of OpenMP
       technology.
   •   Aid in mastering OpenMP technology and integrating it into already existing projects.
   •   Creating parallel applications which use resources more effectively.
   •   Search of errors in existing OpenMP applications.

VivaMP analyzer integrates into Visual Studio 2005/2008 environment and provides simple interface for
testing applications (Figure 3).




                  Figure 3 - Launch of VivaMP tool, integrated into Visual Studio 2005
If we launch VivaMP for our example we'll get a message about errors in 4 different strings where
                                                 message
incorrect modification of variables occurs (Figure 4).




                        Figure 4 - The result of VivaMP static analyzer's operation

Of course, static analysis also has some disadvantages as well as dynamic analysis. But being used
together these methodologies (Intel Thread Checker and VivaMP tools) supplement each other very
well. And used together they serve rather a safe method of detecting errors in m
                                                                               multi-thread applications.

The above described error of writing into x and y variables detected by Intel Thread Checker and VivaMP
tools can be easily corrected: you only need to add one more directive into '#pragma omp parallel for'
construction: private (x, y). Thus, these two variables will be defined as private and there will be
                       x,
separate copies of x and y in each computational thread. But you should also keep in mind that all the
threads save the calculated result by adding it to s variable. Such errors occur when one computational
thread tries to write some value into shared memory and simultaneously the second thread performs
reading. In the given example it can lead to an incorrect result.

Let's consider s += j*y instruction. Originally it is suggested that each thread add the calculated result to
the current value of s variable and then the same operations are executed by all the other threads. But
in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them
first reads the current value of s variable, then adds the result of j*y multiplication to this value and
writes the final result into s variable.

Unlike reading operation which can be implemented in parallel mode and is quite quick, the writing
operation is always sequential. Consequently, if the first thread writes the new value first, the second
thread, having performed the writing operation after that, will erase the result of the first thread's
calculations because the both computational threads read one and the same value of s variable and
                                                              one
begin to write their data into it. In other words, the value of s variable which will be written into shared
memory by the second thread doesn't consider the results of calculations performed by the first thread
at all. You can avoid such a situation by making sure that at any moment only one thread is allowed to
    ll.
execute s += j*y operation. Such operations are called indivisible or atomic. When we need to point the
compiler that some instruction is atomic we use #pragma omp atomic construction. The program code
in which the described operations are corrected is shown in Listing 3.

double FixedFunctionOpenMP(int N)

{

    double x, y, s=0;

    #pragma omp parallel for 

               private(x,y) num_threads(2)

    for (int i=1; i<=N; i++) {

      x = (double)i/N;

      y = x;

      for (int j=1; j<=N; j++) {

         #pragma omp atomic

         s += j * y;

         y = y * x;

      };

    };

    return s;

}

Having recompiled the program and analyzed it once again in Thread Checker, we'll see that it doesn't
contain critical errors. Only two messages are shown telling that the parallel threads end when reaching
return operator in MathFunction function. In the given example it is alright because only the code inside
this function is paralleled. VivaMP static analyzer won't show any messages on this code at all as it is
absolutely correct from its viewpoint.

But some work still remains. Let's see if our code has really become more effective after paralleling.
Let's measure the execution time for three functions: 1 - sequential, 2 - parallel incorrect, 3 - parallel
correct. The results of this measuring for N=1500 are given in Table 1.

Function                                                          Result             Execution time
Sequential variant of the function                                287305025.528      0.5781 seconds
Incorrect variant of the parallel function                        298441282.231      2.9531 seconds
Correct variant of the parallel function using atomic directive   287305025.528      36.8281 seconds
Table 1 - Results of functions' operation

What do we see in the table? The parallel variant of the incorrect function works slower in several times.
But we are not interested in this function. The problem is that the correct variant works even more than
60 times slower. Do we need such parallelism? Of course not.
The reason is that we have chosen a very ineffective method of solving the problem of summing the
result in s variable by using atomic directive. This approach leads to that the threads wait for each other
very often. To avoid constant deadlocks when executing atomic summing operation we can use the
special directive reduction. reduction option defines that the variable will get the combined value at the
exit from the parallel block. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The
modified variant of the function is shown in Listing 4.

double OptimizedFunction(int N)

{

    double x, y, s=0;

    #pragma omp parallel for private(x,y) 

               num_threads(2) reduction(+: s)

    for (int i=1; i<=N; i++) {

      x = (double)i/N;

      y = x;

      for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

      };

    };

    return s;

}

In this case we'll get the function's variant not only correct but of higher performance as well (Table 2).
The speed of calculations has almost doubled (more exactly, it has increased in 1.85 times) what is a
good result for such functions.

Function                                                             Result            Execution time
Sequential variant of the function                                   287305025.528     0.5781 seconds
Incorrect variant of the parallel function                           298441282.231     2.9531 seconds
Correct variant of the parallel function using atomic directive      287305025.528     36.8281 seconds
Correct variant of the parallel function using reduction directive   287305025.528     0.3125 seconds
Table 2 - Results of functions' operation

In conclusion I would like to point out once again that an operable parallel program not always can be
effective. And although parallel programming provides many ways to increase code effectiveness it
demands attention and good knowledge of the used technologies from the programmer. Fortunately,
there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of
multi-thread applications. Dear readers, I wish you good luck in mastering the new field of knowledge!
Appendix N1. Text of the demo program
#include "stdafx.h"

#include <omp.h>

#include <stdlib.h>

#include <windows.h>

class VivaMeteringTimeStruct {

public:

    VivaMeteringTimeStruct()

     { m_userTime = GetCurrentUserTime(); }

    ~VivaMeteringTimeStruct()

     { printf("Time = %.4f secondsn", GetUserSeconds()); }

    double GetUserSeconds();

private:

    __int64 GetCurrentUserTime() const;

    __int64 m_userTime;

};

__int64 VivaMeteringTimeStruct::GetCurrentUserTime() const

{

    FILETIME creationTime, exitTime, kernelTime, userTime;

    GetThreadTimes(GetCurrentThread(), &creationTime,

                   &exitTime, &kernelTime, &userTime);

    __int64 curTime;

    curTime = userTime.dwHighDateTime;

    curTime <<= 32;

    curTime += userTime.dwLowDateTime;

    return curTime;

}

double VivaMeteringTimeStruct::GetUserSeconds()

{

    __int64 delta = GetCurrentUserTime() - m_userTime;
return double(delta) / 10000000.0;

}

double Function(int N)

{

    double x, y, s=0;

    for (int i=1; i<=N; i++) {

     x = (double)i/N;

     y = x;

     for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

     };

    };

    return s;

}

double FunctionOpenMP(int N)

{

    double x, y, s=0;

    #pragma omp parallel for num_threads(2)

    for (int i=1; i<=N; i++) {

     x = (double)i/N;

     y = x;

     for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

     };

    };

    return s;

}

double FixedFunctionOpenMP(int N)
{

    double x, y, s=0;

    #pragma omp parallel for private(x,y) num_threads(2)

    for (int i=1; i<=N; i++) {

     x = (double)i/N;

     y = x;

     for (int j=1; j<=N; j++) {

         #pragma omp atomic

         s += j * y;

         y = y * x;

     };

    };

    return s;

}

double OptimizedFunction(int N)

{

    double x, y, s=0;

    #pragma omp parallel for private(x,y) 

            num_threads(2) reduction(+: s)

    for (int i=1; i<=N; i++) {

     x = (double)i/N;

     y = x;

     for (int j=1; j<=N; j++) {

         s += j * y;

         y = y * x;

     };

    };

    return s;

}

int _tmain(int , _TCHAR* [])
{

    int N = 15000;

    {

        VivaMeteringTimeStruct Timer;

        printf("Result = %.3f   ", Function(N));

    }

    {

        VivaMeteringTimeStruct Timer;

        printf("Result = %.3f   ", FunctionOpenMP(N));

    }

    {

        VivaMeteringTimeStruct Timer;

        printf("Result = %.3f   ", FixedFunctionOpenMP(N));

    }

    {

        VivaMeteringTimeStruct Timer;

        printf("Result = %.3f   ", OptimizedFunction(N));

    }

    return 0;

}

More Related Content

What's hot (20)

Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
Vasavi College of Engg
 
RETURN ORIENTED OBFUSCATION
RETURN ORIENTED OBFUSCATIONRETURN ORIENTED OBFUSCATION
RETURN ORIENTED OBFUSCATION
csandit
 
Building of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code loggingBuilding of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code logging
PVS-Studio
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
butest
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
Student
 
Introduction of exception in vb.net
Introduction of exception in vb.netIntroduction of exception in vb.net
Introduction of exception in vb.net
suraj pandey
 
Compiler design error handling
Compiler design error handlingCompiler design error handling
Compiler design error handling
RohitK71
 
Introduction to programming languages part 2
Introduction to programming languages   part 2Introduction to programming languages   part 2
Introduction to programming languages part 2
university of education,Lahore
 
Ds03 part i algorithms by jyoti lakhani
Ds03 part i algorithms   by jyoti lakhaniDs03 part i algorithms   by jyoti lakhani
Ds03 part i algorithms by jyoti lakhani
jyoti_lakhani
 
Automated bug localization
Automated bug localizationAutomated bug localization
Automated bug localization
Xin Ye
 
Introduction to Java programming - Java tutorial for beginners to teach Java ...
Introduction to Java programming - Java tutorial for beginners to teach Java ...Introduction to Java programming - Java tutorial for beginners to teach Java ...
Introduction to Java programming - Java tutorial for beginners to teach Java ...
Duckademy IT courses
 
Core java slides
Core java slidesCore java slides
Core java slides
Abhilash Nair
 
Unit 3.1 Algorithm and Flowchart
Unit 3.1 Algorithm and FlowchartUnit 3.1 Algorithm and Flowchart
Unit 3.1 Algorithm and Flowchart
Bom Khati
 
Understanding Annotations in Java
Understanding Annotations in JavaUnderstanding Annotations in Java
Understanding Annotations in Java
Ecommerce Solution Provider SysIQ
 
Which is not a step in the problem
Which is not a step in the problemWhich is not a step in the problem
Which is not a step in the problem
kasguest
 
Testing Options in Java
Testing Options in JavaTesting Options in Java
Testing Options in Java
Michael Fons
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
Sudip Singh
 
Tracking my face with matlab ws word format
Tracking my face with matlab ws word formatTracking my face with matlab ws word format
Tracking my face with matlab ws word format
Gaspard Ggas
 
SMI - Introduction to Java
SMI - Introduction to JavaSMI - Introduction to Java
SMI - Introduction to Java
SMIJava
 
50120140502017
5012014050201750120140502017
50120140502017
IAEME Publication
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
Vasavi College of Engg
 
RETURN ORIENTED OBFUSCATION
RETURN ORIENTED OBFUSCATIONRETURN ORIENTED OBFUSCATION
RETURN ORIENTED OBFUSCATION
csandit
 
Building of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code loggingBuilding of systems of automatic C/C++ code logging
Building of systems of automatic C/C++ code logging
PVS-Studio
 
Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...Finding latent code errors via machine learning over program ...
Finding latent code errors via machine learning over program ...
butest
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
Student
 
Introduction of exception in vb.net
Introduction of exception in vb.netIntroduction of exception in vb.net
Introduction of exception in vb.net
suraj pandey
 
Compiler design error handling
Compiler design error handlingCompiler design error handling
Compiler design error handling
RohitK71
 
Ds03 part i algorithms by jyoti lakhani
Ds03 part i algorithms   by jyoti lakhaniDs03 part i algorithms   by jyoti lakhani
Ds03 part i algorithms by jyoti lakhani
jyoti_lakhani
 
Automated bug localization
Automated bug localizationAutomated bug localization
Automated bug localization
Xin Ye
 
Introduction to Java programming - Java tutorial for beginners to teach Java ...
Introduction to Java programming - Java tutorial for beginners to teach Java ...Introduction to Java programming - Java tutorial for beginners to teach Java ...
Introduction to Java programming - Java tutorial for beginners to teach Java ...
Duckademy IT courses
 
Unit 3.1 Algorithm and Flowchart
Unit 3.1 Algorithm and FlowchartUnit 3.1 Algorithm and Flowchart
Unit 3.1 Algorithm and Flowchart
Bom Khati
 
Which is not a step in the problem
Which is not a step in the problemWhich is not a step in the problem
Which is not a step in the problem
kasguest
 
Testing Options in Java
Testing Options in JavaTesting Options in Java
Testing Options in Java
Michael Fons
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
Sudip Singh
 
Tracking my face with matlab ws word format
Tracking my face with matlab ws word formatTracking my face with matlab ws word format
Tracking my face with matlab ws word format
Gaspard Ggas
 
SMI - Introduction to Java
SMI - Introduction to JavaSMI - Introduction to Java
SMI - Introduction to Java
SMIJava
 

Viewers also liked (9)

Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
PVS-Studio
 
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applicationsPVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio
 
Description of VivaVisualCode
Description of VivaVisualCodeDescription of VivaVisualCode
Description of VivaVisualCode
PVS-Studio
 
Changes in programmer tools' infrastructure
Changes in programmer tools' infrastructureChanges in programmer tools' infrastructure
Changes in programmer tools' infrastructure
PVS-Studio
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs Chromium
PVS-Studio
 
How VivaCore library appeared
How VivaCore library appearedHow VivaCore library appeared
How VivaCore library appeared
PVS-Studio
 
VivaMP - a tool for OpenMP
VivaMP - a tool for OpenMPVivaMP - a tool for OpenMP
VivaMP - a tool for OpenMP
PVS-Studio
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!
PVS-Studio
 
Lesson 21. Pattern 13. Data alignment
Lesson 21. Pattern 13. Data alignmentLesson 21. Pattern 13. Data alignment
Lesson 21. Pattern 13. Data alignment
PVS-Studio
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
PVS-Studio
 
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applicationsPVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio, a solution for developers of modern resource-intensive applications
PVS-Studio
 
Description of VivaVisualCode
Description of VivaVisualCodeDescription of VivaVisualCode
Description of VivaVisualCode
PVS-Studio
 
Changes in programmer tools' infrastructure
Changes in programmer tools' infrastructureChanges in programmer tools' infrastructure
Changes in programmer tools' infrastructure
PVS-Studio
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs Chromium
PVS-Studio
 
How VivaCore library appeared
How VivaCore library appearedHow VivaCore library appeared
How VivaCore library appeared
PVS-Studio
 
VivaMP - a tool for OpenMP
VivaMP - a tool for OpenMPVivaMP - a tool for OpenMP
VivaMP - a tool for OpenMP
PVS-Studio
 
Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!Parallel programs to multi-processor computers!
Parallel programs to multi-processor computers!
PVS-Studio
 
Lesson 21. Pattern 13. Data alignment
Lesson 21. Pattern 13. Data alignmentLesson 21. Pattern 13. Data alignment
Lesson 21. Pattern 13. Data alignment
PVS-Studio
 

Similar to Debugging and optimization of multi-thread OpenMP-programs (20)

Testing parallel programs
Testing parallel programsTesting parallel programs
Testing parallel programs
PVS-Studio
 
S D D Program Development Tools
S D D  Program  Development  ToolsS D D  Program  Development  Tools
S D D Program Development Tools
gavhays
 
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
IJCSEIT Journal
 
4 coding from algorithms
4 coding from algorithms4 coding from algorithms
4 coding from algorithms
hccit
 
265 ge8151 problem solving and python programming - 2 marks with answers
265   ge8151 problem solving and python programming - 2 marks with answers265   ge8151 problem solving and python programming - 2 marks with answers
265 ge8151 problem solving and python programming - 2 marks with answers
vithyanila
 
Phases of compiler
Phases of compilerPhases of compiler
Phases of compiler
ahsaniftikhar19
 
Program concep sequential statements
Program concep sequential statementsProgram concep sequential statements
Program concep sequential statements
ankurkhanna
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
Andrey Karpov
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 
Programming C ppt for learning foundations
Programming C ppt for learning foundationsProgramming C ppt for learning foundations
Programming C ppt for learning foundations
ssuser65733f
 
Software code metrics
Software code metricsSoftware code metrics
Software code metrics
PVS-Studio
 
Porting is a Delicate Matter: Checking Far Manager under Linux
Porting is a Delicate Matter: Checking Far Manager under LinuxPorting is a Delicate Matter: Checking Far Manager under Linux
Porting is a Delicate Matter: Checking Far Manager under Linux
PVS-Studio
 
100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?
PVS-Studio
 
Diving into VS 2015 Day2
Diving into VS 2015 Day2Diving into VS 2015 Day2
Diving into VS 2015 Day2
Akhil Mittal
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelop
PVS-Studio
 
Tutorial 37 API Coding
Tutorial 37 API CodingTutorial 37 API Coding
Tutorial 37 API Coding
Max Kleiner
 
Parallel programming model
Parallel programming modelParallel programming model
Parallel programming model
Illuru Phani Kumar
 
Mpi.net tutorial
Mpi.net tutorialMpi.net tutorial
Mpi.net tutorial
pablodutrarodrigues
 
Problem Solving Techniques
Problem Solving TechniquesProblem Solving Techniques
Problem Solving Techniques
Ashesh R
 
Testing parallel programs
Testing parallel programsTesting parallel programs
Testing parallel programs
PVS-Studio
 
S D D Program Development Tools
S D D  Program  Development  ToolsS D D  Program  Development  Tools
S D D Program Development Tools
gavhays
 
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
IJCSEIT Journal
 
4 coding from algorithms
4 coding from algorithms4 coding from algorithms
4 coding from algorithms
hccit
 
265 ge8151 problem solving and python programming - 2 marks with answers
265   ge8151 problem solving and python programming - 2 marks with answers265   ge8151 problem solving and python programming - 2 marks with answers
265 ge8151 problem solving and python programming - 2 marks with answers
vithyanila
 
Program concep sequential statements
Program concep sequential statementsProgram concep sequential statements
Program concep sequential statements
ankurkhanna
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
Andrey Karpov
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 
Difficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usabilityDifficulties of comparing code analyzers, or don't forget about usability
Difficulties of comparing code analyzers, or don't forget about usability
PVS-Studio
 
Programming C ppt for learning foundations
Programming C ppt for learning foundationsProgramming C ppt for learning foundations
Programming C ppt for learning foundations
ssuser65733f
 
Software code metrics
Software code metricsSoftware code metrics
Software code metrics
PVS-Studio
 
Porting is a Delicate Matter: Checking Far Manager under Linux
Porting is a Delicate Matter: Checking Far Manager under LinuxPorting is a Delicate Matter: Checking Far Manager under Linux
Porting is a Delicate Matter: Checking Far Manager under Linux
PVS-Studio
 
100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?100% code coverage by static analysis - is it that good?
100% code coverage by static analysis - is it that good?
PVS-Studio
 
Diving into VS 2015 Day2
Diving into VS 2015 Day2Diving into VS 2015 Day2
Diving into VS 2015 Day2
Akhil Mittal
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelop
PVS-Studio
 
Tutorial 37 API Coding
Tutorial 37 API CodingTutorial 37 API Coding
Tutorial 37 API Coding
Max Kleiner
 
Problem Solving Techniques
Problem Solving TechniquesProblem Solving Techniques
Problem Solving Techniques
Ashesh R
 

Recently uploaded (20)

Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
Connect and Protect: Networks and Network Security
Connect and Protect: Networks and Network SecurityConnect and Protect: Networks and Network Security
Connect and Protect: Networks and Network Security
VICTOR MAESTRE RAMIREZ
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
TrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI PaymentsTrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI Payments
Trs Labs
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
The Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdfThe Changing Compliance Landscape in 2025.pdf
The Changing Compliance Landscape in 2025.pdf
Precisely
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
Connect and Protect: Networks and Network Security
Connect and Protect: Networks and Network SecurityConnect and Protect: Networks and Network Security
Connect and Protect: Networks and Network Security
VICTOR MAESTRE RAMIREZ
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent LasterAI 3-in-1: Agents, RAG, and Local Models - Brent Laster
AI 3-in-1: Agents, RAG, and Local Models - Brent Laster
All Things Open
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
TrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI PaymentsTrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI Payments
Trs Labs
 
Vaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without HallucinationsVaibhav Gupta BAML: AI work flows without Hallucinations
Vaibhav Gupta BAML: AI work flows without Hallucinations
john409870
 
Play It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google CertificatePlay It Safe: Manage Security Risks - Google Certificate
Play It Safe: Manage Security Risks - Google Certificate
VICTOR MAESTRE RAMIREZ
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 

Debugging and optimization of multi-thread OpenMP-programs

  • 1. Debugging and optimization of multi- thread OpenMP-programs Authors: Andrey Karpov, Evgeniy Romanovsky Date: 20.11.2009 Abstract The task of familiarizing programmers with the sphere of developing parallel applications is getting more and more urgent. This article is a brief introduction into creation of multi-thread applications based on OpenMP technology. The approaches to debugging and optimization of parallel applications are described. In this article we discuss OpenMP technology whose main value is the possibility of improving and optimizing the already created code. OpenMP standard is a set of specifications for paralleling code in an environment with shared memory. OpenMP can be used if this standard is supported by the compiler. Besides, you need absolutely new tools for the debugging stage when errors are detected, localized and corrected and optimization is performed. A debugger for sequential code is a tool which a programmer knows very well and uses very often. It allows you to trace changes of variables' values when debugging a program in single-stepping mode with the help of advanced user interface. But in case of debugging and testing of multi-thread applications everything is different, in particular creation of multi-thread applications becomes the main area in creating effective applications. Debugging of a sequential program is based on that the level of predictability of initial and current program's states is determined by the input data. When a programmer debugs multi-thread code he usually faces absolutely unique problems: in different operation systems different planning strategies are used, the computer system's load changes dynamically, processes' priorities can be different etc. Precise reconstruction of the program's state at a certain moment of its execution (a trivial thing for sequential debugging) is very difficult in case of a parallel program because of its undetermined behavior. In other words, behavior of processes launched in the system, in particular their execution and waiting for execution, deadlocks and the like, depends on random events occurring in the system. What to do? It is obvious that we need absolutely different means for debugging parallel code. As parallel computer systems have become a common thing on consumer market, demand for means of debugging multi-thread applications has increased greatly. We will consider debugging and increase of performance of a multi-thread application based on OpenMP technology. The full program text, separate fragments from which we will consider further, is given in Appendix 1 at the end of the article. Let's consider sequential code of 'Function' function given in Listing 1 as an example. This simple subprogram calculates values of some mathematical function with one argument. double Function(int N) {
  • 2. double x, y, s=0; for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } When calling this function with N argument equal 15000, we'll get 287305025.528. This function can be easily paralleled with the help of OpenMP standard's means. Let's add only one string before the first operator for (Listing 2). double FunctionOpenMP(int N) { double x, y, s=0; #pragma omp parallel for num_threads(2) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } Unfortunately, the code we've created is incorrect and the result of the function is in general undefined. For example, it can be 298441282.231. Let's consider possible causes.
  • 3. The main cause of errors in parallel programs is incorrect work with shared resources, i.e. resources common for all launched processes, and in particular - with shared variables. This program is successfully compiled in Microsoft Visual Studio 2005 environment and the compiler even doesn't show any warning messages. However it is incorrect. To understand this you should recall that variables in OpenMP-programs are divided into shared, existing as single copies and available for all the threads, and private, localized in a concrete process. Besides, there is a rule saying that by default all the variables in parallel regions of OpenMP are shared save for parallel loops' indexes and variables defined inside these parallel regions. From the example above it is obvious that x, y and s variables are shared what it absolutely incorrect. s variable should be shared as it is the adder in the given algorithm. But when working with x or y each process calculates their next value and writes it into the corresponding variable (x or y). And in this case the result depends on the sequence of executing the parallel threads. In other words, if the first thread calculates x value, writes it into x variable and after the same operations are performed by the second thread, tries to read the value of x variable it will get the value written into it by the last thread, i.e. the second one. Such errors, when program operation depends on the sequence of executing different code sections, are called race condition or data race ("race" condition or "race" of computing threads; it means that unsynchronized memory accesses occur). To search such errors we need special program means. One of them is Intel Thread Checker. The product's site: http://www.viva64.com/go.php?url=526. This program is provided as a module for Intel VTune Performance Analyzer profiler adding to already existing means for working with multi-thread code. Intel Thread Checker allows you to detect both the errors described above and many other defects, for example, deadlocks (points of mutual lock of computational threads) and memory leaks. When Intel Thread Checker is installed, a new project category will appear in New Project dialog of Intel VTune Performance Analyzer application - Threading Wizards (wizards for working with threads), among which there will be Intel Thread Checker Wizard. To launch an example, you should select it and point the path to the executed program in the next wizard's window. The program will be executed and the profiler will collect all data about the application's operation. An example of such information provided by Intel Thread Checker is shown on Figure 1.
  • 5. Figure 1 - a lot of critical errors were detected during Thread Checker's operation As we see, the number of errors is large even for such a small program. Thread Checker groups the detected errors simultaneously estimating how critical they are for program operation and provides their description increasing the programmer's labor effectiveness. Besides, on Source View inlay there is program code of the application where the sections with the errors are marked (Figure 2).
  • 7. Figure 2 - Analysis of multi-thread code by Intel Thread Checker We should mention that Intel Thread Checker cannot detect errors in some cases. This occurs when the code rarely gets control or is executed on a system with a different architecture. Errors can be also missed when the set of input text data differs greatly from the data processed by the program when used by end users. All this doesn't allow us to be sure that there are no errors in a multi-thread program when it is tested by dynamic analysis means the results of which depend on the environment and time of its execution. But there is another tool and it is good news for OpenMP developers. This tool is VivaMP offering an alternative approach to verifying parallel programs. VivaMP is built according the principle of static code
  • 8. analyzer and allows you to test the application code without launching it. To learn more about VivaMP tool visit the developers' site http://www.viva64.com/vivamp-tool/. VivaMP's scopes of use: • Controlling correctness of the code of applications developed on the basis of OpenMP technology. • Aid in mastering OpenMP technology and integrating it into already existing projects. • Creating parallel applications which use resources more effectively. • Search of errors in existing OpenMP applications. VivaMP analyzer integrates into Visual Studio 2005/2008 environment and provides simple interface for testing applications (Figure 3). Figure 3 - Launch of VivaMP tool, integrated into Visual Studio 2005
  • 9. If we launch VivaMP for our example we'll get a message about errors in 4 different strings where message incorrect modification of variables occurs (Figure 4). Figure 4 - The result of VivaMP static analyzer's operation Of course, static analysis also has some disadvantages as well as dynamic analysis. But being used together these methodologies (Intel Thread Checker and VivaMP tools) supplement each other very well. And used together they serve rather a safe method of detecting errors in m multi-thread applications. The above described error of writing into x and y variables detected by Intel Thread Checker and VivaMP tools can be easily corrected: you only need to add one more directive into '#pragma omp parallel for' construction: private (x, y). Thus, these two variables will be defined as private and there will be x, separate copies of x and y in each computational thread. But you should also keep in mind that all the threads save the calculated result by adding it to s variable. Such errors occur when one computational thread tries to write some value into shared memory and simultaneously the second thread performs reading. In the given example it can lead to an incorrect result. Let's consider s += j*y instruction. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y multiplication to this value and writes the final result into s variable. Unlike reading operation which can be implemented in parallel mode and is quite quick, the writing operation is always sequential. Consequently, if the first thread writes the new value first, the second thread, having performed the writing operation after that, will erase the result of the first thread's calculations because the both computational threads read one and the same value of s variable and one begin to write their data into it. In other words, the value of s variable which will be written into shared memory by the second thread doesn't consider the results of calculations performed by the first thread at all. You can avoid such a situation by making sure that at any moment only one thread is allowed to ll.
  • 10. execute s += j*y operation. Such operations are called indivisible or atomic. When we need to point the compiler that some instruction is atomic we use #pragma omp atomic construction. The program code in which the described operations are corrected is shown in Listing 3. double FixedFunctionOpenMP(int N) { double x, y, s=0; #pragma omp parallel for private(x,y) num_threads(2) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { #pragma omp atomic s += j * y; y = y * x; }; }; return s; } Having recompiled the program and analyzed it once again in Thread Checker, we'll see that it doesn't contain critical errors. Only two messages are shown telling that the parallel threads end when reaching return operator in MathFunction function. In the given example it is alright because only the code inside this function is paralleled. VivaMP static analyzer won't show any messages on this code at all as it is absolutely correct from its viewpoint. But some work still remains. Let's see if our code has really become more effective after paralleling. Let's measure the execution time for three functions: 1 - sequential, 2 - parallel incorrect, 3 - parallel correct. The results of this measuring for N=1500 are given in Table 1. Function Result Execution time Sequential variant of the function 287305025.528 0.5781 seconds Incorrect variant of the parallel function 298441282.231 2.9531 seconds Correct variant of the parallel function using atomic directive 287305025.528 36.8281 seconds Table 1 - Results of functions' operation What do we see in the table? The parallel variant of the incorrect function works slower in several times. But we are not interested in this function. The problem is that the correct variant works even more than 60 times slower. Do we need such parallelism? Of course not.
  • 11. The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. This approach leads to that the threads wait for each other very often. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. reduction option defines that the variable will get the combined value at the exit from the parallel block. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The modified variant of the function is shown in Listing 4. double OptimizedFunction(int N) { double x, y, s=0; #pragma omp parallel for private(x,y) num_threads(2) reduction(+: s) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } In this case we'll get the function's variant not only correct but of higher performance as well (Table 2). The speed of calculations has almost doubled (more exactly, it has increased in 1.85 times) what is a good result for such functions. Function Result Execution time Sequential variant of the function 287305025.528 0.5781 seconds Incorrect variant of the parallel function 298441282.231 2.9531 seconds Correct variant of the parallel function using atomic directive 287305025.528 36.8281 seconds Correct variant of the parallel function using reduction directive 287305025.528 0.3125 seconds Table 2 - Results of functions' operation In conclusion I would like to point out once again that an operable parallel program not always can be effective. And although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multi-thread applications. Dear readers, I wish you good luck in mastering the new field of knowledge!
  • 12. Appendix N1. Text of the demo program #include "stdafx.h" #include <omp.h> #include <stdlib.h> #include <windows.h> class VivaMeteringTimeStruct { public: VivaMeteringTimeStruct() { m_userTime = GetCurrentUserTime(); } ~VivaMeteringTimeStruct() { printf("Time = %.4f secondsn", GetUserSeconds()); } double GetUserSeconds(); private: __int64 GetCurrentUserTime() const; __int64 m_userTime; }; __int64 VivaMeteringTimeStruct::GetCurrentUserTime() const { FILETIME creationTime, exitTime, kernelTime, userTime; GetThreadTimes(GetCurrentThread(), &creationTime, &exitTime, &kernelTime, &userTime); __int64 curTime; curTime = userTime.dwHighDateTime; curTime <<= 32; curTime += userTime.dwLowDateTime; return curTime; } double VivaMeteringTimeStruct::GetUserSeconds() { __int64 delta = GetCurrentUserTime() - m_userTime;
  • 13. return double(delta) / 10000000.0; } double Function(int N) { double x, y, s=0; for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } double FunctionOpenMP(int N) { double x, y, s=0; #pragma omp parallel for num_threads(2) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } double FixedFunctionOpenMP(int N)
  • 14. { double x, y, s=0; #pragma omp parallel for private(x,y) num_threads(2) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { #pragma omp atomic s += j * y; y = y * x; }; }; return s; } double OptimizedFunction(int N) { double x, y, s=0; #pragma omp parallel for private(x,y) num_threads(2) reduction(+: s) for (int i=1; i<=N; i++) { x = (double)i/N; y = x; for (int j=1; j<=N; j++) { s += j * y; y = y * x; }; }; return s; } int _tmain(int , _TCHAR* [])
  • 15. { int N = 15000; { VivaMeteringTimeStruct Timer; printf("Result = %.3f ", Function(N)); } { VivaMeteringTimeStruct Timer; printf("Result = %.3f ", FunctionOpenMP(N)); } { VivaMeteringTimeStruct Timer; printf("Result = %.3f ", FixedFunctionOpenMP(N)); } { VivaMeteringTimeStruct Timer; printf("Result = %.3f ", OptimizedFunction(N)); } return 0; }