blob: a992b4e41969f622da735447ed70c542329b08f7 [file] [log] [blame]
Joseph Huber6242f9b2021-07-20 16:04:131.. _omp112:
2
Joseph Huber16164072021-07-14 21:04:543Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]
4=====================================================================================================
5
Joseph Huber16164072021-07-14 21:04:546This missed remark indicates that a globalized value was found on the target
7device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
8or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
Joseph Huberdead50d2021-07-26 20:01:419removed will need to be handled by the runtime and will significantly impact
Shao-Ce SUN0c660252021-11-15 01:17:0810performance.
Joseph Huber16164072021-07-14 21:04:5411
Joseph Huberdead50d2021-07-26 20:01:4112The OpenMP standard requires that threads are able to share their data between
13each-other. However, this is not true by default when offloading to a target
Joseph Huber16164072021-07-14 21:04:5414device such as a GPU. Threads on a GPU cannot shared their data unless it is
15first placed in global or shared memory. In order to create standards complaint
16code, the Clang compiler will globalize any variables that could potentially be
17shared between the threads. In the majority of cases, globalized variables can
18either be returns to a thread-local stack, or pushed to shared memory. However,
19in a few cases it is necessary and will cause a performance penalty.
20
21Examples
22--------
23
24This example shows legitimate data sharing on the device. It is a convoluted
25example, but is completely complaint with the OpenMP standard. If globalization
26was not added this would result in different results on different target
27devices.
28
29.. code-block:: c++
30
31 #include <omp.h>
32 #include <cstdio>
Shao-Ce SUN0c660252021-11-15 01:17:0833
Joseph Huber16164072021-07-14 21:04:5434 #pragma omp declare target
35 static int *p;
36 #pragma omp end declare target
Shao-Ce SUN0c660252021-11-15 01:17:0837
Joseph Huber16164072021-07-14 21:04:5438 void foo() {
39 int x = omp_get_thread_num();
40 if (omp_get_thread_num() == 1)
41 p = &x;
Shao-Ce SUN0c660252021-11-15 01:17:0842
Joseph Huber16164072021-07-14 21:04:5443 #pragma omp barrier
Shao-Ce SUN0c660252021-11-15 01:17:0844
Joseph Huber16164072021-07-14 21:04:5445 printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
46 }
Shao-Ce SUN0c660252021-11-15 01:17:0847
Joseph Huber16164072021-07-14 21:04:5448 int main() {
49 #pragma omp target parallel
50 foo();
51 }
52
53.. code-block:: console
54
Shao-Ce SUN0c660252021-11-15 01:17:0855 $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
Joseph Huber16164072021-07-14 21:04:5456 omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
57 due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
58 int x = omp_get_thread_num();
59 ^
60
61A less convoluted example globalization that cannot be removed occurs when
62calling functions that aren't visible from the current translation unit.
63
64.. code-block:: c++
65
66 extern void use(int *x);
Shao-Ce SUN0c660252021-11-15 01:17:0867
Joseph Huber16164072021-07-14 21:04:5468 void foo() {
69 int x;
70 use(&x);
71 }
Shao-Ce SUN0c660252021-11-15 01:17:0872
Joseph Huber16164072021-07-14 21:04:5473 int main() {
74 #pragma omp target parallel
75 foo();
76 }
77
78.. code-block:: console
79
Shao-Ce SUN0c660252021-11-15 01:17:0880 $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
Joseph Huber16164072021-07-14 21:04:5481 omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
82 due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
83 int x;
84 ^
85
86Diagnostic Scope
87----------------
88
89OpenMP target offloading missed remark.