Joseph Huber | 6242f9b | 2021-07-20 16:04:13 | [diff] [blame] | 1 | .. _omp112: |
| 2 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 3 | Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112] |
| 4 | ===================================================================================================== |
| 5 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 6 | This missed remark indicates that a globalized value was found on the target |
| 7 | device that was not either replaced with stack memory by :ref:`OMP110 <omp110>` |
| 8 | or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been |
Joseph Huber | dead50d | 2021-07-26 20:01:41 | [diff] [blame] | 9 | removed will need to be handled by the runtime and will significantly impact |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 10 | performance. |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 11 | |
Joseph Huber | dead50d | 2021-07-26 20:01:41 | [diff] [blame] | 12 | The OpenMP standard requires that threads are able to share their data between |
| 13 | each-other. However, this is not true by default when offloading to a target |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 14 | device such as a GPU. Threads on a GPU cannot shared their data unless it is |
| 15 | first placed in global or shared memory. In order to create standards complaint |
| 16 | code, the Clang compiler will globalize any variables that could potentially be |
| 17 | shared between the threads. In the majority of cases, globalized variables can |
| 18 | either be returns to a thread-local stack, or pushed to shared memory. However, |
| 19 | in a few cases it is necessary and will cause a performance penalty. |
| 20 | |
| 21 | Examples |
| 22 | -------- |
| 23 | |
| 24 | This example shows legitimate data sharing on the device. It is a convoluted |
| 25 | example, but is completely complaint with the OpenMP standard. If globalization |
| 26 | was not added this would result in different results on different target |
| 27 | devices. |
| 28 | |
| 29 | .. code-block:: c++ |
| 30 | |
| 31 | #include <omp.h> |
| 32 | #include <cstdio> |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 33 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 34 | #pragma omp declare target |
| 35 | static int *p; |
| 36 | #pragma omp end declare target |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 37 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 38 | void foo() { |
| 39 | int x = omp_get_thread_num(); |
| 40 | if (omp_get_thread_num() == 1) |
| 41 | p = &x; |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 42 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 43 | #pragma omp barrier |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 44 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 45 | printf ("Thread %d: %d\n", omp_get_thread_num(), *p); |
| 46 | } |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 47 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 48 | int main() { |
| 49 | #pragma omp target parallel |
| 50 | foo(); |
| 51 | } |
| 52 | |
| 53 | .. code-block:: console |
| 54 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 55 | $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 56 | omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance |
| 57 | due to data globalization. [OMP112] [-Rpass-missed=openmp-opt] |
| 58 | int x = omp_get_thread_num(); |
| 59 | ^ |
| 60 | |
| 61 | A less convoluted example globalization that cannot be removed occurs when |
| 62 | calling functions that aren't visible from the current translation unit. |
| 63 | |
| 64 | .. code-block:: c++ |
| 65 | |
| 66 | extern void use(int *x); |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 67 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 68 | void foo() { |
| 69 | int x; |
| 70 | use(&x); |
| 71 | } |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 72 | |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 73 | int main() { |
| 74 | #pragma omp target parallel |
| 75 | foo(); |
| 76 | } |
| 77 | |
| 78 | .. code-block:: console |
| 79 | |
Shao-Ce SUN | 0c66025 | 2021-11-15 01:17:08 | [diff] [blame] | 80 | $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp |
Joseph Huber | 1616407 | 2021-07-14 21:04:54 | [diff] [blame] | 81 | omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance |
| 82 | due to data globalization. [OMP112] [-Rpass-missed=openmp-opt] |
| 83 | int x; |
| 84 | ^ |
| 85 | |
| 86 | Diagnostic Scope |
| 87 | ---------------- |
| 88 | |
| 89 | OpenMP target offloading missed remark. |