blob: d7b87bd9ea8ac3ee4c45f07338f4095ed953d4b9 [file] [log] [blame]
Joseph Huber6242f9b2021-07-20 16:04:131.. _omp111:
2
Joseph Huber16164072021-07-14 21:04:543Replaced globalized variable with X bytes of shared memory. [OMP111]
4====================================================================
5
Joseph Huber16164072021-07-14 21:04:546This optimization occurs when a globalized variable's data is shared between
Joseph Huberdead50d2021-07-26 20:01:417multiple threads, but requires a constant amount of memory that can be
8determined at compile time. This is the case when only a single thread creates
9the memory and is then shared between every thread. The memory can then be
10pushed to a static buffer of shared memory on the device. This optimization
11allows users to declare shared memory on the device without using OpenMP's
12custom allocators.
Joseph Huber16164072021-07-14 21:04:5413
Joseph Huberdead50d2021-07-26 20:01:4114Globalization occurs when a pointer to a thread-local variable escapes the
15current scope. If a single thread is known to be responsible for creating and
16sharing the data it can instead be mapped directly to the device's shared
17memory. Checking if only a single thread can execute an instruction requires
Joseph Huber16164072021-07-14 21:04:5418that the parent functions have internal linkage. Otherwise, an external caller
19could invalidate this analysis but having multiple threads call that function.
Joseph Huberdead50d2021-07-26 20:01:4120The optimization pass will make internal copies of each function to use for this
21reason, but it is still recommended to mark them as internal using keywords like
Joseph Huber16164072021-07-14 21:04:5422``static`` whenever possible.
23
24Example
25-------
26
27This optimization should apply to any variable declared in an OpenMP target
28region that is then shared with every thread in a parallel region. This allows
29the user to declare shared memory without using custom allocators. A simple
30stencil calculation shows how this can be used.
31
32.. code-block:: c++
33
34 void stencil(int M, int N, double *X, double *Y) {
35 #pragma omp target teams distribute collapse(2) \
36 map(to : X [0:M * N]) map(tofrom : Y [0:M * N])
37 for (int i0 = 0; i0 < M; i0 += MC) {
38 for (int j0 = 0; j0 < N; j0 += NC) {
39 double sX[MC][NC];
Shao-Ce SUN0c660252021-11-15 01:17:0840
Joseph Huberdead50d2021-07-26 20:01:4141 #pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
Joseph Huber16164072021-07-14 21:04:5442 for (int i1 = 0; i1 < MC; ++i1)
43 for (int j1 = 0; j1 < NC; ++j1)
44 sX[i1][j1] = X[(i0 + i1) * N + (j0 + j1)];
Shao-Ce SUN0c660252021-11-15 01:17:0845
Joseph Huberdead50d2021-07-26 20:01:4146 #pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
Joseph Huber16164072021-07-14 21:04:5447 for (int i1 = 1; i1 < MC - 1; ++i1)
48 for (int j1 = 1; j1 < NC - 1; ++j1)
49 Y[(i0 + i1) * N + j0 * j1] = (sX[i1 + 1][j1] + sX[i1 - 1][j1] +
50 sX[i1][j1 + 1] + sX[i1][j1 - 1] +
51 -4.0 * sX[i1][j1]) / (dX * dX);
Shao-Ce SUN0c660252021-11-15 01:17:0852 }
Joseph Huber16164072021-07-14 21:04:5453 }
54 }
55
56.. code-block:: console
57
58
Shao-Ce SUN0c660252021-11-15 01:17:0859 $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass=openmp-opt -fopenmp-version=51 omp111.cpp
Joseph Huber16164072021-07-14 21:04:5460 omp111.cpp:10:14: remark: Replaced globalized variable with 8192 bytes of shared memory. [OMP111]
61 double sX[MC][NC];
62 ^
63
Joseph Huberdead50d2021-07-26 20:01:4164The default mapping for variables captured in an OpenMP parallel region is
65``shared``. This means taking a pointer to the object which will ultimately
66result in globalization that will be mapped to shared memory when it could have
67been placed in registers. To avoid this, make sure each variable that can be
68copied into the region is marked ``firstprivate`` either explicitly or using the
69OpenMP 5.1 feature ``default(firstprivate)``.
70
Joseph Huber16164072021-07-14 21:04:5471Diagnostic Scope
72----------------
73
74OpenMP target offloading optimization remark.