Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 1 | .. raw:: html |
| 2 | |
| 3 | <style type="text/css"> |
| 4 | .none { background-color: #FFCCCC } |
| 5 | .partial { background-color: #FFFF99 } |
| 6 | .good { background-color: #CCFF99 } |
| 7 | </style> |
| 8 | |
| 9 | .. role:: none |
| 10 | .. role:: partial |
| 11 | .. role:: good |
| 12 | |
| 13 | .. contents:: |
| 14 | :local: |
| 15 | |
| 16 | ================== |
| 17 | OpenCL Support |
| 18 | ================== |
| 19 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 20 | Clang has complete support of OpenCL C versions from 1.0 to 2.0. |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 21 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 22 | Clang also supports :ref:`the C++ for OpenCL kernel language <cxx_for_opencl_impl>`. |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 23 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 24 | There is an ongoing work to support :ref:`OpenCL 3.0 <opencl_300>`. |
| 25 | |
| 26 | There are also other :ref:`new and experimental features <opencl_experimenal>` available. |
| 27 | |
| 28 | For general issues and bugs with OpenCL in clang refer to `Bugzilla |
| 29 | <https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=172679&product=clang&resolution=--->`__. |
| 30 | |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 31 | Internals Manual |
| 32 | ================ |
| 33 | |
| 34 | This section acts as internal documentation for OpenCL features design |
| 35 | as well as some important implementation aspects. It is primarily targeted |
| 36 | at the advanced users and the toolchain developers integrating frontend |
| 37 | functionality as a component. |
| 38 | |
| 39 | OpenCL Metadata |
| 40 | --------------- |
| 41 | |
| 42 | Clang uses metadata to provide additional OpenCL semantics in IR needed for |
| 43 | backends and OpenCL runtime. |
| 44 | |
| 45 | Each kernel will have function metadata attached to it, specifying the arguments. |
| 46 | Kernel argument metadata is used to provide source level information for querying |
Sven van Haastregt | 18f16c9 | 2021-02-12 09:58:18 | [diff] [blame^] | 47 | at runtime, for example using the `clGetKernelArgInfo |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 48 | <https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#167>`_ |
| 49 | call. |
| 50 | |
| 51 | Note that ``-cl-kernel-arg-info`` enables more information about the original |
| 52 | kernel code to be added e.g. kernel parameter names will appear in the OpenCL |
Sven van Haastregt | 18f16c9 | 2021-02-12 09:58:18 | [diff] [blame^] | 53 | metadata along with other information. |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 54 | |
| 55 | The IDs used to encode the OpenCL's logical address spaces in the argument info |
| 56 | metadata follows the SPIR address space mapping as defined in the SPIR |
| 57 | specification `section 2.2 |
| 58 | <https://www.khronos.org/registry/spir/specs/spir_spec-2.0.pdf#18>`_ |
| 59 | |
| 60 | OpenCL Specific Options |
| 61 | ----------------------- |
| 62 | |
| 63 | In addition to the options described in :doc:`UsersManual` there are the |
| 64 | following options specific to the OpenCL frontend. |
| 65 | |
| 66 | .. _opencl_cl_ext: |
| 67 | |
| 68 | .. option:: -cl-ext |
| 69 | |
| 70 | Disables support of OpenCL extensions. All OpenCL targets provide a list |
| 71 | of extensions that they support. Clang allows to amend this using the ``-cl-ext`` |
| 72 | flag with a comma-separated list of extensions prefixed with ``'+'`` or ``'-'``. |
| 73 | The syntax: ``-cl-ext=<(['-'|'+']<extension>[,])+>``, where extensions |
| 74 | can be either one of `the OpenCL published extensions |
| 75 | <https://www.khronos.org/registry/OpenCL>`_ |
| 76 | or any vendor extension. Alternatively, ``'all'`` can be used to enable |
| 77 | or disable all known extensions. |
| 78 | |
| 79 | Note that this is a frontend-only flag and therefore it requires the use of |
| 80 | flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``. |
| 81 | |
| 82 | Example disabling double support for the 64-bit SPIR target: |
| 83 | |
| 84 | .. code-block:: console |
| 85 | |
| 86 | $ clang -cc1 -triple spir64-unknown-unknown -cl-ext=-cl_khr_fp64 test.cl |
| 87 | |
| 88 | Enabling all extensions except double support in R600 AMD GPU can be done using: |
| 89 | |
| 90 | .. code-block:: console |
| 91 | |
| 92 | $ clang -cc1 -triple r600-unknown-unknown -cl-ext=-all,+cl_khr_fp16 test.cl |
| 93 | |
| 94 | .. _opencl_fake_address_space_map: |
| 95 | |
| 96 | .. option:: -ffake-address-space-map |
| 97 | |
| 98 | Overrides the target address space map with a fake map. |
| 99 | This allows adding explicit address space IDs to the bitcode for non-segmented |
| 100 | memory architectures that do not have separate IDs for each of the OpenCL |
| 101 | logical address spaces by default. Passing ``-ffake-address-space-map`` will |
| 102 | add/override address spaces of the target compiled for with the following values: |
| 103 | ``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address |
| 104 | space is represented by the absence of an address space attribute in the IR (see |
| 105 | also :ref:`the section on the address space attribute <opencl_addrsp>`). |
| 106 | |
| 107 | .. code-block:: console |
| 108 | |
| 109 | $ clang -cc1 -ffake-address-space-map test.cl |
| 110 | |
| 111 | Note that this is a frontend-only flag and therefore it requires the use of |
| 112 | flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``. |
| 113 | |
| 114 | OpenCL builtins |
| 115 | --------------- |
| 116 | |
Sven van Haastregt | 18a7079 | 2021-02-12 09:56:32 | [diff] [blame] | 117 | **Clang builtins** |
| 118 | |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 119 | There are some standard OpenCL functions that are implemented as Clang builtins: |
| 120 | |
| 121 | - All pipe functions from `section 6.13.16.2/6.13.16.3 |
| 122 | <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#160>`_ of |
Sven van Haastregt | 18f16c9 | 2021-02-12 09:58:18 | [diff] [blame^] | 123 | the OpenCL v2.0 kernel language specification. |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 124 | |
| 125 | - Address space qualifier conversion functions ``to_global``/``to_local``/``to_private`` |
| 126 | from `section 6.13.9 |
| 127 | <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#101>`_. |
| 128 | |
| 129 | - All the ``enqueue_kernel`` functions from `section 6.13.17.1 |
| 130 | <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#164>`_ and |
| 131 | enqueue query functions from `section 6.13.17.5 |
| 132 | <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#171>`_. |
| 133 | |
Sven van Haastregt | 18a7079 | 2021-02-12 09:56:32 | [diff] [blame] | 134 | **Fast builtin function declarations** |
| 135 | |
| 136 | The implementation of the fast builtin function declarations (available via the |
| 137 | :ref:`-fdeclare-opencl-builtins option <opencl_fast_builtins>`) consists of the |
| 138 | following main components: |
| 139 | |
| 140 | - A TableGen definitions file ``OpenCLBuiltins.td``. This contains a compact |
| 141 | representation of the supported builtin functions. When adding new builtin |
| 142 | function declarations, this is normally the only file that needs modifying. |
| 143 | |
| 144 | - A Clang TableGen emitter defined in ``ClangOpenCLBuiltinEmitter.cpp``. During |
| 145 | Clang build time, the emitter reads the TableGen definition file and |
| 146 | generates ``OpenCLBuiltins.inc``. This generated file contains various tables |
| 147 | and functions that capture the builtin function data from the TableGen |
| 148 | definitions in a compact manner. |
| 149 | |
| 150 | - OpenCL specific code in ``SemaLookup.cpp``. When ``Sema::LookupBuiltin`` |
| 151 | encounters a potential builtin function, it will check if the name corresponds |
| 152 | to a valid OpenCL builtin function. If so, all overloads of the function are |
| 153 | inserted using ``InsertOCLBuiltinDeclarationsFromTable`` and overload |
| 154 | resolution takes place. |
| 155 | |
Anastasia Stulova | d7cc3a0 | 2021-01-27 12:21:22 | [diff] [blame] | 156 | .. _opencl_addrsp: |
| 157 | |
| 158 | Address spaces attribute |
| 159 | ------------------------ |
| 160 | |
| 161 | Clang has arbitrary address space support using the ``address_space(N)`` |
| 162 | attribute, where ``N`` is an integer number in the range specified in the |
| 163 | Clang source code. This addresses spaces can be used along with the OpenCL |
| 164 | address spaces however when such addresses spaces converted to/from OpenCL |
| 165 | address spaces the behavior is not governed by OpenCL specification. |
| 166 | |
| 167 | An OpenCL implementation provides a list of standard address spaces using |
| 168 | keywords: ``private``, ``local``, ``global``, and ``generic``. In the AST and |
| 169 | in the IR each of the address spaces will be represented by unique number |
| 170 | provided in the Clang source code. The specific IDs for an address space do not |
| 171 | have to match between the AST and the IR. Typically in the AST address space |
| 172 | numbers represent logical segments while in the IR they represent physical |
| 173 | segments. |
| 174 | Therefore, machines with flat memory segments can map all AST address space |
| 175 | numbers to the same physical segment ID or skip address space attribute |
| 176 | completely while generating the IR. However, if the address space information |
| 177 | is needed by the IR passes e.g. to improve alias analysis, it is recommended |
| 178 | to keep it and only lower to reflect physical memory segments in the late |
| 179 | machine passes. The mapping between logical and target address spaces is |
| 180 | specified in the Clang's source code. |
| 181 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 182 | .. _cxx_for_opencl_impl: |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 183 | |
| 184 | C++ for OpenCL Implementation Status |
| 185 | ==================================== |
| 186 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 187 | Clang implements language version 1.0 published in `the official |
| 188 | release of C++ for OpenCL Documentation |
| 189 | <https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r1>`_. |
| 190 | |
Anastasia Stulova | bc84f89 | 2021-01-15 17:19:16 | [diff] [blame] | 191 | Limited support of experimental C++ libraries is described in the :ref:`experimental features <opencl_experimenal>`. |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 192 | |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 193 | Bugzilla bugs for this functionality are typically prefixed |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 194 | with '[C++4OpenCL]' - click `here |
| 195 | <https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=204139&product=clang&query_format=advanced&resolution=---&sh ort_desc=%5BC%2B%2B4OpenCL%5D&short_desc_type=allwordssubstr>`_ |
| 196 | to view the full bug list. |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 197 | |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 198 | |
| 199 | Missing features or with limited support |
| 200 | ---------------------------------------- |
| 201 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 202 | - Use of ObjC blocks is disabled and therefore the ``enqueue_kernel`` builtin |
| 203 | function is not supported currently. It is expected that if support for this |
| 204 | feature is added in the future, it will utilize C++ lambdas instead of ObjC |
| 205 | blocks. |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 206 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 207 | - IR generation for global destructors is incomplete (See: |
| 208 | `PR48047 <https://llvm.org/PR48047>`_). |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 209 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 210 | - There is no distinct file extension for sources that are to be compiled |
| 211 | in C++ for OpenCL mode (See: `PR48097 <https://llvm.org/PR48097>`_) |
Sven van Haastregt | 5e962e8 | 2019-10-17 12:56:02 | [diff] [blame] | 212 | |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 213 | .. _opencl_300: |
| 214 | |
| 215 | OpenCL 3.0 Implementation Status |
| 216 | ================================ |
| 217 | |
| 218 | The following table provides an overview of features in OpenCL C 3.0 and their |
Sven van Haastregt | 18f16c9 | 2021-02-12 09:58:18 | [diff] [blame^] | 219 | implementation status. |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 220 | |
Anastasia Stulova | d1862a16 | 2021-01-15 14:25:32 | [diff] [blame] | 221 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 222 | | Category | Feature | Status | Reviews | |
| 223 | +==============================+==============================================================+======================+===========================================================================+ |
| 224 | | Command line interface | New value for ``-cl-std`` flag | :good:`done` | https://reviews.llvm.org/D88300 | |
| 225 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 226 | | Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 | |
| 227 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 228 | | Predefined macros | Feature macros | :part:`worked on` | https://reviews.llvm.org/D89869 | |
| 229 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 230 | | Feature optionality | Generic address space | :none:`unclaimed` | | |
| 231 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 232 | | Feature optionality | Builtin function overloads with generic address space | :part:`worked on` | https://reviews.llvm.org/D92004 | |
| 233 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 234 | | Feature optionality | Program scope variables in global memory | :none:`unclaimed` | | |
| 235 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 236 | | Feature optionality | 3D image writes including builtin functions | :none:`unclaimed` | | |
| 237 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 238 | | Feature optionality | read_write images including builtin functions | :none:`unclaimed` | | |
| 239 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 240 | | Feature optionality | C11 atomics memory scopes, ordering and builtin function | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | |
| 241 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 242 | | Feature optionality | Device-side kernel enqueue including builtin functions | :none:`unclaimed` | | |
| 243 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 244 | | Feature optionality | Pipes including builtin functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | |
| 245 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 246 | | Feature optionality | Work group collective functions | :part:`worked on` | https://reviews.llvm.org/D92004 | |
| 247 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 248 | | New functionality | RGBA vector components | :none:`unclaimed` | | |
| 249 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 250 | | New functionality | Subgroup functions | :part:`worked on` | https://reviews.llvm.org/D92004 | |
| 251 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
| 252 | | New functionality | Atomic mem scopes: subgroup, all devices including functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | |
| 253 | +------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ |
Anastasia Stulova | adb77a7 | 2021-01-14 14:52:54 | [diff] [blame] | 254 | |
| 255 | .. _opencl_experimenal: |
Anastasia Stulova | 0ef2b68 | 2021-01-08 13:37:27 | [diff] [blame] | 256 | |
| 257 | Experimental features |
| 258 | ===================== |
| 259 | |
| 260 | Clang provides the following new WIP features for the developers to experiment |
| 261 | and provide early feedback or contribute with further improvements. |
| 262 | Feel free to contact us on `cfe-dev |
| 263 | <https://lists.llvm.org/mailman/listinfo/cfe-dev>`_ or via `Bugzilla |
| 264 | <https://bugs.llvm.org/>`__. |
| 265 | |
Sven van Haastregt | 18a7079 | 2021-02-12 09:56:32 | [diff] [blame] | 266 | .. _opencl_fast_builtins: |
| 267 | |
Anastasia Stulova | 8fdd578 | 2021-01-25 11:17:03 | [diff] [blame] | 268 | Fast builtin function declarations |
| 269 | ---------------------------------- |
| 270 | |
| 271 | In addition to regular header includes with builtin types and functions using |
| 272 | ``-finclude-default-header`` explained in :doc:`UsersManual`, clang |
| 273 | supports a fast mechanism to declare builtin functions with |
| 274 | ``-fdeclare-opencl-builtins``. This does not declare the builtin types and |
| 275 | therefore it has to be used in combination with ``-finclude-default-header`` |
| 276 | if full functionality is required. |
| 277 | |
| 278 | **Example of Use**: |
| 279 | |
| 280 | .. code-block:: console |
Sven van Haastregt | 18f16c9 | 2021-02-12 09:58:18 | [diff] [blame^] | 281 | |
Anastasia Stulova | 7a45f27 | 2021-02-03 14:04:13 | [diff] [blame] | 282 | $ clang -Xclang -fdeclare-opencl-builtins test.cl |
Anastasia Stulova | 8fdd578 | 2021-01-25 11:17:03 | [diff] [blame] | 283 | |
| 284 | Note that this is a frontend-only flag and therefore it requires the use of |
| 285 | flags that forward options to the frontend, e.g. ``-cc1`` or ``-Xclang``. |
| 286 | |
| 287 | As this feature is still in experimental phase some changes might still occur |
| 288 | on the command line interface side. |
| 289 | |
Anastasia Stulova | 0ef2b68 | 2021-01-08 13:37:27 | [diff] [blame] | 290 | C++ libraries for OpenCL |
| 291 | ------------------------ |
| 292 | |
| 293 | There is ongoing work to support C++ standard libraries from `LLVM's libcxx |
| 294 | <https://libcxx.llvm.org/>`_ in OpenCL kernel code using C++ for OpenCL mode. |
| 295 | |
| 296 | It is currently possible to include `type_traits` from C++17 in the kernel |
| 297 | sources when the following clang extensions are enabled |
| 298 | ``__cl_clang_function_pointers`` and ``__cl_clang_variadic_functions``, |
| 299 | see :doc:`LanguageExtensions` for more details. The use of non-conformant |
| 300 | features enabled by the extensions does not expose non-conformant behavior |
| 301 | beyond the compilation i.e. does not get generated in IR or binary. |
| 302 | The extension only appear in metaprogramming |
| 303 | mechanism to identify or verify the properties of types. This allows to provide |
| 304 | the full C++ functionality without a loss of portability. To avoid unsafe use |
| 305 | of the extensions it is recommended that the extensions are disabled directly |
| 306 | after the header include. |
| 307 | |
| 308 | **Example of Use**: |
| 309 | |
| 310 | The example of kernel code with `type_traits` is illustrated here. |
| 311 | |
| 312 | .. code-block:: c++ |
| 313 | |
| 314 | #pragma OPENCL EXTENSION __cl_clang_function_pointers : enable |
| 315 | #pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable |
| 316 | #include <type_traits> |
| 317 | #pragma OPENCL EXTENSION __cl_clang_function_pointers : disable |
| 318 | #pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable |
| 319 | |
| 320 | using sint_type = std::make_signed<unsigned int>::type; |
| 321 | |
| 322 | __kernel void foo() { |
| 323 | static_assert(!std::is_same<sint_type, unsigned int>::value); |
| 324 | } |
| 325 | |
| 326 | The possible clang invocation to compile the example is as follows: |
| 327 | |
| 328 | .. code-block:: console |
| 329 | |
| 330 | $ clang -cl-std=clc++ -I<path to libcxx checkout or installation>/include test.cl |
| 331 | |
| 332 | Note that `type_traits` is a header only library and therefore no extra |
| 333 | linking step against the standard libraries is required. |