blob: 266fde3436d301b96e4e7a734df13288206cfe2e [file] [log] [blame]
Sven van Haastregt5e962e82019-10-17 12:56:021.. raw:: html
2
3 <style type="text/css">
4 .none { background-color: #FFCCCC }
5 .partial { background-color: #FFFF99 }
6 .good { background-color: #CCFF99 }
7 </style>
8
9.. role:: none
10.. role:: partial
11.. role:: good
12
13.. contents::
14 :local:
15
16==================
17OpenCL Support
18==================
19
Anastasia Stulovaadb77a72021-01-14 14:52:5420Clang has complete support of OpenCL C versions from 1.0 to 2.0.
Sven van Haastregt5e962e82019-10-17 12:56:0221
Anastasia Stulovaadb77a72021-01-14 14:52:5422Clang also supports :ref:`the C++ for OpenCL kernel language <cxx_for_opencl_impl>`.
Sven van Haastregt5e962e82019-10-17 12:56:0223
Anastasia Stulovaadb77a72021-01-14 14:52:5424There is an ongoing work to support :ref:`OpenCL 3.0 <opencl_300>`.
25
26There are also other :ref:`new and experimental features <opencl_experimenal>` available.
27
28For general issues and bugs with OpenCL in clang refer to `Bugzilla
29<https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=172679&product=clang&resolution=--->`__.
30
Anastasia Stulovad7cc3a02021-01-27 12:21:2231Internals Manual
32================
33
34This section acts as internal documentation for OpenCL features design
35as well as some important implementation aspects. It is primarily targeted
36at the advanced users and the toolchain developers integrating frontend
37functionality as a component.
38
39OpenCL Metadata
40---------------
41
42Clang uses metadata to provide additional OpenCL semantics in IR needed for
43backends and OpenCL runtime.
44
45Each kernel will have function metadata attached to it, specifying the arguments.
46Kernel argument metadata is used to provide source level information for querying
Sven van Haastregt18f16c92021-02-12 09:58:1847at runtime, for example using the `clGetKernelArgInfo
Anastasia Stulovad7cc3a02021-01-27 12:21:2248<https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf#167>`_
49call.
50
51Note that ``-cl-kernel-arg-info`` enables more information about the original
52kernel code to be added e.g. kernel parameter names will appear in the OpenCL
Sven van Haastregt18f16c92021-02-12 09:58:1853metadata along with other information.
Anastasia Stulovad7cc3a02021-01-27 12:21:2254
55The IDs used to encode the OpenCL's logical address spaces in the argument info
56metadata follows the SPIR address space mapping as defined in the SPIR
57specification `section 2.2
58<https://www.khronos.org/registry/spir/specs/spir_spec-2.0.pdf#18>`_
59
60OpenCL Specific Options
61-----------------------
62
63In addition to the options described in :doc:`UsersManual` there are the
64following options specific to the OpenCL frontend.
65
66.. _opencl_cl_ext:
67
68.. option:: -cl-ext
69
70Disables support of OpenCL extensions. All OpenCL targets provide a list
71of extensions that they support. Clang allows to amend this using the ``-cl-ext``
72flag with a comma-separated list of extensions prefixed with ``'+'`` or ``'-'``.
73The syntax: ``-cl-ext=<(['-'|'+']<extension>[,])+>``, where extensions
74can be either one of `the OpenCL published extensions
75<https://www.khronos.org/registry/OpenCL>`_
76or any vendor extension. Alternatively, ``'all'`` can be used to enable
77or disable all known extensions.
78
79Note that this is a frontend-only flag and therefore it requires the use of
80flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``.
81
82Example disabling double support for the 64-bit SPIR target:
83
84 .. code-block:: console
85
86 $ clang -cc1 -triple spir64-unknown-unknown -cl-ext=-cl_khr_fp64 test.cl
87
88Enabling all extensions except double support in R600 AMD GPU can be done using:
89
90 .. code-block:: console
91
92 $ clang -cc1 -triple r600-unknown-unknown -cl-ext=-all,+cl_khr_fp16 test.cl
93
94.. _opencl_fake_address_space_map:
95
96.. option:: -ffake-address-space-map
97
98Overrides the target address space map with a fake map.
99This allows adding explicit address space IDs to the bitcode for non-segmented
100memory architectures that do not have separate IDs for each of the OpenCL
101logical address spaces by default. Passing ``-ffake-address-space-map`` will
102add/override address spaces of the target compiled for with the following values:
103``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address
104space is represented by the absence of an address space attribute in the IR (see
105also :ref:`the section on the address space attribute <opencl_addrsp>`).
106
107 .. code-block:: console
108
109 $ clang -cc1 -ffake-address-space-map test.cl
110
111Note that this is a frontend-only flag and therefore it requires the use of
112flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``.
113
114OpenCL builtins
115---------------
116
Sven van Haastregt18a70792021-02-12 09:56:32117**Clang builtins**
118
Anastasia Stulovad7cc3a02021-01-27 12:21:22119There are some standard OpenCL functions that are implemented as Clang builtins:
120
121- All pipe functions from `section 6.13.16.2/6.13.16.3
122 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#160>`_ of
Sven van Haastregt18f16c92021-02-12 09:58:18123 the OpenCL v2.0 kernel language specification.
Anastasia Stulovad7cc3a02021-01-27 12:21:22124
125- Address space qualifier conversion functions ``to_global``/``to_local``/``to_private``
126 from `section 6.13.9
127 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#101>`_.
128
129- All the ``enqueue_kernel`` functions from `section 6.13.17.1
130 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#164>`_ and
131 enqueue query functions from `section 6.13.17.5
132 <https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf#171>`_.
133
Sven van Haastregt18a70792021-02-12 09:56:32134**Fast builtin function declarations**
135
136The implementation of the fast builtin function declarations (available via the
137:ref:`-fdeclare-opencl-builtins option <opencl_fast_builtins>`) consists of the
138following main components:
139
140- A TableGen definitions file ``OpenCLBuiltins.td``. This contains a compact
141 representation of the supported builtin functions. When adding new builtin
142 function declarations, this is normally the only file that needs modifying.
143
144- A Clang TableGen emitter defined in ``ClangOpenCLBuiltinEmitter.cpp``. During
145 Clang build time, the emitter reads the TableGen definition file and
146 generates ``OpenCLBuiltins.inc``. This generated file contains various tables
147 and functions that capture the builtin function data from the TableGen
148 definitions in a compact manner.
149
150- OpenCL specific code in ``SemaLookup.cpp``. When ``Sema::LookupBuiltin``
151 encounters a potential builtin function, it will check if the name corresponds
152 to a valid OpenCL builtin function. If so, all overloads of the function are
153 inserted using ``InsertOCLBuiltinDeclarationsFromTable`` and overload
154 resolution takes place.
155
Anastasia Stulovad7cc3a02021-01-27 12:21:22156.. _opencl_addrsp:
157
158Address spaces attribute
159------------------------
160
161Clang has arbitrary address space support using the ``address_space(N)``
162attribute, where ``N`` is an integer number in the range specified in the
163Clang source code. This addresses spaces can be used along with the OpenCL
164address spaces however when such addresses spaces converted to/from OpenCL
165address spaces the behavior is not governed by OpenCL specification.
166
167An OpenCL implementation provides a list of standard address spaces using
168keywords: ``private``, ``local``, ``global``, and ``generic``. In the AST and
169in the IR each of the address spaces will be represented by unique number
170provided in the Clang source code. The specific IDs for an address space do not
171have to match between the AST and the IR. Typically in the AST address space
172numbers represent logical segments while in the IR they represent physical
173segments.
174Therefore, machines with flat memory segments can map all AST address space
175numbers to the same physical segment ID or skip address space attribute
176completely while generating the IR. However, if the address space information
177is needed by the IR passes e.g. to improve alias analysis, it is recommended
178to keep it and only lower to reflect physical memory segments in the late
179machine passes. The mapping between logical and target address spaces is
180specified in the Clang's source code.
181
Anastasia Stulovaadb77a72021-01-14 14:52:54182.. _cxx_for_opencl_impl:
Sven van Haastregt5e962e82019-10-17 12:56:02183
184C++ for OpenCL Implementation Status
185====================================
186
Anastasia Stulovaadb77a72021-01-14 14:52:54187Clang implements language version 1.0 published in `the official
188release of C++ for OpenCL Documentation
189<https://github.com/KhronosGroup/OpenCL-Docs/releases/tag/cxxforopencl-v1.0-r1>`_.
190
Anastasia Stulovabc84f892021-01-15 17:19:16191Limited support of experimental C++ libraries is described in the :ref:`experimental features <opencl_experimenal>`.
Anastasia Stulovaadb77a72021-01-14 14:52:54192
Sven van Haastregt5e962e82019-10-17 12:56:02193Bugzilla bugs for this functionality are typically prefixed
Anastasia Stulovaadb77a72021-01-14 14:52:54194with '[C++4OpenCL]' - click `here
195<https://bugs.llvm.org/buglist.cgi?component=OpenCL&list_id=204139&product=clang&query_format=advanced&resolution=---&sh ort_desc=%5BC%2B%2B4OpenCL%5D&short_desc_type=allwordssubstr>`_
196to view the full bug list.
Sven van Haastregt5e962e82019-10-17 12:56:02197
Sven van Haastregt5e962e82019-10-17 12:56:02198
199Missing features or with limited support
200----------------------------------------
201
Anastasia Stulovaadb77a72021-01-14 14:52:54202- Use of ObjC blocks is disabled and therefore the ``enqueue_kernel`` builtin
203 function is not supported currently. It is expected that if support for this
204 feature is added in the future, it will utilize C++ lambdas instead of ObjC
205 blocks.
Sven van Haastregt5e962e82019-10-17 12:56:02206
Anastasia Stulovaadb77a72021-01-14 14:52:54207- IR generation for global destructors is incomplete (See:
208 `PR48047 <https://llvm.org/PR48047>`_).
Sven van Haastregt5e962e82019-10-17 12:56:02209
Anastasia Stulovaadb77a72021-01-14 14:52:54210- There is no distinct file extension for sources that are to be compiled
211 in C++ for OpenCL mode (See: `PR48097 <https://llvm.org/PR48097>`_)
Sven van Haastregt5e962e82019-10-17 12:56:02212
Anastasia Stulovaadb77a72021-01-14 14:52:54213.. _opencl_300:
214
215OpenCL 3.0 Implementation Status
216================================
217
218The following table provides an overview of features in OpenCL C 3.0 and their
Sven van Haastregt18f16c92021-02-12 09:58:18219implementation status.
Anastasia Stulovaadb77a72021-01-14 14:52:54220
Anastasia Stulovad1862a162021-01-15 14:25:32221+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
222| Category | Feature | Status | Reviews |
223+==============================+==============================================================+======================+===========================================================================+
224| Command line interface | New value for ``-cl-std`` flag | :good:`done` | https://reviews.llvm.org/D88300 |
225+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
226| Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 |
227+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
228| Predefined macros | Feature macros | :part:`worked on` | https://reviews.llvm.org/D89869 |
229+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
230| Feature optionality | Generic address space | :none:`unclaimed` | |
231+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
232| Feature optionality | Builtin function overloads with generic address space | :part:`worked on` | https://reviews.llvm.org/D92004 |
233+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
234| Feature optionality | Program scope variables in global memory | :none:`unclaimed` | |
235+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
236| Feature optionality | 3D image writes including builtin functions | :none:`unclaimed` | |
237+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
238| Feature optionality | read_write images including builtin functions | :none:`unclaimed` | |
239+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
240| Feature optionality | C11 atomics memory scopes, ordering and builtin function | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) |
241+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
242| Feature optionality | Device-side kernel enqueue including builtin functions | :none:`unclaimed` | |
243+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
244| Feature optionality | Pipes including builtin functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) |
245+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
246| Feature optionality | Work group collective functions | :part:`worked on` | https://reviews.llvm.org/D92004 |
247+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
248| New functionality | RGBA vector components | :none:`unclaimed` | |
249+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
250| New functionality | Subgroup functions | :part:`worked on` | https://reviews.llvm.org/D92004 |
251+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
252| New functionality | Atomic mem scopes: subgroup, all devices including functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) |
253+------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+
Anastasia Stulovaadb77a72021-01-14 14:52:54254
255.. _opencl_experimenal:
Anastasia Stulova0ef2b682021-01-08 13:37:27256
257Experimental features
258=====================
259
260Clang provides the following new WIP features for the developers to experiment
261and provide early feedback or contribute with further improvements.
262Feel free to contact us on `cfe-dev
263<https://lists.llvm.org/mailman/listinfo/cfe-dev>`_ or via `Bugzilla
264<https://bugs.llvm.org/>`__.
265
Sven van Haastregt18a70792021-02-12 09:56:32266.. _opencl_fast_builtins:
267
Anastasia Stulova8fdd5782021-01-25 11:17:03268Fast builtin function declarations
269----------------------------------
270
271In addition to regular header includes with builtin types and functions using
272``-finclude-default-header`` explained in :doc:`UsersManual`, clang
273supports a fast mechanism to declare builtin functions with
274``-fdeclare-opencl-builtins``. This does not declare the builtin types and
275therefore it has to be used in combination with ``-finclude-default-header``
276if full functionality is required.
277
278**Example of Use**:
279
280 .. code-block:: console
Sven van Haastregt18f16c92021-02-12 09:58:18281
Anastasia Stulova7a45f272021-02-03 14:04:13282 $ clang -Xclang -fdeclare-opencl-builtins test.cl
Anastasia Stulova8fdd5782021-01-25 11:17:03283
284Note that this is a frontend-only flag and therefore it requires the use of
285flags that forward options to the frontend, e.g. ``-cc1`` or ``-Xclang``.
286
287As this feature is still in experimental phase some changes might still occur
288on the command line interface side.
289
Anastasia Stulova0ef2b682021-01-08 13:37:27290C++ libraries for OpenCL
291------------------------
292
293There is ongoing work to support C++ standard libraries from `LLVM's libcxx
294<https://libcxx.llvm.org/>`_ in OpenCL kernel code using C++ for OpenCL mode.
295
296It is currently possible to include `type_traits` from C++17 in the kernel
297sources when the following clang extensions are enabled
298``__cl_clang_function_pointers`` and ``__cl_clang_variadic_functions``,
299see :doc:`LanguageExtensions` for more details. The use of non-conformant
300features enabled by the extensions does not expose non-conformant behavior
301beyond the compilation i.e. does not get generated in IR or binary.
302The extension only appear in metaprogramming
303mechanism to identify or verify the properties of types. This allows to provide
304the full C++ functionality without a loss of portability. To avoid unsafe use
305of the extensions it is recommended that the extensions are disabled directly
306after the header include.
307
308**Example of Use**:
309
310The example of kernel code with `type_traits` is illustrated here.
311
312.. code-block:: c++
313
314 #pragma OPENCL EXTENSION __cl_clang_function_pointers : enable
315 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : enable
316 #include <type_traits>
317 #pragma OPENCL EXTENSION __cl_clang_function_pointers : disable
318 #pragma OPENCL EXTENSION __cl_clang_variadic_functions : disable
319
320 using sint_type = std::make_signed<unsigned int>::type;
321
322 __kernel void foo() {
323 static_assert(!std::is_same<sint_type, unsigned int>::value);
324 }
325
326The possible clang invocation to compile the example is as follows:
327
328 .. code-block:: console
329
330 $ clang -cl-std=clc++ -I<path to libcxx checkout or installation>/include test.cl
331
332Note that `type_traits` is a header only library and therefore no extra
333linking step against the standard libraries is required.