blob: 4b28ab2a8218a96ff41a26d68c028c657c1c0d23 [file] [log] [blame]
Louis Dionne491d0452021-06-08 15:15:271
2====================
3``<atomic>`` Design
4====================
5
6There were originally 3 designs under consideration. They differ in where most
7of the implementation work is done. The functionality exposed to the customer
8should be identical (and conforming) for all three designs.
9
10
11Design A: Minimal work for the library
12======================================
13The compiler supplies all of the intrinsics as described below. This list of
14intrinsics roughly parallels the requirements of the C and C++ atomics proposals.
15The C and C++ library implementations simply drop through to these intrinsics.
16Anything the platform does not support in hardware, the compiler
17arranges for a (compiler-rt) library call to be made which will do the job with
18a mutex, and in this case ignoring the memory ordering parameter (effectively
19implementing ``memory_order_seq_cst``).
20
21Ultimate efficiency is preferred over run time error checking. Undefined
22behavior is acceptable when the inputs do not conform as defined below.
23
24.. code-block:: cpp
25
26 // In every intrinsic signature below, type* atomic_obj may be a pointer to a
27 // volatile-qualified type. Memory ordering values map to the following meanings:
28 // memory_order_relaxed == 0
29 // memory_order_consume == 1
30 // memory_order_acquire == 2
31 // memory_order_release == 3
32 // memory_order_acq_rel == 4
33 // memory_order_seq_cst == 5
34
35 // type must be trivially copyable
36 // type represents a "type argument"
37 bool __atomic_is_lock_free(type);
38
39 // type must be trivially copyable
40 // Behavior is defined for mem_ord = 0, 1, 2, 5
41 type __atomic_load(const type* atomic_obj, int mem_ord);
42
43 // type must be trivially copyable
44 // Behavior is defined for mem_ord = 0, 3, 5
45 void __atomic_store(type* atomic_obj, type desired, int mem_ord);
46
47 // type must be trivially copyable
48 // Behavior is defined for mem_ord = [0 ... 5]
49 type __atomic_exchange(type* atomic_obj, type desired, int mem_ord);
50
51 // type must be trivially copyable
52 // Behavior is defined for mem_success = [0 ... 5],
53 // mem_failure <= mem_success
54 // mem_failure != 3
55 // mem_failure != 4
56 bool __atomic_compare_exchange_strong(type* atomic_obj,
57 type* expected, type desired,
58 int mem_success, int mem_failure);
59
60 // type must be trivially copyable
61 // Behavior is defined for mem_success = [0 ... 5],
62 // mem_failure <= mem_success
63 // mem_failure != 3
64 // mem_failure != 4
65 bool __atomic_compare_exchange_weak(type* atomic_obj,
66 type* expected, type desired,
67 int mem_success, int mem_failure);
68
69 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
70 // unsigned int, long, unsigned long, long long, unsigned long long,
71 // char16_t, char32_t, wchar_t
72 // Behavior is defined for mem_ord = [0 ... 5]
73 type __atomic_fetch_add(type* atomic_obj, type operand, int mem_ord);
74
75 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
76 // unsigned int, long, unsigned long, long long, unsigned long long,
77 // char16_t, char32_t, wchar_t
78 // Behavior is defined for mem_ord = [0 ... 5]
79 type __atomic_fetch_sub(type* atomic_obj, type operand, int mem_ord);
80
81 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
82 // unsigned int, long, unsigned long, long long, unsigned long long,
83 // char16_t, char32_t, wchar_t
84 // Behavior is defined for mem_ord = [0 ... 5]
85 type __atomic_fetch_and(type* atomic_obj, type operand, int mem_ord);
86
87 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
88 // unsigned int, long, unsigned long, long long, unsigned long long,
89 // char16_t, char32_t, wchar_t
90 // Behavior is defined for mem_ord = [0 ... 5]
91 type __atomic_fetch_or(type* atomic_obj, type operand, int mem_ord);
92
93 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
94 // unsigned int, long, unsigned long, long long, unsigned long long,
95 // char16_t, char32_t, wchar_t
96 // Behavior is defined for mem_ord = [0 ... 5]
97 type __atomic_fetch_xor(type* atomic_obj, type operand, int mem_ord);
98
99 // Behavior is defined for mem_ord = [0 ... 5]
100 void* __atomic_fetch_add(void** atomic_obj, ptrdiff_t operand, int mem_ord);
101 void* __atomic_fetch_sub(void** atomic_obj, ptrdiff_t operand, int mem_ord);
102
103 // Behavior is defined for mem_ord = [0 ... 5]
104 void __atomic_thread_fence(int mem_ord);
105 void __atomic_signal_fence(int mem_ord);
106
107If desired the intrinsics taking a single ``mem_ord`` parameter can default
108this argument to 5.
109
110If desired the intrinsics taking two ordering parameters can default ``mem_success``
111to 5, and ``mem_failure`` to ``translate_memory_order(mem_success)`` where
112``translate_memory_order(mem_success)`` is defined as:
113
114.. code-block:: cpp
115
116 int translate_memory_order(int o) {
117 switch (o) {
118 case 4:
119 return 2;
120 case 3:
121 return 0;
122 }
123 return o;
124 }
125
126Below are representative C++ implementations of all of the operations. Their
127purpose is to document the desired semantics of each operation, assuming
128``memory_order_seq_cst``. This is essentially the code that will be called
129if the front end calls out to compiler-rt.
130
131.. code-block:: cpp
132
133 template <class T>
134 T __atomic_load(T const volatile* obj) {
135 unique_lock<mutex> _(some_mutex);
136 return *obj;
137 }
138
139 template <class T>
140 void __atomic_store(T volatile* obj, T desr) {
141 unique_lock<mutex> _(some_mutex);
142 *obj = desr;
143 }
144
145 template <class T>
146 T __atomic_exchange(T volatile* obj, T desr) {
147 unique_lock<mutex> _(some_mutex);
148 T r = *obj;
149 *obj = desr;
150 return r;
151 }
152
153 template <class T>
154 bool __atomic_compare_exchange_strong(T volatile* obj, T* exp, T desr) {
155 unique_lock<mutex> _(some_mutex);
156 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp)
157 {
158 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr;
159 return true;
160 }
161 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj;
162 return false;
163 }
164
165 // May spuriously return false (even if *obj == *exp)
166 template <class T>
167 bool __atomic_compare_exchange_weak(T volatile* obj, T* exp, T desr) {
168 unique_lock<mutex> _(some_mutex);
169 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp)
170 {
171 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr;
172 return true;
173 }
174 std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj;
175 return false;
176 }
177
178 template <class T>
179 T __atomic_fetch_add(T volatile* obj, T operand) {
180 unique_lock<mutex> _(some_mutex);
181 T r = *obj;
182 *obj += operand;
183 return r;
184 }
185
186 template <class T>
187 T __atomic_fetch_sub(T volatile* obj, T operand) {
188 unique_lock<mutex> _(some_mutex);
189 T r = *obj;
190 *obj -= operand;
191 return r;
192 }
193
194 template <class T>
195 T __atomic_fetch_and(T volatile* obj, T operand) {
196 unique_lock<mutex> _(some_mutex);
197 T r = *obj;
198 *obj &= operand;
199 return r;
200 }
201
202 template <class T>
203 T __atomic_fetch_or(T volatile* obj, T operand) {
204 unique_lock<mutex> _(some_mutex);
205 T r = *obj;
206 *obj |= operand;
207 return r;
208 }
209
210 template <class T>
211 T __atomic_fetch_xor(T volatile* obj, T operand) {
212 unique_lock<mutex> _(some_mutex);
213 T r = *obj;
214 *obj ^= operand;
215 return r;
216 }
217
218 void* __atomic_fetch_add(void* volatile* obj, ptrdiff_t operand) {
219 unique_lock<mutex> _(some_mutex);
220 void* r = *obj;
221 (char*&)(*obj) += operand;
222 return r;
223 }
224
225 void* __atomic_fetch_sub(void* volatile* obj, ptrdiff_t operand) {
226 unique_lock<mutex> _(some_mutex);
227 void* r = *obj;
228 (char*&)(*obj) -= operand;
229 return r;
230 }
231
232 void __atomic_thread_fence() {
233 unique_lock<mutex> _(some_mutex);
234 }
235
236 void __atomic_signal_fence() {
237 unique_lock<mutex> _(some_mutex);
238 }
239
240
241Design B: Something in between
242==============================
243This is a variation of design A which puts the burden on the library to arrange
244for the correct manipulation of the run time memory ordering arguments, and only
245calls the compiler for well-defined memory orderings. I think of this design as
246the worst of A and C, instead of the best of A and C. But I offer it as an
247option in the spirit of completeness.
248
249.. code-block:: cpp
250
251 // type must be trivially copyable
252 bool __atomic_is_lock_free(const type* atomic_obj);
253
254 // type must be trivially copyable
255 type __atomic_load_relaxed(const volatile type* atomic_obj);
256 type __atomic_load_consume(const volatile type* atomic_obj);
257 type __atomic_load_acquire(const volatile type* atomic_obj);
258 type __atomic_load_seq_cst(const volatile type* atomic_obj);
259
260 // type must be trivially copyable
261 type __atomic_store_relaxed(volatile type* atomic_obj, type desired);
262 type __atomic_store_release(volatile type* atomic_obj, type desired);
263 type __atomic_store_seq_cst(volatile type* atomic_obj, type desired);
264
265 // type must be trivially copyable
266 type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
267 type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
268 type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
269 type __atomic_exchange_release(volatile type* atomic_obj, type desired);
270 type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
271 type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);
272
273 // type must be trivially copyable
274 bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
275 type* expected,
276 type desired);
277 bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
278 type* expected,
279 type desired);
280 bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
281 type* expected,
282 type desired);
283 bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
284 type* expected,
285 type desired);
286 bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
287 type* expected,
288 type desired);
289 bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
290 type* expected,
291 type desired);
292 bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
293 type* expected,
294 type desired);
295 bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
296 type* expected,
297 type desired);
298 bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
299 type* expected,
300 type desired);
301 bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
302 type* expected,
303 type desired);
304 bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
305 type* expected,
306 type desired);
307 bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
308 type* expected,
309 type desired);
310 bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
311 type* expected,
312 type desired);
313 bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
314 type* expected,
315 type desired);
316 bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
317 type* expected,
318 type desired);
319 bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
320 type* expected,
321 type desired);
322
323 // type must be trivially copyable
324 bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
325 type* expected,
326 type desired);
327 bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
328 type* expected,
329 type desired);
330 bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
331 type* expected,
332 type desired);
333 bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
334 type* expected,
335 type desired);
336 bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
337 type* expected,
338 type desired);
339 bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
340 type* expected,
341 type desired);
342 bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
343 type* expected,
344 type desired);
345 bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
346 type* expected,
347 type desired);
348 bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
349 type* expected,
350 type desired);
351 bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
352 type* expected,
353 type desired);
354 bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
355 type* expected,
356 type desired);
357 bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
358 type* expected,
359 type desired);
360 bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
361 type* expected,
362 type desired);
363 bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
364 type* expected,
365 type desired);
366 bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
367 type* expected,
368 type desired);
369 bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
370 type* expected,
371 type desired);
372
373 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
374 // unsigned int, long, unsigned long, long long, unsigned long long,
375 // char16_t, char32_t, wchar_t
376 type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
377 type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
378 type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
379 type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
380 type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
381 type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);
382
383 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
384 // unsigned int, long, unsigned long, long long, unsigned long long,
385 // char16_t, char32_t, wchar_t
386 type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
387 type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
388 type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
389 type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
390 type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
391 type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);
392
393 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
394 // unsigned int, long, unsigned long, long long, unsigned long long,
395 // char16_t, char32_t, wchar_t
396 type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
397 type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
398 type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
399 type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
400 type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
401 type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);
402
403 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
404 // unsigned int, long, unsigned long, long long, unsigned long long,
405 // char16_t, char32_t, wchar_t
406 type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
407 type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
408 type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
409 type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
410 type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
411 type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);
412
413 // type is one of: char, signed char, unsigned char, short, unsigned short, int,
414 // unsigned int, long, unsigned long, long long, unsigned long long,
415 // char16_t, char32_t, wchar_t
416 type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
417 type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
418 type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
419 type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
420 type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
421 type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);
422
423 void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
424 void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
425 void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
426 void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
427 void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
428 void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
429
430 void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
431 void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
432 void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
433 void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
434 void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
435 void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
436
437 void __atomic_thread_fence_relaxed();
438 void __atomic_thread_fence_consume();
439 void __atomic_thread_fence_acquire();
440 void __atomic_thread_fence_release();
441 void __atomic_thread_fence_acq_rel();
442 void __atomic_thread_fence_seq_cst();
443
444 void __atomic_signal_fence_relaxed();
445 void __atomic_signal_fence_consume();
446 void __atomic_signal_fence_acquire();
447 void __atomic_signal_fence_release();
448 void __atomic_signal_fence_acq_rel();
449 void __atomic_signal_fence_seq_cst();
450
451Design C: Minimal work for the front end
452========================================
453The ``<atomic>`` header is one of the most closely coupled headers to the compiler.
454Ideally when you invoke any function from ``<atomic>``, it should result in highly
455optimized assembly being inserted directly into your application -- assembly that
456is not otherwise representable by higher level C or C++ expressions. The design of
457the libc++ ``<atomic>`` header started with this goal in mind. A secondary, but
458still very important goal is that the compiler should have to do minimal work to
459facilitate the implementation of ``<atomic>``. Without this second goal, then
460practically speaking, the libc++ ``<atomic>`` header would be doomed to be a
461barely supported, second class citizen on almost every platform.
462
463Goals:
464
465- Optimal code generation for atomic operations
466- Minimal effort for the compiler to achieve goal 1 on any given platform
467- Conformance to the C++0X draft standard
468
469The purpose of this document is to inform compiler writers what they need to do
470to enable a high performance libc++ ``<atomic>`` with minimal effort.
471
472The minimal work that must be done for a conforming ``<atomic>``
473----------------------------------------------------------------
474The only "atomic" operations that must actually be lock free in
475``<atomic>`` are represented by the following compiler intrinsics:
476
477.. code-block:: cpp
478
479 __atomic_flag__ __atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) {
480 unique_lock<mutex> _(some_mutex);
481 __atomic_flag__ result = *obj;
482 *obj = desr;
483 return result;
484 }
485
486 void __atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) {
487 unique_lock<mutex> _(some_mutex);
488 *obj = desr;
489 }
490
491Where:
492
493- If ``__has_feature(__atomic_flag)`` evaluates to 1 in the preprocessor then
494 the compiler must define ``__atomic_flag__`` (e.g. as a typedef to ``int``).
495- If ``__has_feature(__atomic_flag)`` evaluates to 0 in the preprocessor then
496 the library defines ``__atomic_flag__`` as a typedef to ``bool``.
497- To communicate that the above intrinsics are available, the compiler must
498 arrange for ``__has_feature`` to return 1 when fed the intrinsic name
499 appended with an '_' and the mangled type name of ``__atomic_flag__``.
500
501For example if ``__atomic_flag__`` is ``unsigned int``:
502
503.. code-block:: cpp
504
505 // __has_feature(__atomic_flag) == 1
506 // __has_feature(__atomic_exchange_seq_cst_j) == 1
507 // __has_feature(__atomic_store_seq_cst_j) == 1
508
509 typedef unsigned int __atomic_flag__;
510
511 unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) {
512 // ...
513 }
514
515 void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) {
516 // ...
517 }
518
519That's it! Compiler writers do the above and you've got a fully conforming
520(though sub-par performance) ``<atomic>`` header!
521
522
523Recommended work for a higher performance ``<atomic>``
524------------------------------------------------------
525It would be good if the above intrinsics worked with all integral types plus
526``void*``. Because this may not be possible to do in a lock-free manner for
527all integral types on all platforms, a compiler must communicate each type that
528an intrinsic works with. For example, if ``__atomic_exchange_seq_cst`` works
529for all types except for ``long long`` and ``unsigned long long`` then:
530
531.. code-block:: cpp
532
533 __has_feature(__atomic_exchange_seq_cst_b) == 1 // bool
534 __has_feature(__atomic_exchange_seq_cst_c) == 1 // char
535 __has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char
536 __has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char
537 __has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
538 __has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
539 __has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t
540 __has_feature(__atomic_exchange_seq_cst_s) == 1 // short
541 __has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short
542 __has_feature(__atomic_exchange_seq_cst_i) == 1 // int
543 __has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int
544 __has_feature(__atomic_exchange_seq_cst_l) == 1 // long
545 __has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long
546 __has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
547
548Note that only the ``__has_feature`` flag is decorated with the argument
549type. The name of the compiler intrinsic is not decorated, but instead works
550like a C++ overloaded function.
551
552Additionally, there are other intrinsics besides ``__atomic_exchange_seq_cst``
553and ``__atomic_store_seq_cst``. They are optional. But if the compiler can
554generate faster code than provided by the library, then clients will benefit
555from the compiler writer's expertise and knowledge of the targeted platform.
556
557Below is the complete list of *sequentially consistent* intrinsics, and
558their library implementations. Template syntax is used to indicate the desired
559overloading for integral and ``void*`` types. The template does not represent a
560requirement that the intrinsic operate on **any** type!
561
562.. code-block:: cpp
563
564 // T is one of:
565 // bool, char, signed char, unsigned char, short, unsigned short,
566 // int, unsigned int, long, unsigned long,
567 // long long, unsigned long long, char16_t, char32_t, wchar_t, void*
568
569 template <class T>
570 T __atomic_load_seq_cst(T const volatile* obj) {
571 unique_lock<mutex> _(some_mutex);
572 return *obj;
573 }
574
575 template <class T>
576 void __atomic_store_seq_cst(T volatile* obj, T desr) {
577 unique_lock<mutex> _(some_mutex);
578 *obj = desr;
579 }
580
581 template <class T>
582 T __atomic_exchange_seq_cst(T volatile* obj, T desr) {
583 unique_lock<mutex> _(some_mutex);
584 T r = *obj;
585 *obj = desr;
586 return r;
587 }
588
589 template <class T>
590 bool __atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) {
591 unique_lock<mutex> _(some_mutex);
592 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) {
593 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
594 return true;
595 }
596 std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
597 return false;
598 }
599
600 template <class T>
601 bool __atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) {
602 unique_lock<mutex> _(some_mutex);
603 if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
604 {
605 std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
606 return true;
607 }
608 std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
609 return false;
610 }
611
612 // T is one of:
613 // char, signed char, unsigned char, short, unsigned short,
614 // int, unsigned int, long, unsigned long,
615 // long long, unsigned long long, char16_t, char32_t, wchar_t
616
617 template <class T>
618 T __atomic_fetch_add_seq_cst(T volatile* obj, T operand) {
619 unique_lock<mutex> _(some_mutex);
620 T r = *obj;
621 *obj += operand;
622 return r;
623 }
624
625 template <class T>
626 T __atomic_fetch_sub_seq_cst(T volatile* obj, T operand) {
627 unique_lock<mutex> _(some_mutex);
628 T r = *obj;
629 *obj -= operand;
630 return r;
631 }
632
633 template <class T>
634 T __atomic_fetch_and_seq_cst(T volatile* obj, T operand) {
635 unique_lock<mutex> _(some_mutex);
636 T r = *obj;
637 *obj &= operand;
638 return r;
639 }
640
641 template <class T>
642 T __atomic_fetch_or_seq_cst(T volatile* obj, T operand) {
643 unique_lock<mutex> _(some_mutex);
644 T r = *obj;
645 *obj |= operand;
646 return r;
647 }
648
649 template <class T>
650 T __atomic_fetch_xor_seq_cst(T volatile* obj, T operand) {
651 unique_lock<mutex> _(some_mutex);
652 T r = *obj;
653 *obj ^= operand;
654 return r;
655 }
656
657 void* __atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) {
658 unique_lock<mutex> _(some_mutex);
659 void* r = *obj;
660 (char*&)(*obj) += operand;
661 return r;
662 }
663
664 void* __atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) {
665 unique_lock<mutex> _(some_mutex);
666 void* r = *obj;
667 (char*&)(*obj) -= operand;
668 return r;
669 }
670
671 void __atomic_thread_fence_seq_cst() {
672 unique_lock<mutex> _(some_mutex);
673 }
674
675 void __atomic_signal_fence_seq_cst() {
676 unique_lock<mutex> _(some_mutex);
677 }
678
679One should consult the (currently draft) `C++ Standard <https://wg21.link/n3126>`_
680for the details of the definitions for these operations. For example,
681``__atomic_compare_exchange_weak_seq_cst_seq_cst`` is allowed to fail
682spuriously while ``__atomic_compare_exchange_strong_seq_cst_seq_cst`` is not.
683
684If on your platform the lock-free definition of ``__atomic_compare_exchange_weak_seq_cst_seq_cst``
685would be the same as ``__atomic_compare_exchange_strong_seq_cst_seq_cst``, you may omit the
686``__atomic_compare_exchange_weak_seq_cst_seq_cst`` intrinsic without a performance cost. The
687library will prefer your implementation of ``__atomic_compare_exchange_strong_seq_cst_seq_cst``
688over its own definition for implementing ``__atomic_compare_exchange_weak_seq_cst_seq_cst``.
689That is, the library will arrange for ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` to call
690``__atomic_compare_exchange_strong_seq_cst_seq_cst`` if you supply an intrinsic for the strong
691version but not the weak.
692
693Taking advantage of weaker memory synchronization
694-------------------------------------------------
695So far, all of the intrinsics presented require a **sequentially consistent** memory ordering.
696That is, no loads or stores can move across the operation (just as if the library had locked
697that internal mutex). But ``<atomic>`` supports weaker memory ordering operations. In all,
698there are six memory orderings (listed here from strongest to weakest):
699
700.. code-block:: cpp
701
702 memory_order_seq_cst
703 memory_order_acq_rel
704 memory_order_release
705 memory_order_acquire
706 memory_order_consume
707 memory_order_relaxed
708
709(See the `C++ Standard <https://wg21.link/n3126>`_ for the detailed definitions of each of these orderings).
710
711On some platforms, the compiler vendor can offer some or even all of the above
712intrinsics at one or more weaker levels of memory synchronization. This might
713lead for example to not issuing an ``mfence`` instruction on the x86.
714
715If the compiler does not offer any given operation, at any given memory ordering
716level, the library will automatically attempt to call the next highest memory
717ordering operation. This continues up to ``seq_cst``, and if that doesn't
718exist, then the library takes over and does the job with a ``mutex``. This
719is a compile-time search and selection operation. At run time, the application
720will only see the few inlined assembly instructions for the selected intrinsic.
721
722Each intrinsic is appended with the 7-letter name of the memory ordering it
723addresses. For example a ``load`` with ``relaxed`` ordering is defined by:
724
725.. code-block:: cpp
726
727 T __atomic_load_relaxed(const volatile T* obj);
728
729And announced with:
730
731.. code-block:: cpp
732
733 __has_feature(__atomic_load_relaxed_b) == 1 // bool
734 __has_feature(__atomic_load_relaxed_c) == 1 // char
735 __has_feature(__atomic_load_relaxed_a) == 1 // signed char
736 ...
737
738The ``__atomic_compare_exchange_strong(weak)`` intrinsics are parameterized
739on two memory orderings. The first ordering applies when the operation returns
740``true`` and the second ordering applies when the operation returns ``false``.
741
742Not every memory ordering is appropriate for every operation. ``exchange``
743and the ``fetch_XXX`` operations support all 6. But ``load`` only supports
744``relaxed``, ``consume``, ``acquire`` and ``seq_cst``. ``store`` only supports
745``relaxed``, ``release``, and ``seq_cst``. The ``compare_exchange`` operations
746support the following 16 combinations out of the possible 36:
747
748.. code-block:: cpp
749
750 relaxed_relaxed
751 consume_relaxed
752 consume_consume
753 acquire_relaxed
754 acquire_consume
755 acquire_acquire
756 release_relaxed
757 release_consume
758 release_acquire
759 acq_rel_relaxed
760 acq_rel_consume
761 acq_rel_acquire
762 seq_cst_relaxed
763 seq_cst_consume
764 seq_cst_acquire
765 seq_cst_seq_cst
766
767Again, the compiler supplies intrinsics only for the strongest orderings where
768it can make a difference. The library takes care of calling the weakest
769supplied intrinsic that is as strong or stronger than the customer asked for.
770
771Note about ABI
772==============
773With any design, the (back end) compiler writer should note that the decision to
774implement lock-free operations on any given type (or not) is an ABI-binding decision.
775One can not change from treating a type as not lock free, to lock free (or vice-versa)
776without breaking your ABI.
777
778For example:
779
780**TU1.cpp**:
781
782.. code-block:: cpp
783
784 extern atomic<long long> A;
785 int foo() { return A.compare_exchange_strong(w, x); }
786
787
788**TU2.cpp**:
789
790.. code-block:: cpp
791
792 extern atomic<long long> A;
793 void bar() { return A.compare_exchange_strong(y, z); }
794
795If only **one** of these calls to ``compare_exchange_strong`` is implemented with
796mutex-locked code, then that mutex-locked code will not be executed mutually
797exclusively of the one implemented in a lock-free manner.