SlideShare a Scribd company logo
4
Most read
6
Most read
7
Most read
Introduction to memory order
consume
2015
issue.hsu@gmail.com
Outline
โ€ข Quick recap of acquire and release semantics
โ€ข The purpose of consume semantics
โ€ข Todayโ€™s compiler support
2
Quick Recap of Acquire and Release
Semantics
3
Memory Order
โ€ข In the C++11 standard atomic library, most functions accept a
memory_order argument
โ€ข Both consume and acquire serve the same purpose
โ€“ To help pass non-atomic information safely between threads
โ€ข Like acquire operations, a consume operation must be combined with
a release operation in another thread
4
enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
};
Example of Acquire and Release
โ€ข Declare two shared variables
โ€ข The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
โ€ข Another asynchronous task running in another thread try to do a write
operation
5
atomic<int> Guard(0);
int Payload = 0;
for(โ€ฆ)
{
โ€ฆ.
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
โ€ฆ
}
Payload = 42;
Guard.store(1, memory_order_release);
Example of Acquire and Release
โ€ข Once the asynchronous task writes to Guard, the main thread reads it
โ€“ It means that the write-release synchronized-with the read-acquire
โ€“ We are guaranteed that p will equal 42, no matter what platform we run this
example on
โ€ข Weโ€™ve used acquire and release semantics to pass a simple non-
atomic integer Payload between threads
6
The Cost of Acquire Semantics
7
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
strong memory model weakly-ordered CPU
The Purpose of Consume Semantics
8
Data Dependency
โ€ข The PowerPC and ARM are weakly-ordered CPUs, but in fact, there
are some cases where they do enforce memory ordering at the
machine instruction level without the need for explicit memory barrier
instructions
โ€“ These processors always preserve memory ordering between data-dependent
instructions
โ€ข When multiple instructions are data-dependent on each other, we call
it a data dependency chain
โ€“ In the following PowerPC listing, there are two independent data dependency
chains
9
Data Dependency
โ€ข Consume semantics are designed to exploit the data dependency
ordering
โ€ข At the source code level, a dependency chain is a sequence of
expressions whose evaluations all carry-a-dependency to each
another
โ€“ Carries-a-dependency is defined in ยง1.10.9 of the C++11 standard
โ€“ It mainly says that one evaluation carries-a-dependency to another if the value of
the first is used as an operand of the second
10
Example of Consume and Release
โ€ข Declare two shared variables
โ€ข The main thread sits in a loop, repeatedly attempting the following
sequence of read operations
โ€ข Another asynchronous task running in another thread try to do a write
operation
11
atomic<int> Guard(0);
int Payload = 0;
for(โ€ฆ)
{
โ€ฆ
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
โ€ฆ
}
Payload = 42;
Guard.store(1, memory_order_release);
atomic<int*> Guard(nullptr);
int Payload = 0;
Payload = 42;
Guard.store(&Payload, memory_order_release);
for(โ€ฆ)
{
โ€ฆ
g = Guard.load(memory_order_consume);
if (g != nullptr)
p = *g;
โ€ฆ
}
Example of Consume and Release
โ€ข This time, we donโ€™t have a synchronizes-with relationship anywhere.
What we have this time is called a dependency-ordered-before
relationship
โ€ข In any dependency-ordered-before relationship, thereโ€™s a
dependency chain starting at the consume operation, and all memory
operations performed before the write-release are guaranteed to be
visible to that chain.
12
Example of Consume and Release
13
Todayโ€™s Compiler Support
14
Current Compiler Status
โ€ข Those assembly code listings just showed you for PowerPC and
ARMv7 were fabricated
โ€“ Sorry, but GCC 4.8.3 and Clang 4.6 donโ€™t actually generate that machine code for
consume operations
โ€ข Current versions of GCC and Clang/LLVM use the heavy strategy, all
the time
โ€“ As a result, if you compile memory_order_consume for PowerPC or ARMv7 using
todayโ€™s compilers, youโ€™ll end up with unnecessary memory barrier instructions
15
Efficient Compiler Strategy in GCC
โ€ข GCC 4.9.2 actually has an efficient compiler strategy in its
implementation of memory_order_consume, as described in this
GCC bug report
โ€“ Only available in GCC 4.9.2 AARCH64 target
16
โ€ข In this example, we are admittedly abusing C++11โ€™s definition of carry-a-dependency
by using f in an expression that cancels it out (f - f). Nonetheless, we are still
technically playing by the standardโ€™s current rules, and thus, its ordering guarantees
should still apply
Example That Illustrates the Compiler Bug
17
int read()
{
int f = Guard.load(std::memory_order_consume); // load-consume
if (f != 0)
return Payload[f - f]; // plain load from Payload[f - f]
return 0;
}
int write()
{
Payload[0] = 42; // plain store to Payload[0]
Guard.store(1, std::memory_order_release); // store-release
}
#include <atomic>
std::atomic<int> Guard(0);
int Payload[1] = { 0xbadf00d };
$ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
A Patch for This Bug
โ€ข Andrew Macleod posted a patch for this issue in the bug report. His
patch adds the following lines near the end of the get_memmodel
function in gcc/builtins.c
โ€ข After patching
โ€“ $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
18
/* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so
be conservative and promote consume to acquire. */
if (val == MEMMODEL_CONSUME)
val = MEMMODEL_ACQUIRE;
This Bug Doesnโ€™t Happen on PowerPC
โ€ข Interestingly, if you compile the same example for PowerPC, there is
no bug. This is using the same GCC version 4.9.2 without Andrewโ€™s
patch applied
โ€“ $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp
19
if (model == MEMMODEL_RELAXED
|| model == MEMMODEL_CONSUME
|| model == MEMMODEL_RELEASE)
return "ldr<atomic_sfx>t%<w>0, %1";
else
return "ldar<atomic_sfx>t%<w>0, %1";
switch (model)
{
case MEMMODEL_RELAXED:
break;
case MEMMODEL_CONSUME:
case MEMMODEL_ACQUIRE:
case MEMMODEL_SEQ_CST:
emit_insn (gen_loadsync_<mode> (operands[0]));
break;
gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
The Uncertain Future of memory order
consume
โ€ข The C++ standard committee is wondering what to do with
memory_order_consume in future revisions of C++
โ€ข The authorโ€™s opinion is that the definition of carries-a-dependency
should be narrowed to require that different return values from a load-
consume result in different behavior for any dependent statements
that are executed
โ€“ Using f - f as a dependency is nonsense, and narrowing the definition would free
the compiler from having to support such nonsense โ€œdependenciesโ€ if it chooses
to implement the efficient strategy
โ€“ This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and
is captured among various alternatives described in Paul McKenneyโ€™s proposal
N4036
20
int my_array[MY_ARRAY_SIZE];
i = atomic_load_explicit(gi, memory_order_consume);
r1 = my_array[i];
References
21
References
โ€ข The Purpose of memory_order_consume in C++11
โ€“ http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
โ€ข Fixing GCC's Implementation of memory_order_consume
โ€“ http://preshing.com/20141124/fixing-gccs-implementation-of-
memory_order_consume/
โ€ข http://en.cppreference.com/w/cpp/atomic/memory_order
โ€ข Bug 59448 - Code generation doesn't respect C11 address-dependency
โ€“ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
โ€ข N4036: Towards Implementation and Use of memory order consume
โ€“ https://isocpp.org/files/papers/n4036.pdf
โ€ข Demo program
โ€“ https://github.com/preshing/ConsumeDemo
22

More Related Content

What's hot (20)

PPTX
Instruction Set Architecture
Jaffer Haadi
ย 
PPTX
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
varun arora
ย 
PPTX
Memory Hierarchy
chauhankapil
ย 
PPTX
Memory banking-of-8086-final
Estiak Khan
ย 
PDF
SOC Interconnects: AMBA & CoreConnect
Dr. A. B. Shinde
ย 
PDF
Logic Synthesis
VandanaPagar1
ย 
PPT
Cache memory presentation
bravehearted1010
ย 
PDF
Physical design
Manjunath hosmani
ย 
PPTX
Cache memory principles
bit allahabad
ย 
PPT
Coding style for good synthesis
Vinchipsytm Vlsitraining
ย 
PDF
Programmable Logic Array(PLA), digital circuits
warda aziz
ย 
PPTX
CACHE MEMORY
VENNILAV6
ย 
PPT
Parallel architecture
Mr SMAK
ย 
PPTX
8257 DMA Controller
ShivamSood22
ย 
PDF
Memory consistency models
palani kumar
ย 
PPTX
Von-Neumann machine and IAS architecture
Shishir Aryal
ย 
PPTX
Multiplication & division instructions microprocessor 8086
University of Gujrat, Pakistan
ย 
PPT
Basic ops concept of comp
gaurav jain
ย 
PPT
KARNAUGH MAP(K-MAP)
mihir jain
ย 
DOCX
ARM lab programs
revanasidha janbgi
ย 
Instruction Set Architecture
Jaffer Haadi
ย 
Backtracking-N Queens Problem-Graph Coloring-Hamiltonian cycle
varun arora
ย 
Memory Hierarchy
chauhankapil
ย 
Memory banking-of-8086-final
Estiak Khan
ย 
SOC Interconnects: AMBA & CoreConnect
Dr. A. B. Shinde
ย 
Logic Synthesis
VandanaPagar1
ย 
Cache memory presentation
bravehearted1010
ย 
Physical design
Manjunath hosmani
ย 
Cache memory principles
bit allahabad
ย 
Coding style for good synthesis
Vinchipsytm Vlsitraining
ย 
Programmable Logic Array(PLA), digital circuits
warda aziz
ย 
CACHE MEMORY
VENNILAV6
ย 
Parallel architecture
Mr SMAK
ย 
8257 DMA Controller
ShivamSood22
ย 
Memory consistency models
palani kumar
ย 
Von-Neumann machine and IAS architecture
Shishir Aryal
ย 
Multiplication & division instructions microprocessor 8086
University of Gujrat, Pakistan
ย 
Basic ops concept of comp
gaurav jain
ย 
KARNAUGH MAP(K-MAP)
mihir jain
ย 
ARM lab programs
revanasidha janbgi
ย 

Similar to Introduction to memory order consume (20)

PPTX
Memory model
Yi-Hsiu Hsu
ย 
PDF
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Hemanth Venkatesh
ย 
PPT
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
ย 
PPTX
Memory model
MingdongLiao
ย 
PDF
OpenPOWER Application Optimization
Ganesan Narayanasamy
ย 
PDF
BKK16-208 EAS
Linaro
ย 
PDF
Q4.11: Sched_mc on dual / quad cores
Linaro
ย 
PDF
Memory Leak Debuging in the Semi conductor Hardwares
Karthick Rajagopal
ย 
PDF
Understanding of linux kernel memory model
SeongJae Park
ย 
PDF
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cruftex
ย 
PDF
Open Dayligth usando SDN-NFV
Open Networking Perรบ (Opennetsoft)
ย 
PPT
slides8 SharedMemory.ppt
aminnezarat
ย 
ODP
Java memory model
Michaล‚ Warecki
ย 
PDF
Clug 2012 March web server optimisation
grooverdan
ย 
PDF
spinlock.pdf
Adrian Huang
ย 
PDF
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Linaro
ย 
PPT
01 oracle architecture
Smitha Padmanabhan
ย 
PDF
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
Linaro
ย 
DOCX
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
ย 
DOCX
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
ย 
Memory model
Yi-Hsiu Hsu
ย 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Hemanth Venkatesh
ย 
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
ย 
Memory model
MingdongLiao
ย 
OpenPOWER Application Optimization
Ganesan Narayanasamy
ย 
BKK16-208 EAS
Linaro
ย 
Q4.11: Sched_mc on dual / quad cores
Linaro
ย 
Memory Leak Debuging in the Semi conductor Hardwares
Karthick Rajagopal
ย 
Understanding of linux kernel memory model
SeongJae Park
ย 
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
cruftex
ย 
Open Dayligth usando SDN-NFV
Open Networking Perรบ (Opennetsoft)
ย 
slides8 SharedMemory.ppt
aminnezarat
ย 
Java memory model
Michaล‚ Warecki
ย 
Clug 2012 March web server optimisation
grooverdan
ย 
spinlock.pdf
Adrian Huang
ย 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Linaro
ย 
01 oracle architecture
Smitha Padmanabhan
ย 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
Linaro
ย 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
tidwellveronique
ย 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
tidwellveronique
ย 
Ad

More from Yi-Hsiu Hsu (7)

PPTX
Glow introduction
Yi-Hsiu Hsu
ย 
PPTX
TensorRT survey
Yi-Hsiu Hsu
ย 
PPTX
Yocto Project introduction
Yi-Hsiu Hsu
ย 
PPTX
Understand more about C
Yi-Hsiu Hsu
ย 
PPTX
RISC-V Introduction
Yi-Hsiu Hsu
ย 
PPTX
GCC for ARMv8 Aarch64
Yi-Hsiu Hsu
ย 
PPTX
Introduction to armv8 aarch64
Yi-Hsiu Hsu
ย 
Glow introduction
Yi-Hsiu Hsu
ย 
TensorRT survey
Yi-Hsiu Hsu
ย 
Yocto Project introduction
Yi-Hsiu Hsu
ย 
Understand more about C
Yi-Hsiu Hsu
ย 
RISC-V Introduction
Yi-Hsiu Hsu
ย 
GCC for ARMv8 Aarch64
Yi-Hsiu Hsu
ย 
Introduction to armv8 aarch64
Yi-Hsiu Hsu
ย 
Ad

Recently uploaded (20)

PDF
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
ย 
PDF
Rewards and Recognition (2).pdf
ethan Talor
ย 
PDF
Building scalbale cloud native apps with .NET 8
GillesMathieu10
ย 
PDF
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
PPTX
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
PPTX
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
PDF
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
PPTX
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
ย 
PPTX
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
ย 
PDF
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
PPTX
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
ย 
PDF
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
ย 
PPTX
arctitecture application system design os dsa
za241967
ย 
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
ย 
PDF
Which Hiring Management Tools Offer the Best ROI?
HireME
ย 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
PDF
Best Software Development at Best Prices
softechies7
ย 
PDF
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
ย 
Rewards and Recognition (2).pdf
ethan Talor
ย 
Building scalbale cloud native apps with .NET 8
GillesMathieu10
ย 
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
ย 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
IObit Driver Booster Pro Crack Download Latest Version
chaudhryakashoo065
ย 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
ย 
CV-Project_2024 version 01222222222.pptx
MohammadSiddiqui70
ย 
IObit Uninstaller Pro 14.3.1.8 Crack for Windows Latest
utfefguu
ย 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
ย 
IDM Crack with Internet Download Manager 6.42 [Latest 2025]
HyperPc soft
ย 
Automated Test Case Repair Using Language Models
Lionel Briand
ย 
IObit Uninstaller Pro 14.3.1.8 Crack Free Download 2025
sdfger qwerty
ย 
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
ย 
arctitecture application system design os dsa
za241967
ย 
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
ย 
Which Hiring Management Tools Offer the Best ROI?
HireME
ย 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
ย 
Best Software Development at Best Prices
softechies7
ย 
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
ย 

Introduction to memory order consume

  • 2. Outline โ€ข Quick recap of acquire and release semantics โ€ข The purpose of consume semantics โ€ข Todayโ€™s compiler support 2
  • 3. Quick Recap of Acquire and Release Semantics 3
  • 4. Memory Order โ€ข In the C++11 standard atomic library, most functions accept a memory_order argument โ€ข Both consume and acquire serve the same purpose โ€“ To help pass non-atomic information safely between threads โ€ข Like acquire operations, a consume operation must be combined with a release operation in another thread 4 enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst };
  • 5. Example of Acquire and Release โ€ข Declare two shared variables โ€ข The main thread sits in a loop, repeatedly attempting the following sequence of read operations โ€ข Another asynchronous task running in another thread try to do a write operation 5 atomic<int> Guard(0); int Payload = 0; for(โ€ฆ) { โ€ฆ. g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; โ€ฆ } Payload = 42; Guard.store(1, memory_order_release);
  • 6. Example of Acquire and Release โ€ข Once the asynchronous task writes to Guard, the main thread reads it โ€“ It means that the write-release synchronized-with the read-acquire โ€“ We are guaranteed that p will equal 42, no matter what platform we run this example on โ€ข Weโ€™ve used acquire and release semantics to pass a simple non- atomic integer Payload between threads 6
  • 7. The Cost of Acquire Semantics 7 g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; strong memory model weakly-ordered CPU
  • 8. The Purpose of Consume Semantics 8
  • 9. Data Dependency โ€ข The PowerPC and ARM are weakly-ordered CPUs, but in fact, there are some cases where they do enforce memory ordering at the machine instruction level without the need for explicit memory barrier instructions โ€“ These processors always preserve memory ordering between data-dependent instructions โ€ข When multiple instructions are data-dependent on each other, we call it a data dependency chain โ€“ In the following PowerPC listing, there are two independent data dependency chains 9
  • 10. Data Dependency โ€ข Consume semantics are designed to exploit the data dependency ordering โ€ข At the source code level, a dependency chain is a sequence of expressions whose evaluations all carry-a-dependency to each another โ€“ Carries-a-dependency is defined in ยง1.10.9 of the C++11 standard โ€“ It mainly says that one evaluation carries-a-dependency to another if the value of the first is used as an operand of the second 10
  • 11. Example of Consume and Release โ€ข Declare two shared variables โ€ข The main thread sits in a loop, repeatedly attempting the following sequence of read operations โ€ข Another asynchronous task running in another thread try to do a write operation 11 atomic<int> Guard(0); int Payload = 0; for(โ€ฆ) { โ€ฆ g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; โ€ฆ } Payload = 42; Guard.store(1, memory_order_release); atomic<int*> Guard(nullptr); int Payload = 0; Payload = 42; Guard.store(&Payload, memory_order_release); for(โ€ฆ) { โ€ฆ g = Guard.load(memory_order_consume); if (g != nullptr) p = *g; โ€ฆ }
  • 12. Example of Consume and Release โ€ข This time, we donโ€™t have a synchronizes-with relationship anywhere. What we have this time is called a dependency-ordered-before relationship โ€ข In any dependency-ordered-before relationship, thereโ€™s a dependency chain starting at the consume operation, and all memory operations performed before the write-release are guaranteed to be visible to that chain. 12
  • 13. Example of Consume and Release 13
  • 15. Current Compiler Status โ€ข Those assembly code listings just showed you for PowerPC and ARMv7 were fabricated โ€“ Sorry, but GCC 4.8.3 and Clang 4.6 donโ€™t actually generate that machine code for consume operations โ€ข Current versions of GCC and Clang/LLVM use the heavy strategy, all the time โ€“ As a result, if you compile memory_order_consume for PowerPC or ARMv7 using todayโ€™s compilers, youโ€™ll end up with unnecessary memory barrier instructions 15
  • 16. Efficient Compiler Strategy in GCC โ€ข GCC 4.9.2 actually has an efficient compiler strategy in its implementation of memory_order_consume, as described in this GCC bug report โ€“ Only available in GCC 4.9.2 AARCH64 target 16
  • 17. โ€ข In this example, we are admittedly abusing C++11โ€™s definition of carry-a-dependency by using f in an expression that cancels it out (f - f). Nonetheless, we are still technically playing by the standardโ€™s current rules, and thus, its ordering guarantees should still apply Example That Illustrates the Compiler Bug 17 int read() { int f = Guard.load(std::memory_order_consume); // load-consume if (f != 0) return Payload[f - f]; // plain load from Payload[f - f] return 0; } int write() { Payload[0] = 42; // plain store to Payload[0] Guard.store(1, std::memory_order_release); // store-release } #include <atomic> std::atomic<int> Guard(0); int Payload[1] = { 0xbadf00d }; $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp
  • 18. A Patch for This Bug โ€ข Andrew Macleod posted a patch for this issue in the bug report. His patch adds the following lines near the end of the get_memmodel function in gcc/builtins.c โ€ข After patching โ€“ $ aarch64-linux-g++ -std=c++11 -O2 -S consumetest.cpp 18 /* Workaround for Bugzilla 59448. GCC doesn't track consume properly, so be conservative and promote consume to acquire. */ if (val == MEMMODEL_CONSUME) val = MEMMODEL_ACQUIRE;
  • 19. This Bug Doesnโ€™t Happen on PowerPC โ€ข Interestingly, if you compile the same example for PowerPC, there is no bug. This is using the same GCC version 4.9.2 without Andrewโ€™s patch applied โ€“ $ powerpc-linux-g++ -std=c++11 -O2 -S consumetest.cpp 19 if (model == MEMMODEL_RELAXED || model == MEMMODEL_CONSUME || model == MEMMODEL_RELEASE) return "ldr<atomic_sfx>t%<w>0, %1"; else return "ldar<atomic_sfx>t%<w>0, %1"; switch (model) { case MEMMODEL_RELAXED: break; case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: case MEMMODEL_SEQ_CST: emit_insn (gen_loadsync_<mode> (operands[0])); break; gcc-4.9.2/gcc/config/rs6000/sync.mdgcc-4.9.2/gcc/config/aarch64/atomics.md
  • 20. The Uncertain Future of memory order consume โ€ข The C++ standard committee is wondering what to do with memory_order_consume in future revisions of C++ โ€ข The authorโ€™s opinion is that the definition of carries-a-dependency should be narrowed to require that different return values from a load- consume result in different behavior for any dependent statements that are executed โ€“ Using f - f as a dependency is nonsense, and narrowing the definition would free the compiler from having to support such nonsense โ€œdependenciesโ€ if it chooses to implement the efficient strategy โ€“ This idea was first proposed by Torvald Riegel in the Linux Kernel Mailing List and is captured among various alternatives described in Paul McKenneyโ€™s proposal N4036 20 int my_array[MY_ARRAY_SIZE]; i = atomic_load_explicit(gi, memory_order_consume); r1 = my_array[i];
  • 22. References โ€ข The Purpose of memory_order_consume in C++11 โ€“ http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/ โ€ข Fixing GCC's Implementation of memory_order_consume โ€“ http://preshing.com/20141124/fixing-gccs-implementation-of- memory_order_consume/ โ€ข http://en.cppreference.com/w/cpp/atomic/memory_order โ€ข Bug 59448 - Code generation doesn't respect C11 address-dependency โ€“ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 โ€ข N4036: Towards Implementation and Use of memory order consume โ€“ https://isocpp.org/files/papers/n4036.pdf โ€ข Demo program โ€“ https://github.com/preshing/ConsumeDemo 22