-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Offload] regression on sm_60 #138560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hm, I didn't realize this was a literal limitation of older SMs. I don't think we even use this, but likely with |
@llvm/issue-subscribers-offload Author: Ye Luo (ye-luo)
I saw a regression on `sm_60` caused by #122781
after the merge I got
```
fatal error: error in backend: PTX does not support "atomic" for orderings different than"NotAtomic" or "Monotonic" for sm_60 or older, but order is: "seq_cst".
```
on the develop, error becomes.
```
fatal error: error in backend: PTX does not support "atomic" for orderings different than"NotAtomic" or "Monotonic" for sm_60 or older, but order is: "acquire".
```
|
with
with
|
That's interesting. I guess for now we can just revert that patch, since I erroneously thought it wasn't necessary because the backend handled it, but I didn't consider whether or not |
There were codes modified on top that patch. Could you take care of the revert? |
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes llvm#138560
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes llvm/llvm-project#138560
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes llvm#138560
Looking at the PTX spec for fence- the fence instruction with memory orderings is supported for sm70+. Here's the snippet which inserts a fence to implement a memory ordering for load/store: Perhaps we could provide a brute-force emulation that uses membar.sys (which is supported) instead of a fence with a memory order- but that will likely be very inefficient. |
Considering that the alternative is the program not compiling, doesn't seem that bad. |
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes llvm/llvm-project#138560 (cherry picked from commit dfcb8cb)
Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes llvm/llvm-project#138560 (cherry picked from commit dfcb8cb)
I saw a regression on
sm_60
caused by #122781after the merge I got
on the develop, error becomes.
reproducer, code
https://github.com/TApplencourt/OvO/blob/master/test_src/cpp/hierarchical_parallelism/memcopy-complex_double/target_teams_distribute_parallel_for.cpp
The text was updated successfully, but these errors were encountered: