Skip to content

[AArch64] Why doesn't clang generate stp/ldp for 256-bit value ? #139005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Zhenhang1213 opened this issue May 8, 2025 · 2 comments
Open

[AArch64] Why doesn't clang generate stp/ldp for 256-bit value ? #139005

Zhenhang1213 opened this issue May 8, 2025 · 2 comments

Comments

@Zhenhang1213
Copy link
Contributor

In arm document: it shows that if use keywords like volatile with a 128-bit structure type, it is possible to treat two 64-bit values as a 128-bit value. I have confirmed this works in Clang/GCC. However, when I extend this to a 256-bit data structure, LLVM cannot handle it while GCC can. Is it possible to extend LLVM's behavior?

demo:
https://godbolt.org/z/Eoj454o6T

@llvmbot llvmbot added the clang Clang issues not falling into any other category label May 8, 2025
@EugeneZelenko EugeneZelenko added backend:AArch64 and removed clang Clang issues not falling into any other category labels May 8, 2025
@llvmbot
Copy link
Member

llvmbot commented May 8, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Austin (Zhenhang1213)

In arm [document](https://developer.arm.com/documentation/ka004805/latest/): it shows that if use keywords like volatile with a 128-bit structure type, it is possible to treat two 64-bit values as a 128-bit value. I have confirmed this works in Clang/GCC. However, when I extend this to a 256-bit data structure, LLVM cannot handle it while GCC can. Is it possible to extend LLVM's behavior?

demo:
https://godbolt.org/z/Eoj454o6T

@davemgreen
Copy link
Collaborator

There is code to handle the volatile load/store to LDP case (https://godbolt.org/z/Yzbb1EoKM), but not from a memcpy, which will be lowered to "legal" types (i64) and not folded into LDP (https://godbolt.org/z/n4eWejhMr). Getting LLVM to do the same through a memcpy sounds sensible to me if you can figure out a way to do it. It goes via getOptimalMemOpType and gets split up in DAG creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants