Skip to content

lldb-server crashes on startup on AArch64 which has SME but not SVE #138717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DavidSpickett opened this issue May 6, 2025 · 6 comments
Open

Comments

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented May 6, 2025

Relates to #135563

I'm running Arm's Foundation Model via shrinkwrap with the following added to the v9.5-a config:

$ git diff
diff --git a/config/arch/v9.5.yaml b/config/arch/v9.5.yaml
index 789e64f..fd29552 100644
--- a/config/arch/v9.5.yaml
+++ b/config/arch/v9.5.yaml
@@ -16,3 +16,16 @@ run:
   params:
     -C cluster0.has_arm_v9-5: 1
     -C cluster1.has_arm_v9-5: 1
+    -C cluster0.has_sve : 1
+    -C cluster1.has_sve : 1
+    -C cluster0.sve.has_sme2 : 0
+    -C cluster1.sve.has_sme2 : 0
+    -C cluster0.sve.has_sme : 1
+    -C cluster1.sve.has_sme : 1
+    -C cluster0.sve.has_sve2 : 1
+    -C cluster1.sve.has_sve2 : 1
+    -C cluster0.sve.sme_only : 1
+    -C cluster1.sve.sme_only : 1
+    -C cluster0.sve.has_sme_fa64: 1
+    -C cluster1.sve.has_sme_fa64: 1
$ shrinkwrap --runtime null build --overlay=arch/v9.5.yaml ns-edk2.yaml
$ shrinkwrap --runtime null run --rtvar=KERNEL=/home/david.spickett/linux.build/arm64/arch/arm64/boot/Image --rtvar=ROOTFS=/home/david.spickett/jammy-arm64-rootfs.img ns-edk2.yaml

(normally it would use the docker runtime but I am inside of a docker container already)

The kernel and rootfs are built using the scripts in https://github.com/llvm/llvm-project/tree/main/lldb/scripts/lldb-test-qemu.

Currently SME is disabled in the kernel but for issues that shouldn't break lldb, not like this anyway. So I have patched it back in:

$ git diff
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..27437f13154e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2285,7 +2285,6 @@ config ARM64_SME
        bool "ARM Scalable Matrix Extension support"
        default y
        depends on ARM64_SVE
-       depends on BROKEN
        help
          The Scalable Matrix Extension (SME) is an extension to the AArch64
          execution state which utilises a substantial subset of the SVE

The cpuinfo reported is:

Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimdd
p asimdfhm dit uscat ilrcpc flagm sb paca pacg gcs dcpodp flagm2 frint i8mm bf16 dgh rng bti ecv afp sme 
smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 cssc mops hbc poe

Note all the sme.* features and no sve.* features.

This means the processor has SME but you cannot access the SVE register file outside of streaming mode or use SVE instructions. This is probably what the Apple M4 has, but I don't have one to confirm exactly.

When I start lldb-server it crashes immediately:

$ ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
/usr/lib/gcc/aarch64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1045: reference std::vecto
r<unsigned int>::operator[](size_type) [_Tp = unsigned int, _Alloc = std::allocator<unsigned int>]: Asser
tion '__n < this->size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrac
e.
Stack dump:
0.	Program arguments: ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var 
`LLVM_SYMBOLIZER_PATH` to point to it):
0  lldb-server     0x0000aaaaaaf78330
1  lldb-server     0x0000aaaaaaf7635c
2  lldb-server     0x0000aaaaaaf78a3c
3  linux-vdso.so.1 0x0000fffff7ffa7dc __kernel_rt_sigreturn + 0
4  libc.so.6       0x0000fffff77ff200
5  libc.so.6       0x0000fffff77ba67c raise + 28
6  libc.so.6       0x0000fffff77a7130 abort + 228
7  lldb-server     0x0000aaaaaaf0bfb0
8  lldb-server     0x0000aaaaab0da9a8
9  lldb-server     0x0000aaaaaafc09f4
10 lldb-server     0x0000aaaaaafc86bc
11 lldb-server     0x0000aaaaaafc84f0
12 lldb-server     0x0000aaaaaafaaa4c
13 lldb-server     0x0000aaaaaafaa78c
14 lldb-server     0x0000aaaaaafa9340
15 lldb-server     0x0000aaaaab018d90
16 lldb-server     0x0000aaaaaaf0916c
17 lldb-server     0x0000aaaaaaf0b054
18 lldb-server     0x0000aaaaaaf11dac
19 libc.so.6       0x0000fffff77a73fc
20 libc.so.6       0x0000fffff77a74cc __libc_start_main + 152
21 lldb-server     0x0000aaaaaaf08d70
Aborted

If I symbolise this:

$ ./bin/llvm-symbolizer --obj=./bin/lldb-server --adjust-vma=0xaaaaaaaa0000 0x0000aaaaaaf78330 0x0000aaaaaaf7635c 0x0000aaaaaaf78a3c 0x0000fffff7ffa7dc 0x0000fffff77ff200 0x0000fffff77ba67c 0x0000fffff77a7130 0x0000aaaaaaf0bfb0 0x0000aaaaab0da9a8 0x0000aaaaaafc09f4 0x0000aaaaaafc86bc 0x0000aaaaaafc84f0 0x0000aaaaaafaaa4c 0x0000aaaaaafaa78c 0x0000aaaaaafa9340 0x0000aaaaab018d90 0x0000aaaaaaf0916c 0x0000aaaaaaf0b054 0x0000aaaaaaf11dac 0x0000fffff77a73fc 0x0000fffff77a74cc 0x0000aaaaaaf08d70
llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
??:0:0

llvm::sys::RunSignalHandlers()
??:0:0

SignalHandler(int, siginfo_t*, void*)
Signals.cpp:0:0

??
??:0:0

??
??:0:0

??
??:0:0

??
??:0:0

llvm::support::detail::provider_format_adapter<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>::~provider_format_adapter()
lldb-gdbserver.cpp:0:0

RegisterInfoPOSIX_arm64::IsSVEReg(unsigned int) const
??:0:0

lldb_private::process_linux::NativeRegisterContextLinux_arm64::ConfigureRegisterContext()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStopped()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStoppedBySignal(unsigned int, siginfo_t const*)
??:0:0

lldb_private::process_linux::NativeProcessLinux::AddThread(unsigned long, bool)
??:0:0

lldb_private::process_linux::NativeProcessLinux::NativeProcessLinux(int, int, lldb_private::NativeProcessProtocol::NativeDelegate&, lldb_private::ArchSpec const&, lldb_private::process_linux::NativeProcessLinux::Manager&, llvm::ArrayRef<int>)
??:0:0

lldb_private::process_linux::NativeProcessLinux::Manager::Launch(lldb_private::ProcessLaunchInfo&, lldb_private::NativeProcessProtocol::NativeDelegate&)
??:0:0

lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS::LaunchProcess()
??:0:0

handle_launch(lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS&, llvm::ArrayRef<llvm::StringRef>)
??:0:0

main_gdbserver(int, char**)
??:0:0

main
??:0:0

??
??:0:0

??
??:0:0

_start
??:0:0

We crashed in something related to SVE registers. Probably we tried to lookup a register ID in an empty list, because lldb saw that we don't have SVE, and therefore didn't check for streaming SVE either.

@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/issue-subscribers-lldb

Author: David Spickett (DavidSpickett)

Relates to https://github.com//pull/135563

I'm running Arm's Foundation Model via shrinkwrap with the following added to the v9.5-a config:

$ git diff
diff --git a/config/arch/v9.5.yaml b/config/arch/v9.5.yaml
index 789e64f..fd29552 100644
--- a/config/arch/v9.5.yaml
+++ b/config/arch/v9.5.yaml
@@ -16,3 +16,16 @@ run:
   params:
     -C cluster0.has_arm_v9-5: 1
     -C cluster1.has_arm_v9-5: 1
+    -C cluster0.has_sve : 1
+    -C cluster1.has_sve : 1
+    -C cluster0.sve.has_sme2 : 0
+    -C cluster1.sve.has_sme2 : 0
+    -C cluster0.sve.has_sme : 1
+    -C cluster1.sve.has_sme : 1
+    -C cluster0.sve.has_sve2 : 1
+    -C cluster1.sve.has_sve2 : 1
+    -C cluster0.sve.sme_only : 1
+    -C cluster1.sve.sme_only : 1
+    -C cluster0.sve.has_sme_fa64: 1
+    -C cluster1.sve.has_sme_fa64: 1

Currently SME is disabled in the kernel but for issues that shouldn't break lldb, not like this anyway. So I have patched it back in:

$ git diff
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..27437f13154e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2285,7 +2285,6 @@ config ARM64_SME
        bool "ARM Scalable Matrix Extension support"
        default y
        depends on ARM64_SVE
-       depends on BROKEN
        help
          The Scalable Matrix Extension (SME) is an extension to the AArch64
          execution state which utilises a substantial subset of the SVE

The cpuinfo reported is:

Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimdd
p asimdfhm dit uscat ilrcpc flagm sb paca pacg gcs dcpodp flagm2 frint i8mm bf16 dgh rng bti ecv afp sme 
smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 cssc mops hbc poe

Note all the sme.* features and no sve.* features.

This means the processor has SME but you cannot access the SVE register file outside of streaming mode or use SVE instructions. This is probably what the Apple M4 has, but I don't have one to confirm exactly.

When I start lldb-server it crashes immediately:

$ ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
/usr/lib/gcc/aarch64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1045: reference std::vecto
r&lt;unsigned int&gt;::operator[](size_type) [_Tp = unsigned int, _Alloc = std::allocator&lt;unsigned int&gt;]: Asser
tion '__n &lt; this-&gt;size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrac
e.
Stack dump:
0.	Program arguments: ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var 
`LLVM_SYMBOLIZER_PATH` to point to it):
0  lldb-server     0x0000aaaaaaf78330
1  lldb-server     0x0000aaaaaaf7635c
2  lldb-server     0x0000aaaaaaf78a3c
3  linux-vdso.so.1 0x0000fffff7ffa7dc __kernel_rt_sigreturn + 0
4  libc.so.6       0x0000fffff77ff200
5  libc.so.6       0x0000fffff77ba67c raise + 28
6  libc.so.6       0x0000fffff77a7130 abort + 228
7  lldb-server     0x0000aaaaaaf0bfb0
8  lldb-server     0x0000aaaaab0da9a8
9  lldb-server     0x0000aaaaaafc09f4
10 lldb-server     0x0000aaaaaafc86bc
11 lldb-server     0x0000aaaaaafc84f0
12 lldb-server     0x0000aaaaaafaaa4c
13 lldb-server     0x0000aaaaaafaa78c
14 lldb-server     0x0000aaaaaafa9340
15 lldb-server     0x0000aaaaab018d90
16 lldb-server     0x0000aaaaaaf0916c
17 lldb-server     0x0000aaaaaaf0b054
18 lldb-server     0x0000aaaaaaf11dac
19 libc.so.6       0x0000fffff77a73fc
20 libc.so.6       0x0000fffff77a74cc __libc_start_main + 152
21 lldb-server     0x0000aaaaaaf08d70
Aborted

If I symbolise this:

$ ./bin/llvm-symbolizer --obj=./bin/lldb-server --adjust-vma=0xaaaaaaaa0000 0x0000aaaaaaf78330 0x0000aaaaaaf7635c 0x0000aaaaaaf78a3c 0x0000fffff7ffa7dc 0x0000fffff77ff200 0x0000fffff77ba67c 0x0000fffff77a7130 0x0000aaaaaaf0bfb0 0x0000aaaaab0da9a8 0x0000aaaaaafc09f4 0x0000aaaaaafc86bc 0x0000aaaaaafc84f0 0x0000aaaaaafaaa4c 0x0000aaaaaafaa78c 0x0000aaaaaafa9340 0x0000aaaaab018d90 0x0000aaaaaaf0916c 0x0000aaaaaaf0b054 0x0000aaaaaaf11dac 0x0000fffff77a73fc 0x0000fffff77a74cc 0x0000aaaaaaf08d70
llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int)
??:0:0

llvm::sys::RunSignalHandlers()
??:0:0

SignalHandler(int, siginfo_t*, void*)
Signals.cpp:0:0

??
??:0:0

??
??:0:0

??
??:0:0

??
??:0:0

llvm::support::detail::provider_format_adapter&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt;&gt;&gt;::~provider_format_adapter()
lldb-gdbserver.cpp:0:0

RegisterInfoPOSIX_arm64::IsSVEReg(unsigned int) const
??:0:0

lldb_private::process_linux::NativeRegisterContextLinux_arm64::ConfigureRegisterContext()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStopped()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStoppedBySignal(unsigned int, siginfo_t const*)
??:0:0

lldb_private::process_linux::NativeProcessLinux::AddThread(unsigned long, bool)
??:0:0

lldb_private::process_linux::NativeProcessLinux::NativeProcessLinux(int, int, lldb_private::NativeProcessProtocol::NativeDelegate&amp;, lldb_private::ArchSpec const&amp;, lldb_private::process_linux::NativeProcessLinux::Manager&amp;, llvm::ArrayRef&lt;int&gt;)
??:0:0

lldb_private::process_linux::NativeProcessLinux::Manager::Launch(lldb_private::ProcessLaunchInfo&amp;, lldb_private::NativeProcessProtocol::NativeDelegate&amp;)
??:0:0

lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS::LaunchProcess()
??:0:0

handle_launch(lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS&amp;, llvm::ArrayRef&lt;llvm::StringRef&gt;)
??:0:0

main_gdbserver(int, char**)
??:0:0

main
??:0:0

??
??:0:0

??
??:0:0

_start
??:0:0

We crashed in something related to SVE registers. Probably we tried to lookup a register ID in an empty list, because lldb saw that we don't have SVE, and therefore didn't check for streaming SVE either.

@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/issue-subscribers-backend-aarch64

Author: David Spickett (DavidSpickett)

Relates to https://github.com//pull/135563

I'm running Arm's Foundation Model via shrinkwrap with the following added to the v9.5-a config:

$ git diff
diff --git a/config/arch/v9.5.yaml b/config/arch/v9.5.yaml
index 789e64f..fd29552 100644
--- a/config/arch/v9.5.yaml
+++ b/config/arch/v9.5.yaml
@@ -16,3 +16,16 @@ run:
   params:
     -C cluster0.has_arm_v9-5: 1
     -C cluster1.has_arm_v9-5: 1
+    -C cluster0.has_sve : 1
+    -C cluster1.has_sve : 1
+    -C cluster0.sve.has_sme2 : 0
+    -C cluster1.sve.has_sme2 : 0
+    -C cluster0.sve.has_sme : 1
+    -C cluster1.sve.has_sme : 1
+    -C cluster0.sve.has_sve2 : 1
+    -C cluster1.sve.has_sve2 : 1
+    -C cluster0.sve.sme_only : 1
+    -C cluster1.sve.sme_only : 1
+    -C cluster0.sve.has_sme_fa64: 1
+    -C cluster1.sve.has_sme_fa64: 1

Currently SME is disabled in the kernel but for issues that shouldn't break lldb, not like this anyway. So I have patched it back in:

$ git diff
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..27437f13154e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2285,7 +2285,6 @@ config ARM64_SME
        bool "ARM Scalable Matrix Extension support"
        default y
        depends on ARM64_SVE
-       depends on BROKEN
        help
          The Scalable Matrix Extension (SME) is an extension to the AArch64
          execution state which utilises a substantial subset of the SVE

The cpuinfo reported is:

Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimdd
p asimdfhm dit uscat ilrcpc flagm sb paca pacg gcs dcpodp flagm2 frint i8mm bf16 dgh rng bti ecv afp sme 
smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 cssc mops hbc poe

Note all the sme.* features and no sve.* features.

This means the processor has SME but you cannot access the SVE register file outside of streaming mode or use SVE instructions. This is probably what the Apple M4 has, but I don't have one to confirm exactly.

When I start lldb-server it crashes immediately:

$ ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
/usr/lib/gcc/aarch64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1045: reference std::vecto
r&lt;unsigned int&gt;::operator[](size_type) [_Tp = unsigned int, _Alloc = std::allocator&lt;unsigned int&gt;]: Asser
tion '__n &lt; this-&gt;size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrac
e.
Stack dump:
0.	Program arguments: ./lldb-server gdbserver 0.0.0.0:1234 -- /tmp/test.o
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var 
`LLVM_SYMBOLIZER_PATH` to point to it):
0  lldb-server     0x0000aaaaaaf78330
1  lldb-server     0x0000aaaaaaf7635c
2  lldb-server     0x0000aaaaaaf78a3c
3  linux-vdso.so.1 0x0000fffff7ffa7dc __kernel_rt_sigreturn + 0
4  libc.so.6       0x0000fffff77ff200
5  libc.so.6       0x0000fffff77ba67c raise + 28
6  libc.so.6       0x0000fffff77a7130 abort + 228
7  lldb-server     0x0000aaaaaaf0bfb0
8  lldb-server     0x0000aaaaab0da9a8
9  lldb-server     0x0000aaaaaafc09f4
10 lldb-server     0x0000aaaaaafc86bc
11 lldb-server     0x0000aaaaaafc84f0
12 lldb-server     0x0000aaaaaafaaa4c
13 lldb-server     0x0000aaaaaafaa78c
14 lldb-server     0x0000aaaaaafa9340
15 lldb-server     0x0000aaaaab018d90
16 lldb-server     0x0000aaaaaaf0916c
17 lldb-server     0x0000aaaaaaf0b054
18 lldb-server     0x0000aaaaaaf11dac
19 libc.so.6       0x0000fffff77a73fc
20 libc.so.6       0x0000fffff77a74cc __libc_start_main + 152
21 lldb-server     0x0000aaaaaaf08d70
Aborted

If I symbolise this:

$ ./bin/llvm-symbolizer --obj=./bin/lldb-server --adjust-vma=0xaaaaaaaa0000 0x0000aaaaaaf78330 0x0000aaaaaaf7635c 0x0000aaaaaaf78a3c 0x0000fffff7ffa7dc 0x0000fffff77ff200 0x0000fffff77ba67c 0x0000fffff77a7130 0x0000aaaaaaf0bfb0 0x0000aaaaab0da9a8 0x0000aaaaaafc09f4 0x0000aaaaaafc86bc 0x0000aaaaaafc84f0 0x0000aaaaaafaaa4c 0x0000aaaaaafaa78c 0x0000aaaaaafa9340 0x0000aaaaab018d90 0x0000aaaaaaf0916c 0x0000aaaaaaf0b054 0x0000aaaaaaf11dac 0x0000fffff77a73fc 0x0000fffff77a74cc 0x0000aaaaaaf08d70
llvm::sys::PrintStackTrace(llvm::raw_ostream&amp;, int)
??:0:0

llvm::sys::RunSignalHandlers()
??:0:0

SignalHandler(int, siginfo_t*, void*)
Signals.cpp:0:0

??
??:0:0

??
??:0:0

??
??:0:0

??
??:0:0

llvm::support::detail::provider_format_adapter&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt;&gt;&gt;::~provider_format_adapter()
lldb-gdbserver.cpp:0:0

RegisterInfoPOSIX_arm64::IsSVEReg(unsigned int) const
??:0:0

lldb_private::process_linux::NativeRegisterContextLinux_arm64::ConfigureRegisterContext()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStopped()
??:0:0

lldb_private::process_linux::NativeThreadLinux::SetStoppedBySignal(unsigned int, siginfo_t const*)
??:0:0

lldb_private::process_linux::NativeProcessLinux::AddThread(unsigned long, bool)
??:0:0

lldb_private::process_linux::NativeProcessLinux::NativeProcessLinux(int, int, lldb_private::NativeProcessProtocol::NativeDelegate&amp;, lldb_private::ArchSpec const&amp;, lldb_private::process_linux::NativeProcessLinux::Manager&amp;, llvm::ArrayRef&lt;int&gt;)
??:0:0

lldb_private::process_linux::NativeProcessLinux::Manager::Launch(lldb_private::ProcessLaunchInfo&amp;, lldb_private::NativeProcessProtocol::NativeDelegate&amp;)
??:0:0

lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS::LaunchProcess()
??:0:0

handle_launch(lldb_private::process_gdb_remote::GDBRemoteCommunicationServerLLGS&amp;, llvm::ArrayRef&lt;llvm::StringRef&gt;)
??:0:0

main_gdbserver(int, char**)
??:0:0

main
??:0:0

??
??:0:0

??
??:0:0

_start
??:0:0

We crashed in something related to SVE registers. Probably we tried to lookup a register ID in an empty list, because lldb saw that we don't have SVE, and therefore didn't check for streaming SVE either.

@DavidSpickett
Copy link
Collaborator Author

Note that my kernel does include https://lore.kernel.org/linux-arm-kernel/[email protected]/ which fixes the CPU info, I have not tried an older kernel yet.

@DavidSpickett
Copy link
Collaborator Author

Relevant quotes from the ARMARM:

If FEAT_SVE or FEAT_SME is implemented, an SVE scalable vector register file and an SVE scalable predicate
register file.

If FEAT_SME is implemented, this does not imply that FEAT_SVE and FEAT_SVE2 are implemented when the PE is
not in Streaming SVE mode.

So you can also think of this as "streaming SVE only", but the kernel has decided not to show it that way in the cpuinfo. I'm not sure myself what the proper term is, it's like there is a split between the SVE register state and FEAT_SVE the instructions themselves.

Anyway, it is a legal configuration that's my point.

@DavidSpickett DavidSpickett changed the title lldb-server crashes on AArch64 which has SME but not SVE lldb-server crashes on startup on AArch64 which has SME but not SVE May 6, 2025
@DavidSpickett
Copy link
Collaborator Author

#135563 get us further:

$ ./bin/lldb
(lldb) settings set plugin.process.gdb-remote.packet-timeout 120
(lldb) gdb-remote localhost:1234^C
(lldb) file /tmp/test.o
Current executable set to '/tmp/test.o' (aarch64).
(lldb) gdb-remote localhost:1234
Process 490 stopped
* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000fffff7fd9c00 ld-linux-aarch64.so.1`_dl_help [inlined] print_hwcaps_subdirectories(sta
te=0x0000000000000000) at dl-usage.c:192:5
(lldb) b main
Breakpoint 1: where = test.o`main at test.c:2:10, address = 0x0000aaaaaaaa0714
(lldb) c
Process 490 resuming
Process 490 stopped
* thread #1, name = 'test.o', stop reason = breakpoint 1.1
    frame #0: 0x0000aaaaaaaa0714 test.o`main at test.c:2:10
   1   	int main() {
-> 2   	  return 0;
   3   	}
(lldb) register read --all
General Purpose Registers:
        x0 = 0x0000000000000001
        x1 = 0x0000fffffffffb58
<...>

Floating Point Registers:
      fpsr = 0x00000000
      fpcr = 0x00000000
96 registers were unavailable.

Scalable Vector Extension Registers:
50 registers were unavailable.

<...>

Scalable Matrix Extension Registers:
      svcr = 0x0000000000000000
       svg = 0x0000000000000004
        za = {0x00 <...>}

Guarded Control Stack Registers:
  gcs_features_enabled = 0x0000000000000000
  gcs_features_locked = 0x0000000000000000
  gcspr_el0 = 0x0000000000000000

This is an example program that does not enter streaming mode, so the SVE registers not being available makes sense, though I don't know that it isn't an error on our part too.

What's unexpected is the FP regs are missing. If we had SVE we would read them via the SVE context, so I expect we aren't falling back to the normal ptrace context for FP here.

Will try an example with streaming mode enabled too.

@DavidSpickett
Copy link
Collaborator Author

Entering streaming mode is more weird than I expected:

(lldb) c
jProcess 364 resuming
Process 364 stopped
* thread #1, name = 'test.o', stop reason = breakpoint 1.1
    frame #0: 0x0000aaaaaaaa0714 test.o`main at test.c:2:3
   1   	int main() {
-> 2   	  asm volatile("msr  s0_3_c4_c7_3, xzr" /*smstart*/);
   3   	  return 0;
   4   	}
(lldb) register read z0 v0 svcr
z0           = error: unavailable
v0           = error: unavailable
    svcr = 0x0000000000000000
         = (ZA = 0, SM = 0)
(lldb) n
Process 364 stopped
* thread #1, name = 'test.o', stop reason = step over
    frame #0: 0x0000aaaaaaaa0718 test.o`main at test.c:3:10
   1   	int main() {
   2   	  asm volatile("msr  s0_3_c4_c7_3, xzr" /*smstart*/);
-> 3   	  return 0;
   4   	}
(lldb) register read z0 v0 svcr
z0           = error: unavailable
v0           = error: unavailable
    svcr = 0x0000000000000002
         = (ZA = 1, SM = 0)

The before state makes some sense. We know that we have a form of SVE, so we're trying to use the SVE context to read the fp registers (I forget the reason for this, but it's what the kernel wants us to do). Of course that is going to return nothing, the fix will be to use the normal fp context.

LLDB also doesn't support registers appearing and disappearing at runtime so even when we fix this, the normal SVE registers will be unavailable outside of streaming mode (this is also why a disabled za register reports as all 0s).

smstart by default enables streaming mode and ZA, we see ZA enabled but not streaming mode. This is likely because we don't get svcr from a ptrace context (none of them contain it), but we derive it from what we are allowed to read from ptrace. lldb is likely confused about what SVE it can read, so SM is not set, even though we know the program is now in streaming mode.

Shouldn't be too difficult to fix if you can debug lldb-server itself remotely, and I want to see if I can get the tests for fp reading when SVE is present working in this "SME only" configuration.

I can work on this but I have to look at recent kernel changes for SME (https://lists.infradead.org/pipermail/linux-arm-kernel/2025-May/1025549.html) first, and will be unavailable next week as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants