Skip to content

[DebugInfo][ConstraintElimination] Potential debug value loss in replacing comparisons with the speculated constants #135736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Apochens opened this issue Apr 15, 2025 · 8 comments · Fixed by #136839

Comments

@Apochens
Copy link
Contributor

In ConstraintElimination, the pass first collects instructions that could be replaced with a speculated constant in function checkAndReplaceCondition(), and then it removes the collected instruction in ConstraintElimination-L1964.

1963  for (Instruction *I : ToRemove)
1964    I->eraseFromParent();

However, I found that this process leads to poison debug values in the optimzied IR. Here is an example where the ConstraintElimination is performed on an LLVM IR function in the regression test (and.ll) of the pass. The debug information is obtained by opt -passes=debugify. Poison debug values are produced in the basic block bb1:

; Before the optimization
bb1:                                              ; preds = %entry
  %t.1 = icmp ule i4 %x, %z, !dbg !31
    #dbg_value(i1 %t.1, !13, !DIExpression(), !31)
  %t.2 = icmp ule i4 %x, %y, !dbg !32
    #dbg_value(i1 %t.2, !14, !DIExpression(), !32)
  %r.1 = xor i1 %t.1, %t.2, !dbg !33
; After the optimization
bb1:                                              ; preds = %entry
    #dbg_value(i1 poison, !13, !DIExpression(), !31)
    #dbg_value(i1 poison, !14, !DIExpression(), !32)
  %r.1 = xor i1 true, true, !dbg !33

I think these poison debug values, which corresponds to the erased icmp instructions, could be prevented according to rules-for-updating-debug-values. Specifically, we can use either salvageDebugInfo() (i.e., using the DIExpression) or replaceAllDbgUsesWith() (i.e., refering the debug value to the speculated constrant) to save the debug value information for the erased imp instructions.

Moreover, could this issue be caused by the missed debug information salvage in Value::replaceUsesWithIf() used in checkAndReplaceCondition()? (Value::replaceAllUsesWith handles the debug value information for the replaced instruction.)

@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/issue-subscribers-debuginfo

Author: Shan Huang (Apochens)

In ConstraintElimination, the pass first collects instructions that could be replaced with a speculated constant in function [`checkAndReplaceCondition()`](https://github.com/llvm/llvm-project/blob/fe54d1afcca055f464840654dd2ec3fd83aea688/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp#L1431), and then it removes the collected instruction in [ConstraintElimination-L1964](https://github.com/llvm/llvm-project/blob/fe54d1afcca055f464840654dd2ec3fd83aea688/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp#L1964). ```c++ 1963 for (Instruction *I : ToRemove) 1964 I->eraseFromParent(); ```

However, I found that this process leads to poison debug values in the optimzied IR. Here is an example where the ConstraintElimination is performed on an LLVM IR function in the regression test (and.ll) of the pass. The debug information is obtained by opt -passes=debugify. Poison debug values are produced in the basic block bb1:

; Before the optimization
bb1:                                              ; preds = %entry
  %t.1 = icmp ule i4 %x, %z, !dbg !31
    #dbg_value(i1 %t.1, !13, !DIExpression(), !31)
  %t.2 = icmp ule i4 %x, %y, !dbg !32
    #dbg_value(i1 %t.2, !14, !DIExpression(), !32)
  %r.1 = xor i1 %t.1, %t.2, !dbg !33
; After the optimization
bb1:                                              ; preds = %entry
    #dbg_value(i1 poison, !13, !DIExpression(), !31)
    #dbg_value(i1 poison, !14, !DIExpression(), !32)
  %r.1 = xor i1 true, true, !dbg !33

I think these poison debug values, which corresponds to the erased icmp instructions, could be prevented according to rules-for-updating-debug-values. Specifically, we can use either salvageDebugInfo() (i.e., using the DIExpression) or replaceAllDbgUsesWith() (i.e., refering the debug value to the speculated constrant) to save the debug value information for the erased imp instructions.

Moreover, could this issue be caused by the missed debug information salvage in Value::replaceUsesWithIf() used in checkAndReplaceCondition()? (Value::replaceAllUsesWith handles the debug value information for the replaced instruction.)

@OCHyams
Copy link
Contributor

OCHyams commented Apr 17, 2025

Thanks for the detailed report @Apochens. I agree that the debug locations looks like they could be preserved, and that you have identified an awkward edge in that replaceUsesWithIf doesn't look at debug info uses. Salvaging seems like a decent broad solution, and perhaps all instructions erased in the ToRemove loop should get salvaged first (that needs double checking - I haven't dived into the code, but it seems sensible on the surface).

replaceAllDbgUsesWith() (i.e., refering the debug value to the speculated constrant) to save the debug value information for the erased imp instructions.

If we know the value can be replaced with a constant, that seems preferable to avoid unnecessarily long or complex DIExpressions, but I wonder if replaceAllDbgUsesWith would be too general (I haven't looked too closely at constraint elmination's code - it might make sense to use findDbgUsers and replace the uses if the users fulfill the same conditions specified in the replaceUsesWithIf - but again, I'm not 100% sure without a closer look).

@Apochens
Copy link
Contributor Author

Salvaging seems like a decent broad solution, and perhaps all instructions erased in the ToRemove loop should get salvaged first (that needs double checking - I haven't dived into the code, but it seems sensible on the surface).

@OCHyams Through my investigation into the code, other instructions in ToRemove have called Value::replaceAllUsesWith() in the code, which handles their debug values. However, the instructions collected in the function checkAndReplaceCondition() call Value::replaceUsesWithIf(), which cannot handle their debug values, resulting in the poison debug values.

it might make sense to use findDbgUsers and replace the uses if the users fulfill the same conditions specified in the replaceUsesWithIf - but again, I'm not 100% sure without a closer look).

If you could give me a reference example, I will try to see how to fix the issue. : )

@OCHyams
Copy link
Contributor

OCHyams commented Apr 17, 2025

If you could give me a reference example, I will try to see how to fix the issue. : )

Here we go, I modified the example you provided:

source_filename = "temp.ll"

define i1 @test_and_ule(i4 %x, i4 %y, i4 %z) !dbg !5 {
entry:
  %c.1 = icmp ule i4 %x, %y, !dbg !11
  %c.2 = icmp ule i4 %y, %z, !dbg !12
  %t.1 = icmp ule i4 %x, %z, !dbg !13
  %and = and i1 %c.1, %c.2, !dbg !14
  br i1 %and, label %bb1, label %exit, !dbg !15

bb1:                                              ; preds = %entry
    #dbg_value(i1 %t.1, !9, !DIExpression(), !13)
  %r.1 = xor i1 %t.1, %t.1, !dbg !16
  br label %exit

exit:                                             ; preds = %bb1, %entry
    #dbg_value(i1 %t.1, !17, !DIExpression(), !13)
  ret i1 %t.1, !dbg !18
}

!llvm.dbg.cu = !{!0}
!llvm.debugify = !{!2, !3}
!llvm.module.flags = !{!4}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "temp.ll", directory: "/")
!2 = !{i32 20}
!3 = !{i32 17}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = distinct !DISubprogram(name: "test_and_ule", linkageName: "test_and_ule", scope: null, file: !1, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !8)
!6 = !DISubroutineType(types: !7)
!7 = !{}
!8 = !{!9}
!9 = !DILocalVariable(name: "4", scope: !5, file: !1, line: 5, type: !10)
!10 = !DIBasicType(name: "ty8", size: 8, encoding: DW_ATE_unsigned)
!11 = !DILocation(line: 1, column: 1, scope: !5)
!12 = !DILocation(line: 2, column: 1, scope: !5)
!13 = !DILocation(line: 5, column: 1, scope: !5)
!14 = !DILocation(line: 3, column: 1, scope: !5)
!15 = !DILocation(line: 4, column: 1, scope: !5)
!16 = !DILocation(line: 7, column: 1, scope: !5)
!17 = !DILocalVariable(name: "6", scope: !5, file: !1, line: 7, type: !10)
!18 = !DILocation(line: 20, column: 1, scope: !5)

The pass finds that %t.1 is true through bb1, so the dbg_value can become true. But the second dbg_value using %t.1 shouldn't be changed to true because we don't know the value of %t.1 in exit.

@Apochens
Copy link
Contributor Author

@OCHyams It seems that your example would not be optimized by ConstraintElimination, which only replaces the condition with the constant when the condition can be speculated. So, I refined it and here is the revised example.

Based on that example, I propose to fix this issue as shown in the following diff. For the instructions to remove, I will get their debug value records through findDbgUsers. For a record, I then check the DFSNumIn and DFSNumOut of the basic block where the record resides according to the same conditions specified in the replaceUsesWithIf(). Note that the ContextInst condition is not checked because the record do not have the corresponding context instruction.

diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index ad41ea735..d506809f1 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -39,6 +39,7 @@
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/ValueMapper.h"
+#include "llvm/IR/DebugInfo.h"
 
 #include <optional>
 #include <string>
@@ -1456,8 +1457,22 @@ static bool checkAndReplaceCondition(
       return ShouldReplace;
     });
     NumCondsRemoved++;
-    if (Cmp->use_empty())
+    if (Cmp->use_empty()) {
+      // Update DebugValueRecords
+      SmallVector<DbgVariableIntrinsic *> DbgUsers;
+      SmallVector<DbgVariableRecord *> DVRUsers;
+      findDbgUsers(DbgUsers, Cmp, &DVRUsers);
+
+      for (auto *DVR: DVRUsers) {
+        auto *DTN = DT.getNode(DVR->getParent());
+        if (!DTN || DTN->getDFSNumIn() < NumIn || DTN->getDFSNumOut() > NumOut)
+          continue;
+
+        DVR->replaceVariableLocationOp(Cmp, ConstantC);
+      }
+
       ToRemove.push_back(Cmp);
+    }
     return Changed;
   };

Applying the patched ContraintElimination, the poison value in the debug value record becomes true as expected. Is this the correct way to fix it?

bb1:                                              ; preds = %entry
    #dbg_value(i1 true, !13, !DIExpression(), !20)
  %r.1 = xor i1 true, true, !dbg !21
    #dbg_value(i1 %r.1, !14, !DIExpression(), !21)
  br label %exit, !dbg !22

@OCHyams
Copy link
Contributor

OCHyams commented Apr 22, 2025

Sorry about my test case - I've modified it so that %t.1 is no longer used by the ret, which now demonstrates my point: https://godbolt.org/z/vEschh5rW

Both dbg_values become poison at the moment. With your fix the first dbg_value should be i1 true, but the second should still be poison.

source_filename = "temp.ll"

define i1 @test_and_ule(i4 %x, i4 %y, i4 %z) !dbg !5 {
entry:
  %c.1 = icmp ule i4 %x, %y, !dbg !11
  %c.2 = icmp ule i4 %y, %z, !dbg !12
  %t.1 = icmp ule i4 %x, %z, !dbg !13
  %and = and i1 %c.1, %c.2, !dbg !14
  br i1 %and, label %bb1, label %exit, !dbg !15

bb1:                                              ; preds = %entry
    #dbg_value(i1 %t.1, !9, !DIExpression(), !13)
  %r.1 = xor i1 %t.1, %t.1, !dbg !16
  br label %exit

exit:                                             ; preds = %bb1, %entry
    #dbg_value(i1 %t.1, !17, !DIExpression(), !13)
  ret i1 true, !dbg !18
}

!llvm.dbg.cu = !{!0}
!llvm.debugify = !{!2, !3}
!llvm.module.flags = !{!4}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "temp.ll", directory: "/")
!2 = !{i32 20}
!3 = !{i32 17}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = distinct !DISubprogram(name: "test_and_ule", linkageName: "test_and_ule", scope: null, file: !1, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !8)
!6 = !DISubroutineType(types: !7)
!7 = !{}
!8 = !{!9}
!9 = !DILocalVariable(name: "4", scope: !5, file: !1, line: 5, type: !10)
!10 = !DIBasicType(name: "ty8", size: 8, encoding: DW_ATE_unsigned)
!11 = !DILocation(line: 1, column: 1, scope: !5)
!12 = !DILocation(line: 2, column: 1, scope: !5)
!13 = !DILocation(line: 5, column: 1, scope: !5)
!14 = !DILocation(line: 3, column: 1, scope: !5)
!15 = !DILocation(line: 4, column: 1, scope: !5)
!16 = !DILocation(line: 7, column: 1, scope: !5)
!17 = !DILocalVariable(name: "6", scope: !5, file: !1, line: 7, type: !10)
!18 = !DILocation(line: 20, column: 1, scope: !5)

I think the code looks about right. Does this case work as expected?

@Apochens
Copy link
Contributor Author

@OCHyams Thanks for the correction! I have tried optimizing the modified example with the patched ConstraintElimination, and the result is shown in the following. As expected, the first dbg_value becomes i1 true, while the second one is poison.

define i1 @test_and_ule(i4 %x, i4 %y, i4 %z) !dbg !5 {
entry:
  %c.1 = icmp ule i4 %x, %y, !dbg !11
  %c.2 = icmp ule i4 %y, %z, !dbg !12
  %and = and i1 %c.1, %c.2, !dbg !13
  br i1 %and, label %bb1, label %exit, !dbg !14

bb1:                                              ; preds = %entry
    #dbg_value(i1 true, !9, !DIExpression(), !15)
  %r.1 = xor i1 true, true, !dbg !16
  br label %exit

exit:                                             ; preds = %bb1, %entry
    #dbg_value(i1 poison, !17, !DIExpression(), !15)
  ret i1 true, !dbg !18
}

!llvm.dbg.cu = !{!0}
!llvm.debugify = !{!2, !3}
!llvm.module.flags = !{!4}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "temp.ll", directory: "/")
!2 = !{i32 20}
!3 = !{i32 17}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = distinct !DISubprogram(name: "test_and_ule", linkageName: "test_and_ule", scope: null, file: !1, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !8)
!6 = !DISubroutineType(types: !7)
!7 = !{}
!8 = !{!9}
!9 = !DILocalVariable(name: "4", scope: !5, file: !1, line: 5, type: !10)
!10 = !DIBasicType(name: "ty8", size: 8, encoding: DW_ATE_unsigned)
!11 = !DILocation(line: 1, column: 1, scope: !5)
!12 = !DILocation(line: 2, column: 1, scope: !5)
!13 = !DILocation(line: 3, column: 1, scope: !5)
!14 = !DILocation(line: 4, column: 1, scope: !5)
!15 = !DILocation(line: 5, column: 1, scope: !5)
!16 = !DILocation(line: 7, column: 1, scope: !5)
!17 = !DILocalVariable(name: "6", scope: !5, file: !1, line: 7, type: !10)
!18 = !DILocation(line: 20, column: 1, scope: !5)

I think I can give a PR for this issue, but I'm not sure how to construct the regression test. Should I use this example along with checks as the regression test?

@OCHyams
Copy link
Contributor

OCHyams commented Apr 23, 2025

I think I can give a PR for this issue, but I'm not sure how to construct the regression test. Should I use this example along with checks as the regression test?

Great, yes that would do - I'm happy to help with and refine it in the pull request.

Apochens added a commit to Apochens/llvm-project that referenced this issue Apr 23, 2025
Apochens added a commit that referenced this issue May 5, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this issue May 6, 2025
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this issue May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants