16.Debugging
16.Debugging
Programming
15. Debugging and Testing
Federico Busato
2024-11-05
Table of Contents
1 Debugging Overview
2 Assertions
3 Execution Debugging
Breakpoints
Watchpoints / Catchpoints
Control Flow
Stack and Info
Print
Disassemble
std::breakpoint 1/82
Table of Contents
4 Memory Debugging
valgrind
5 Hardening Techniques
Stack Usage
Standard Library Checks
Undefined Behavior Protections
Control Flow Protections
2/82
Table of Contents
6 Sanitizers
Address Sanitizer
Leak Sanitizer
Memory Sanitizers
Undefined Behavior Sanitizer
Sampling-Based Sanitizer
7 Debugging Summary
8 Compiler Warnings
3/82
Table of Contents
9 Static Analysis
10 Code Testing
Unit Testing
Test-Driven Development (TDD)
Code Coverage
Fuzz Testing
11 Code Quality
clang-tidy
4/82
Feature Complete
5/82
Debugging Overview
Is this a bug?
7/82
Cost of Software Defects 1/2
8/82
Cost of Software Defects 2/2
Some examples:
• The Millennium Bug (2000): $100 billion
• The Morris Worm (1988): $10 million (single student)
• Ariane 5 (1996): $370 million
• Knight’s unintended trades (2012): $440 million
• Bitcoin exchange error (2011): $1.5 million
• Pentium FDIV Bug (1994): $475 million
• Boeing 737 MAX (2019): $3.9 million
see also:
11 of the most costly software errors in history
Historical Software Accidents and Errors
List of software bugs 9/82
Types of Software Defects
• C++ is very error prone language, see 60 terrible tips for a C++
developer
• Human behavior, e.g. copying & pasting code is very common practice and can
introduce subtle bugs → check the code carefully, deep understanding of its
behavior
11/82
Program Errors
Static Analysis A proactive strategy that examines the source code for (potential)
errors.
Techniques: Warnings, static analysis tool, compile-time checks
Limitations: Turing’s undecidability theorem, exponential code paths
13/82
Assertions
Unrecoverable Errors and Assertions
14/82
Assertion
template<typename T>
T sqrt(T value) {
static_assert(std::is_arithmetic_v<T>, // precondition
"T must be an arithmetic type");
assert(std::is_finite(value) && value >= 0); // precondition
int ret = ... // sqrt computation
assert(std::is_finite(value) && ret >= 0 && // postcondition
(ret == 0 || ret == 1 || ret < value));
return ret;
}
15/82
Assertion
Assertions may slow down the execution. They can be disable by define the NDEBUG
macro
# define NDEBUG // or with the flag "-DNDEBUG"
Additionally, MSVC defines the DEBUG macro when the /MTd or /MDd flags are
provided to select the debug version of the C run-time library
16/82
Assertion Enhancements 1/2
The library provides the BOOST ASSERT(expr) macro which is mapped to the
following function (to implement and customize)
void boost::assertion_failed(
const char* expr, // failed expression
const char* function, // function name of the failed assertion
const char* file, // file name of the failed assertion
long line); // line number of the failed assertion
17/82
Assertion Enhancements 2/2
0# bar(int) at /path/to/source/file.cpp:70
1# bar(int) at /path/to/source/file.cpp:70
2# bar(int) at /path/to/source/file.cpp:70
3# bar(int) at /path/to/source/file.cpp:70
4# main at /path/to/main.cpp:93
5# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
6# _start
18/82
Execution
Debugging
Execution Debugging (gdb) 1/2
-O0 Disable any code optimization for helping the debugger. It is implicit for most
compilers
-g Enable debugging
- stores the symbol table information in the executable (mapping between assembly
and source code lines)
- for some compilers, it may disable certain optimizations
- slow down the compilation phase and the execution
-g3 Produces enhanced debugging information, e.g. macro definitions. Available for
most compilers. Suggested instead of -g 19/82
Execution Debugging (gdb) 2/2
Additional flags:
-ggdb3 Generate specific debugging information for gdb.
Equivalent to -g3 with gcc
21/82
gdb - Watchpoints / Catchpoints
22/82
gdb - Control Flow
25/82
gdb - Disassemble
Command Description
examine address
n number of elements,
x/nfu <address>
f format (d: int, f: float, etc.),
u data size (b: byte, w: word, etc.)
26/82
std::breakpoint
C++26 provides the <debugging> library, which allows interaction with a debugger
directly from the source code, without relying on platform-specific intrinsic instructions
27/82
gdb - Notes
Terms like buffer overflow, race condition, page fault, null pointer, stack exhaustion,
heap exhaustion/corruption, use-after-free, or double free – all describe memory
safety vulnerabilities
Mitigation:
• Run-time check
• Static analysis
• Avoid unsafe language constructs
31/82
valgrind 1/9
$ wget ftp://sourceware.org/pub/valgrind/valgrind-3.21.tar.bz2
$ tar xf valgrind-3.21.tar.bz2
$ cd valgrind-3.21
$ ./configure --enable-lto
$ make -j 12
$ sudo make install
$ sudo apt install libc6-dbg #if needed
some linux distributions provide the package through apt install valgrid , but it could be an old version
32/82
valgrind 2/9
Basic usage:
• compile with -g
Output example 1:
==60127== Invalid read of size 4 !!out-of-bound access
==60127== at 0x100000D9E: f(int) (main.cpp:86)
==60127== by 0x100000C22: main (main.cpp:40)
==60127== Address 0x10042c148 is 0 bytes after a block of size 40 alloc'd
==60127== at 0x1000161EF: malloc (vg_replace_malloc.c:236)
==60127== by 0x100000C88: f(int) (main.cpp:75)
==60127== by 0x100000C22: main (main.cpp:40)
33/82
valgrind 3/9
Output example 2:
!!memory leak
==19182== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==19182== at 0x1B8FF5CD: malloc (vg_replace_malloc.c:130)
==19182== by 0x8048385: f (main.cpp:5)
==19182== by 0x80483AB: main (main.cpp:11)
• Definitely lost
• Indirectly lost
• Still reachable
• Possibly lost
When a program terminates, it releases all heap memory allocations. Despite this,
leaving memory leaks is considered a bad practice and makes the program unsafe with
respect to multiple internal iterations of a functionality. If a program has memory leaks
for a single iteration, is it safe for multiple iterations?
A robust program prevents any memory leak even when abnormal conditions occur
35/82
valgrind 5/9
Definitely lost indicates blocks that are not deleted at the end of the program (return
from the main() function). The common case is local variables pointing to newly
allocated heap memory
void f() {
int* y = new int[3]; // 12 bytes definitely lost
}
int main() {
int* x = new int[10]; // 40 bytes definitely lost
f();
}
36/82
valgrind 6/9
Indirectly lost indicates blocks pointed by other heap variables that are not deleted.
The common case is global variables pointing to newly allocated heap memory
struct A {
int* array;
};
int main() {
A* x = new A; // 8 bytes definitely lost
x->array = new int[4]; // 16 bytes indirectly lost
}
37/82
valgrind 7/9
Still reachable indicates blocks that are not deleted but they are still reachable at the
end of the program
int* array;
int main() {
array = new int[3];
}
// 12 bytes still reachable (global static class could delete it)
# include <cstdlib>
int main() {
int* array = new int[3];
std::abort(); // early abnormal termination
// 12 bytes still reachable
... // maybe it is delete here
}
38/82
valgrind 8/9
Possibly lost indicates blocks that are still reachable but pointer arithmetic makes the
deletion more complex, or even not possible
# include <cstdlib>
int main() {
int* array = new int[3];
array++; // pointer arithmetic
std::abort(); // early abnormal termination
// 12 bytes still reachable
... // maybe it is delete here but you should be able
// to revert pointer arithmetic
}
39/82
valgrind 9/9
Advanced flags:
• --leak-check=full print details for each “definitely lost” or “possibly lost”
block, including where it was allocated
• --show-leak-kinds=all to combine with --leak-check=full. Print all leak kinds
• --track-fds=yes list open file descriptors on exit (not closed)
41/82
Compile-time Stack Usage
• -fstack-usage Makes the compiler output stack usage information for the
program, on a per-function basis
42/82
Use compiler flags for stack protection in GCC and Clang
Compile-time Stack Protection
43/82
Run-time Stack Usage
44/82
libc Buffer Overflow Checks 1/2
FORTIFY SOURCE define: the compiler provides buffer overflow checks for the
following functions:
memcpy , mempcpy , memmove , memset , strcpy , stpcpy , strncpy , strcat ,
strncat , sprintf , vsprintf , snprintf , vsnprintf , gets .
Recent compilers (e.g. GCC 12+, Clang 9+) allow detects buffer overflows with
enhanced coverage, e.g. dynamic pointers, with FORTIFY SOURCE=3 *
46/82
Standard Library Precondictions
The standard library provides run-time precondition checks for library calls, such as
bounds-checks for strings and containers, and null-pointer checks, etc.
-D GLIBCXX ASSERTIONS for libstdc++ (GCC)
-D LIBCPP ASSERT , LIBCPP HARDENING MODE EXTENSIVE for libc++ (LLVM):
47/82
Undefined Behavior Protections 1/2
• -fwrapv Signed integer has the same semantic of unsigned integer, with a
well-defined wrap-around behavior
• -fno-strict-aliasing Strict aliasing means that two objects with the same
memory address are not same if they have a different type, undefined behavior
otherwise. The flag disables this constraint
48/82
Undefined Behavior Protections 2/2
49/82
Control Flow Protections
50/82
Other Run-time Checks
51/82
Sanitizers
Address Sanitizer
Sanitizer are used during development and testing to discover and diagnose memory
misuse bugs and potentially dangerous undefined behavior
Sanitizer are implemented in Clang (from 3.1), gcc (from 4.8) and Xcode
Project using Sanitizers:
• Chromium
• Firefox
• Linux kernel
• Android
52/82
Memory error checking in C and C++: Comparing Sanitizers and Valgrind
Address Sanitizer
• github.com/google/sanitizers/wiki/AddressSanitizer
• gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html 53/82
Leak Sanitizer
• github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer
• gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html 54/82
Memory Sanitizers
-fsanitize-memory-track-origins=2
track origins of uninitialized values
• github.com/google/sanitizers/wiki/MemorySanitizer
• gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html 55/82
Undefined Behavior Sanitizer
gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html 56/82
Undefined Behavior Sanitizer
-fsanitize=<options> :
undefined All of the checks other than float-divide-by-zero,
unsigned-integer-overflow, implicit-conversion,
local-bounds and the nullability-* group of checks
local-bounds Out of bounds array indexing, in cases where the array bound can be
statically determined
58/82
Sanitizers vs. Valgrind
Valgrind - A neglected tool from the shadows or a serious debugging tool? 59/82
Debugging Summary
How to Debug Common Errors
Segmentation fault
• gdb, valgrind, sanitizers
• Segmentation fault when just entered in a function → stack overflow
Infinite execution
• gdb + (CTRL + C)
Incorrect results
• valgrind + assertion + gdb + sanitizers
60/82
Compiler Warnings
Compiler Warnings - GCC and Clang
-Wextra Enables some extra warning flags that are not enabled by -Wall (∼15 warnings)
Static analysis is the process of source code examination to find potential issues
Benefits of static code analysis:
63/82
Static Analyzers - Clang and GCC
void test() {
int i, a[10];
int x = a[i]; // warning: array subscript is undefined
}
scan-build make
The MSVC Static Analyzer Enables code analysis and control op-
tions (e.g. double-free, use-after-free, stdio related, etc) by adding the
/analyze flag
cppcheck --enable=warning,performance,style,portability,information,error
<src_file/directory>
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
cppcheck --enable=<enable_flags> --project=compile_commands.json
65/82
Popular Static Analyzers - PVS-Studio, SonarLint
Customers: IBM, Intel, Adobe, Microsoft, Nvidia, Bosh, IdGames, EpicGames, etc.
SonarLint plugin is available for Visual Code, Visual Studio Code, Eclipse, and IntelliJ IDEA
66/82
Other Static Analyzers - FBInfer, DeepCode
Available for Visual Studio Code, Sublime, IntelliJ IDEA, and Atom
67/82
see also: A curated list of static analysis tool
Code Testing
Code Testing
see Case Study 4: The $440 Million Software Error at Knight Capital
68/82
from: Kat Maddox (on Twitter)
Code Testing
Unit Test A unit is the smallest piece of code that can be logically isolated in a
system. Unit test refers to the verification of a unit. It supposes the
full knowledge of the code under testing (white-box testing)
Goals: meet specifications/requirements, fast development/debugging
Functional Test Output validation instead of the internal structure (black-box testing)
Goals: performance, regression (same functionalities of previous
version), stability, security (e.g. sanitizers), composability (e.g.
integration test)
69/82
Unit Testing 1/3
Unit testing involves breaking your program into pieces, and subjecting each piece to
a series of tests
Unit testing should observe the following key features:
• Isolation: Each unit test should be independent and avoid external interference
from other parts of the code
• Automation: Non-user interaction, easy to run, and manage
• Small Scope: Unit tests focus on small portions of code or specific
functionalities, making it easier to identify bugs
71/82
Unit Testing 3/3
72/82
JetBrains C++ Developer Ecosystem 2022
Test-Driven Development (TDD)
73/82
Test-Driven Development (TDD) - Main advantages
• Understandable behavior. New user can learn how the system works and its
properties from the tests
• Increase confidence. Developers are more confident that their code will work as
intended because it has been extensively tested
• github.com/catchorg/Catch2
• The Little Things: Testing with Catch2 75/82
catch 2/2
Code coverage is a measure used to describe the degree to which the source code of
a program is executed when a particular execution/test suite runs
gcov and llvm-profdata/llvm-cov are tools used in conjunction with compiler
instrumentation (gcc, clang) to interpret and visualize the raw code coverage
generated during the execution
gcovr and lcov are utilities for managing gcov/llvm-cov at higher level and
generating code coverage results
program.cpp:
# include <iostream>
# include <string>
79/82
Coverage-Guided Fuzz Testing
A fuzzer is a specialized tool that tracks which areas of the code are reached, and
generates mutations on the corpus of input data in order to maximize the code
coverage
LibFuzzer is the library provided by LLVM and feeds fuzzed inputs to the library via
a specific fuzzing entrypoint
The fuzz target function accepts an array of bytes and does something interesting with these
bytes using the API under test:
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* Data,
size_t Size) {
DoSomethingInterestingWithMyAPI(Data, Size);
return 0;
}
80/82
Code Quality
Linters - clang-tidy 1/2
lint: The term was derived from the name of the undesirable bits of fiber
clang-tidy provides an extensible framework for diagnosing and fixing typical
programming errors, like style violations, interface misuse, or bugs that can be deduced
via static analysis
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
$ clang-tidy -p .
clang-tidy searches the configuration file .clang-tidy file located in the closest
parent directory of the input file
• Fuchsia • Performance
• Google • Readability