blob: 8bc0aa86a4eea15f6eda67117cfd4a48b3653b97 [file] [log] [blame] [view]
andybonsad92aa32015-08-31 02:27:441# Linux Crash Dumping
2
3Official builds of Chrome support crash dumping and reporting using the Google
4crash servers. This is a guide to how this works.
5
6[TOC]
andybons3322f762015-08-24 21:37:097
8## Breakpad
9
andybonsad92aa32015-08-31 02:27:4410Breakpad is an open source library which we use for crash reporting across all
11three platforms (Linux, Mac and Windows). For Linux, a substantial amount of
12work was required to support cross-process dumping. At the time of writing this
13code is currently forked from the upstream breakpad repo. While this situation
Mark Mentovaiebb9ddd62017-09-25 17:24:4114remains, the forked code lives in `third_party/breakpad/linux`. The upstream
15repo is mirrored in `third_party/breakpad/breakpad`.
andybons3322f762015-08-24 21:37:0916
andybonsad92aa32015-08-31 02:27:4417The code currently supports i386 only. Getting x86-64 to work should only be a
18minor amount of work.
andybons3322f762015-08-24 21:37:0919
20### Minidumps
21
andybonsad92aa32015-08-31 02:27:4422Breakpad deals in a file format called 'minidumps'. This is a Microsoft format
23and thus is defined by in-memory structures which are dumped, raw, to disk. The
24main header file for this file format is
Mark Mentovaiebb9ddd62017-09-25 17:24:4125`third_party/breakpad/breakpad/src/google_breakpad/common/minidump_format.h`.
andybons3322f762015-08-24 21:37:0926
andybonsad92aa32015-08-31 02:27:4427At the top level, the minidump file format is a list of key-value pairs. Many of
28the keys are defined by the minidump format and contain cross-platform
29representations of stacks, threads etc. For Linux we also define a number of
30custom keys containing `/proc/cpuinfo`, `lsb-release` etc. These are defined in
Mark Mentovaiebb9ddd62017-09-25 17:24:4131`third_party/breakpad/breakpad/linux/minidump_format_linux.h`.
andybons3322f762015-08-24 21:37:0932
33### Catching exceptions
34
andybonsad92aa32015-08-31 02:27:4435Exceptional conditions (such as invalid memory references, floating point
36exceptions, etc) are signaled by synchronous signals to the thread which caused
37them. Synchronous signals are always run on the thread which triggered them as
38opposed to asynchronous signals which can be handled by any thread in a
39thread-group which hasn't masked that signal.
andybons3322f762015-08-24 21:37:0940
andybonsad92aa32015-08-31 02:27:4441All the signals that we wish to catch are synchronous except SIGABRT, and we can
42always arrange to send SIGABRT to a specific thread. Thus, we find the crashing
43thread by looking at the current thread in the signal handler.
andybons3322f762015-08-24 21:37:0944
andybonsad92aa32015-08-31 02:27:4445The signal handlers run on a pre-allocated stack in case the crash was triggered
46by a stack overflow.
andybons3322f762015-08-24 21:37:0947
andybonsad92aa32015-08-31 02:27:4448Once we have started handling the signal, we have to assume that the address
49space is compromised. In order not to fall prey to this and crash (again) in the
50crash handler, we observe some rules:
andybons3322f762015-08-24 21:37:0951
andybonsad92aa32015-08-31 02:27:44521. We don't enter the dynamic linker. This, observably, can trigger crashes in
53 the crash handler. Unfortunately, entering the dynamic linker is very easy
54 and can be triggered by calling a function from a shared library who's
55 resolution hasn't been cached yet. Since we can't know which functions have
56 been cached we avoid calling any of these functions with one exception:
57 `memcpy`. Since the compiler can emit calls to `memcpy` we can't really
58 avoid it.
591. We don't allocate memory via malloc as the heap may be corrupt. Instead we
60 use a custom allocator (in `breadpad/linux/memory.h`) which gets clean pages
61 directly from the kernel.
62
63In order to avoid calling into libc we have a couple of header files which wrap
64the system calls (`linux_syscall_support.h`) and reimplement a tiny subset of
65libc (`linux_libc_support.h`).
andybons3322f762015-08-24 21:37:0966
67### Self dumping
68
andybonsad92aa32015-08-31 02:27:4469The simple case occurs when the browser process crashes. Here we catch the
70signal and `clone` a new process to perform the dumping. We have to use a new
71process because a process cannot ptrace itself.
andybons3322f762015-08-24 21:37:0972
andybonsad92aa32015-08-31 02:27:4473The dumping process then ptrace attaches to all the threads in the crashed
74process and writes out a minidump to `/tmp`. This is generic breakpad code.
andybons3322f762015-08-24 21:37:0975
andybonsad92aa32015-08-31 02:27:4476Then we reach the Chrome specific parts in `chrome/app/breakpad_linux.cc`. Here
77we construct another temporary file and write a MIME wrapping of the crash dump
78ready for uploading. We then fork off `wget` to upload the file. Based on Debian
79popcorn, `wget` is very commonly installed (much more so than `libcurl`) and
80`wget` handles the HTTPS gubbins for us.
andybons3322f762015-08-24 21:37:0981
82### Renderer dumping
83
andybonsad92aa32015-08-31 02:27:4484In the case of a crash in the renderer, we don't want the renderer handling the
85crash dumping itself. In the future we will sandbox the renderer and allowing it
86the authority to crash dump itself is too much.
andybons3322f762015-08-24 21:37:0987
andybonsad92aa32015-08-31 02:27:4488Thus, we split the crash dumping in two parts: the gathering of information
89which is done in process and the external dumping which is done out of process.
90In the case above, the latter half was done in a `clone`d child. In this case,
91the browser process handles it.
andybons3322f762015-08-24 21:37:0992
andybonsad92aa32015-08-31 02:27:4493When renderers are forked off, they have a `UNIX DGRAM` socket in file
94descriptor 4. The signal handler then calls into Chrome specific code
95(`chrome/renderer/render_crash_handler_linux.cc`) when it would otherwise
96`clone`. The Chrome specific code sends a datagram to the socket which contains:
andybons3322f762015-08-24 21:37:0997
andybonsad92aa32015-08-31 02:27:4498* Information which is only available to the signal handler (such as the
99 `ucontext` structure).
100* A file descriptor to a pipe which it then blocks on reading from.
101* A `CREDENTIALS` structure giving its PID.
andybons3322f762015-08-24 21:37:09102
andybonsad92aa32015-08-31 02:27:44103The kernel enforces that the renderer isn't lying in the `CREDENTIALS` structure
104so it can't ask the browser to crash dump another process.
andybons3322f762015-08-24 21:37:09105
andybonsad92aa32015-08-31 02:27:44106The browser then performs the ptrace and minidump writing which would otherwise
107be performed in the `clone`d process and does the MIME wrapping the uploading as
108normal.
109
110Once the browser has finished getting information from the crashed renderer via
111ptrace, it writes a byte to the file descriptor which was passed from the
112renderer. The renderer than wakes up (because it was blocking on reading from
113the other end) and rethrows the signal to itself. It then appears to crash
114'normally' and other parts of the browser notice the abnormal termination and
115display the sad tab.
andybons3322f762015-08-24 21:37:09116
117## How to test Breakpad support in Chromium
118
thestigb46d74a12015-10-01 19:13:55119* Build Chromium as normal.
andybonsad92aa32015-08-31 02:27:44120* Run the browser with the environment variable
121 [CHROME_HEADLESS=1](https://crbug.com/19663). This enables crash dumping but
122 prevents crash dumps from being uploaded and deleted.
123
124 ```shell
125 env CHROME_HEADLESS=1 ./out/Debug/chrome-wrapper
126 ```
thestigb46d74a12015-10-01 19:13:55127* Visit the special URL `chrome://crash` to trigger a crash in the renderer
andybonsad92aa32015-08-31 02:27:44128 process.
129* A crash dump file should appear in the directory
130 `~/.config/chromium/Crash Reports`.