andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 1 | # Linux Crash Dumping |
| 2 | |
| 3 | Official builds of Chrome support crash dumping and reporting using the Google |
| 4 | crash servers. This is a guide to how this works. |
| 5 | |
| 6 | [TOC] |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 7 | |
| 8 | ## Breakpad |
| 9 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 10 | Breakpad is an open source library which we use for crash reporting across all |
| 11 | three platforms (Linux, Mac and Windows). For Linux, a substantial amount of |
| 12 | work was required to support cross-process dumping. At the time of writing this |
| 13 | code is currently forked from the upstream breakpad repo. While this situation |
Mark Mentovai | ebb9ddd6 | 2017-09-25 17:24:41 | [diff] [blame] | 14 | remains, the forked code lives in `third_party/breakpad/linux`. The upstream |
| 15 | repo is mirrored in `third_party/breakpad/breakpad`. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 16 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 17 | The code currently supports i386 only. Getting x86-64 to work should only be a |
| 18 | minor amount of work. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 19 | |
| 20 | ### Minidumps |
| 21 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 22 | Breakpad deals in a file format called 'minidumps'. This is a Microsoft format |
| 23 | and thus is defined by in-memory structures which are dumped, raw, to disk. The |
| 24 | main header file for this file format is |
Mark Mentovai | ebb9ddd6 | 2017-09-25 17:24:41 | [diff] [blame] | 25 | `third_party/breakpad/breakpad/src/google_breakpad/common/minidump_format.h`. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 26 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 27 | At the top level, the minidump file format is a list of key-value pairs. Many of |
| 28 | the keys are defined by the minidump format and contain cross-platform |
| 29 | representations of stacks, threads etc. For Linux we also define a number of |
| 30 | custom keys containing `/proc/cpuinfo`, `lsb-release` etc. These are defined in |
Mark Mentovai | ebb9ddd6 | 2017-09-25 17:24:41 | [diff] [blame] | 31 | `third_party/breakpad/breakpad/linux/minidump_format_linux.h`. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 32 | |
| 33 | ### Catching exceptions |
| 34 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 35 | Exceptional conditions (such as invalid memory references, floating point |
| 36 | exceptions, etc) are signaled by synchronous signals to the thread which caused |
| 37 | them. Synchronous signals are always run on the thread which triggered them as |
| 38 | opposed to asynchronous signals which can be handled by any thread in a |
| 39 | thread-group which hasn't masked that signal. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 40 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 41 | All the signals that we wish to catch are synchronous except SIGABRT, and we can |
| 42 | always arrange to send SIGABRT to a specific thread. Thus, we find the crashing |
| 43 | thread by looking at the current thread in the signal handler. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 44 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 45 | The signal handlers run on a pre-allocated stack in case the crash was triggered |
| 46 | by a stack overflow. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 47 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 48 | Once we have started handling the signal, we have to assume that the address |
| 49 | space is compromised. In order not to fall prey to this and crash (again) in the |
| 50 | crash handler, we observe some rules: |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 51 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 52 | 1. We don't enter the dynamic linker. This, observably, can trigger crashes in |
| 53 | the crash handler. Unfortunately, entering the dynamic linker is very easy |
| 54 | and can be triggered by calling a function from a shared library who's |
| 55 | resolution hasn't been cached yet. Since we can't know which functions have |
| 56 | been cached we avoid calling any of these functions with one exception: |
| 57 | `memcpy`. Since the compiler can emit calls to `memcpy` we can't really |
| 58 | avoid it. |
| 59 | 1. We don't allocate memory via malloc as the heap may be corrupt. Instead we |
| 60 | use a custom allocator (in `breadpad/linux/memory.h`) which gets clean pages |
| 61 | directly from the kernel. |
| 62 | |
| 63 | In order to avoid calling into libc we have a couple of header files which wrap |
| 64 | the system calls (`linux_syscall_support.h`) and reimplement a tiny subset of |
| 65 | libc (`linux_libc_support.h`). |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 66 | |
| 67 | ### Self dumping |
| 68 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 69 | The simple case occurs when the browser process crashes. Here we catch the |
| 70 | signal and `clone` a new process to perform the dumping. We have to use a new |
| 71 | process because a process cannot ptrace itself. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 72 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 73 | The dumping process then ptrace attaches to all the threads in the crashed |
| 74 | process and writes out a minidump to `/tmp`. This is generic breakpad code. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 75 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 76 | Then we reach the Chrome specific parts in `chrome/app/breakpad_linux.cc`. Here |
| 77 | we construct another temporary file and write a MIME wrapping of the crash dump |
| 78 | ready for uploading. We then fork off `wget` to upload the file. Based on Debian |
| 79 | popcorn, `wget` is very commonly installed (much more so than `libcurl`) and |
| 80 | `wget` handles the HTTPS gubbins for us. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 81 | |
| 82 | ### Renderer dumping |
| 83 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 84 | In the case of a crash in the renderer, we don't want the renderer handling the |
| 85 | crash dumping itself. In the future we will sandbox the renderer and allowing it |
| 86 | the authority to crash dump itself is too much. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 87 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 88 | Thus, we split the crash dumping in two parts: the gathering of information |
| 89 | which is done in process and the external dumping which is done out of process. |
| 90 | In the case above, the latter half was done in a `clone`d child. In this case, |
| 91 | the browser process handles it. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 92 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 93 | When renderers are forked off, they have a `UNIX DGRAM` socket in file |
| 94 | descriptor 4. The signal handler then calls into Chrome specific code |
| 95 | (`chrome/renderer/render_crash_handler_linux.cc`) when it would otherwise |
| 96 | `clone`. The Chrome specific code sends a datagram to the socket which contains: |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 97 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 98 | * Information which is only available to the signal handler (such as the |
| 99 | `ucontext` structure). |
| 100 | * A file descriptor to a pipe which it then blocks on reading from. |
| 101 | * A `CREDENTIALS` structure giving its PID. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 102 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 103 | The kernel enforces that the renderer isn't lying in the `CREDENTIALS` structure |
| 104 | so it can't ask the browser to crash dump another process. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 105 | |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 106 | The browser then performs the ptrace and minidump writing which would otherwise |
| 107 | be performed in the `clone`d process and does the MIME wrapping the uploading as |
| 108 | normal. |
| 109 | |
| 110 | Once the browser has finished getting information from the crashed renderer via |
| 111 | ptrace, it writes a byte to the file descriptor which was passed from the |
| 112 | renderer. The renderer than wakes up (because it was blocking on reading from |
| 113 | the other end) and rethrows the signal to itself. It then appears to crash |
| 114 | 'normally' and other parts of the browser notice the abnormal termination and |
| 115 | display the sad tab. |
andybons | 3322f76 | 2015-08-24 21:37:09 | [diff] [blame] | 116 | |
| 117 | ## How to test Breakpad support in Chromium |
| 118 | |
thestig | b46d74a1 | 2015-10-01 19:13:55 | [diff] [blame] | 119 | * Build Chromium as normal. |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 120 | * Run the browser with the environment variable |
| 121 | [CHROME_HEADLESS=1](https://crbug.com/19663). This enables crash dumping but |
| 122 | prevents crash dumps from being uploaded and deleted. |
| 123 | |
| 124 | ```shell |
| 125 | env CHROME_HEADLESS=1 ./out/Debug/chrome-wrapper |
| 126 | ``` |
thestig | b46d74a1 | 2015-10-01 19:13:55 | [diff] [blame] | 127 | * Visit the special URL `chrome://crash` to trigger a crash in the renderer |
andybons | ad92aa3 | 2015-08-31 02:27:44 | [diff] [blame] | 128 | process. |
| 129 | * A crash dump file should appear in the directory |
| 130 | `~/.config/chromium/Crash Reports`. |