Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 1 | # Efficient Fuzzing Guide |
| 2 | |
| 3 | Once you have a fuzz target running, you can analyze and tweak it to improve its |
| 4 | efficiency. This document describes techniques to minimize fuzzing time and |
| 5 | maximize your results. |
| 6 | |
| 7 | *** note |
| 8 | **Note:** If you haven’t created your first fuzz target yet, see the [Getting |
| 9 | Started Guide]. |
| 10 | *** |
| 11 | |
| 12 | The most direct way to gauge the effectiveness of your fuzz target is to collect |
| 13 | metrics. You can get them manually, or take them from a [ClusterFuzz status] |
| 14 | page after your fuzz target is checked into the Chromium repository. |
| 15 | |
| 16 | [TOC] |
| 17 | |
| 18 | ## Key metrics of a fuzz target |
| 19 | |
| 20 | ### Execution speed |
| 21 | |
| 22 | A fuzzing engine such as libFuzzer typically explores a large search space by |
| 23 | performing randomized mutations, so it needs to run as fast as possible to find |
| 24 | interesting code paths. |
| 25 | |
| 26 | Fuzz target speed is calculated in executions per second (`exec/s`). It is |
| 27 | printed while a fuzz target is running: |
| 28 | |
| 29 | ``` |
| 30 | #11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098 |
| 31 | ``` |
| 32 | |
| 33 | You should aim for at least 1,000 exec/s from your fuzz target locally before |
| 34 | submitting it to the Chromium repository. If you’re under 1,000, consider the |
| 35 | following improvements: |
| 36 | |
| 37 | * [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup) |
| 38 | * [Minimizing memory usage](#Minimizing-memory-usage) |
| 39 | |
| 40 | #### Simplifying initialization/cleanup |
| 41 | |
| 42 | If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the |
| 43 | fuzzer’s execution speed. It can also cause the fuzzer to target specific |
| 44 | use-cases or fail to account for unexpected scenarios. |
| 45 | |
| 46 | Instead of performing setup and teardown on each input, use static |
| 47 | initialization and shared resources. Check out this [startup initialization] in |
| 48 | libFuzzer’s documentation for an example. |
| 49 | |
| 50 | *** note |
| 51 | **Note:** You can skip freeing static resources. However, all other resources |
| 52 | allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated, |
| 53 | since the function gets called millions of times during a fuzzing session. If |
| 54 | you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency. |
| 55 | *** |
| 56 | |
| 57 | #### Minimizing memory usage |
| 58 | |
| 59 | Avoid allocation of dynamic memory wherever possible. Memory instrumentation |
| 60 | works faster for stack-based and static objects than for heap-allocated ones. |
| 61 | |
| 62 | *** note |
| 63 | **Note:** It’s always a good idea to try different variants for your fuzz target |
| 64 | locally, then submit only the fastest implementation to the Chromium repository. |
| 65 | *** |
| 66 | |
| 67 | ### Code coverage |
| 68 | |
| 69 | You can check the percentage of code covered by your fuzz target to gauge |
| 70 | fuzzing effectiveness: |
| 71 | |
| 72 | * Review aggregated Chrome coverage from recent runs by checking the [fuzzing |
| 73 | coverage] report. This report can provide insight on how to improve code |
| 74 | coverage. |
| 75 | * Generate a source-level coverage report for your fuzzer by running the |
| 76 | [coverage script] stored in the Chromium repository. The script provides |
| 77 | detailed instructions and a usage example. |
| 78 | |
| 79 | *** note |
| 80 | **Note:** The code coverage of a fuzz target depends heavily on the corpus. A |
| 81 | well-chosen corpus will produce much greater code coverage. On the other hand, |
| 82 | a coverage report generated by a fuzz target without a corpus won't cover much |
| 83 | code. If you don’t have a corpus to use, you can download the [corpus from |
| 84 | ClusterFuzz]. For more information on the corpus, see |
| 85 | [Corpus Size](#Corpus-Size). |
| 86 | *** |
| 87 | |
| 88 | ### Corpus size |
| 89 | |
| 90 | A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase |
| 91 | or corpus unit) *interesting* if the input results in new code coverage (i.e., |
| 92 | if the fuzzer reaches code that has not been reached before). The set of all |
| 93 | interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs |
| 94 | and grows over time. |
| 95 | |
| 96 | If a fuzz target stops discovering new interesting inputs after running for a |
| 97 | while, it typically indicates that the fuzz target is hitting a code barrier |
| 98 | (also called a *coverage plateau*). The corpus for a reasonably complex target |
| 99 | should contain hundreds (if not thousands) of inputs. |
| 100 | |
| 101 | If a fuzz target reaches coverage plateau with a small corpus, the common causes |
| 102 | are checksums and magic numbers. Or, it may be impossible for your fuzzer to |
| 103 | reach a lot of code. The easiest way to diagnose the problem is to generate and |
| 104 | analyze a [coverage report](#code-coverage). Then, to fix the issue, try the |
| 105 | following: |
| 106 | |
| 107 | * Change the code (e.g., disable CRC checks while fuzzing) with a |
| 108 | [custom build](#Custom-build). |
| 109 | * Prepare or improve the [seed corpus](#Seed-corpus). |
| 110 | * Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary). |
| 111 | |
| 112 | ## Ways to improve a fuzz target |
| 113 | |
| 114 | ### Seed corpus |
| 115 | |
| 116 | You can give your fuzz target a starting point by creating a set of valid and |
| 117 | interesting inputs called a *seed corpus*. If you don’t provide a seed corpus, |
| 118 | the fuzzing engine has to guess inputs from scratch, which can take time |
| 119 | (depending on the size of the inputs and the complexity of the target format). |
| 120 | In many cases, providing a seed corpus can increase code coverage by an order of |
| 121 | magnitude. |
| 122 | |
| 123 | Seed corpuses work especially well for strictly defined file formats and data |
| 124 | transmission protocols: |
| 125 | |
| 126 | * For file format parsers, add valid files from your test suite. |
| 127 | * For protocol parsers, add valid raw streams from a test suite into separate |
| 128 | files. |
| 129 | * For graphics libraries, add a variety of small PNG/JPG/GIF files. |
| 130 | |
| 131 | #### Using a corpus locally |
| 132 | |
| 133 | If you’re running a fuzz target locally, you can easily designate a corpus by |
| 134 | passing a directory as an argument: |
| 135 | |
| 136 | ``` |
| 137 | ./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus |
| 138 | ``` |
| 139 | |
| 140 | The fuzzer stores all the interesting inputs it finds in the directory. |
| 141 | |
| 142 | #### Creating a Chromium repository seed corpus |
| 143 | |
| 144 | When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined |
| 145 | in the Chromium source repository. You can define one in your `BUILD.gn` file by |
| 146 | adding a `seed_corpus` attribute to your `fuzzer_test` target definition: |
| 147 | |
| 148 | ``` |
| 149 | fuzzer_test("my_fuzzer") { |
| 150 | ... |
| 151 | seed_corpus = "test/fuzz/testcases" |
| 152 | ... |
| 153 | } |
| 154 | ``` |
| 155 | |
| 156 | If you want to specify multiple seed corpus directories, use the `seed_corpuses` |
| 157 | attribute instead: |
| 158 | |
| 159 | ``` |
| 160 | fuzzer_test("my_fuzzer") { |
| 161 | ... |
| 162 | seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ] |
| 163 | ... |
| 164 | } |
| 165 | ``` |
| 166 | |
| 167 | All files found in these directories and their subdirectories are stored in a |
| 168 | `<my_fuzzer>_seed_corpus.zip` output archive. |
| 169 | |
| 170 | #### Uploading corpus files to GCS |
| 171 | |
| 172 | If you can't store your seed corpus in the Chromium repository (e.g., it’s too |
| 173 | large, can’t be open-sourced, etc.), you can upload the corpus to the Google |
| 174 | Cloud Storage (GCS) bucket used by ClusterFuzz. |
| 175 | |
| 176 | 1) Open the [Corpus GCS Bucket] in your browser. |
| 177 | 2) Search for the directory named `<my_fuzzer>`. If the directory does not |
| 178 | exist, create it. |
| 179 | 3) In the `<my_fuzzer>` directory, upload your corpus files. |
| 180 | |
| 181 | *** note |
| 182 | **Note:** If you upload your corpus to GCS, you don’t need to add the |
| 183 | `seed_corpus` attribute to your `fuzzer_test` target definition. However, adding |
| 184 | seed corpus to the Chromium repository is the preferred way. |
| 185 | *** |
| 186 | |
| 187 | You can do the same thing by using the [gsutil] command line tool: |
| 188 | |
| 189 | ```bash |
| 190 | gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer> |
| 191 | ``` |
| 192 | |
| 193 | *** note |
| 194 | **Note:** To write to this bucket using `gsutil`, you must be logged into your |
| 195 | @google.com account (@chromium.org will not work). You can use the `gcloud auth |
| 196 | login` command to log into your account in `gsutil` if you installed `gsutil` |
| 197 | through `gcloud`. |
| 198 | *** |
| 199 | |
| 200 | #### Minimizing a seed corpus |
| 201 | |
| 202 | Your seed corpus is synced to all fuzzing bots for every iteration, so it's |
| 203 | important to minimize it to a small set of interesting inputs before uploading. |
| 204 | Keeping the seed corpus small improves fuzzing efficiency and prevents our bots |
| 205 | from running out of disk space. |
| 206 | |
| 207 | You can minimize your seed corpus by using libFuzzer’s `-merge=1` option: |
| 208 | |
| 209 | ```bash |
| 210 | # Create an empty directory. |
| 211 | mkdir seed_corpus_minimized |
| 212 | |
| 213 | # Run the fuzzer with -merge=1 flag. |
| 214 | ./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus |
| 215 | ``` |
| 216 | |
| 217 | After running the command, the `seed_corpus_minimized` directory will contain a |
| 218 | minimized corpus that gives the same code coverage as your initial `seed_corpus` |
| 219 | directory. |
| 220 | |
| 221 | ### Fuzzer dictionary |
| 222 | |
| 223 | You can help your fuzzer increase its coverage by providing a set of common |
| 224 | words or values that you expect to find in the input. Such a dictionary works |
| 225 | especially well for certain use-cases (e.g., fuzzing file format decoders or |
| 226 | text-based protocols like XML). |
| 227 | |
| 228 | Add a fuzzer dictionary: |
| 229 | |
| 230 | 1) Create a flat ASCII text file that lists one input token per line in the |
| 231 | format `name="value"`. The value must appear in quotes with hex escaping |
| 232 | (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic |
| 233 | characters (`\` and `"` shorthands are recognized, too). This syntax is |
| 234 | similar to the one used by the [AFL] fuzzing engine (`-x` option). |
| 235 | |
| 236 | *** note |
| 237 | **Note:** `name` can be omitted, but it is a convenient way to document the |
| 238 | meaning of each token. Here’s an example dictionary: |
| 239 | *** |
| 240 | |
| 241 | ``` |
| 242 | # Lines starting with '#' and empty lines are ignored. |
| 243 | |
| 244 | # Adds "blah" word (w/o quotes) to the dictionary. |
| 245 | kw1="blah" |
| 246 | # Use \\ for backslash and \" for quotes. |
| 247 | kw2="\"ac\\dc\"" |
| 248 | # Use \xAB for hex values. |
| 249 | kw3="\xF7\xF8" |
| 250 | # Key name before '=' can be omitted: |
| 251 | "foo\x0Abar" |
| 252 | ``` |
| 253 | |
| 254 | 2) Test your dictionary by running your fuzz target locally: |
| 255 | |
| 256 | ```bash |
| 257 | ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus> |
| 258 | ``` |
| 259 | |
| 260 | If the dictionary is effective, you should see `NEW` units discovered in the |
| 261 | output. |
| 262 | |
| 263 | 3) Add the dictionary file in the same directory as your fuzz target, then add |
| 264 | the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file: |
| 265 | |
| 266 | ``` |
| 267 | fuzzer_test("my_fuzzer") { |
| 268 | ... |
| 269 | dict = "my_fuzzer.dict" |
| 270 | } |
| 271 | ``` |
| 272 | |
| 273 | The dictionary is submitted to the Chromium repository. Once ClusterFuzz |
| 274 | picks up a new revision build, the dictionary is used automatically. |
| 275 | |
| 276 | ### Custom build |
| 277 | |
| 278 | If you need to change the code being tested by your fuzz target, you can use an |
| 279 | `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code. |
| 280 | |
| 281 | *** note |
| 282 | **Note:** Patching target code is not a preferred way of improving the |
| 283 | corresponding fuzz target, but in some cases it might be the only way to do it |
| 284 | (e.g., when there is no intended API to disable checksum verification, or when |
| 285 | the target code uses a random generator that affects the reproducibility of |
| 286 | crashes). |
| 287 | *** |
| 288 | |
| 289 | [AFL]: http://lcamtuf.coredump.cx/afl/ |
| 290 | [ClusterFuzz status]: libFuzzer_integration.md#Status-Links |
| 291 | [Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer |
| 292 | [Getting Started Guide]: getting_started.md |
| 293 | [corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus |
| 294 | [coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py |
| 295 | [fuzzing coverage]: https://chromium-coverage.appspot.com/reports/latest_fuzzers_only/linux/index.html |
| 296 | [gsutil]: https://cloud.google.com/storage/docs/gsutil |
| 297 | [startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization |