blob: ed1f5636567dccdaae48ac2783d5695fa0a1b06a [file] [log] [blame] [view]
Max Moroz4a8415a2019-08-02 17:46:511# Efficient Fuzzing Guide
2
3Once you have a fuzz target running, you can analyze and tweak it to improve its
4efficiency. This document describes techniques to minimize fuzzing time and
5maximize your results.
6
7*** note
8**Note:** If you haven’t created your first fuzz target yet, see the [Getting
9Started Guide].
10***
11
12The most direct way to gauge the effectiveness of your fuzz target is to collect
13metrics. You can get them manually, or take them from a [ClusterFuzz status]
14page after your fuzz target is checked into the Chromium repository.
15
16[TOC]
17
18## Key metrics of a fuzz target
19
20### Execution speed
21
22A fuzzing engine such as libFuzzer typically explores a large search space by
23performing randomized mutations, so it needs to run as fast as possible to find
24interesting code paths.
25
26Fuzz target speed is calculated in executions per second (`exec/s`). It is
27printed while a fuzz target is running:
28
29```
30#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
31```
32
33You should aim for at least 1,000 exec/s from your fuzz target locally before
34submitting it to the Chromium repository. If you’re under 1,000, consider the
35following improvements:
36
37* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup)
38* [Minimizing memory usage](#Minimizing-memory-usage)
39
40#### Simplifying initialization/cleanup
41
42If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the
43fuzzer’s execution speed. It can also cause the fuzzer to target specific
44use-cases or fail to account for unexpected scenarios.
45
46Instead of performing setup and teardown on each input, use static
47initialization and shared resources. Check out this [startup initialization] in
48libFuzzer’s documentation for an example.
49
50*** note
51**Note:** You can skip freeing static resources. However, all other resources
52allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated,
53since the function gets called millions of times during a fuzzing session. If
54you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.
55***
56
57#### Minimizing memory usage
58
59Avoid allocation of dynamic memory wherever possible. Memory instrumentation
60works faster for stack-based and static objects than for heap-allocated ones.
61
62*** note
63**Note:** It’s always a good idea to try different variants for your fuzz target
64locally, then submit only the fastest implementation to the Chromium repository.
65***
66
67### Code coverage
68
69You can check the percentage of code covered by your fuzz target to gauge
70fuzzing effectiveness:
71
72* Review aggregated Chrome coverage from recent runs by checking the [fuzzing
73 coverage] report. This report can provide insight on how to improve code
74 coverage.
75* Generate a source-level coverage report for your fuzzer by running the
76 [coverage script] stored in the Chromium repository. The script provides
77 detailed instructions and a usage example.
78
79*** note
80**Note:** The code coverage of a fuzz target depends heavily on the corpus. A
81well-chosen corpus will produce much greater code coverage. On the other hand,
82a coverage report generated by a fuzz target without a corpus won't cover much
83code. If you don’t have a corpus to use, you can download the [corpus from
84ClusterFuzz]. For more information on the corpus, see
85[Corpus Size](#Corpus-Size).
86***
87
88### Corpus size
89
90A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase
91or corpus unit) *interesting* if the input results in new code coverage (i.e.,
92if the fuzzer reaches code that has not been reached before). The set of all
93interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs
94and grows over time.
95
96If a fuzz target stops discovering new interesting inputs after running for a
97while, it typically indicates that the fuzz target is hitting a code barrier
98(also called a *coverage plateau*). The corpus for a reasonably complex target
99should contain hundreds (if not thousands) of inputs.
100
101If a fuzz target reaches coverage plateau with a small corpus, the common causes
102are checksums and magic numbers. Or, it may be impossible for your fuzzer to
103reach a lot of code. The easiest way to diagnose the problem is to generate and
104analyze a [coverage report](#code-coverage). Then, to fix the issue, try the
105following:
106
107* Change the code (e.g., disable CRC checks while fuzzing) with a
108 [custom build](#Custom-build).
109* Prepare or improve the [seed corpus](#Seed-corpus).
110* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary).
111
112## Ways to improve a fuzz target
113
114### Seed corpus
115
116You can give your fuzz target a starting point by creating a set of valid and
117interesting inputs called a *seed corpus*. If you don’t provide a seed corpus,
118the fuzzing engine has to guess inputs from scratch, which can take time
119(depending on the size of the inputs and the complexity of the target format).
120In many cases, providing a seed corpus can increase code coverage by an order of
121magnitude.
122
123Seed corpuses work especially well for strictly defined file formats and data
124transmission protocols:
125
126* For file format parsers, add valid files from your test suite.
127* For protocol parsers, add valid raw streams from a test suite into separate
128 files.
129* For graphics libraries, add a variety of small PNG/JPG/GIF files.
130
131#### Using a corpus locally
132
133If you’re running a fuzz target locally, you can easily designate a corpus by
134passing a directory as an argument:
135
136```
137./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
138```
139
140The fuzzer stores all the interesting inputs it finds in the directory.
141
142#### Creating a Chromium repository seed corpus
143
144When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined
145in the Chromium source repository. You can define one in your `BUILD.gn` file by
146adding a `seed_corpus` attribute to your `fuzzer_test` target definition:
147
148```
149fuzzer_test("my_fuzzer") {
150 ...
151 seed_corpus = "test/fuzz/testcases"
152 ...
153}
154```
155
156If you want to specify multiple seed corpus directories, use the `seed_corpuses`
157attribute instead:
158
159```
160fuzzer_test("my_fuzzer") {
161 ...
162 seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ]
163 ...
164}
165```
166
167All files found in these directories and their subdirectories are stored in a
168`<my_fuzzer>_seed_corpus.zip` output archive.
169
170#### Uploading corpus files to GCS
171
172If you can't store your seed corpus in the Chromium repository (e.g., it’s too
173large, can’t be open-sourced, etc.), you can upload the corpus to the Google
174Cloud Storage (GCS) bucket used by ClusterFuzz.
175
1761) Open the [Corpus GCS Bucket] in your browser.
1772) Search for the directory named `<my_fuzzer>`. If the directory does not
178 exist, create it.
1793) In the `<my_fuzzer>` directory, upload your corpus files.
180
181*** note
182**Note:** If you upload your corpus to GCS, you don’t need to add the
183`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding
184seed corpus to the Chromium repository is the preferred way.
185***
186
187You can do the same thing by using the [gsutil] command line tool:
188
189```bash
190gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
191```
192
193*** note
194**Note:** To write to this bucket using `gsutil`, you must be logged into your
195@google.com account (@chromium.org will not work). You can use the `gcloud auth
196login` command to log into your account in `gsutil` if you installed `gsutil`
197through `gcloud`.
198***
199
200#### Minimizing a seed corpus
201
202Your seed corpus is synced to all fuzzing bots for every iteration, so it's
203important to minimize it to a small set of interesting inputs before uploading.
204Keeping the seed corpus small improves fuzzing efficiency and prevents our bots
205from running out of disk space.
206
207You can minimize your seed corpus by using libFuzzer’s `-merge=1` option:
208
209```bash
210# Create an empty directory.
211mkdir seed_corpus_minimized
212
213# Run the fuzzer with -merge=1 flag.
214./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
215```
216
217After running the command, the `seed_corpus_minimized` directory will contain a
218minimized corpus that gives the same code coverage as your initial `seed_corpus`
219directory.
220
221### Fuzzer dictionary
222
223You can help your fuzzer increase its coverage by providing a set of common
224words or values that you expect to find in the input. Such a dictionary works
225especially well for certain use-cases (e.g., fuzzing file format decoders or
226text-based protocols like XML).
227
228Add a fuzzer dictionary:
229
2301) Create a flat ASCII text file that lists one input token per line in the
231 format `name="value"`. The value must appear in quotes with hex escaping
232 (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic
233 characters (`\` and `"` shorthands are recognized, too). This syntax is
234 similar to the one used by the [AFL] fuzzing engine (`-x` option).
235
236 *** note
237 **Note:** `name` can be omitted, but it is a convenient way to document the
238 meaning of each token. Here’s an example dictionary:
239 ***
240
241 ```
242 # Lines starting with '#' and empty lines are ignored.
243
244 # Adds "blah" word (w/o quotes) to the dictionary.
245 kw1="blah"
246 # Use \\ for backslash and \" for quotes.
247 kw2="\"ac\\dc\""
248 # Use \xAB for hex values.
249 kw3="\xF7\xF8"
250 # Key name before '=' can be omitted:
251 "foo\x0Abar"
252 ```
253
2542) Test your dictionary by running your fuzz target locally:
255
256 ```bash
257 ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
258 ```
259
260 If the dictionary is effective, you should see `NEW` units discovered in the
261 output.
262
2633) Add the dictionary file in the same directory as your fuzz target, then add
264 the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file:
265
266 ```
267 fuzzer_test("my_fuzzer") {
268 ...
269 dict = "my_fuzzer.dict"
270 }
271 ```
272
273 The dictionary is submitted to the Chromium repository. Once ClusterFuzz
274 picks up a new revision build, the dictionary is used automatically.
275
276### Custom build
277
278If you need to change the code being tested by your fuzz target, you can use an
279`#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code.
280
281*** note
282**Note:** Patching target code is not a preferred way of improving the
283corresponding fuzz target, but in some cases it might be the only way to do it
284(e.g., when there is no intended API to disable checksum verification, or when
285the target code uses a random generator that affects the reproducibility of
286crashes).
287***
288
289[AFL]: http://lcamtuf.coredump.cx/afl/
290[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
291[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer
292[Getting Started Guide]: getting_started.md
293[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus
294[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py
295[fuzzing coverage]: https://chromium-coverage.appspot.com/reports/latest_fuzzers_only/linux/index.html
296[gsutil]: https://cloud.google.com/storage/docs/gsutil
297[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization