Blame - testing/libfuzzer/efficient_fuzzing.md - chromium/src.git

blob: ed1f5636567dccdaae48ac2783d5695fa0a1b06a [file] [log] [blame] [view]

Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	1	# Efficient Fuzzing Guide
				2
				3	Once you have a fuzz target running, you can analyze and tweak it to improve its
				4	efficiency. This document describes techniques to minimize fuzzing time and
				5	maximize your results.
				6
				7	*** note
				8	Note: If you haven’t created your first fuzz target yet, see the [Getting
				9	Started Guide].
				10	***
				11
				12	The most direct way to gauge the effectiveness of your fuzz target is to collect
				13	metrics. You can get them manually, or take them from a [ClusterFuzz status]
				14	page after your fuzz target is checked into the Chromium repository.
				15
				16	[TOC]
				17
				18	## Key metrics of a fuzz target
				19
				20	### Execution speed
				21
				22	A fuzzing engine such as libFuzzer typically explores a large search space by
				23	performing randomized mutations, so it needs to run as fast as possible to find
				24	interesting code paths.
				25
				26	Fuzz target speed is calculated in executions per second (`exec/s`). It is
				27	printed while a fuzz target is running:
				28
				29	```
				30	#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
				31	```
				32
				33	You should aim for at least 1,000 exec/s from your fuzz target locally before
				34	submitting it to the Chromium repository. If you’re under 1,000, consider the
				35	following improvements:
				36
				37	* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup)
				38	* [Minimizing memory usage](#Minimizing-memory-usage)
				39
				40	#### Simplifying initialization/cleanup
				41
				42	If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the
				43	fuzzer’s execution speed. It can also cause the fuzzer to target specific
				44	use-cases or fail to account for unexpected scenarios.
				45
				46	Instead of performing setup and teardown on each input, use static
				47	initialization and shared resources. Check out this [startup initialization] in
				48	libFuzzer’s documentation for an example.
				49
				50	*** note
				51	Note: You can skip freeing static resources. However, all other resources
				52	allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated,
				53	since the function gets called millions of times during a fuzzing session. If
				54	you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.
				55	***
				56
				57	#### Minimizing memory usage
				58
				59	Avoid allocation of dynamic memory wherever possible. Memory instrumentation
				60	works faster for stack-based and static objects than for heap-allocated ones.
				61
				62	*** note
				63	Note: It’s always a good idea to try different variants for your fuzz target
				64	locally, then submit only the fastest implementation to the Chromium repository.
				65	***
				66
				67	### Code coverage
				68
				69	You can check the percentage of code covered by your fuzz target to gauge
				70	fuzzing effectiveness:
				71
				72	* Review aggregated Chrome coverage from recent runs by checking the [fuzzing
				73	coverage] report. This report can provide insight on how to improve code
				74	coverage.
				75	* Generate a source-level coverage report for your fuzzer by running the
				76	[coverage script] stored in the Chromium repository. The script provides
				77	detailed instructions and a usage example.
				78
				79	*** note
				80	Note: The code coverage of a fuzz target depends heavily on the corpus. A
				81	well-chosen corpus will produce much greater code coverage. On the other hand,
				82	a coverage report generated by a fuzz target without a corpus won't cover much
				83	code. If you don’t have a corpus to use, you can download the [corpus from
				84	ClusterFuzz]. For more information on the corpus, see
				85	[Corpus Size](#Corpus-Size).
				86	***
				87
				88	### Corpus size
				89
				90	A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase
				91	or corpus unit) interesting if the input results in new code coverage (i.e.,
				92	if the fuzzer reaches code that has not been reached before). The set of all
				93	interesting inputs is called the corpus. A corpus is shared across fuzzer runs
				94	and grows over time.
				95
				96	If a fuzz target stops discovering new interesting inputs after running for a
				97	while, it typically indicates that the fuzz target is hitting a code barrier
				98	(also called a coverage plateau). The corpus for a reasonably complex target
				99	should contain hundreds (if not thousands) of inputs.
				100
				101	If a fuzz target reaches coverage plateau with a small corpus, the common causes
				102	are checksums and magic numbers. Or, it may be impossible for your fuzzer to
				103	reach a lot of code. The easiest way to diagnose the problem is to generate and
				104	analyze a [coverage report](#code-coverage). Then, to fix the issue, try the
				105	following:
				106
				107	* Change the code (e.g., disable CRC checks while fuzzing) with a
				108	[custom build](#Custom-build).
				109	* Prepare or improve the [seed corpus](#Seed-corpus).
				110	* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary).
				111
				112	## Ways to improve a fuzz target
				113
				114	### Seed corpus
				115
				116	You can give your fuzz target a starting point by creating a set of valid and
				117	interesting inputs called a seed corpus. If you don’t provide a seed corpus,
				118	the fuzzing engine has to guess inputs from scratch, which can take time
				119	(depending on the size of the inputs and the complexity of the target format).
				120	In many cases, providing a seed corpus can increase code coverage by an order of
				121	magnitude.
				122
				123	Seed corpuses work especially well for strictly defined file formats and data
				124	transmission protocols:
				125
				126	* For file format parsers, add valid files from your test suite.
				127	* For protocol parsers, add valid raw streams from a test suite into separate
				128	files.
				129	* For graphics libraries, add a variety of small PNG/JPG/GIF files.
				130
				131	#### Using a corpus locally
				132
				133	If you’re running a fuzz target locally, you can easily designate a corpus by
				134	passing a directory as an argument:
				135
				136	```
				137	./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
				138	```
				139
				140	The fuzzer stores all the interesting inputs it finds in the directory.
				141
				142	#### Creating a Chromium repository seed corpus
				143
				144	When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined
				145	in the Chromium source repository. You can define one in your `BUILD.gn` file by
				146	adding a `seed_corpus` attribute to your `fuzzer_test` target definition:
				147
				148	```
				149	fuzzer_test("my_fuzzer") {
				150	...
				151	seed_corpus = "test/fuzz/testcases"
				152	...
				153	}
				154	```
				155
				156	If you want to specify multiple seed corpus directories, use the `seed_corpuses`
				157	attribute instead:
				158
				159	```
				160	fuzzer_test("my_fuzzer") {
				161	...
				162	seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ]
				163	...
				164	}
				165	```
				166
				167	All files found in these directories and their subdirectories are stored in a
				168	`<my_fuzzer>_seed_corpus.zip` output archive.
				169
				170	#### Uploading corpus files to GCS
				171
				172	If you can't store your seed corpus in the Chromium repository (e.g., it’s too
				173	large, can’t be open-sourced, etc.), you can upload the corpus to the Google
				174	Cloud Storage (GCS) bucket used by ClusterFuzz.
				175
				176	1) Open the [Corpus GCS Bucket] in your browser.
				177	2) Search for the directory named `<my_fuzzer>`. If the directory does not
				178	exist, create it.
				179	3) In the `<my_fuzzer>` directory, upload your corpus files.
				180
				181	*** note
				182	Note: If you upload your corpus to GCS, you don’t need to add the
				183	`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding
				184	seed corpus to the Chromium repository is the preferred way.
				185	***
				186
				187	You can do the same thing by using the [gsutil] command line tool:
				188
				189	```bash
				190	gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
				191	```
				192
				193	*** note
				194	Note: To write to this bucket using `gsutil`, you must be logged into your
				195	@google.com account (@chromium.org will not work). You can use the `gcloud auth
				196	login` command to log into your account in `gsutil` if you installed `gsutil`
				197	through `gcloud`.
				198	***
				199
				200	#### Minimizing a seed corpus
				201
				202	Your seed corpus is synced to all fuzzing bots for every iteration, so it's
				203	important to minimize it to a small set of interesting inputs before uploading.
				204	Keeping the seed corpus small improves fuzzing efficiency and prevents our bots
				205	from running out of disk space.
				206
				207	You can minimize your seed corpus by using libFuzzer’s `-merge=1` option:
				208
				209	```bash
				210	# Create an empty directory.
				211	mkdir seed_corpus_minimized
				212
				213	# Run the fuzzer with -merge=1 flag.
				214	./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
				215	```
				216
				217	After running the command, the `seed_corpus_minimized` directory will contain a
				218	minimized corpus that gives the same code coverage as your initial `seed_corpus`
				219	directory.
				220
				221	### Fuzzer dictionary
				222
				223	You can help your fuzzer increase its coverage by providing a set of common
				224	words or values that you expect to find in the input. Such a dictionary works
				225	especially well for certain use-cases (e.g., fuzzing file format decoders or
				226	text-based protocols like XML).
				227
				228	Add a fuzzer dictionary:
				229
				230	1) Create a flat ASCII text file that lists one input token per line in the
				231	format `name="value"`. The value must appear in quotes with hex escaping
				232	(`\xNN`) applied to all non-printable, high-bit, or otherwise problematic
				233	characters (`\` and `"` shorthands are recognized, too). This syntax is
				234	similar to the one used by the [AFL] fuzzing engine (`-x` option).
				235
				236	*** note
				237	Note: `name` can be omitted, but it is a convenient way to document the
				238	meaning of each token. Here’s an example dictionary:
				239	***
				240
				241	```
				242	# Lines starting with '#' and empty lines are ignored.
				243
				244	# Adds "blah" word (w/o quotes) to the dictionary.
				245	kw1="blah"
				246	# Use \\ for backslash and \" for quotes.
				247	kw2="\"ac\\dc\""
				248	# Use \xAB for hex values.
				249	kw3="\xF7\xF8"
				250	# Key name before '=' can be omitted:
				251	"foo\x0Abar"
				252	```
				253
				254	2) Test your dictionary by running your fuzz target locally:
				255
				256	```bash
				257	./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
				258	```
				259
				260	If the dictionary is effective, you should see `NEW` units discovered in the
				261	output.
				262
				263	3) Add the dictionary file in the same directory as your fuzz target, then add
				264	the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file:
				265
				266	```
				267	fuzzer_test("my_fuzzer") {
				268	...
				269	dict = "my_fuzzer.dict"
				270	}
				271	```
				272
				273	The dictionary is submitted to the Chromium repository. Once ClusterFuzz
				274	picks up a new revision build, the dictionary is used automatically.
				275
				276	### Custom build
				277
				278	If you need to change the code being tested by your fuzz target, you can use an
				279	`#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` macro in your target code.
				280
				281	*** note
				282	Note: Patching target code is not a preferred way of improving the
				283	corresponding fuzz target, but in some cases it might be the only way to do it
				284	(e.g., when there is no intended API to disable checksum verification, or when
				285	the target code uses a random generator that affects the reproducibility of
				286	crashes).
				287	***
				288
				289	[AFL]: http://lcamtuf.coredump.cx/afl/
				290	[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
				291	[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer
				292	[Getting Started Guide]: getting_started.md
				293	[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus
				294	[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py
				295	[fuzzing coverage]: https://chromium-coverage.appspot.com/reports/latest_fuzzers_only/linux/index.html
				296	[gsutil]: https://cloud.google.com/storage/docs/gsutil
				297	[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization