testing/libfuzzer/getting_started.md - gn - Git at Google

 # Getting Started with libFuzzer in Chromium

 *** note
 **Prerequisites:** libFuzzer in Chromium is supported on Linux and Mac only.
 ***

 This document will walk you through:

 * setting up your build environment.
 * creating your first fuzz target.
 * running the fuzz target and verifying its vitals.

 ## Configure Build

 Use `use_libfuzzer` GN argument together with sanitizer to generate build files:

 *Notice*: current implementation also supports `use_afl` argument, but it is
 recommended to use libFuzzer for local development. Running libFuzzer locally
 doesn't require any special configuration and gives meaningful output quickly for
 speed, coverage and other parameters.

 ```bash
 # With address sanitizer
 gn gen out/libfuzzer '--args=use_libfuzzer=true is_asan=true is_debug=false enable_nacl=false' --check
 ```

 Supported sanitizer configurations are:

 | GN Argument | Description |
 |--------------|----|
 | `is_asan=true` | enables [Address Sanitizer] to catch problems like buffer overruns. |
 | `is_msan=true` | enables [Memory Sanitizer] to catch problems like uninitialized reads<sup>\[[*](reference.md#MSan)\]</sup>. |
 | `is_ubsan_security=true` | enables [Undefined Behavior Sanitizer] to catch<sup>\[[*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow. |
 | | it is possible to run libfuzzer without any sanitizers; *probably not what you want*.|

 Fuzz targets are built with minimal symbols by default, regardless of the value
 of `is_debug` and `symbol_level`. However if you want to run fuzz target under a
 debugger you can re-enable them by setting `sanitizer_keep_symbols=true`.

 To get the exact GN configuration that are used on our builders, see
 [Build Config].

 ## Write Fuzz Target

 Create a new `<my_fuzzer>.cc` file and define a `LLVMFuzzerTestOneInput` function:

 ```cpp
 #include <stddef.h>
 #include <stdint.h>

 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   // put your fuzzing code here and use data+size as input.
   return 0;
 }
 ```

 *Note*: You should create the fuzz target file `<my_fuzzer>.cc` next to the code
 that is being tested and in the same directory as your other unit tests. Please
 do not use `testing/libfuzzer/fuzzers` directory, this was a directory used for
 initial sample fuzz targets and is no longer recommended for landing new fuzz
 targets.

 [quic_stream_factory_fuzzer.cc] is a good example of real-world fuzz target.

 ## Define GN Target

 Define `fuzzer_test` GN target in BUILD.gn:

 ```python
 import("//testing/libfuzzer/fuzzer_test.gni")
 fuzzer_test("my_fuzzer") {
   sources = [ "my_fuzzer.cc" ]
   deps = [ ... ]
 }
 ```

 ## Build and Run Fuzz Target Locally

 Build with ninja as usual and run:

 ```bash
 ninja -C out/libfuzzer url_parse_fuzzer
 ./out/libfuzzer/url_parse_fuzzer
 ```

 Your fuzz target should produce output like this:

 ```
 INFO: Seed: 1511722356
 INFO: Loaded 2 modules   (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
 INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
 INFO: A corpus is not provided, starting from an empty corpus
 #2  INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
 #3  NEW    cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
 #4  NEW    cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
 #6  NEW    cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
 ```

 * `... NEW ...` line appears when libFuzzer finds new and interesting inputs.
 * an efficient fuzz target should be able to finds lots of them rather quickly.
 * `... pulse ...` line will appear periodically to show the current status.

 For more information about libFuzzer's output, please refer to [its own
 documentation].

 ### Symbolize Stacktrace

 If your fuzz target crashes when running locally and you see non-symbolized
 stacktrace, make sure that you have directory containing `llvm-symbolizer`
 binary added in `$PATH`. The symbolizer binary is included in Chromium's Clang
 package located at `third_party/llvm-build/Release+Asserts/bin/` directory.

 Alternatively, you can set `external_symbolizer_path` option via
 `ASAN_OPTIONS` env variable:

 ```bash
 $ ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
     ./fuzzer ./crash-input
 ```

 The same approach works with other sanitizers (e.g. `MSAN_OPTIONS`,
 `UBSAN_OPTIONS`, etc).

 ## Improving Your Fuzz Target

 Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
 To make it more efficient, several small steps can take you really far:

 * Create seed corpus. Add `seed_corpus = "src/fuzz-testcases/"` attribute
 to your fuzzer target and add example files in appropriate folder. Read more
 in [Seed Corpus] section of the [Efficient Fuzzer Guide].
 *Make sure corpus files are appropriately licensed.*
 * Create mutation dictionary. With a `dict = "protocol.dict"` attribute and
 `key=value` dictionary file format, mutations can be more effective.
 See [Fuzzer Dictionary] section of the [Efficient Fuzzer Guide].
 * Specify testcase length limits. By default, libFuzzer uses `-max_len=4096`
 or takes the longest testcase in the corpus if `-max_len` is not specified.
 ClusterFuzz uses different strategies for different fuzzing sessions, including
 different random values. Also, ClusterFuzz uses different fuzzing engines (e.g.
 AFL that doesn't have `-max_len` option). If your target has an input length
 limit that you would like to *strictly enforce*, add a sanity check to the
 beginning of your target function:

 ```cpp
 if (size < kMinInputLength || size > kMaxInputLength)
   return 0;
 ```

 ### Disable noisy error message logging

 If the code that you are fuzzing generates lot of error messages when
 encountering incorrect or invalid data, then you need to silence those errors
 in the fuzz target. Otherwise, fuzz target will be slow and inefficient.

 If the target uses Chromium logging APIs, the best way to do that is to
 override the environment used for logging in your fuzz target:

 ```cpp
 struct Environment {
   Environment() {
     logging::SetMinLogLevel(logging::LOG_FATAL);
   }
 };

 Environment* env = new Environment();
 ```

 ## Mutating Multiple Inputs

 By default, a fuzzing engine such as libFuzzer mutates a single input referenced
 by `uint8_t* data, size_t size`. However, quite often an API under fuzz testing
 accepts multiple arguments of various types rather than a single buffer. There
 are three approaches for such cases:

 ### 1) libprotobuf-mutator

 If you need to mutate multiple inputs of various types and length, please see
 [Getting Started with libprotobuf-mutator in Chromium]. That approach allows
 to mutate multiple inputs independently.

 **Caveats:** This approach requires an extra effort, but works with APIs and
 data structures of any complexity.

 ### 2) hash-based argument

 Another frequent case of an API under fuzz testing is a function that accepts a
 buffer with data and some integer value meaning a bitwise combination of flags.
 For such cases, we recommend to calculate a hash value from `(data, size)` and
 use that value for fuzzing of an additional integer argument, for example:

 ```cpp
 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   std::string str = std::string(reinterpret_cast<const char*>(data), size);
   std::size_t data_hash = std::hash<std::string>()(str);
   APIToBeFuzzed(data, size, data_hash);
   return 0;
 }

 ```

 **Caveats:** Hash value derived from the data would be a random value rather
 than a meaningful value controlled by fuzzing engine, i.e. a single bit mutation
 would result in a completely different hash value that might lead to a new code
 coverage, but the next mutation would generate another hash value and trigger
 another code path, without providing a real guidance to the fuzzing engine.

 ### 3) bytes taken from (data, size)

 You can extract one or more bytes from the data provided by fuzzing engine and
 use that value for fuzzing other arguments of the target API or making other
 decisions (e.g. number of iterations or attempts for calling some function).
 Note that those bytes should not be used as data for any other arguments, e.g.:

 ```cpp
 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   // Don't forget to enforce minimal data length.
   if (size < 1)
     return 0;

   // Extract single byte for fuzzing "flags" value.
   uint8_t flags = data[0];

   // Wrong, there is a bias between flags and API input.
   APIToBeFuzzed(data, size, flags);

   // Good, API input and flags are independent.
   APIToBeFuzzed(data + 1, size - 1, flags);

   return 0;
 }
 ```

 This approach addresses the problem of the *hash-based argument* approach, but
 has its own **caveats**:

 * If you extract any bytes from the input (either first or last ones), you
 cannot use valid samples as seed corpus. In that case, you'll have to generate
 seed corpus manually, i.e. append necessary bytes to the valid sample inputs.

 * Imagine that `APIToBeFuzzed()` had a bug, something like the following:

 ```cpp
 void APIToBeFuzzed(uint8_t* buffer, size_t length, uint8_t options) {
   ...
   if (options == 0x66) {
     // Yes, looks ridiculous, but things like that did happen in the real world.
     *(buffer - 1) = -1;
   }
   ...
 }
 ```

 assuming we used the fuzz target listed above, neither ASan nor other santizers
 would detect a buffer underwrite vulnerability, as the byte addressed by
 `buffer - 1` is actually a mapped memory allocated inside the fuzzing engine as
 `data[0]`.

 To avoid issues like that one, we would have to allocate a separate buffer and
 copy API input into it, or use a container object e.g.:

 ```cpp
 extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
   // Don't forget to enforce minimal data length.
   if (size < 1)
     return 0;

   // Extract single byte for fuzzing flags value.
   uint8_t flags = data[0];

   // Put API input into a separate container.
   std::vector<uint8_t> buffer(data + 1, data + size);

   APIToBeFuzzed(buffer.data(), buffer.size(), flags);

   return 0;
 }
 ```

 There is [base::FuzzedDataProvider] class that might be helpful for writing
 fuzz targets using that approach.


 ## Submitting Fuzz Target to ClusterFuzz

 ClusterFuzz builds and executes all `fuzzer_test` targets in the Chromium
 repository. It is extremely important to land a fuzz target into Chromium
 repository so that ClusterFuzz can run it at scale. Do not rely on just
 running fuzzers locally in your own environment, as it will catch far less
 issues. It's crucial to run fuzz targets continuously forever for catching
 regressions and improving code coverage over time.

 ## Next Steps

 * After your fuzz target is landed, you should check [ClusterFuzz status] page
 in a day or two.
 * Check the [Efficient Fuzzer Guide] to better understand your fuzz target
 performance and for optimization hints.


 [Address Sanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
 [ClusterFuzz status]: clusterfuzz.md#Status-Links
 [Efficient Fuzzer Guide]: efficient_fuzzer.md
 [Fuzzer Dictionary]: efficient_fuzzer.md#Fuzzer-Dictionary
 [Memory Sanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
 [Seed Corpus]: efficient_fuzzer.md#Seed-Corpus
 [Undefined Behavior Sanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
 [crbug/598448]: https://bugs.chromium.org/p/chromium/issues/detail?id=598448
 [quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory_fuzzer.cc
 [Build Config]: reference.md#Builder-configurations
 [its own documentation]: http://llvm.org/docs/LibFuzzer.html#output
 [Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
 [base::FuzzedDataProvider]: https://cs.chromium.org/chromium/src/base/test/fuzzed_data_provider.h
	# Getting Started with libFuzzer in Chromium

	*** note
	Prerequisites: libFuzzer in Chromium is supported on Linux and Mac only.
	***

	This document will walk you through:

	* setting up your build environment.
	* creating your first fuzz target.
	* running the fuzz target and verifying its vitals.

	## Configure Build

	Use `use_libfuzzer` GN argument together with sanitizer to generate build files:

	Notice: current implementation also supports `use_afl` argument, but it is
	recommended to use libFuzzer for local development. Running libFuzzer locally
	doesn't require any special configuration and gives meaningful output quickly for
	speed, coverage and other parameters.

	```bash
	# With address sanitizer
	gn gen out/libfuzzer '--args=use_libfuzzer=true is_asan=true is_debug=false enable_nacl=false' --check
	```

	Supported sanitizer configurations are:

	\| GN Argument \| Description \|
	\|--------------\|----\|
	\| `is_asan=true` \| enables [Address Sanitizer] to catch problems like buffer overruns. \|
	\| `is_msan=true` \| enables [Memory Sanitizer] to catch problems like uninitialized reads<sup>\[[*](reference.md#MSan)\]</sup>. \|
	\| `is_ubsan_security=true` \| enables [Undefined Behavior Sanitizer] to catch<sup>\[[*](reference.md#UBSan)\]</sup> undefined behavior like integer overflow. \|
	\| \| it is possible to run libfuzzer without any sanitizers; probably not what you want.\|

	Fuzz targets are built with minimal symbols by default, regardless of the value
	of `is_debug` and `symbol_level`. However if you want to run fuzz target under a
	debugger you can re-enable them by setting `sanitizer_keep_symbols=true`.

	To get the exact GN configuration that are used on our builders, see
	[Build Config].

	## Write Fuzz Target

	Create a new `<my_fuzzer>.cc` file and define a `LLVMFuzzerTestOneInput` function:

	```cpp
	#include <stddef.h>
	#include <stdint.h>

	extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
	// put your fuzzing code here and use data+size as input.
	return 0;
	}
	```

	Note: You should create the fuzz target file `<my_fuzzer>.cc` next to the code
	that is being tested and in the same directory as your other unit tests. Please
	do not use `testing/libfuzzer/fuzzers` directory, this was a directory used for
	initial sample fuzz targets and is no longer recommended for landing new fuzz
	targets.

	[quic_stream_factory_fuzzer.cc] is a good example of real-world fuzz target.

	## Define GN Target

	Define `fuzzer_test` GN target in BUILD.gn:

	```python
	import("//testing/libfuzzer/fuzzer_test.gni")
	fuzzer_test("my_fuzzer") {
	sources = [ "my_fuzzer.cc" ]
	deps = [ ... ]
	}
	```

	## Build and Run Fuzz Target Locally

	Build with ninja as usual and run:

	```bash
	ninja -C out/libfuzzer url_parse_fuzzer
	./out/libfuzzer/url_parse_fuzzer
	```

	Your fuzz target should produce output like this:

	```
	INFO: Seed: 1511722356
	INFO: Loaded 2 modules (115485 guards): 22572 [0x7fe8acddf560, 0x7fe8acdf5610), 92913 [0xaa05d0, 0xafb194),
	INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
	INFO: A corpus is not provided, starting from an empty corpus
	#2 INITED cov: 961 ft: 48 corp: 1/1b exec/s: 0 rss: 48Mb
	#3 NEW cov: 986 ft: 70 corp: 2/104b exec/s: 0 rss: 48Mb L: 103/103 MS: 1 InsertRepeatedBytes-
	#4 NEW cov: 989 ft: 74 corp: 3/106b exec/s: 0 rss: 48Mb L: 2/103 MS: 1 InsertByte-
	#6 NEW cov: 991 ft: 76 corp: 4/184b exec/s: 0 rss: 48Mb L: 78/103 MS: 2 CopyPart-InsertRepeatedBytes-
	```

	* `... NEW ...` line appears when libFuzzer finds new and interesting inputs.
	* an efficient fuzz target should be able to finds lots of them rather quickly.
	* `... pulse ...` line will appear periodically to show the current status.

	For more information about libFuzzer's output, please refer to [its own
	documentation].

	### Symbolize Stacktrace

	If your fuzz target crashes when running locally and you see non-symbolized
	stacktrace, make sure that you have directory containing `llvm-symbolizer`
	binary added in `$PATH`. The symbolizer binary is included in Chromium's Clang
	package located at `third_party/llvm-build/Release+Asserts/bin/` directory.

	Alternatively, you can set `external_symbolizer_path` option via
	`ASAN_OPTIONS` env variable:

	```bash
	$ ASAN_OPTIONS=external_symbolizer_path=/my/local/llvm/build/llvm-symbolizer \
	./fuzzer ./crash-input
	```

	The same approach works with other sanitizers (e.g. `MSAN_OPTIONS`,
	`UBSAN_OPTIONS`, etc).

	## Improving Your Fuzz Target

	Your fuzz target may immediately discover interesting (i.e. crashing) inputs.
	To make it more efficient, several small steps can take you really far:

	* Create seed corpus. Add `seed_corpus = "src/fuzz-testcases/"` attribute
	to your fuzzer target and add example files in appropriate folder. Read more
	in [Seed Corpus] section of the [Efficient Fuzzer Guide].
	Make sure corpus files are appropriately licensed.
	* Create mutation dictionary. With a `dict = "protocol.dict"` attribute and
	`key=value` dictionary file format, mutations can be more effective.
	See [Fuzzer Dictionary] section of the [Efficient Fuzzer Guide].
	* Specify testcase length limits. By default, libFuzzer uses `-max_len=4096`
	or takes the longest testcase in the corpus if `-max_len` is not specified.
	ClusterFuzz uses different strategies for different fuzzing sessions, including
	different random values. Also, ClusterFuzz uses different fuzzing engines (e.g.
	AFL that doesn't have `-max_len` option). If your target has an input length
	limit that you would like to strictly enforce, add a sanity check to the
	beginning of your target function:

	```cpp
	if (size < kMinInputLength \|\| size > kMaxInputLength)
	return 0;
	```

	### Disable noisy error message logging

	If the code that you are fuzzing generates lot of error messages when
	encountering incorrect or invalid data, then you need to silence those errors
	in the fuzz target. Otherwise, fuzz target will be slow and inefficient.

	If the target uses Chromium logging APIs, the best way to do that is to
	override the environment used for logging in your fuzz target:

	```cpp
	struct Environment {
	Environment() {
	logging::SetMinLogLevel(logging::LOG_FATAL);
	}
	};

	Environment* env = new Environment();
	```

	## Mutating Multiple Inputs

	By default, a fuzzing engine such as libFuzzer mutates a single input referenced
	by `uint8_t* data, size_t size`. However, quite often an API under fuzz testing
	accepts multiple arguments of various types rather than a single buffer. There
	are three approaches for such cases:

	### 1) libprotobuf-mutator

	If you need to mutate multiple inputs of various types and length, please see
	[Getting Started with libprotobuf-mutator in Chromium]. That approach allows
	to mutate multiple inputs independently.

	Caveats: This approach requires an extra effort, but works with APIs and
	data structures of any complexity.

	### 2) hash-based argument

	Another frequent case of an API under fuzz testing is a function that accepts a
	buffer with data and some integer value meaning a bitwise combination of flags.
	For such cases, we recommend to calculate a hash value from `(data, size)` and
	use that value for fuzzing of an additional integer argument, for example:

	```cpp
	extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
	std::string str = std::string(reinterpret_cast<const char*>(data), size);
	std::size_t data_hash = std::hash<std::string>()(str);
	APIToBeFuzzed(data, size, data_hash);
	return 0;
	}

	```

	Caveats: Hash value derived from the data would be a random value rather
	than a meaningful value controlled by fuzzing engine, i.e. a single bit mutation
	would result in a completely different hash value that might lead to a new code
	coverage, but the next mutation would generate another hash value and trigger
	another code path, without providing a real guidance to the fuzzing engine.

	### 3) bytes taken from (data, size)

	You can extract one or more bytes from the data provided by fuzzing engine and
	use that value for fuzzing other arguments of the target API or making other
	decisions (e.g. number of iterations or attempts for calling some function).
	Note that those bytes should not be used as data for any other arguments, e.g.:

	```cpp
	extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
	// Don't forget to enforce minimal data length.
	if (size < 1)
	return 0;

	// Extract single byte for fuzzing "flags" value.
	uint8_t flags = data[0];

	// Wrong, there is a bias between flags and API input.
	APIToBeFuzzed(data, size, flags);

	// Good, API input and flags are independent.
	APIToBeFuzzed(data + 1, size - 1, flags);

	return 0;
	}
	```

	This approach addresses the problem of the hash-based argument approach, but
	has its own caveats:

	* If you extract any bytes from the input (either first or last ones), you
	cannot use valid samples as seed corpus. In that case, you'll have to generate
	seed corpus manually, i.e. append necessary bytes to the valid sample inputs.

	* Imagine that `APIToBeFuzzed()` had a bug, something like the following:

	```cpp
	void APIToBeFuzzed(uint8_t* buffer, size_t length, uint8_t options) {
	...
	if (options == 0x66) {
	// Yes, looks ridiculous, but things like that did happen in the real world.
	*(buffer - 1) = -1;
	}
	...
	}
	```

	assuming we used the fuzz target listed above, neither ASan nor other santizers
	would detect a buffer underwrite vulnerability, as the byte addressed by
	`buffer - 1` is actually a mapped memory allocated inside the fuzzing engine as
	`data[0]`.

	To avoid issues like that one, we would have to allocate a separate buffer and
	copy API input into it, or use a container object e.g.:

	```cpp
	extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
	// Don't forget to enforce minimal data length.
	if (size < 1)
	return 0;

	// Extract single byte for fuzzing flags value.
	uint8_t flags = data[0];

	// Put API input into a separate container.
	std::vector<uint8_t> buffer(data + 1, data + size);

	APIToBeFuzzed(buffer.data(), buffer.size(), flags);

	return 0;
	}
	```

	There is [base::FuzzedDataProvider] class that might be helpful for writing
	fuzz targets using that approach.


	## Submitting Fuzz Target to ClusterFuzz

	ClusterFuzz builds and executes all `fuzzer_test` targets in the Chromium
	repository. It is extremely important to land a fuzz target into Chromium
	repository so that ClusterFuzz can run it at scale. Do not rely on just
	running fuzzers locally in your own environment, as it will catch far less
	issues. It's crucial to run fuzz targets continuously forever for catching
	regressions and improving code coverage over time.

	## Next Steps

	* After your fuzz target is landed, you should check [ClusterFuzz status] page
	in a day or two.
	* Check the [Efficient Fuzzer Guide] to better understand your fuzz target
	performance and for optimization hints.


	[Address Sanitizer]: http://clang.llvm.org/docs/AddressSanitizer.html
	[ClusterFuzz status]: clusterfuzz.md#Status-Links
	[Efficient Fuzzer Guide]: efficient_fuzzer.md
	[Fuzzer Dictionary]: efficient_fuzzer.md#Fuzzer-Dictionary
	[Memory Sanitizer]: http://clang.llvm.org/docs/MemorySanitizer.html
	[Seed Corpus]: efficient_fuzzer.md#Seed-Corpus
	[Undefined Behavior Sanitizer]: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
	[crbug/598448]: https://bugs.chromium.org/p/chromium/issues/detail?id=598448
	[quic_stream_factory_fuzzer.cc]: https://cs.chromium.org/chromium/src/net/quic/chromium/quic_stream_factory_fuzzer.cc
	[Build Config]: reference.md#Builder-configurations
	[its own documentation]: http://llvm.org/docs/LibFuzzer.html#output
	[Getting Started with libprotobuf-mutator in Chromium]: libprotobuf-mutator.md
	[base::FuzzedDataProvider]: https://cs.chromium.org/chromium/src/base/test/fuzzed_data_provider.h