Blame - libc/benchmarks/README.md - external/github.com/llvm/llvm-project.git

blob: 03384f2c4618743ea97b912e4e8090b77ee558db [file] [log] [blame] [view]

Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	1	# Libc mem* benchmarks
				2
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	3	This framework has been designed to evaluate and compare relative performance of memory function implementations on a particular machine.
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	4
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	5	It relies on:
				6	- `libc.src.string.<mem_function>_benchmark` to run the benchmarks for the particular `<mem_function>`.
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	7	- `libc-benchmark-analysis.py3` a tool to process the measurements into reports.
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	8
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	9	## Benchmarking tool
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	10
				11	### Setup
				12
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	13	```shell
				14	cd llvm-project
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	15	cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;libc' -DCMAKE_BUILD_TYPE=Release -DLIBC_INCLUDE_BENCHMARKS=Yes -G Ninja
				16	ninja -C /tmp/build libc.src.string.<mem_function>_benchmark
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	17	```
				18
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	19	> Note: The machine should run in `performance` mode. This is achieved by running:
				20	```shell
				21	cpupower frequency-set --governor performance
				22	```
Eric Christopher	880115e	2020-05-05 21:02:10	[diff] [blame]	23
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	24	### Usage
Eric Christopher	880115e	2020-05-05 21:02:10	[diff] [blame]	25
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	26	The benchmark can run in two modes:
				27	- stochastic mode returns the average time per call for a particular size distribution, this is the default,
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	28	- sweep mode returns the average time per size over a range of sizes.
Eric Christopher	880115e	2020-05-05 21:02:10	[diff] [blame]	29
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	30	Each benchmark requires the `--study-name` to be set, this is a name to identify a run and provide label during analysis. If stochastic mode is being used, you must also provide `--size-distribution-name` to pick one of the available MemorySizeDistribution's.
Eric Christopher	880115e	2020-05-05 21:02:10	[diff] [blame]	31
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	32	It also provides optional flags:
				33	- `--num-trials`: repeats the benchmark more times, the analysis tool can take this into account and give confidence intervals.
				34	- `--output`: specifies a file to write the report - or standard output if not set.
Eric Christopher	880115e	2020-05-05 21:02:10	[diff] [blame]	35
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	36	### Stochastic mode
				37
				38	This is the preferred mode to use. The function parameters are randomized and the branch predictor is less likely to kick in.
				39
				40	```shell
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	41	/tmp/build/bin/libc.src.string.memcpy_benchmark \
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	42	--study-name="new memcpy" \
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	43	--size-distribution-name="memcpy Google A" \
				44	--num-trials=30 \
				45	--output=/tmp/benchmark_result.json
				46	```
				47
Guillaume Chatelet	cfe096d	2020-12-17 14:49:28	[diff] [blame]	48	The `--size-distribution-name` flag is mandatory and points to one of the [predefined distribution](MemorySizeDistributions.h).
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	49
				50	> Note: These distributions are gathered from several important binaries at Google (servers, databases, realtime and batch jobs) and reflect the importance of focusing on small sizes.
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	51
				52	Using a profiler to observe size distributions for calls into libc functions, it
				53	was found most operations act on a small number of bytes.
				54
				55	Function \| % of calls with size ≤ 128 \| % of calls with size ≤ 1024
				56	------------------ \| --------------------------: \| ---------------------------:
				57	memcpy \| 96% \| 99%
				58	memset \| 91% \| 99.9%
				59	memcmp<sup>1</sup> \| 99.5% \| ~100%
				60
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	61	_<sup>1</sup> - The size refers to the size of the buffers to compare and not
				62	the number of bytes until the first difference._
				63
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	64	### Sweep mode
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	65
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	66	This mode is used to measure call latency per size for a certain range of sizes. Because it exercises the same size over and over again the branch predictor can kick in. It can still be useful to compare strength and weaknesses of particular implementations.
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	67
				68	```shell
Andre Vieira	bc71aa4	2022-10-12 15:12:23	[diff] [blame]	69	/tmp/build/bin/libc.src.string.memcpy_benchmark \
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	70	--study-name="new memcpy" \
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	71	--sweep-mode \
				72	--sweep-max-size=128 \
				73	--output=/tmp/benchmark_result.json
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	74	```
				75
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	76	## Analysis tool
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	77
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	78	### Setup
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	79
Guillaume Chatelet	deae7e9	2020-12-17 13:16:14	[diff] [blame]	80	Make sure to have `matplotlib`, `pandas` and `seaborn` setup correctly:
				81
				82	```shell
				83	apt-get install python3-pip
				84	pip3 install matplotlib pandas seaborn
				85	```
				86	You may need `python3-gtk` or similar package to display the graphs.
				87
				88	### Usage
				89
				90	```shell
				91	python3 libc/benchmarks/libc-benchmark-analysis.py3 /tmp/benchmark_result.json ...
				92	```
				93
				94	When used with __multiple trials Sweep Mode data__ the tool displays the 95% confidence interval.
				95
				96	When providing with multiple reports at the same time, all the graphs from the same machine are displayed side by side to allow for comparison.
				97
				98	The Y-axis unit can be changed via the `--mode` flag:
				99	- `time` displays the measured time (this is the default),
				100	- `cycles` displays the number of cycles computed from the cpu frequency,
				101	- `bytespercycle` displays the number of bytes per cycle (for `Sweep Mode` reports only).
Guillaume Chatelet	aba80d0	2020-01-06 12:17:04	[diff] [blame]	102
				103	## Under the hood
				104
				105	To learn more about the design decisions behind the benchmarking framework,
				106	have a look at the [RATIONALE.md](RATIONALE.md) file.