blob: 03384f2c4618743ea97b912e4e8090b77ee558db [file] [log] [blame] [view]
Guillaume Chateletaba80d02020-01-06 12:17:041# Libc mem* benchmarks
2
Guillaume Chateletdeae7e92020-12-17 13:16:143This framework has been designed to evaluate and compare relative performance of memory function implementations on a particular machine.
Guillaume Chateletaba80d02020-01-06 12:17:044
Andre Vieirabc71aa42022-10-12 15:12:235It relies on:
6 - `libc.src.string.<mem_function>_benchmark` to run the benchmarks for the particular `<mem_function>`.
Guillaume Chateletdeae7e92020-12-17 13:16:147 - `libc-benchmark-analysis.py3` a tool to process the measurements into reports.
Guillaume Chateletaba80d02020-01-06 12:17:048
Guillaume Chateletdeae7e92020-12-17 13:16:149## Benchmarking tool
Guillaume Chateletaba80d02020-01-06 12:17:0410
11### Setup
12
Guillaume Chateletaba80d02020-01-06 12:17:0413```shell
14cd llvm-project
Andre Vieirabc71aa42022-10-12 15:12:2315cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;libc' -DCMAKE_BUILD_TYPE=Release -DLIBC_INCLUDE_BENCHMARKS=Yes -G Ninja
16ninja -C /tmp/build libc.src.string.<mem_function>_benchmark
Guillaume Chateletaba80d02020-01-06 12:17:0417```
18
Guillaume Chateletdeae7e92020-12-17 13:16:1419> Note: The machine should run in `performance` mode. This is achieved by running:
20```shell
21cpupower frequency-set --governor performance
22```
Eric Christopher880115e2020-05-05 21:02:1023
Guillaume Chateletdeae7e92020-12-17 13:16:1424### Usage
Eric Christopher880115e2020-05-05 21:02:1025
Andre Vieirabc71aa42022-10-12 15:12:2326The benchmark can run in two modes:
27 - **stochastic mode** returns the average time per call for a particular size distribution, this is the default,
Guillaume Chateletdeae7e92020-12-17 13:16:1428 - **sweep mode** returns the average time per size over a range of sizes.
Eric Christopher880115e2020-05-05 21:02:1029
Andre Vieirabc71aa42022-10-12 15:12:2330Each benchmark requires the `--study-name` to be set, this is a name to identify a run and provide label during analysis. If **stochastic mode** is being used, you must also provide `--size-distribution-name` to pick one of the available MemorySizeDistribution's.
Eric Christopher880115e2020-05-05 21:02:1031
Guillaume Chateletdeae7e92020-12-17 13:16:1432It also provides optional flags:
33 - `--num-trials`: repeats the benchmark more times, the analysis tool can take this into account and give confidence intervals.
34 - `--output`: specifies a file to write the report - or standard output if not set.
Eric Christopher880115e2020-05-05 21:02:1035
Guillaume Chateletdeae7e92020-12-17 13:16:1436### Stochastic mode
37
38This is the preferred mode to use. The function parameters are randomized and the branch predictor is less likely to kick in.
39
40```shell
Andre Vieirabc71aa42022-10-12 15:12:2341/tmp/build/bin/libc.src.string.memcpy_benchmark \
Guillaume Chateletdeae7e92020-12-17 13:16:1442 --study-name="new memcpy" \
Guillaume Chateletdeae7e92020-12-17 13:16:1443 --size-distribution-name="memcpy Google A" \
44 --num-trials=30 \
45 --output=/tmp/benchmark_result.json
46```
47
Guillaume Chateletcfe096d2020-12-17 14:49:2848The `--size-distribution-name` flag is mandatory and points to one of the [predefined distribution](MemorySizeDistributions.h).
Guillaume Chateletdeae7e92020-12-17 13:16:1449
50> Note: These distributions are gathered from several important binaries at Google (servers, databases, realtime and batch jobs) and reflect the importance of focusing on small sizes.
Guillaume Chateletaba80d02020-01-06 12:17:0451
52Using a profiler to observe size distributions for calls into libc functions, it
53was found most operations act on a small number of bytes.
54
55Function | % of calls with size ≤ 128 | % of calls with size ≤ 1024
56------------------ | --------------------------: | ---------------------------:
57memcpy | 96% | 99%
58memset | 91% | 99.9%
59memcmp<sup>1</sup> | 99.5% | ~100%
60
Guillaume Chateletaba80d02020-01-06 12:17:0461_<sup>1</sup> - The size refers to the size of the buffers to compare and not
62the number of bytes until the first difference._
63
Guillaume Chateletdeae7e92020-12-17 13:16:1464### Sweep mode
Guillaume Chateletaba80d02020-01-06 12:17:0465
Guillaume Chateletdeae7e92020-12-17 13:16:1466This mode is used to measure call latency per size for a certain range of sizes. Because it exercises the same size over and over again the branch predictor can kick in. It can still be useful to compare strength and weaknesses of particular implementations.
Guillaume Chateletaba80d02020-01-06 12:17:0467
68```shell
Andre Vieirabc71aa42022-10-12 15:12:2369/tmp/build/bin/libc.src.string.memcpy_benchmark \
Guillaume Chateletdeae7e92020-12-17 13:16:1470 --study-name="new memcpy" \
Guillaume Chateletdeae7e92020-12-17 13:16:1471 --sweep-mode \
72 --sweep-max-size=128 \
73 --output=/tmp/benchmark_result.json
Guillaume Chateletaba80d02020-01-06 12:17:0474```
75
Guillaume Chateletdeae7e92020-12-17 13:16:1476## Analysis tool
Guillaume Chateletaba80d02020-01-06 12:17:0477
Guillaume Chateletdeae7e92020-12-17 13:16:1478### Setup
Guillaume Chateletaba80d02020-01-06 12:17:0479
Guillaume Chateletdeae7e92020-12-17 13:16:1480Make sure to have `matplotlib`, `pandas` and `seaborn` setup correctly:
81
82```shell
83apt-get install python3-pip
84pip3 install matplotlib pandas seaborn
85```
86You may need `python3-gtk` or similar package to display the graphs.
87
88### Usage
89
90```shell
91python3 libc/benchmarks/libc-benchmark-analysis.py3 /tmp/benchmark_result.json ...
92```
93
94When used with __multiple trials Sweep Mode data__ the tool displays the 95% confidence interval.
95
96When providing with multiple reports at the same time, all the graphs from the same machine are displayed side by side to allow for comparison.
97
98The Y-axis unit can be changed via the `--mode` flag:
99 - `time` displays the measured time (this is the default),
100 - `cycles` displays the number of cycles computed from the cpu frequency,
101 - `bytespercycle` displays the number of bytes per cycle (for `Sweep Mode` reports only).
Guillaume Chateletaba80d02020-01-06 12:17:04102
103## Under the hood
104
105 To learn more about the design decisions behind the benchmarking framework,
106 have a look at the [RATIONALE.md](RATIONALE.md) file.