I recently worked on some hashtable lookup code that could benefit from SIMD optimizations and microbenchmarking of modulus and hash functions to improve the code quality. However, modern CPUs are complex and have various components that cause fluctuations during benchmarks, such as core design, access times of the CPU Cache Hierarchy, CPU frequency adjustments for thermal balancing, etc.
To get more stable benchmarking results on my CPU, an AMD 7950X3D
with two types of cores, I looked into the LLVM Benchmarking
Tips page, which has excellent information. It suggests using cset
shield to isolate the CPU cores for exclusive benchmark runs, but
unfortunately, it relies on cgroup
v1 while my system (Ubuntu 22.04) uses systemd 249
with
cgroup
v2.
I created a helper script to set up an isolated CPU partition and stabilize execution fluctuations as much as possible for more reliable benchmark runs. This assumes a CPU with a similar number of cores as the AMD 7950X3D hybrid core CPU, which has 16 cores (32 SMT threads) with eight cores having a large cache and the other eight supporting higher frequencies. The script isolates physical cores 6,7, and 8,9 for benchmarks while disabling their SMT siblings 22-25 to avoid hyperthreading during benchmark runs. That provides me with two cores of each kind to run benchmarks on.
The script does the following:
- Disables address space randomization (ASLR) to reduce cache colorization effects.
- Disables core frequency scaling on all CPUs
(
performance
governor). - Turns off
cpufreq
boost mode, which can cause thermal throttling during benchmarks. - Allow perf profiling events for users with
CAP_SYS_ADMIN
(see/proc/sys/kernel/perf_event_paranoid
). - Disables cores 22-25 (hyperthreading siblings) for exclusive use of cores 6,7 and 8,9.
- Sets up a cgroup v2 CPU partition
shield
with reserved cores 6-9 exclusively. - Displays system information relevant to the CPU partition
When called without an argument, the script undoes the above settings. For benchmarking, it is invoked with a process id, normally a shell PID, to move the shell onto the isolated cores. Benchmark runs started from within this shell will inherit the CPU partition. Using the taskset utility, benchmarks can be forced to run on a particular core, e.g.:
bash> echo $$
67207
bash> ./benchmarking.sh 67207
[...core.isolation...]
bash> taskset -c 9 perf stat -d -r 3 ./my-benchmark
[...stats...]
This script is not a silver bullet for perfect benchmarking, but it can help minimize execution noise and get more stable results on contemporary CPUs. The number of cores and SMT siblings is easy to adapt, with a bit of help from lscpu or similar tools. Feel free to let me know if you have any suggestions or improvements to this script:
For more information about benchmarking on Linux, please refer to the following resources: