In particular, we’ve been hacking on the unit tests and tried to get
make check invocations run much faster. To paraphrase Michael C. Feathers from his very interesting book Working Effectively with Legacy Code on unit tests:
Unit tests should run fast – a test taking 1/10th of a second is a slow unit test.
Most tests we had executed during
make check took much longer. Beast has some pretty sophisticated test features nowadays, i.e. it can render BSE files to WAV files offline (in a test harness), extract certain audio features from the WAV files and compare those against saved feature sets. In other places, we’re using tests that loop through all possible input/output values of a function in brute force manner and assert correctness over the full value range. Adding up to that, we have performance tests that may repeatedly call the same functions (often thousands or millions of times) in order to measure their performance and print out measurements.
These kind of tests are nice to have for broad correctness testing, especially around release time. However we did run into the problem of
make check being less likely executed before commits, because running the tests would be too slow to bother with. That of course somewhat defeats the purpose of having a test harness. Another problem that we ran into were the intermixing of correctness/accuracy tests with performance benchmarks. These often sit in the same test program or even the same function and are hard to spot that way in the full output of a
To solve the outlined problems, we changed the Beast tests as follows:
* All makefiles support the (recursive) rules:
report (this is easily implemented by including a common makefile).
* Tests added to TESTS are run as part of
check (automake standard).
* Tests added to SLOWTESTS are run as part of
* Tests added to PERFTESTS are run as part of
make report runs all of
perf and captures the output into a file
* We use special test initialization functions (e.g.
sfi_init_test(argc,argv)) which do argument parsing to handle
* Performance measurements are always reported by the
treport_maximized(perf_testname,amount,unit) function or the
treport_minimized() variant thereof, depending on whether the measured quantity is desired to be maximized or minimized. These functions are defined in birnettests.h and print out quantities with a magic prefix that allows grepping for performance results.
make distcheck enforces a successful run of
Together, these changes have allowed us to easily tweak our tests to have faster test loops (
if !test_slow) and to conditionalize lengthy performance loops (
if test_perf). So
make check is pleasingly fast now, while
make slowcheck still runs all the brute force and lengthy tests we’ve come up with. Performance results are now available at the tip of:
$ make report [...] $ grep '^#TBENCH=' report.out #TBENCH=mini: Direct-AutoLocker: +83.57 nSeconds #TBENCH=mini: Birnet-AutoLocker: +104.574 nSeconds #TBENCH=maxi: CPU Resampling FPU-Up08M: +260.4562325006 Streams #TBENCH=maxi: CPU Resampling FPU-Up16M: +184.19598452754 Streams #TBENCH=maxi: CPU Resampling SSE-Up08M: +399.04229848364 Streams #TBENCH=maxi: CPU Resampling SSE-Up16M: +338.5240352065 Streams
The results are tailored to be parsable by performance statistics scripts. So writing scripts to present performance report differences and to compare performance reports between releases is now on the TODO list. 😉