Don't worry, it's ...

Probably Fine

Hi! I'm a product developer, currently keeping busy at Unruly. I've an interest in eXtreme Programming, Continuous Delivery, and cutting-edge Agile development techniques - I speak semi-regularly on these topics and co-organise XProLo, a monthly meetup for XP practitioners. I wrote a paper on Mob Programming in LBNIP 2015.

Test-Driving Performance

04 Feb 2016

It's more common to use TDD to drive functionality than it is to drive non-functional requirements such as performance, even though failing to meet a performance requirement can put your system in a worse state than failing to meet a functionality requirement.

OpenJDK provides a harness called JMH for running such tests - microbenchmarks are calibrations against individual bits of code rather than entire systems, which I'll be using in my example.

The First Test

If your application has performance sensitivity around latency or throughput, then consider writing the first test to enforce performance characteristics that you need.

These tests are your safety nets. As unit/acceptance/integration tests catch unwanted changes in behaviour, these will flag up undesirable performance regressions.

An Example

JMH has lots of examples of behaviour, set up and invoked through a main method.

We don't have to do it this way, and can build it into a test using the same API.

Let's say we have an ObjectUnderTest and it's crucial that it can handle calling .doTheThing() at least 300 times/second (we have a similar situation at work wherein running an auction must complete within a spec-required time threshold).

@State(Scope.Benchmark)
publi class BasicPerformanceTest {

    private ObjectUnderTest obj = new ObjectUnderTest();

    @Benchmark
    public void measureThroughput(Blackhole blackhole) {
        blackhole.consume(obj.doTheThing());
    }

    @Test
    public void shouldHandle300OperationsPerSecond() throws Exception {
        Options options = new OptionsBuilder()
                .warmupIterations(10)
                .measurementIterations(50)
                .mode(Mode.Throughput)
                .timeUnit(TimeUnit.SECONDS)
                .include(this.getClass().getName())
                .build();

        double opsPerSecond = new Runner(options).runSingle()
                .getPrimaryResult()
                .getScore();

        assertThat(opsPerSecond, is(greaterThan(300.0)));
    }
}

The above class has been wired up as both a JUnit test (using @Test) and a JMH benchmark (using @State and @Benchmark) - running the test method uses the JMH API to fork the JVM and run the @Benchmark method in a separate JVM.

Having one benchmark which measures only throughput makes pulling the result out of the runner is easy - .runSingle() exposes a single RunResult whose primary result is the ops/second measurement.

Combining the testing library with a purpose-built performance testing framework like JMH allows us to leverage its features such as handling warmups, forking explicit test JVMs, thread allocation, CPU burn, and many more - the JMH page has lots of examples.

Taking Care of Performance Tests

Just like any tests, these are neither write-once nor appropriate in all cases - we must apply the same level of care and judgement as we do when writing feature tests.

  • It might be okay that a test is failing now - are the assumptions under which the test was written still valid? Can you lift or remove requirements?
  • Experiment with tests that are consistently green - are the thresholds too loose? Can you drop them and find out the limits?

And of course, there are always reasons why these kinds of tests aren't appropriate at all.

  • The test subject might not be suitable - IO-focused performance tests could behave differently on memory storage vs SSDs vs spinning rust.
  • The environment might not be suitable - workstations/Build Servers aren't guaranteed to have the same setup or hardware as your production machines.
  • Reproducibility is key - a test that fails non-deterministically on identically-built machines isn't particularly useful.
  • A holistic approach might be more appropriate - take a wide-view of the system as a whole when a small unit has many different ways it could fail between runs.

To conclude, why not give it a try? You may find that folding performance tests into the test-suite is a great way to ensure that you don't end up neglecting an equally important part of the application - even if those tests aren't part of the deploy-blocking run.

Thanks to Richard Warburton, Chris Sinjakli, and Benji Weber for thoughts and proof-reading.


comments powered by Disqus