Flaky Behavior in Concurrent Tests Caused by Static Fields

1167 words
6 minutes read
Updated on 25 Apr 2024

What’s wrong with this Java code?

public class App {

    private static Repo repo;

    App(Repo repo) {
        this.repo = repo;
    }

    String greet(int id) {
        return repo.getGreeting(id);
    }
}

class Repo {

    String getGreeting(int id) {
        throw new UnsupportedOperationException("not implemented yet");
    }
}

Starting with a unit test

Let’s try to test it. Below is a basic unit test that mocks an unimplemented Repo dependency and executes two tests. These tests expect different greeting messages based on the provided greeting id.

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when;

class AppTest {

    private final Repo repo = mock(Repo.class);

    private final App app = new App(repo);

    @Test
    void greet_john() {
        String greeting = "Hello, John!";
        when(repo.getGreeting(1)).thenReturn(greeting);
        assertEquals(greeting, app.greet(1));
    }

    @Test
    void greet_mike() {
        String greeting = "Hi, Mike!";
        when(repo.getGreeting(2)).thenReturn(greeting);
        assertEquals(greeting, app.greet(2));
    }
}

Single threaded test execution

Executing it using JUnit 5 testing framework yields the expected outcome: both tests pass successfully. By default, JUnit generates a new instance of the test class before executing each test method. This process occurs sequentially, as illustrated in Figure 1.

Before executing greet_hohn, a new instance of AppTest is created. A mock Repo object is then generated and assigned to the static field App.repo. The test is executed, during which the mock is enhanced with a stub that indirectly returns the actual value. This value is compared with expected greeting message, test passed.

JUnit proceeds to greet_mike and follows a similar process. It creates a new instance of AppTest, resets the App.repo field to a new mock, executes the test, adds a stub, verifies the obtained greeting message, and we are done, the test is green.

Multi-threaded test execution

There are way more tests in real projects. Leveraging concurrent execution can accelerate the testing phase. Let’s execute the aforementioned tests concurrently now. To achieve this, we must override the subsequent properties and provide them to JUnit.

# junit-platform.properties

# Enable concurrent test execution
junit.jupiter.execution.parallel.enabled=true

# Run all test methods within a class concurrently
junit.jupiter.execution.parallel.mode.default=concurrent

The test methods within AppTest are executed concurrently now. If we attempt to run the tests multiple times, we may encounter a situation where one of them succeeds while another one fails. The outcome could differ from one run to another, but you will certainly experience either of the following.

expected: <Hello, John!> but was: <null>
Expected :Hello, John!
Actual   :null

expected: <Hi, Mike!> but was: <null>
Expected :Hi, Mike!
Actual   :null

It appears that we have entered the realm of flaky tests. To understand why this occurs, let’s examine the sequence of execution and the state of the App.repo field when each of the tests is executed concurrently, as depicted in Figure 2.

JUnit detects the presence of two tests. Initially, it creates an instance of AppTest for greet_john. Consequently, the App.repo static field points to the mock established within the greet_john instance. However, JUnit does not execute the test at this point; instead, it proceeds to create another instance of AppTest for greet_mike. This action overwrites the reference in the App.repo with the one originating from the greet_mike instance. As a result, any modifications made to the greet_john repo mock will not be reflected in the App.repo static field.

With the preparations complete, JUnit advances to concurrently executing both tests. In the case of greet_john, a stub is added into a local mock, yet this modification isn’t propagated to App.repo. As a result, the obtained greeting message is null, leading to a test failure. Conversely, in the case of greet_mike, the created stub is visible from the App.repo field, resulting in the expected greeting message and a successful test outcome.

Do you see the issue? Due to App having repo as a static field, that field is global across all instances of App. Hence, whichever instance of AppTest happens to be the most recent one in assigning a reference to a mock to that static field will have its mock utilized by all test methods across all threads uniformly.

How to fix flaky tests

The evident solution would involve refactoring the App code to eliminate the usage of a static field, as shown below.

public class App {

    private final Repo repo;

    App(Repo repo) {
        this.repo = repo;
    }

    String greet(int id) {
        return repo.getGreeting(id);
    }
}

However, there are cases where achieving this can prove challenging, particularly within large codebases with legacy code and complex logic. If direct modification of the main codebase is unattainable, it remains feasible to adjust the test itself. However, the benefits obtained from such adjustments should surpass the effort and maintenance investment involved.

@TestInstance(TestInstance.Lifecycle.PER_CLASS)
@Execution(ExecutionMode.CONCURRENT)
class AppTest {

    private final Repo repo = mock(Repo.class);

    private final App app = new App(repo);

    @BeforeEach
    void before() {
        when(repo.getGreeting(1)).thenReturn("Hello, John!");
        when(repo.getGreeting(2)).thenReturn("Hi, Mike!");
    }

    @Test
    void greet_john() {
        assertEquals("Hello, John!", app.greet(1));
    }

    @Test
    void greet_mike() {
        assertEquals("Hi, Mike!", app.greet(2));
    }
}

The usage of TestInstance.Lifecycle.PER_CLASS instructs JUnit to create a new test instance once per test class. In our scenario, this means that a single AppTest instance will be created for both tests, preventing any overwriting of the App.repo field.

The multi-threaded execution mode should be explicitly configured using ExecutionMode.CONCURRENT when utilizing @TestInstance(PER_CLASS). Failing to do so will result in the sequential execution of test methods within the same thread.

The mocking library we use is not thread-safe as mentioned in the Mockito FAQ:

For healthy scenarios Mockito plays nicely with threads. For instance, you can run tests in parallel to speed up the build. Also, you can let multiple threads call methods on a shared mock to test in concurrent conditions. Check out a timeout() feature for testing concurrency.

However Mockito is only thread-safe in healthy tests, that is tests without multiple threads stubbing/verifying a shared mock. Stubbing or verification of a shared mock from different threads is NOT the proper way of testing because it will always lead to intermittent behavior. In general, mutable state + assertions in multi-threaded environment lead to random results. If you do stub/verify a shared mock across threads you will face occasional exceptions like: WrongTypeOfReturnValue, etc.

And indeed, when we execute the tests, one or both of them might randomly fail. To address this problem, we establish all stubs within the before() method, which is executed prior to each test method. However, adopting this approach brings its own set of challenges:

Ambiguity arises when certain stubs are intended for exclusive use in a single test method but inadvertently influence other tests.
Conflicts emerge when multiple stubs are required for the same input.

Summary

Give particular consideration to static fields within the classes being tested. Should you encounter flaky behavior during concurrent test execution, these fields could serve as a solid starting point for investigation.

It is feasible to fix unreliable tests without refactoring the main application logic. However, this process can be intricate and error-prone, albeit potentially acceptable for short-term resolutions. For a more sustainable solution, consider moving towards non-static fields where applicable.