Skip to content

Flaky tests in TestNG.
Named, fixed, and quarantined.

Flaky TestNG suites are not random. They follow patterns: dependency cascades hiding failures, parallel methods racing on shared instances, DataProvider rows mutating shared state, listeners with global reach, retry analyzers that paper over bugs. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published

mergify[bot] commented · 2 minutes ago Flaky test detected checkout flow › settles the pending promise src/checkout.test.ts:42 Last 3 runs on this commit: ✕ Failed ✓ Passed ✓ Passed Confidence on main: 98% 71% over the last 7 days Auto-quarantined by Test Insights This test no longer blocks your merge. Quarantine lifts when stable.
Example PR comment from the Mergify bot detecting a flaky TestNG test and quarantining it automatically.

Why TestNG is uniquely flaky

TestNG is the test framework that gave Java its first parallel runner, dependency graph, and data provider built in. Suites are defined declaratively in testng.xml and the runner pulls together @Test-annotated methods across classes. The flexibility is the appeal: groups, dependencies, and parallel modes all compose. The flexibility is also the source of most flakes, because the same composition rules apply at runtime regardless of how complex the suite gets.

Two facets do most of the damage. Dependency graphs mean a single broken test marks every dependent test as SKIP, hiding the real failure under a wave of skipped statuses. And parallel="methods" in testng.xml shares the test instance across threads by default, so a @BeforeMethod-set field becomes a race the moment two methods run concurrently.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of TestNG suites: dependsOnGroups graph collapses on a single failure, parallel=methods racing on @BeforeMethod state, @DataProvider returning shared mutable state, @Listeners with global side effects, IRetryAnalyzer hiding real bugs, ITestContext attribute leakage, hard timeouts that flake on slow runners, and Surefire `rerunFailingTestsCount` hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

dependsOnGroups graph collapses on a single failure

Symptom. A run reports 1 failure and 47 skipped tests, all of which were waiting on the failing test through a chain of `dependsOnGroups`.

Root cause. @Test(dependsOnGroups = "auth") tells TestNG to skip a test (and everything that depends on it) when an upstream group fails. The intent is to avoid noise from cascading errors, but it also hides regressions: any flake in an upstream test silently disables a chunk of the suite. The pipeline is "green except for one test" when in reality the suite ran 12% of its checks.

public class AuthTest {
    @Test(groups = "auth")
    public void login() {
        assertTrue(authClient.login("user", "pw")); // racy under load
    }
}

public class CheckoutTest {
    @Test(groups = "checkout", dependsOnGroups = "auth")
    public void canCheckout() { ... }

    @Test(groups = "checkout", dependsOnGroups = "auth")
    public void canApplyDiscount() { ... }
    // every checkout test silently skips when login flakes
}

Fix. Use dependsOnGroups only when downstream tests are genuinely meaningless without the upstream (a true precondition, not a convenience). For "tests that share setup", lift the setup into @BeforeClass so each test fails on its own merits when something goes wrong.

public class CheckoutTest {
    @BeforeClass
    public void authenticate() {
        assertTrue(authClient.login("user", "pw"), "auth precondition");
    }

    @Test
    public void canCheckout() { ... }

    @Test
    public void canApplyDiscount() { ... }
}

With Mergify. Test Insights treats SKIPPED tests caused by upstream failures as their own signal, distinct from explicit skips. The dashboard surfaces tests that are skipped on every run that has an upstream flake, so the dependency cascade is visible.

Pattern 2

parallel=methods racing on @BeforeMethod state

Symptom. A test class that ran green for years starts failing when `parallel="methods"` is enabled in `testng.xml`, with assertions seeing values from a sibling test.

Root cause. Under parallel="methods", TestNG runs every method on its own thread but reuses the same test instance across them. A field set in @BeforeMethod is shared between concurrent methods. @BeforeMethod fires before each, in arbitrary order, so the field reads out a value some other thread set milliseconds earlier.

public class CounterTest {
    private int counter; // shared across parallel methods on the same instance

    @BeforeMethod
    public void reset() { counter = 0; }

    @Test public void incrementsToOne() {
        counter++;
        assertEquals(counter, 1); // sometimes 2 when sibling raced ahead
    }

    @Test public void alsoIncrementsToOne() {
        counter++;
        assertEquals(counter, 1);
    }
}

Fix. Prefer per-method local variables over instance fields. When state must persist across methods, switch to a thread-confined data structure (ThreadLocal) or move parallelism up to parallel="classes", which gives each class its own instance per thread.

public class CounterTest {
    @Test public void incrementsToOne() {
        int counter = 0;
        counter++;
        assertEquals(counter, 1);
    }
}

With Mergify. Test Insights tags failures that only appear with `parallel="methods"` and pass under sequential reruns as parallelism-sensitive. The dashboard surfaces the parallel-only signature so the shared-instance root cause is the obvious lead.

Pattern 3

@DataProvider returning shared mutable state

Symptom. A data-driven test passes in CI for some inputs and fails for others, and re-running with the failing input alone passes.

Root cause. @DataProvider returns a 2D array that TestNG iterates over to call the test once per row. If the rows hold mutable objects, every iteration mutates the same instance: row 0 changes a shared User, row 1 sees the post-row-0 state. Under parallel="methods", the rows even race on the same object across threads.

private static final User SHARED = new User("Rémy");

@DataProvider(name = "users")
public Object[][] users() {
    return new Object[][] { {SHARED}, {SHARED}, {SHARED} };
}

@Test(dataProvider = "users")
public void renamesUser(User u) {
    u.setName(u.getName() + "-renamed");
    assertTrue(u.getName().endsWith("-renamed"));
    // row 1 sees "Rémy-renamed-renamed", row 2 sees "...-renamed-renamed-renamed"
}

Fix. Build a fresh instance per row inside the provider. For expensive setup, build a factory and call it per row.

@DataProvider(name = "users")
public Object[][] users() {
    return new Object[][] {
        { new User("Rémy") },
        { new User("Rémy") },
        { new User("Rémy") },
    };
}

With Mergify. Test Insights groups failures by test method and parameter index. When iteration N of a parameterized method fails consistently and N-1 passed, the dashboard surfaces the iteration-order signature so the shared-state mistake is easy to find.

Pattern 4

@Listeners with global side effects

Symptom. A test passes alone and fails inside the suite with assertions about logging that no one in the failing test wrote, after a custom `@Listeners` was added in an unrelated test class.

Root cause. @Listeners is class-level metadata that registers an ITestListener for the entire suite, not just the annotated class. A listener that mutates static state (a logger, a metrics counter, a thread-local) keeps that state alive for every test in the run, including tests that never opted in. The first test to fail after the listener was added looks unrelated to it.

@Listeners(MetricsListener.class)
public class CheckoutTest { ... }

// MetricsListener.java
public class MetricsListener implements ITestListener {
    public void onTestStart(ITestResult r) {
        Metrics.global().increment("test_started");
    }
}

// elsewhere
public class HealthcheckTest {
    @Test public void metricsEmpty() {
        // expected: 0. actual: N (CheckoutTest's listener fired across the suite)
        assertEquals(Metrics.global().get("test_started"), 0);
    }
}

Fix. Scope listeners explicitly. Use testng.xml <listeners> for genuinely suite-wide listeners and document them. Avoid mutating static state from listeners; if you must collect cross-test data, use a sink the test can read deterministically.

With Mergify. Test Insights groups the downstream failures by the test that introduced the listener. When five seemingly unrelated tests fail only after a class with a new @Listeners is added, the dashboard surfaces the upstream culprit.

Pattern 5

IRetryAnalyzer hiding real bugs

Symptom. Your suite passes consistently. A user reports a bug that should have been caught by the test that ran three times yesterday before passing on attempt 3.

Root cause. IRetryAnalyzer reruns failing tests up to N times and reports the last result. A real race that loses on attempt 1 and wins on attempt 2 gets reported as PASSED. The bug is still there. The pipeline has decided not to look at it. The retry log line in the report is the only trace, and few teams alert on it.

public class FlakyRetry implements IRetryAnalyzer {
    private int count = 0;
    @Override public boolean retry(ITestResult result) {
        return count++ < 3;
    }
}

public class CheckoutTest {
    @Test(retryAnalyzer = FlakyRetry.class)
    public void chargesCard() {
        // intermittent timing bug; passes 2 of 3 times
    }
}

Fix. Do not retry at the framework level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information IRetryAnalyzer's PASSED status discards. Quarantine kicks in once the pattern is clear.

Pattern 6

ITestContext attribute leakage

Symptom. A test that reads `context.getAttribute("sessionId")` passes alone and fails inside the suite with the wrong session, set by a sibling test.

Root cause. ITestContext attributes are scoped per <test> in testng.xml. A test that sets context.setAttribute("sessionId", id) stores that value for every other test in the same <test> block. Two tests that both rely on a "current session" attribute walk over each other every run.

@Test
public void logsIn(ITestContext context) {
    String sessionId = authClient.login("user").getSessionId();
    context.setAttribute("sessionId", sessionId);
}

@Test
public void buysAThing(ITestContext context) {
    String sessionId = (String) context.getAttribute("sessionId");
    // sometimes the wrong session if a sibling test logged in concurrently
    checkoutClient.buy("widget", sessionId);
}

Fix. Pass dependencies through method parameters or instance fields scoped to the test instance. Reserve ITestContext attributes for read-only metadata that listeners produce.

public class CheckoutTest {
    private String sessionId;

    @BeforeMethod
    public void login() {
        sessionId = authClient.login("user").getSessionId();
    }

    @Test
    public void buysAThing() {
        checkoutClient.buy("widget", sessionId);
    }
}

With Mergify. Test Insights catches the cross-test signature: a test only fails after a specific other test has run, with assertions about session, user, or token that the failing test never set. The dashboard surfaces the predecessor so the misused context is the obvious lead.

Pattern 7

Hard timeouts that flake on slow runners

Symptom. A test annotated `@Test(timeOut = 2000)` passes locally and fails on the slower CI runner with `Method ... didn't finish within the time-out 2000ms`.

Root cause. timeOut on a TestNG method is a wall-clock check. A test that takes 1900ms on a fast laptop runs over 2000ms on a throttled CI worker and fails for being too slow, not for being wrong. The threshold is brittle by design: it cannot tell scheduler jitter from a real hang.

@Test(timeOut = 2000)
public void connectsToRedis() {
    // 1.8s locally; 2.1s on the slowest CI runner
    redisClient.connect();
    assertEquals(redisClient.ping(), "PONG");
}

Fix. Either pick a wall-clock budget that absorbs CI variance (3x the local p95) or replace the timeout with an assertion that the operation made progress. For genuine hang detection, configure CI to kill the JVM at the suite level rather than per test.

@Test
public void connectsToRedis() {
    redisClient.connect();
    assertEquals(redisClient.ping(), "PONG");
}

// JVM-level kill: surefire <forkedProcessTimeoutInSeconds>120</...>

With Mergify. Test Insights links timeout failures to the CI runner type that produced them. When a test only times out on the slowest pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the wall-clock dependency is the obvious place to look.

Pattern 8

Surefire rerunFailingTestsCount hiding real bugs

Symptom. Your build is green. A user hits a bug that the failed-on-first-attempt test was supposed to catch.

Root cause. Surefire's rerunFailingTestsCount applies to TestNG just like JUnit: failing tests rerun up to N times and the build reports the last result. Combined with TestNG's own IRetryAnalyzer, you can have two layers of retry, each one hiding a different category of bug. The build is green; the production database is not.

<plugin>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <rerunFailingTestsCount>3</rerunFailingTestsCount>
    <suiteXmlFiles>
      <suiteXmlFile>testng.xml</suiteXmlFile>
    </suiteXmlFiles>
  </configuration>
</plugin>

Fix. Do not retry at the build level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information `rerunFailingTestsCount` and IRetryAnalyzer throw away. Quarantine kicks in once the pattern is clear.

Detection

Catch every TestNG flake in CI

TestNG emits JUnit-compatible XML reports out of the box (Surefire wraps them in its own format too). Point Mergify at the XML output of every CI run and Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci
# 1. TestNG + Surefire emit JUnit XML by default. Locate the file:
# Maven Surefire: target/surefire-reports/TEST-*.xml
# TestNG standalone: test-output/junitreports/TEST-*.xml

# 2. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload target/surefire-reports/TEST-*.xml

Prevention

Block flaky TestNG tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without IRetryAnalyzer or Surefire's `rerunFailingTestsCount` touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky TestNG tests per PR

Quarantine

Quarantine without skipping

Once a TestNG test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `@Test(enabled = false)` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

renders the invoice line Healthy login dispatches the right action Healthy checkout flow settles the pending promise Quarantined rate limiter rejects after 3 requests Healthy

Want to see which TestNG tests in your repo are already flaky?

Works with TestNG's built-in XML reports or Surefire's JUnit output. No extra plugins required. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my TestNG tests flaky in CI but pass locally?
Your laptop and your CI runner differ in CPU count, parallel thread count, and the order TestNG happens to schedule independent methods. Tests that share an instance under `parallel="methods"`, mutate a `@DataProvider` row, or read an `ITestContext` attribute set by a sibling surface those issues only under CI's tighter resource budget. Reproduce locally with `-Dparallel=methods -DthreadCount=8` to expose the failure, then fix the underlying coupling before pushing.
How do I detect flaky TestNG tests?
TestNG alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does IRetryAnalyzer or Surefire's `rerunFailingTestsCount` fix flaky tests?
No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use IRetryAnalyzer or Surefire's `rerunFailingTestsCount` as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in TestNG?
Eight patterns cover most of what we see: dependsOnGroups graph collapses on a single failure, parallel=methods racing on @BeforeMethod state, @DataProvider returning shared mutable state, @Listeners with global side effects, IRetryAnalyzer hiding real bugs, ITestContext attribute leakage, hard timeouts that flake on slow runners, and Surefire `rerunFailingTestsCount` hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky TestNG test without deleting it?
Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `@Test(enabled = false)`, no commented-out tests, no orphaned files.
What is the difference between TestNG @BeforeMethod and @BeforeClass under parallel mode?
Under `parallel="methods"`, TestNG reuses one instance of the test class across threads. `@BeforeMethod` fires before each method but mutates the same instance fields, so concurrent methods race. `@BeforeClass` fires once per class and is similarly shared. For per-method isolation, switch to `parallel="classes"` (one instance per thread per class) or use thread-local state for fields that need to survive setup.

Ship your TestNG suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.