Why jest.retryTimes() is hiding bugs in your test suite

Adding retryTimes(3) feels like fixing flakes. It is not. It is throwing away the data you need to find the bug, then telling CI everything is fine.

Someone on your team has had a long week. The CI is failing on a flaky test, the merge queue has been stuck for an hour, the on-call engineer is in a meeting, and the PR has to ship before the demo. They open jest.config.js, add this:

jest.retryTimes(3);

CI goes green. The PR ships. Everyone gets back to work.

This is the moment your test suite started lying to you.

What `retryTimes` actually does

The Jest docs are honest about it: retryTimes(n) re-runs a failing test up to n times and reports the result of the last attempt. If the test passes on attempt 2 of 4, the suite is green. The first failure is discarded.

The implication: every test in your suite is now allowed to fail, as long as it fails fewer than n + 1 times in a row. A bug that has a 25% failure rate (one in four attempts) statistically passes 99.6% of the time with retryTimes(3). You will see it in production. You will not see it in CI.

This is not a hypothetical. We watch it happen in customer suites. The ones that adopted retryTimes as a permanent policy two years ago have CI green rates above 99% and a long tail of “weird user reports” that nobody can reproduce. The ones that did not adopt it have noisier CI and shorter incident lists.

The four categories of flake `retryTimes` confuses

Calling something “flaky” treats four very different problems as one:

A real bug with a low failure rate. A race condition, an off-by-one in a date library, a connection pool that occasionally returns a closed connection. The failure is real every time. The retry happens to land in the 75% of attempts where the race resolves favorably.
Test pollution. A previous test left timers, mocks, or module state behind. The first run of your test sees the polluted world; the retry runs after teardown finished and sees a clean one. The test you wrote is fine. The test that ran before it is the bug.
Environment instability. A network call to an external service times out. A CI runner with less headroom than your laptop loses a tight setTimeout race. Your test is correct, the environment is the problem.
A genuinely non-deterministic test. You wrote Math.random() and forgot. Or you used Date.now() for keys. Or you depend on Map iteration order with non-deterministic input. The test is broken even though the code is fine.

Only one of these (3) is the case where retrying is harmless. Two of the four (1 and 2) are bugs you want to find. One (4) is a broken test that needs rewriting. retryTimes treats them all the same: hide the failure, ship the PR.

What “flaky” should mean instead

A flaky test is a test you cannot trust as a binary signal until you have more data. Not “a test that occasionally fails”. The difference matters because it changes the response.

When the response is “treat each failure as a real bug, and only quarantine after evidence accumulates”, you find bugs early. When the response is “set a retry budget”, you ship them.

The infrastructure version of this is the difference between:

Quarantine: the test still runs, the result is recorded and surfaced, but a failure does not block the merge. Engineers see the noise and choose when to invest in fixing it.
Retry: the test still runs, the failure is discarded, and the merge proceeds as if everything is fine.

Quarantine keeps the signal. Retry destroys it.

The case people make for `retryTimes`

The honest argument for retryTimes is that engineering time is finite, and a known-flaky test that mostly passes is not worth blocking the queue over. That is a real trade-off.

The dishonest version is “we will fix it later”. retryTimes adds zero pressure to fix anything. The test is now “passing”, which means the ticket gets closed and the bug ages indefinitely. We have customers who turned on retryTimes in 2024 and have not removed a single retry rule since.

If you want the trade-off without losing the signal, use a flaky-test detection layer that:

Runs the test as written, no retries at the Jest level.
Records pass/fail per attempt across many runs.
Quarantines (does not retry) once a confidence threshold is breached.
Surfaces the quarantine in the dashboard, with a link to the PR or the commit that introduced the noise.
Lifts quarantine automatically when the test stabilizes.

Mergify Test Insights does this. So do a couple of other tools in the space (we have a comparison page for one of them). The shape is what matters: detection plus quarantine plus visibility, not retry.

When `retryTimes` is acceptable

There are exactly two cases where adding retryTimes to your config is the right call:

Active triage of a specific known-broken test. You know the test is broken. You have a fix in flight. You add retryTimes with an inline comment naming the ticket and a removal date. When the date passes and the retry is still there, your linter complains.
An external-service flake you cannot fix. A third-party API returns 502 once a day. You retry the network call (or better, you mock it) rather than letting the test fail. This is not really a retryTimes use case; you should be retrying inside the test, not retrying the test.

Anything beyond those two is hiding a bug. Probably more than one.

What to do this afternoon

If you have retryTimes somewhere in your config and you are not sure why, here is a five-minute exercise:

Remove it.
Run the suite three times (or however many runs retryTimes was masking).
Note every test that fails in any of the three runs.
For each one, decide: is this a real bug, test pollution, environment, or a broken test?

Most of what comes back as “flaky” is one of the first two. Those are the ones you want to fix. The third is rare in well-written suites; the fourth is common but obvious once you look at the test.

The retry was hiding the answer to a question you should have been asking.

More patterns where Jest lies to you

The retry-hides-bugs pattern is one of eight Jest flake patterns we see most often on Mergify Test Insights. The others are mostly about state crossing test boundaries: fake timers leaking into the next test, snapshots drifting on shared state, module mocks hoisting above imports, test.concurrent racing on counters.

Same theme: something wasn’t reset. The fix is usually small. The hard part is being willing to look at the failure instead of retrying past it.

Why jest.retryTimes() is hiding bugs in your test suite

What `retryTimes` actually does

The four categories of flake `retryTimes` confuses

What “flaky” should mean instead

The case people make for `retryTimes`

When `retryTimes` is acceptable

What to do this afternoon

More patterns where Jest lies to you

Tired of flaky tests blocking your pipeline?

Recommended posts

Playwright storageState is not just a setup file. It is a contract.

Playwright storageState is not just a setup file. It is a contract.

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for

Hypothesis is not flaky. Your code under test is, and Hypothesis is the messenger.

Hypothesis is not flaky. Your code under test is, and Hypothesis is the messenger.

Why jest.retryTimes() is hiding bugs in your test suite

What retryTimes actually does

The four categories of flake retryTimes confuses

What “flaky” should mean instead

The case people make for retryTimes

When retryTimes is acceptable

What to do this afternoon

More patterns where Jest lies to you

Tired of flaky tests blocking your pipeline?

Recommended posts

Playwright storageState is not just a setup file. It is a contract.

Playwright storageState is not just a setup file. It is a contract.

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for

Hypothesis is not flaky. Your code under test is, and Hypothesis is the messenger.

Hypothesis is not flaky. Your code under test is, and Hypothesis is the messenger.

What `retryTimes` actually does

The four categories of flake `retryTimes` confuses

The case people make for `retryTimes`

When `retryTimes` is acceptable