Test Quarantine
Quarantine is what you reach for when a test is flaky but you cannot prove the underlying code is wrong. The test keeps running and reporting; it just stops blocking merges while you figure out what is going on. Done right, it is the cheapest unblock you can ship.
In one paragraph
Quarantine removes a test from the required-check set without removing it from the suite. The test runs, the result is recorded, the failure no longer blocks a merge. The point is to decouple the unblock from the diagnostic. The team that waits to investigate before quarantining pays the cost twice, once in the flake and once in the PRs that pile up behind it.
What quarantine actually means
Three things can happen to a flaky test:
- Skip: the test stops running. No signal, no failure history, no way to see whether the underlying flake is resolving. This is the worst option, and yet it is what most teams reach for under deadline pressure.
- Delete: the assertion is gone. Faster CI, but you have lost the coverage. Only acceptable when the test was wrong, not just flaky.
- Quarantine: the test still runs and the result is still recorded, but the test no longer participates in the required-check set, so a failure does not block the merge. Coverage and signal both stay intact while other engineers are unblocked.
Quarantine is the only one of the three that preserves the information needed to decide what to do next. A test that goes from 5% flake to 50% flake while quarantined is data. A skipped test gives you nothing.
When to quarantine
The trigger is failure rate, not theory. A test that fails non-deterministically more than around 2% of the time and that has blocked someone's PR in the past week is a quarantine candidate. The exact threshold depends on the suite, but the principle is fixed: if the test is hurting other engineers more than it is catching real bugs, quarantine it.
The harder call is what not to quarantine. A test that fails because production behavior actually changed is not flaky; it is correct. The same test class can flip between the two over time, which is why detection (rather than reaction) is the part that matters. See the flaky tests guide for the patterns that distinguish noise from real signal, framework by framework.
The graveyard anti-pattern
Quarantine has one failure mode worth naming: it becomes a graveyard. Tests get quarantined, the ticket sits in the backlog, six months pass, the test is still there. The team forgets why. The coverage erodes silently.
The fix is process, not tooling. Every quarantined test gets:
- A ticket, with the failure pattern attached.
- An owner (the original test author by default).
- A maximum stay, usually two to four weeks.
Past the stay, the test is either fixed or formally retired with written reasoning. A dashboard of "tests quarantined for more than N days" surfaces the rot before it becomes the new normal.
FAQ
What is test quarantine?
Test quarantine is the practice of marking a flaky test so it still runs and still reports its result, but is removed from the set of required checks that gate a merge. The test is not deleted, not skipped, and not silently disabled. It is just stopped from being able to block other engineers' PRs while its flake is being investigated.
Quarantine vs skip vs delete: what is the difference?
Skipping marks a test as not-run, so it produces no signal and no failure history. Deleting removes the assertion entirely. Quarantine keeps the test running and reporting, only changing whether its result blocks the merge. Quarantine is the right tool when you suspect the test is flaky but cannot yet prove the underlying code is correct.
When should I quarantine a test?
When a test fails non-deterministically more than a small threshold (often >2% of runs) and blocks other people's PRs. Quarantine immediately, then investigate. The point of quarantine is to decouple the diagnostic from the unblock. A team that waits to investigate before quarantining ends up paying the cost twice: once in the flake, once in the broken PRs that pile up behind it.
What is the risk of quarantining a test?
Quarantine can mask a real bug. A test that flakes today might be detecting a genuine race condition that breaks in production tomorrow. The mitigation is a quarantine SLA: every quarantined test gets a ticket, an owner, and a maximum stay (usually two to four weeks). Past that, the test is either fixed or formally retired with reasoning.
How does Mergify Test Insights handle quarantine?
Test Insights detects flaky tests automatically (same SHA, different result), surfaces the failure pattern, and exposes a quarantine action that removes the test from the required-check set without touching the test code. The quarantined test keeps running and reporting, so the team can see whether the flake resolves or whether a fix is needed.
Can I quarantine without a tool?
Yes, but it gets expensive. The manual version is a tag (often `@flaky` or `@quarantine`) plus a CI filter that splits required from non-required runs. The hard part is detection: spotting which test is flaky in the first place, then keeping the quarantine list current as the suite changes. Tools earn their keep on detection more than on the quarantine action itself.
Should quarantined tests block deploys?
Usually no, since the reason to quarantine was that the test was unreliable. But the failure pattern still matters: a quarantined test that goes from 5% flake to 50% flake is signal that something changed. Most teams alert on the trend rather than on individual failures.
Detect, quarantine, and fix flaky tests in one place.
Mergify Test Insights finds flaky tests automatically, surfaces the failure pattern, and exposes one-click quarantine that keeps the test running without gating merges.