Skip to content

Flaky tests in Pest.
Named, fixed, and quarantined.

Flaky Pest suites are not random. They follow patterns: higher-order chains sharing state, beforeEach scope confusion, parallel paratest workers fighting over the database, snapshot races, Carbon clocks left frozen. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published

mergify[bot] commented · 2 minutes ago Flaky test detected checkout flow › settles the pending promise src/checkout.test.ts:42 Last 3 runs on this commit: ✕ Failed ✓ Passed ✓ Passed Confidence on main: 98% 71% over the last 7 days Auto-quarantined by Test Insights This test no longer blocks your merge. Quarantine lifts when stable.
Example PR comment from the Mergify bot detecting a flaky Pest test and quarantining it automatically.

Why Pest is uniquely flaky

Pest is the modern PHP testing framework built on top of PHPUnit. It keeps PHPUnit's runner and trades the class-based syntax for an expressive function-based DSL: test('does the thing', fn () => ...), higher-order chains, datasets, and architectural assertions. The DSL makes specs read like the behaviour they test. The catch is that everything still runs on PHPUnit underneath, so every PHPUnit flake category applies, plus a few that come from Pest's syntax sugar.

Two facets are Pest-specific. Higher-order tests chain expectations on the test instance, which makes it easy to accidentally share state across the chain. Datasets can be defined as closures or arrays, and a closure that captures mutable state outside its scope leaks across every test that uses it. Pest's parallel runner inherits paratest's database-collision problem when used with Laravel's RefreshDatabase.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of Pest suites: higher-order chains that share state across iterations, beforeEach mutating global state without afterEach cleanup, parallel paratest workers fighting over the Laravel database, dataset() closures capturing mutable outer state, snapshot-driven tests racing in parallel, Carbon test-time leakage between specs, browser tests via Pest Plugin Browser racing the page, and Pest's retry expression hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

Higher-order chains that share state across iterations

Symptom. A higher-order Pest test that expects multiple values on the same subject passes locally and fails in CI when the subject's state changed between expectations.

Root cause. Higher-order tests chain method calls on the test's expected subject. Each link in the chain operates on the same value, so a chain that includes a method call with side effects (a fluent builder, a state machine transition) carries those side effects into the next assertion. Locally the chain runs fast enough that intermediate state matches; on CI a slower scheduler exposes the order dependence.

it('renames and saves the user')
    ->expect(fn () => User::create(['name' => 'Rémy']))
    ->name->toBe('Rémy')
    ->setName('Rémy Updated') // mutates the expected value
    ->name->toBe('Rémy Updated')
    ->save() // calls a slow DB write
    ->name->toBe('Rémy Updated'); // sometimes sees a stale cached value

Fix. Use higher-order chains for read-only assertions on a single value. When the test mutates the subject, switch to an explicit closure body so each step's side effects are visible and the order is unambiguous.

it('renames and saves the user', function () {
    $user = User::create(['name' => 'Rémy']);
    expect($user->name)->toBe('Rémy');

    $user->setName('Rémy Updated');
    expect($user->name)->toBe('Rémy Updated');

    $user->save();
    expect($user->fresh()->name)->toBe('Rémy Updated');
});

With Mergify. Test Insights groups higher-order test failures by the assertion position in the chain. When the same chain consistently fails at step 3 of 5, the dashboard surfaces the cross-step state dependency so the side-effect-free rewrite is the obvious fix.

Pattern 2

beforeEach mutating global state without afterEach cleanup

Symptom. A test inside a `describe` block changes a Laravel facade fake or a static class property, the next test in a sibling block fails on an assertion about a value the failing test never set.

Root cause. Pest's beforeEach always fires before every test in scope, but it does not auto-reset what it changed. A hook inside a describe that mutates a static property, registers a permanent event listener, or fakes a facade without a paired afterEach leaks that change into every later test in the file. The file-level beforeEach does not undo it.

// tests/Feature/InvoiceTest.php
describe('with frozen audit log', function () {
    beforeEach(function () {
        AuditLog::$disabled = true; // static property mutation
    });

    test('processes order without audit', function () {
        expect(processOrder())->toBeTrue();
    });
});

// later in the same file, no longer inside the describe
test('writes an audit entry for refunds', function () {
    refund();
    // expects AuditLog::$disabled === false (default)
    // actual: the previous describe set it true and never reset it
    expect(AuditLog::entries())->toHaveCount(1);
});

Fix. Pair every beforeEach that mutates global state with an afterEach in the same scope so the change is undone before the next test. For Laravel facades, prefer the test traits Laravel ships (RefreshDatabase, WithFaker) that handle teardown for you.

describe('with frozen audit log', function () {
    beforeEach(fn () => AuditLog::$disabled = true);
    afterEach(fn () => AuditLog::$disabled = false);

    test('processes order without audit', function () {
        expect(processOrder())->toBeTrue();
    });
});

With Mergify. Test Insights groups failures by the describe block that ran before them. When several seemingly unrelated tests start failing only after a specific describe runs, the dashboard surfaces the upstream culprit so the missing afterEach is the obvious lead.

Pattern 3

Parallel paratest workers fighting over the Laravel database

Symptom. A Pest suite that runs green sequentially fails under `pest --parallel` with assertions about rows another worker created or `SQLSTATE: database is locked`.

Root cause. Pest's parallel runner is paratest under the hood. RefreshDatabase wraps each test in a transaction, but every parallel worker hits the same database file or schema. SQLite locks immediately; Postgres/MySQL race on autoincrement IDs. Pest does not auto-recreate per-worker databases the way php artisan test --parallel does unless the runner is configured to.

# CI script
vendor/bin/pest --parallel --processes=4
# Worker 1 starts a transaction, Worker 2 hits SQLITE_BUSY, suite fails.

# .env.testing
DB_CONNECTION=sqlite
DB_DATABASE=database/testing.sqlite

Fix. Use Laravel's php artisan test --parallel wrapper around Pest, which handles per-worker databases automatically. For raw paratest invocations, set the database name to a per-token placeholder.

# CI script
php artisan test --parallel --processes=4

# Or with raw paratest:
vendor/bin/pest --parallel --recreate-databases

# .env.testing for Postgres
DB_CONNECTION=pgsql
DB_DATABASE=test_${TEST_TOKEN}

With Mergify. Test Insights tags failures that only appear under parallel runs as parallelism-sensitive. The dashboard surfaces the parallel-only signature so the worker-database collision is the obvious root cause.

Pattern 4

dataset() closures capturing mutable outer state

Symptom. A dataset-driven test passes for some inputs and fails for others, and re-running with the failing input alone passes.

Root cause. Pest datasets can be defined as closures that build values per iteration. A closure that captures a mutable object from the surrounding scope returns the same instance on every call, so each iteration sees the previous one's mutations. use ($shared) in a dataset closure does not give the iteration its own copy.

$shared = User::factory()->make(['name' => 'Rémy']);

dataset('plans', function () use ($shared) {
    // closure captures $shared by reference; every yield returns the SAME instance
    yield 'free' => ['free', $shared];
    yield 'pro'  => ['pro',  $shared];
});

it('renames the user', function (string $plan, User $u) {
    $u->setName($u->getName() . '-' . $plan);
    expect($u->name)->toEndWith($plan);
    // row 1 mutates $shared; row 2 sees 'Rémy-free' and renames to 'Rémy-free-pro'
})->with('plans');

Fix. Build fresh values inside the dataset closure on every yield so each iteration gets its own. For expensive setup, factor it into a factory function the dataset calls per row.

dataset('plans', function () {
    yield 'free' => ['free', User::factory()->make(['name' => 'Rémy'])];
    yield 'pro'  => ['pro',  User::factory()->make(['name' => 'Rémy'])];
});

With Mergify. Test Insights groups failures by test name and dataset row. When iteration N of a dataset-driven test fails consistently and N-1 passed, the dashboard surfaces the iteration-order signature so the shared-closure mistake is easy to find.

Pattern 5

Snapshot-driven tests racing in parallel

Symptom. A test using `toMatchSnapshot()` writes a snapshot file in CI and the next run fails with a diff that did not exist between commits.

Root cause. Snapshot files in spatie/pest-plugin-snapshots live on disk under tests/__snapshots__. Under --parallel, two workers can write to the same snapshot file simultaneously when two tests share the same descriptor (a parameterised test, two tests with the same name in different describes). The result is one snapshot file with content from whichever write landed last.

it('renders invoice in USD', function () {
    expect(format(1099, 'USD'))->toMatchSnapshot();
});

it('renders invoice in EUR', function () {
    expect(format(1099, 'EUR'))->toMatchSnapshot();
});

# Both tests can run on different workers and write to:
# tests/__snapshots__/InvoiceTest__renders_invoice__1.snap
# (the snapshot key collides when the test names share a prefix)

Fix. Keep snapshot tests in the same file and either run them serially (group them under a single describe with explicit naming) or split snapshot tests into a separate suite that does not run with --parallel.

// pest.php
uses(MatchesSnapshots::class)->in('Snapshots');

// CI:
vendor/bin/pest --testsuite=Unit --parallel
vendor/bin/pest --testsuite=Snapshots // sequential

With Mergify. Test Insights detects the snapshot-churn pattern: the same test file produces a different .snap diff on consecutive runs of the same SHA. The dashboard surfaces the file as snapshot-unstable so the parallel-write race is the obvious lead.

Pattern 6

Carbon test-time leakage between specs

Symptom. A test that calls `Carbon::setTestNow('2026-01-01')` passes, and a later spec that touches `Carbon::now()` asserts on a date months in the past.

Root cause. Carbon's setTestNow mutates a static. Pest does not auto-reset it between tests. A spec that froze the clock without resetting in afterEach leaves the clock frozen for every test that runs after on the same worker. The leak is invisible until a later test reads Carbon::now() and fails.

it('expires the invitation', function () {
    Carbon::setTestNow('2026-01-01');
    $invite = Invitation::create();
    Carbon::setTestNow('2026-01-09');
    expect($invite->isExpired())->toBeTrue();
    // missing Carbon::setTestNow();
});

it('mints a session token valid for an hour', function () {
    $token = SessionToken::for($this->user);
    // Carbon::now() is still 2026-01-09; assertion fails
    expect($token->expiresAt->timestamp)
        ->toBeWithin(60, Carbon::now()->addHour()->timestamp);
});

Fix. Reset Carbon's test clock in a global afterEach in tests/Pest.php. Or use the closure form of Carbon::withTestNow so the clock auto-restores at the end of the block.

// tests/Pest.php
afterEach(function () {
    Carbon::setTestNow(); // reset to real clock
});

// in a test
it('expires the invitation', function () {
    Carbon::withTestNow('2026-01-01', function () {
        $invite = Invitation::create();
        Carbon::setTestNow('2026-01-09');
        expect($invite->isExpired())->toBeTrue();
    });
});

With Mergify. Test Insights catches the cross-test signature: a test only fails after a known time-mutating test, and only when its assertions touch the clock. The dashboard surfaces the ordering so the missed Carbon::setTestNow() is easy to locate.

Pattern 7

Browser tests via Pest Plugin Browser racing the page

Symptom. A `pestphp/pest-plugin-browser` (Playwright-driven) test passes locally and fails in CI with `element not found` on a button the screenshot clearly shows.

Root cause. pest-plugin-browser wraps Playwright and inherits its auto-wait semantics. A test that sleeps for a hardcoded duration before clicking races the page exactly the way bare Playwright does. Locally the wait covers the render; on the slower CI runner the click fires before the element is interactable.

it('opens the modal', function () {
    visit('/dashboard');
    click('button.open-modal');
    sleep(1); // hope the modal is open
    type('input[name=email]', 'user@example.com');
    // CI: input not found, the modal animation is still running
});

Fix. Wait for the actual signal: an assertion that the modal is visible. The Pest browser plugin's assertVisible retries with the configured timeout, so the test is fast when fast and patient when slow.

it('opens the modal', function () {
    visit('/dashboard');
    click('button.open-modal');
    assertVisible('[role=dialog]');
    type('[role=dialog] input[name=email]', 'user@example.com');
});

With Mergify. Test Insights links browser-test failures to their CI runner type. When a Pest browser test only fails on the slower runner pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the hardcoded sleep is the obvious place to look.

Pattern 8

Pest's retry expression hiding real bugs

Symptom. Your Pest suite is green. A user reports a bug that should have been caught by the test that ran three times yesterday before passing.

Root cause. Pest supports retries via --retry on the CLI or per-test ->retry(). A real race that loses on attempt 1 and wins on attempt 2 gets reported as PASSED. The bug is still there. The pipeline has decided not to look at it.

// CI script (please don't)
vendor/bin/pest --retry

// or per test (please don't)
it('charges the card', function () {
    // intermittent timing bug
})->retry(3);

Fix. Do not retry at the framework level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information `--retry` and `->retry()` throw away. Quarantine kicks in once the pattern is clear.

Detection

Catch every Pest flake in CI

Pest emits JUnit-compatible XML reports with the `--log-junit` flag (PHPUnit's option, inherited). Point Mergify at the XML output of every CI run and Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci
# 1. Emit JUnit XML on every CI run
vendor/bin/pest --log-junit junit.xml

# Or via Laravel's wrapper:
php artisan test --log-junit=junit.xml

# 2. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload junit.xml

Prevention

Block flaky Pest tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without Pest's `--retry` flag or `->retry()` touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky Pest tests per PR

Quarantine

Quarantine without skipping

Once a Pest test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `$this->markTestSkipped()` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

renders the invoice line Healthy login dispatches the right action Healthy checkout flow settles the pending promise Quarantined rate limiter rejects after 3 requests Healthy

Want to see which Pest tests in your repo are already flaky?

Works with Pest's built-in `--log-junit` output. No extra plugins required. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my Pest tests flaky in CI but pass locally?
Your laptop and your CI runner differ in CPU count, parallel paratest worker count, and whether the worker shares a database with siblings. Pest tests that share state across higher-order chains, mutate Carbon's clock without resetting, or depend on hardcoded sleeps in browser specs surface those issues only under CI's tighter resource budget. Reproduce locally with `php artisan test --parallel --processes=4` to push the failure into the open before pushing.
How do I detect flaky Pest tests?
Pest alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does Pest's `--retry` flag or `->retry()` fix flaky tests?
No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use Pest's `--retry` flag or `->retry()` as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in Pest?
Eight patterns cover most of what we see: higher-order chains that share state across iterations, beforeEach mutating global state without afterEach cleanup, parallel paratest workers fighting over the Laravel database, dataset() closures capturing mutable outer state, snapshot-driven tests racing in parallel, Carbon test-time leakage between specs, browser tests via Pest Plugin Browser racing the page, and Pest's retry expression hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky Pest test without deleting it?
Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `$this->markTestSkipped()`, no commented-out tests, no orphaned files.
What's the difference between Pest higher-order tests and explicit closures?
Higher-order tests chain expectations on the test's expected subject in a fluent style: `expect(fn () => User::find(1))->name->toBe('Rémy')`. Explicit closures pass an anonymous function as the test body and set up state inline. Higher-order is concise for read-only assertions on a single value; explicit closures are unambiguous when the test mutates state across multiple steps. Mixing the two in one test is the trap, because a fluent chain with side effects is hard to reason about under parallelism.

Ship your Pest suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.