Why are my PHPUnit tests flaky in CI but pass locally?

Your laptop and your CI runner differ in CPU count, parallel paratest worker count, and whether the worker shares a database with siblings. Tests that mutate static properties, leave Mockery expectations unclosed, or freeze Carbon's clock without resetting surface those issues only under CI's tighter resource budget. Reproduce locally with `vendor/bin/paratest -p 4 --recreate-databases` to push the failure into the open before pushing.

How do I detect flaky PHPUnit tests?

PHPUnit alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.

Does PHPUnit retry plugins or paratest's `--rerun-failures` fix flaky tests?

No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use PHPUnit retry plugins or paratest's `--rerun-failures` as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.

How do I quarantine a flaky PHPUnit test without deleting it?

Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `$this->markTestSkipped()`, no commented-out tests, no orphaned files.

How do I run PHPUnit + Laravel tests in parallel without database conflicts?

Use `paratest --recreate-databases` so each worker gets its own database (or schema) and no two workers race on the same connection. Inside Laravel, `RefreshDatabase` works correctly per worker because each worker holds its own transaction. For SQLite, switch the test database to a named per-worker file via the `DB_DATABASE_TEST` placeholder; for Postgres/MySQL, the `paratest` runner appends the worker token to the database name.

Flaky tests in PHPUnit.
Named, fixed, and quarantined.

Flaky PHPUnit suites are not random. They follow patterns: static property leakage, paratest workers fighting over the database, dataProvider rows mutating shared state, Mockery expectations unclosed, Carbon clocks left frozen. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published April 2026

Why PHPUnit is uniquely flaky

PHPUnit runs every test method on a fresh instance of the test class by default, which gives you per-method isolation for free. The catch is that PHP itself does not isolate between tests in the same process: globals, static properties, the autoloader's cache, and any singleton stay alive for the life of the worker. PHPUnit gives you escape hatches (processIsolation, @runInSeparateProcess) but they are off by default because they are slow.

Layer on the Laravel ecosystem, where most PHPUnit suites live in 2026. RefreshDatabase wraps each test in a transaction that rolls back, but parallel paratest workers fight over the same database unless you give each worker its own. Mockery expectations need explicit closing. Carbon's setTestNow mutates a static and stays set unless you reset it.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of PHPUnit suites: static property leakage between tests, RefreshDatabase against parallel paratest workers, dataProvider returning shared mutable state, Mockery expectations leaking across tests, Carbon::now() without setTestNow, Guzzle MockHandler queues that outlive their test, processIsolation traps for global state, and PHPUnit retry attribute hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

Static property leakage between tests

Symptom. A test that mutates a `static` property on a service class passes alone and breaks an unrelated test class with a value the second test never set.

Root cause. PHPUnit creates a fresh instance of the test class per method, but PHP static properties live for the life of the worker. A test that calls FeatureFlag::set('NEW_BILLING', true) leaves the flag set for every later test in the run. The autoloader's cache, registry singletons, and any class with a static collection are vulnerable to the same leak.

class FeatureFlag {
    public static array $flags = [];
    public static function set(string $k, bool $v): void { self::$flags[$k] = $v; }
}

class FeatureFlagTest extends TestCase {
    public function testNewBillingPath(): void {
        FeatureFlag::set('NEW_BILLING', true);
        $this->assertTrue(FeatureFlag::$flags['NEW_BILLING']);
    }
}

class LegacyBillingTest extends TestCase {
    public function testCharge(): void {
        // expects FeatureFlag::$flags empty; actual: ['NEW_BILLING' => true] leaked
        $this->assertSame(99, Pricing::for('pro'));
    }
}

Fix. Reset static state in tearDown(), or use the @backupStaticAttributes annotation (slow). The cleanest fix is to avoid static mutable state in tested classes; pass configuration through a service container instead.

class FeatureFlagTest extends TestCase {
    protected function tearDown(): void {
        FeatureFlag::$flags = [];
        parent::tearDown();
    }

    public function testNewBillingPath(): void {
        FeatureFlag::set('NEW_BILLING', true);
        $this->assertTrue(FeatureFlag::$flags['NEW_BILLING']);
    }
}

With Mergify. Test Insights catches the cross-test signature: a test only fails after a specific other test has run, with assertions about a value the failing test never set. The dashboard surfaces the predecessor so the leaking static is the obvious lead.

Pattern 2

RefreshDatabase against parallel paratest workers

Symptom. A Laravel test suite that runs green sequentially fails under `paratest` with `SQLSTATE: database is locked` or assertions about rows that another worker created.

Root cause. RefreshDatabase wraps each test in a transaction. With one worker that works fine. paratest spawns N workers, all hitting the same database file or schema. SQLite locks immediately; Postgres/MySQL race on autoincrement IDs. Laravel ships --recreate-databases to fix this but you have to enable it.

// phpunit.xml
<phpunit>
    <php>
        <env name="DB_CONNECTION" value="sqlite"/>
        <env name="DB_DATABASE" value="database/testing.sqlite"/>
    </php>
</phpunit>

# CI script
vendor/bin/paratest -p 4
# Worker 1 transaction holds, Worker 2 hits SQLITE_BUSY, suite fails.

Fix. Run paratest --recreate-databases so each worker gets a per-process database (or schema, depending on the driver). For Postgres/MySQL, set DB_DATABASE_TEST with a placeholder Laravel substitutes per worker.

# CI script
vendor/bin/paratest -p 4 --recreate-databases --runner=WrapperRunner

# .env.testing
DB_CONNECTION=pgsql
DB_DATABASE=test_${TEST_TOKEN}

With Mergify. Test Insights tags failures that only appear under parallel runs as parallelism-sensitive. The dashboard surfaces the parallel-only signature so the worker-database collision is the obvious root cause.

Pattern 3

dataProvider returning shared mutable state

Symptom. A data-driven test passes for some inputs and fails for others, and re-running with the failing input alone passes.

Root cause. @dataProvider returns an iterable of arguments that PHPUnit calls the test with once per row. If the rows hold mutable objects, every iteration mutates the same instance. Row 0 changes a shared User, row 1 sees the post-row-0 state.

class PricingTest extends TestCase {
    private static User $sharedUser;

    public static function setUpBeforeClass(): void {
        self::$sharedUser = new User(name: 'Rémy');
    }

    public static function plans(): array {
        return [['free', self::$sharedUser], ['pro', self::$sharedUser]];
    }

    /** @dataProvider plans */
    public function testRenamesUser(string $plan, User $u): void {
        $u->setName($u->getName() . '-' . $plan);
        $this->assertStringEndsWith($plan, $u->getName());
        // row 1: 'Rémy-free' renamed to 'Rémy-free-pro'; assertion still passes
        // row 2 might run first under parallel: name is wrong by the time row 1 reads
    }
}

Fix. Build fresh instances inside the provider. PHPUnit calls the provider once and iterates the result, so building inside the iteration body keeps each row independent.

public static function plans(): array {
    return [
        ['free', new User(name: 'Rémy')],
        ['pro', new User(name: 'Rémy')],
    ];
}

With Mergify. Test Insights groups failures by test method and parameter index. When iteration N of a data-provider test fails consistently and N-1 passed, the dashboard surfaces the iteration-order signature so the shared-state mistake is easy to find.

Pattern 4

Mockery expectations leaking across tests

Symptom. A test sets a Mockery expectation, the assertion passes, and a sibling test fails on a method that was never declared in its own scope.

Root cause. Mockery stores expectations on a global container. Mockery::close() in tearDown verifies and clears them, but tests that skip tearDown (an exception in setUp, an early return) leave expectations behind. The next test sees a mock with stubbed behavior it never declared.

class BillingTest extends TestCase {
    public function testChargesUser(): void {
        $stripe = Mockery::mock(StripeClient::class);
        $stripe->shouldReceive('charge')->andReturn(true);
        // exits without Mockery::close(); expectation persists
        $this->assertTrue(Billing::for($stripe)->chargeUser(42));
    }

    public function testFallback(): void {
        $client = new StripeClient();
        // sees the global Mockery container with the leftover expectation
        $this->assertTrue(Billing::for($client)->fallbackFor(42));
    }
}

Fix. Use Mockery\Adapter\Phpunit\MockeryPHPUnitIntegration (a trait) so Mockery::close() runs automatically after every test, even on failure. For Laravel, the trait is included in the base TestCase.

use Mockery\Adapter\Phpunit\MockeryPHPUnitIntegration;

class BillingTest extends TestCase {
    use MockeryPHPUnitIntegration; // close() fires in tearDown automatically

    public function testChargesUser(): void {
        $stripe = Mockery::mock(StripeClient::class);
        $stripe->shouldReceive('charge')->andReturn(true);
        $this->assertTrue(Billing::for($stripe)->chargeUser(42));
    }
}

With Mergify. Test Insights catches the cross-test signature: a test only fails when run after a specific other test, and only when the assertion involves a Mockery-managed dependency. The dashboard tags the dependency so you know it's mock leakage, not a real regression.

Pattern 5

Carbon::now() without setTestNow

Symptom. A test that calls `Carbon::setTestNow('2026-01-01')` passes, and the next test that touches `Carbon::now()` fails with a date months in the past or future.

Root cause. Carbon::setTestNow mutates a static on the Carbon class. Without a paired Carbon::setTestNow(null) in tearDown, the frozen time persists for every test that touches Carbon::now() on the same worker. Tests that were green for months go red after the first test that froze time without restoring it.

class InvitationTest extends TestCase {
    public function testExpiresAfterSevenDays(): void {
        Carbon::setTestNow('2026-01-01');
        $invite = Invitation::create();
        Carbon::setTestNow(Carbon::now()->addDays(8));
        $this->assertTrue($invite->isExpired());
        // missing Carbon::setTestNow(null);
    }
}

class SessionTest extends TestCase {
    public function testTokenExpiresInOneHour(): void {
        $token = SessionToken::for($this->user);
        // Carbon::now() is still January 9 2026; assertion fails
        $this->assertEqualsWithDelta(
            Carbon::now()->addHour()->timestamp,
            $token->expiresAt->timestamp,
            60
        );
    }
}

Fix. Reset Carbon's test clock in tearDown() on a base test case so every test starts with the real clock. For one-off freezes, prefer the closure form that auto-restores.

abstract class TestCase extends BaseTestCase {
    protected function tearDown(): void {
        Carbon::setTestNow(); // reset to real clock
        parent::tearDown();
    }
}

// or use the closure form:
Carbon::setTestNow('2026-01-01', function () {
    $invite = Invitation::create();
    Carbon::setTestNow(Carbon::now()->addDays(8));
    $this->assertTrue($invite->isExpired());
});

With Mergify. Test Insights shows the cross-test time signature: a test only fails after a known time-mutating test, and only when its assertions touch the clock. The dashboard surfaces the ordering so the missed setTestNow(null) is easy to locate.

Pattern 6

Guzzle MockHandler queues that outlive their test

Symptom. A test that pushes responses onto a Guzzle MockHandler passes, and a later test fails with a response from the previous test's queue.

Root cause. Guzzle's MockHandler is a queue. A test that calls $handler->append(new Response(200, [], '{}')) three times and only consumes two leaves one queued. If the handler is shared via a service container, the next test that resolves the same client pulls the leftover response instead of making a real call.

class HttpClientTest extends TestCase {
    public function testFetchesUser(): void {
        $handler = new MockHandler([new Response(200, [], json_encode(['name' => 'Rémy']))]);
        $client = new Client(['handler' => HandlerStack::create($handler)]);
        // app()->instance(Client::class, $client);  // bound for the whole test run
        $this->assertSame('Rémy', User::fetch()->name);
    }

    public function testHandles404(): void {
        // expected: a fresh client; actual: same MockHandler with stale queue
        $this->expectException(NotFoundHttpException::class);
        User::fetch();
    }
}

Fix. Build a fresh handler per test inside setUp and bind it explicitly. Avoid sharing the handler instance across tests through the service container.

class HttpClientTest extends TestCase {
    private MockHandler $handler;

    protected function setUp(): void {
        parent::setUp();
        $this->handler = new MockHandler();
        app()->instance(Client::class, new Client([
            'handler' => HandlerStack::create($this->handler),
        ]));
    }

    public function testFetchesUser(): void {
        $this->handler->append(new Response(200, [], json_encode(['name' => 'Rémy'])));
        $this->assertSame('Rémy', User::fetch()->name);
    }
}

With Mergify. Test Insights groups failures whose only signature is unexpected HTTP responses or network exceptions into a per-suite bucket. The dashboard surfaces the test that first poisoned the shared handler so the fix lands at the source.

Pattern 7

processIsolation traps for global state

Symptom. A test class with `@runInSeparateProcess` passes locally and fails in CI with a `serialize()` error or a missing service-container binding.

Root cause. @runInSeparateProcess forks a fresh PHP process per test for full isolation. PHPUnit serializes test state across the process boundary, but anonymous classes, closures with non-serializable bindings, and database connections cannot cross. A test that captures any of those in setUp fails on the serialize step in subtle ways.

/** @runInSeparateProcess */
class ConfigTest extends TestCase {
    private \Closure $factory;

    protected function setUp(): void {
        // captures the test instance (\$this) which holds a PDO connection
        $this->factory = fn (string $k) => Config::get($k);
    }

    public function testReadsConfig(): void {
        $this->assertSame('value', ($this->factory)('key'));
        // SerializationException: closure cannot be serialized for the child process
    }
}

Fix. Use @runInSeparateProcess only when you need true global-state isolation (testing autoloader behavior, INI changes). For cross-process tests, build dependencies in the test method body, not in setUp, and avoid storing closures or database handles on $this.

/** @runInSeparateProcess */
class ConfigTest extends TestCase {
    public function testReadsConfig(): void {
        $factory = fn (string $k) => Config::get($k);
        $this->assertSame('value', $factory('key'));
    }
}

With Mergify. Test Insights groups serialize-related failures distinctly from logic failures. The dashboard surfaces tests that fail with `SerializationException` so the @runInSeparateProcess scope decision is the obvious place to look.

Pattern 8

PHPUnit retry attribute hiding real bugs

Symptom. Your CI is green. A user reports a bug your tests should have caught, and the test report shows the failing test passed on attempt 3.

Root cause. Plugins like phpunit-retry add a @retryAttempts(3) annotation that reruns failing tests up to N times. A real race that loses on attempt 1 and wins on attempt 2 gets reported as PASSED. The bug is still there. The pipeline has decided not to look at it.

use jamesheinrich\phpunit_retry\RetryTrait;

class CheckoutTest extends TestCase {
    use RetryTrait;

    /** @retryAttempts 3 */
    public function testChargesCard(): void {
        // intermittent timing bug; passes 2 of 3 times
    }
}

Fix. Do not retry at the framework level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information `@retryAttempts` throws away. Quarantine kicks in once the pattern is clear.

Detection

Catch every PHPUnit flake in CI

PHPUnit emits JUnit-compatible XML reports with the `--log-junit` flag. Point Mergify at the XML output of every CI run and Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci

# 1. Emit JUnit XML on every CI run
vendor/bin/phpunit --log-junit junit.xml

# Or for paratest:
vendor/bin/paratest --log-junit junit.xml

# 2. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload junit.xml

Prevention

Block flaky PHPUnit tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without PHPUnit retry plugins or paratest's `--rerun-failures` touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky PHPUnit tests per PR

Quarantine

Quarantine without skipping

Once a PHPUnit test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `$this->markTestSkipped()` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

Want to see which PHPUnit tests in your repo are already flaky?

Works with PHPUnit's built-in `--log-junit` output or paratest. No extra plugins required. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my PHPUnit tests flaky in CI but pass locally?: Your laptop and your CI runner differ in CPU count, parallel paratest worker count, and whether the worker shares a database with siblings. Tests that mutate static properties, leave Mockery expectations unclosed, or freeze Carbon's clock without resetting surface those issues only under CI's tighter resource budget. Reproduce locally with `vendor/bin/paratest -p 4 --recreate-databases` to push the failure into the open before pushing.
How do I detect flaky PHPUnit tests?: PHPUnit alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does PHPUnit retry plugins or paratest's `--rerun-failures` fix flaky tests?: No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use PHPUnit retry plugins or paratest's `--rerun-failures` as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in PHPUnit?: Eight patterns cover most of what we see: static property leakage between tests, RefreshDatabase against parallel paratest workers, dataProvider returning shared mutable state, Mockery expectations leaking across tests, Carbon::now() without setTestNow, Guzzle MockHandler queues that outlive their test, processIsolation traps for global state, and PHPUnit retry attribute hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky PHPUnit test without deleting it?: Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `$this->markTestSkipped()`, no commented-out tests, no orphaned files.
How do I run PHPUnit + Laravel tests in parallel without database conflicts?: Use `paratest --recreate-databases` so each worker gets its own database (or schema) and no two workers race on the same connection. Inside Laravel, `RefreshDatabase` works correctly per worker because each worker holds its own transaction. For SQLite, switch the test database to a named per-worker file via the `DB_DATABASE_TEST` placeholder; for Postgres/MySQL, the `paratest` runner appends the worker token to the database name.

Ship your PHPUnit suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.

Get started Read the docs

Flaky tests in PHPUnit. Named, fixed, and quarantined.