Why are my NUnit tests flaky in CI but pass locally?

Your laptop and your CI runner differ in CPU count, parallel scope settings, and whether a sibling test mutated a static field before yours ran. NUnit tests that share static state across fixtures, deadlock on .Result inside a sync context, or read DateTime.UtcNow surface those issues only under CI's tighter resource budget. Reproduce locally with the same NUnit parallelism settings your CI uses (`LevelOfParallelism` or `NumberOfTestWorkers`, configured via assembly attributes or `.runsettings`) to expose the failure, then fix the underlying coupling before pushing.

How do I detect flaky NUnit tests?

NUnit alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.

Does [Retry] fix flaky tests?

No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use [Retry] as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.

How do I quarantine a flaky NUnit test without deleting it?

Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `[Ignore]`, no commented-out tests, no orphaned files.

What's the difference between NUnit ParallelScope.Self and Children?

`Self` allows the annotated test or fixture to run in parallel with siblings at the same level. `Children` allows the annotated container's children to run in parallel with each other. They compose: a fixture marked `Self` whose tests are marked `Self` runs the fixture in parallel with siblings AND runs the tests in parallel within the fixture. For a fixture that holds shared state, mark it `[NonParallelizable]` to opt out entirely.

Flaky tests in NUnit.
Named, fixed, and quarantined.

Flaky NUnit suites are not random. They follow patterns: parallel scope confusion, static field leakage, SetUpFixture cleanup that never fires, .Result deadlocks in async tests, real-clock assertions. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published April 2026

Why NUnit is uniquely flaky

NUnit is the original .NET testing framework, predating MSTest and xUnit. It runs every test against a fresh instance of the test class by default, but the AppDomain (or AssemblyLoadContext on .NET Core+) lives for the entire run, which means every static field, singleton, and process-wide configuration survives across tests. NUnit's parallelism is opt-in via [Parallelizable], with three scopes (Self, Children, Fixtures) that compose subtly.

Two facets cause most flakes. The parallelism scope rules look obvious until two fixtures with different scopes interact: a fixture marked ParallelScope.Self can run alongside a fixture marked ParallelScope.None, and shared static state turns the parallelism into a race. async/await in test bodies is fine when the test is async, but mixing .Result or .Wait() on a Task that resumes on the captured sync context deadlocks under NUnit's runner.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of NUnit suites: Parallelizable scope confusion across fixtures, static field state surviving the test run, SetUpFixture lifecycle surprises, TestContext attribute leakage between tests, async tests that .Result-deadlock on the sync context, DateTime.UtcNow without an injected IClock, HttpClient instances shared without per-test handlers, and Retry attribute hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

Parallelizable scope confusion across fixtures

Symptom. Tests that pass in serial mode start failing intermittently after `[assembly: Parallelizable(ParallelScope.Children)]` is enabled, with assertions about state another fixture mutated.

Root cause. [Parallelizable] takes a scope (Self, Children, Fixtures, All, None). The scopes compose by aggregation: assembly-level Children + fixture-level All means tests in that fixture run alongside tests in every other fixture too. Anything those threads share (a static field, a singleton service, an environment variable) becomes a race.

[assembly: Parallelizable(ParallelScope.Fixtures)]

public class FeatureFlagTests {
    public static bool _enabled;

    [SetUp] public void Setup() { _enabled = true; }

    [Test] public void NewBillingPathIsActive() {
        Assert.That(_enabled, Is.True);
    }
}

public class LegacyBillingTests {
    [Test] public void DefaultsToLegacyPath() {
        // FeatureFlagTests races ahead and sets _enabled = true; assertion fails
        Assert.That(FeatureFlagTests._enabled, Is.False);
    }
}

Fix. Pick parallelism explicitly. Annotate fixtures that share static state with [NonParallelizable], or refactor to remove the static state. For per-thread state, use AsyncLocal<T> or ThreadLocal<T> rather than plain statics.

[NonParallelizable]
public class FeatureFlagTests {
    private bool _enabled;

    [SetUp] public void Setup() { _enabled = true; }

    [Test] public void NewBillingPathIsActive() {
        Assert.That(_enabled, Is.True);
    }
}

With Mergify. Test Insights tags failures that only appear under parallel runs as parallelism-sensitive. The dashboard surfaces the parallel-only signature so the shared-static root cause is the obvious lead.

Pattern 2

Static field state surviving the test run

Symptom. A test that mutates a static config dictionary passes alone and breaks an unrelated fixture with a value the second test never set.

Root cause. Even without parallelism, static fields live for the AppDomain. NUnit creates a fresh instance of each test fixture per test by default, but the static state survives. A test that sets FeatureFlags.Enabled["x"] = true leaves it set for every test that runs after, regardless of fixture or class.

public static class FeatureFlags {
    public static Dictionary<string, bool> Enabled { get; } = new();
}

public class FeatureFlagTests {
    [Test] public void NewBilling() {
        FeatureFlags.Enabled["NEW_BILLING"] = true;
        Assert.That(FeatureFlags.Enabled["NEW_BILLING"], Is.True);
    }
}

public class PricingTests {
    [Test] public void DefaultPricing() {
        // expects Enabled empty; sees ["NEW_BILLING" => true] from sibling
        Assert.That(Pricing.For("pro"), Is.EqualTo(99));
    }
}

Fix. Reset static state in [TearDown] on a base test class so every test starts clean. Better: refactor the static into an injected service so each test owns its own instance.

public abstract class TestBase {
    [TearDown] public void ResetGlobalState() {
        FeatureFlags.Enabled.Clear();
    }
}

public class FeatureFlagTests : TestBase {
    [Test] public void NewBilling() { ... }
}

With Mergify. Test Insights catches the cross-test signature: a test only fails after a specific other test has run, with assertions about a value the failing test never set. The dashboard surfaces the predecessor so the leaking static is the obvious lead.

Pattern 3

SetUpFixture lifecycle surprises

Symptom. A `[SetUpFixture]` initializes a database container; a test failure leaves the container running because `[OneTimeTearDown]` did not fire.

Root cause. [SetUpFixture] runs once per namespace before any test in that namespace, with paired [OneTimeSetUp] and [OneTimeTearDown] hooks. Teardown only fires on graceful shutdown; if the test runner crashes, the host process is killed, or a previous teardown threw, external resources stay open. CI runs that container forever until the next pipeline kills it, and meanwhile the next run hits "port already allocated".

[SetUpFixture]
public class GlobalFixtures {
    private static IContainer _postgres;

    [OneTimeSetUp]
    public async Task Setup() {
        _postgres = new PostgreSqlBuilder().Build();
        await _postgres.StartAsync();
    }

    [OneTimeTearDown]
    public async Task TearDown() {
        // never runs if the host process is killed mid-suite
        await _postgres.DisposeAsync();
    }
}

Fix. Wrap teardown in a defensive try/catch and register a synchronous AppDomain.ProcessExit handler that blocks on async disposal so cleanup is deterministic during shutdown. For containers, use Testcontainers' .WithCleanUp(true) so the Docker daemon GCs them even if your handler does not run.

[OneTimeSetUp]
public async Task Setup() {
    _postgres = new PostgreSqlBuilder().WithCleanUp(true).Build();
    await _postgres.StartAsync();
    AppDomain.CurrentDomain.ProcessExit += OnProcessExit;
}

private static void OnProcessExit(object? sender, EventArgs e) {
    if (_postgres is null) return;
    try {
        _postgres.DisposeAsync().AsTask().GetAwaiter().GetResult();
    }
    catch {
        // best-effort during process shutdown
    }
}

With Mergify. Test Insights notices that the first test of a CI run fails with `port already allocated` only when the previous run's `[OneTimeTearDown]` did not log its completion. The dashboard surfaces the cross-run signature so the missing cleanup is the obvious place to look.

Pattern 4

TestContext attribute leakage between tests

Symptom. A test that reads `TestContext.CurrentContext.Test.Properties["sessionId"]` passes alone and fails inside the suite, with the wrong session set by a sibling fixture.

Root cause. TestContext.CurrentContext.Test.Properties stores attributes per test, but TestContext.Out and TestContext.WriteLine share the test runner's writer. Tests that mutate context-level properties via reflection or rely on inherited [Property] attributes from a base class can see one another's values when fixture order changes.

[TestFixture]
public class CheckoutTests {
    [Test, Property("sessionId", "sess-1")]
    public void BuysFromUserOne() {
        var id = TestContext.CurrentContext.Test.Properties.Get("sessionId");
        // Test runner shares property bag during parallel runs;
        // sibling test's property sometimes wins
        AssertCheckout(id);
    }
}

Fix. Pass session-like values through method arguments (TestCase data) or instance fields scoped to the fixture instance. Reserve TestContext.Properties for read-only metadata declared statically on the test.

public class CheckoutTests {
    private string _sessionId;

    [SetUp] public async Task SignIn() {
        _sessionId = await Auth.LoginAsync("user");
    }

    [Test] public void BuysFromUserOne() {
        AssertCheckout(_sessionId);
    }
}

Pattern 5

Async tests that .Result-deadlock on the sync context

Symptom. A test that calls `someAsyncOperation.Result` passes locally and hangs in CI until the runner kills it as a timeout.

Root cause. Pre-.NET 5, the WPF and ASP.NET Classic synchronization contexts captured the calling thread and resumed continuations on it. Calling .Result on a Task blocks the calling thread; if the Task wants to resume on that thread to finish, it deadlocks. NUnit's runner does not always install a sync context, but tests that pull one in (UI tests, custom SynchronizationContext in [SetUp]) trigger the deadlock under load.

[Test]
public void GetsUserSynchronously() {
    SynchronizationContext.SetSynchronizationContext(new TestSyncContext());
    var user = userService.GetAsync(1).Result; // deadlocks
    Assert.That(user.Name, Is.EqualTo("Rémy"));
}

Fix. Make the test method async Task and await the call. NUnit awaits async test methods natively. Avoid mixing sync waits with async code in tests.

[Test]
public async Task GetsUser() {
    var user = await userService.GetAsync(1);
    Assert.That(user.Name, Is.EqualTo("Rémy"));
}

With Mergify. Test Insights groups timeout failures by their cause. When a test consistently times out at exactly the runner's wall-clock limit and never produces output, the dashboard tags it as a deadlock candidate so the .Result call is the obvious place to look.

Pattern 6

DateTime.UtcNow without an injected IClock

Symptom. A test that asserts an event happened "within the last second" passes locally and fails on the slower CI runner with a timestamp 1.2 seconds old.

Root cause. Calling DateTime.UtcNow directly inside production code reads the system clock. A test that asserts on a freshly created timestamp races against scheduler jitter. Locally the assertion fires within a millisecond; on CI the same code runs after a longer pause and the assertion's tolerance window is too tight.

public class AuditLogger {
    public void Log(string evt) {
        store[evt] = DateTime.UtcNow; // direct clock read
    }
}

[Test]
public void AuditTimestampIsRecent() {
    auditLogger.Log("checkout");
    var logged = store["checkout"];
    Assert.That(logged, Is.EqualTo(DateTime.UtcNow).Within(TimeSpan.FromSeconds(1)));
    // racy under CI scheduler jitter
}

Fix. Inject TimeProvider (built-in since .NET 8) into the production class so tests can swap in a fixed clock. Production wires TimeProvider.System; tests use FakeTimeProvider from the Microsoft.Extensions.TimeProvider.Testing package, or a small custom subclass of TimeProvider when adding the dependency is overkill.

public class AuditLogger {
    private readonly TimeProvider _clock;
    public AuditLogger(TimeProvider clock) { _clock = clock; }

    public void Log(string evt) {
        store[evt] = _clock.GetUtcNow();
    }
}

[Test]
public void AuditTimestamp() {
    var fakeTime = new FakeTimeProvider(DateTimeOffset.Parse("2026-01-01Z"));
    var logger = new AuditLogger(fakeTime);
    logger.Log("checkout");
    Assert.That(store["checkout"], Is.EqualTo(fakeTime.GetUtcNow()));
}

With Mergify. Test Insights links timing failures to their CI runner type. When a test only fails on the slower runner pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the real-clock dependency is the obvious place to look.

Pattern 7

HttpClient instances shared without per-test handlers

Symptom. A test that mocks an HTTP response passes alone and fails inside the suite when a sibling test sees the cached mock response from the previous test.

Root cause. HttpClient is meant to be reused across an application's lifetime: instance-per-test creates socket exhaustion. A common pattern is to register a single HttpClient in DI with a custom DelegatingHandler. Tests that swap the handler to a mock without resetting after leak the mock into the next test.

public class HttpClientTests {
    private static readonly MockHandler Handler = new();
    private static readonly HttpClient Shared = new(Handler);

    [Test] public async Task UsesMockResponse() {
        Handler.EnqueueResponse("{}");
        var resp = await Shared.GetAsync("https://api.test/me");
        Assert.That(await resp.Content.ReadAsStringAsync(), Is.EqualTo("{}"));
    }

    [Test] public async Task ReturnsRealValue() {
        // expects unmocked. actual: mock from previous test still set.
        var resp = await Shared.GetAsync("https://api.test/me");
        Assert.That(await resp.Content.ReadAsStringAsync(), Is.Not.EqualTo("{}"));
    }
}

Fix. Build a fresh HttpClient with a per-test handler in [SetUp]. For shared infrastructure, use IHttpClientFactory with named clients so each test asks for its own.

public class HttpClientTests {
    private MockHandler _handler;
    private HttpClient _client;

    [SetUp] public void SetUp() {
        _handler = new MockHandler();
        _client = new HttpClient(_handler);
    }

    [TearDown] public void TearDown() => _client.Dispose();

    [Test] public async Task UsesMockResponse() {
        _handler.EnqueueResponse("{}");
        var resp = await _client.GetAsync("https://api.test/me");
        Assert.That(await resp.Content.ReadAsStringAsync(), Is.EqualTo("{}"));
    }
}

With Mergify. Test Insights groups failures whose only signature is unexpected HTTP responses or mismatched assertion strings into a per-suite bucket. The dashboard surfaces the test that first poisoned the shared client so the fix lands at the source.

Pattern 8

Retry attribute hiding real bugs

Symptom. Your suite is green. A user reports a bug that should have been caught by the test that ran three times yesterday before passing on attempt 3.

Root cause. [Retry(3)] reruns failing tests up to N times and reports the last result. A real race that loses on attempt 1 and wins on attempt 2 gets reported as PASSED. The bug is still there. The pipeline has decided not to look at it.

[Test, Retry(3)]
public async Task ChargesCard() {
    // intermittent timing bug; passes 2 of 3 times
    var result = await billingService.ChargeAsync(42);
    Assert.That(result.Success, Is.True);
}

Fix. Do not retry at the framework level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information `[Retry]` throws away. Quarantine kicks in once the pattern is clear.

Detection

Catch every NUnit flake in CI

NUnit's `dotnet test` runner emits TRX or JUnit XML reports depending on the logger. Configure the JUnit logger and upload the resulting XML to Mergify with a one-line CLI call. Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci

# 1. Add the JUnit logger
dotnet add package JUnitTestLogger

# 2. Emit JUnit XML on every CI run
dotnet test --logger "junit;LogFilePath=junit.xml"

# 3. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload junit.xml

Prevention

Block flaky NUnit tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without [Retry] touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky NUnit tests per PR

Quarantine

Quarantine without skipping

Once a NUnit test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `[Ignore]` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

Want to see which NUnit tests in your repo are already flaky?

Works with the JUnit logger or any JUnit-compatible NUnit reporter. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my NUnit tests flaky in CI but pass locally?: Your laptop and your CI runner differ in CPU count, parallel scope settings, and whether a sibling test mutated a static field before yours ran. NUnit tests that share static state across fixtures, deadlock on .Result inside a sync context, or read DateTime.UtcNow surface those issues only under CI's tighter resource budget. Reproduce locally with the same NUnit parallelism settings your CI uses (`LevelOfParallelism` or `NumberOfTestWorkers`, configured via assembly attributes or `.runsettings`) to expose the failure, then fix the underlying coupling before pushing.
How do I detect flaky NUnit tests?: NUnit alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does [Retry] fix flaky tests?: No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use [Retry] as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in NUnit?: Eight patterns cover most of what we see: Parallelizable scope confusion across fixtures, static field state surviving the test run, SetUpFixture lifecycle surprises, TestContext attribute leakage between tests, async tests that .Result-deadlock on the sync context, DateTime.UtcNow without an injected IClock, HttpClient instances shared without per-test handlers, and Retry attribute hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky NUnit test without deleting it?: Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `[Ignore]`, no commented-out tests, no orphaned files.
What's the difference between NUnit ParallelScope.Self and Children?: `Self` allows the annotated test or fixture to run in parallel with siblings at the same level. `Children` allows the annotated container's children to run in parallel with each other. They compose: a fixture marked `Self` whose tests are marked `Self` runs the fixture in parallel with siblings AND runs the tests in parallel within the fixture. For a fixture that holds shared state, mark it `[NonParallelizable]` to opt out entirely.

Ship your NUnit suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.

Get started Read the docs

Flaky tests in NUnit. Named, fixed, and quarantined.