Why are my MSTest tests flaky in CI but pass locally?

Your laptop and your CI runner differ in CPU count, parallel scope settings, and which test the runner happened to discover first. MSTest tests that share static state across classes, race on DeploymentItem paths, or read DateTime.UtcNow surface those issues only under CI's tighter resource budget. Reproduce locally with `dotnet test --settings ci.runsettings` to apply the same parallel and discovery configuration as CI, then fix the underlying coupling before pushing.

How do I detect flaky MSTest tests?

MSTest alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.

Does MSTest retry plugins fix flaky tests?

No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use MSTest retry plugins as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.

How do I quarantine a flaky MSTest test without deleting it?

Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `[Ignore]`, no commented-out tests, no orphaned files.

What's the difference between MSTest method-level and class-level parallelism?

Method-level (` MethodLevel `) parallelizes individual test methods; class-level (` ClassLevel `) parallelizes test classes. MSTest still creates a separate test class instance per test method either way, so the main flakiness risk in either mode is shared static state or process-wide resources (files, environment variables, clocks, deployment paths). Method-level is often faster for pure tests; class-level can reduce contention when tests in the same class touch the same shared resources. Pick based on what your tests actually share, and mark conflicting classes `[DoNotParallelize]`.

Flaky tests in MSTest.
Named, fixed, and quarantined.

Flaky MSTest suites are not random. They follow patterns: AssemblyInitialize coupling to discovery order, ClassInitialize racing on static state, DeploymentItem path collisions, async void test signatures, real-clock assertions. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published April 2026

Why MSTest is uniquely flaky

MSTest is Microsoft's first-party .NET test framework. It ships with Visual Studio and the .NET SDK, so most enterprise .NET teams use it by default. The default model runs every test against a fresh instance of the test class but keeps the AppDomain alive across tests, which means static state, the system clock, and any singleton survive the entire run.

Two facets cause most flakes. [AssemblyInitialize] runs once per test assembly with no guarantees about which test it precedes; if it depends on test discovery order, an unrelated change to test selection breaks it. And parallelization is configured in .runsettings with subtle interactions: MSTest still creates a fresh test class instance per test method, but parallel execution exposes races through the assembly's static state, singletons, shared files or databases, and other process-wide configuration.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of MSTest suites: AssemblyInitialize racing test discovery, ClassInitialize ordering surprises across parallel classes, DeploymentItem races on shared files, TestContext mutated across tests, parallel execution settings interacting with shared state, async test signatures missing the Task return, DateTime.UtcNow without TimeProvider, and TestProperty rerun configuration hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

AssemblyInitialize racing test discovery

Symptom. A test that depends on a database container started in `[AssemblyInitialize]` fails on the first run after the test runner discovered tests in a different order.

Root cause. [AssemblyInitialize] fires once per assembly, before any test in that assembly runs. It receives a TestContext tied to assembly initialization, not to a specific test, so any code that pulls per-test data from it (deployment items, test categories, properties from a particular test method) reads the wrong values or none at all. The hazard is the global initialization itself: ordering relative to [ClassInitialize], side effects on shared resources before tests start, and brittle assumptions about what the runner has set up by then.

[TestClass]
public class GlobalSetup {
    [AssemblyInitialize]
    public static async Task Init(TestContext ctx) {
        // assumes the first test brings the right deployment items;
        // breaks when discovery picks a different test first
        var data = ctx.DeploymentDirectory + "/seed.json";
        await Database.SeedAsync(File.ReadAllText(data));
    }
}

Fix. Make [AssemblyInitialize] independent of which test runs first. Hardcode resource paths or load fixtures from a known absolute location. For per-test deployment items, use [DeploymentItem] on the test method itself and read inside [TestInitialize].

[TestClass]
public class GlobalSetup {
    [AssemblyInitialize]
    public static async Task Init(TestContext ctx) {
        await Database.SeedAsync(EmbeddedResources.Read("seed.json"));
    }
}

[TestClass]
public class CheckoutTests {
    [DeploymentItem("Resources/checkout-fixtures.json")]
    [TestMethod] public async Task PlacesOrder() { ... }
}

With Mergify. Test Insights tags failures whose only signature is `AssemblyInitialize threw an exception` and groups them by the file the failing test sits in. The dashboard surfaces the discovery-order coupling so the brittle assumption is the obvious lead.

Pattern 2

ClassInitialize ordering surprises across parallel classes

Symptom. Tests that depend on `[ClassInitialize]` setup pass alone and fail under `<Parallelize Scope="ClassLevel">` with assertions about state another class initialized.

Root cause. [ClassInitialize] fires once per class. With Scope="ClassLevel" in .runsettings, two classes can run their [ClassInitialize] hooks concurrently. If they both touch a shared static (a connection pool, a config dictionary, the Mock framework's global registry), they race.

[TestClass]
public class CheckoutTests {
    [ClassInitialize]
    public static void Init(TestContext _) {
        Bootstrap.Configure(env: "checkout"); // mutates a static AppConfig
    }
}

[TestClass]
public class InventoryTests {
    [ClassInitialize]
    public static void Init(TestContext _) {
        Bootstrap.Configure(env: "inventory"); // races with CheckoutTests
    }
}

// runsettings: <Parallelize><Scope>ClassLevel</Scope></Parallelize>

Fix. Avoid mutating shared static state in [ClassInitialize]. Refactor configuration into instance-per-test scope, or wrap the mutating section in lock with explicit ordering. For test classes that genuinely conflict, mark them [DoNotParallelize].

[TestClass]
[DoNotParallelize]
public class CheckoutTests {
    [ClassInitialize]
    public static void Init(TestContext _) {
        Bootstrap.Configure(env: "checkout");
    }
}

With Mergify. Test Insights tags failures that only appear with parallel runsettings as parallelism-sensitive. The dashboard surfaces the parallel-only signature so the [DoNotParallelize] decision is straightforward.

Pattern 3

DeploymentItem races on shared files

Symptom. A test that copies a fixture file via `[DeploymentItem]` passes alone and fails when run alongside another test that writes to the same path.

Root cause. [DeploymentItem] copies a file or directory into TestContext.DeploymentDirectory before the test runs. The deployment directory is shared across tests in the same run by default. If two parallel tests both write to DeploymentDirectory/output.json, they race; the loser sees the wrong content or a partially written file.

[TestClass]
public class ExporterTests {
    [TestMethod]
    [DeploymentItem("Fixtures/template.json")]
    public void ExportsOrder() {
        var output = Path.Combine(TestContext.DeploymentDirectory, "output.json");
        File.WriteAllText(output, exporter.Render(order));
        Assert.IsTrue(File.Exists(output));
    }

    [TestMethod]
    [DeploymentItem("Fixtures/template.json")]
    public void ExportsInvoice() {
        var output = Path.Combine(TestContext.DeploymentDirectory, "output.json");
        File.WriteAllText(output, exporter.Render(invoice));
        // sometimes asserts on content from ExportsOrder under parallel
        StringAssert.Contains(File.ReadAllText(output), "INV-");
    }
}

Fix. Write to a per-test path derived from TestContext.TestName or Path.GetTempFileName(). Reserve DeploymentDirectory for read-only fixtures.

[TestMethod]
public void ExportsOrder() {
    var output = Path.Combine(Path.GetTempPath(), $"{TestContext.TestName}.json");
    File.WriteAllText(output, exporter.Render(order));
    Assert.IsTrue(File.Exists(output));
}

With Mergify. Test Insights groups failures whose only signature is wrong file content or `IOException: file in use` into a per-suite bucket. The dashboard surfaces the test that first wrote to the shared path so the per-test path fix lands at the source.

Pattern 4

TestContext mutated across tests

Symptom. A test that reads `TestContext.Properties["tenantId"]` passes alone and fails inside the suite with a tenant set by a sibling.

Root cause. TestContext.Properties is a per-test bag, but tests that mutate it in [TestInitialize] via reflection or shared helpers can leak the mutation across tests when those helpers cache the context. Combined with parallel runs, two tests on the same thread see each other's properties.

[TestClass]
public class TenantTests {
    public TestContext TestContext { get; set; }

    [TestInitialize]
    public void Setup() {
        TestContext.Properties["tenantId"] = TenantHelper.PickRandom().Id;
    }

    [TestMethod] public void ProcessesOrder() {
        var id = (string)TestContext.Properties["tenantId"];
        // sometimes the wrong tenant under parallel runs
        Assert.IsNotNull(orderService.For(id).Process());
    }
}

Fix. Pass per-test values through instance fields, not TestContext.Properties. TestContext is meant for run metadata (output, deployment paths), not for state your tests need to read deterministically.

[TestClass]
public class TenantTests {
    private string _tenantId;

    [TestInitialize]
    public void Setup() {
        _tenantId = TenantHelper.PickRandom().Id;
    }

    [TestMethod] public void ProcessesOrder() {
        Assert.IsNotNull(orderService.For(_tenantId).Process());
    }
}

With Mergify. Test Insights catches the cross-test signature: a test only fails after a specific other test has run, with assertions about a value the failing test never set. The dashboard surfaces the predecessor so the misused TestContext is the obvious lead.

Pattern 5

Parallel execution settings interacting with shared state

Symptom. A test suite with `<MSTestV2><Parallelize><Workers>4</Workers></Parallelize></MSTestV2>` in `.runsettings` fails with assertions about a static counter another test incremented.

Root cause. MSTest v2's parallel scope (MethodLevel or ClassLevel) and worker count come from .runsettings. With MethodLevel, test methods can run concurrently, but MSTest still creates a separate test class instance per test method. The race is in shared process-wide state: static fields, registered services, the Mock framework's verifier, and any external resource the methods touch.

// .runsettings
<RunSettings>
  <MSTest>
    <Parallelize>
      <Workers>4</Workers>
      <Scope>MethodLevel</Scope>
    </Parallelize>
  </MSTest>
</RunSettings>

[TestClass]
public class CounterTests {
    private static int _counter;

    [TestInitialize] public void Reset() { _counter = 0; }

    [TestMethod] public void IncrementsToOne() {
        _counter++;
        Assert.AreEqual(1, _counter); // sometimes 2 under parallel methods
    }
}

Fix. Pick parallelism that matches the suite's isolation. Method-level requires every test to be a pure function of its inputs. For class-level (one thread per class), keep the static state but ensure each class owns it. For per-test isolation, prefer instance fields and local variables.

// .runsettings
<RunSettings>
  <MSTest>
    <Parallelize>
      <Workers>4</Workers>
      <Scope>ClassLevel</Scope>
    </Parallelize>
  </MSTest>
</RunSettings>

With Mergify. Test Insights tags failures that only appear with method-level parallelism and pass under class-level reruns as parallelism-sensitive. The dashboard surfaces the parallel-only signature so shared static or other process-wide mutable state is the obvious lead.

Pattern 6

Async test signatures missing the Task return

Symptom. An async test marked `[TestMethod] public async void` passes locally and reports false positives in CI (a test that should have failed reports as PASSED).

Root cause. MSTest awaits async test methods only when they return Task or ValueTask. An async void test method returns immediately; the test runner marks it PASSED before the async body completes. Any assertion failure inside the body is thrown on the synchronization context, where MSTest never sees it.

[TestMethod]
public async void ChargesUser() {
    var ok = await billingService.ChargeAsync(42);
    Assert.IsTrue(ok); // never observed; method returns void immediately
}

Fix. Always return Task from async test methods. Treat async void in test code as a bug; some MSTest versions warn but most pass it through.

[TestMethod]
public async Task ChargesUser() {
    var ok = await billingService.ChargeAsync(42);
    Assert.IsTrue(ok);
}

With Mergify. Test Insights catches false positives by comparing the test's reported duration to assertions logged. A test that reports PASSED with a sub-millisecond duration despite calling async code is the fingerprint, and the dashboard flags it as suspect-pass.

Pattern 7

DateTime.UtcNow without TimeProvider

Symptom. A test that asserts an event happened "within the last second" passes locally and fails on the slower CI runner with a timestamp 1.2 seconds old.

Root cause. Calling DateTime.UtcNow directly inside production code reads the system clock. A test that asserts on a freshly created timestamp races against scheduler jitter. Locally the assertion fires within a millisecond; on CI the same code runs after a longer pause and the assertion's tolerance window is too tight.

public class AuditLogger {
    public void Log(string evt) => store[evt] = DateTime.UtcNow;
}

[TestMethod]
public void AuditTimestampIsRecent() {
    auditLogger.Log("checkout");
    var logged = store["checkout"];
    Assert.IsTrue((DateTime.UtcNow - logged).TotalMilliseconds < 50);
    // racy under CI scheduler jitter
}

Fix. Inject TimeProvider (built-in since .NET 8) into the production class so tests can swap in a fixed clock. Production wires TimeProvider.System; tests use FakeTimeProvider from the Microsoft.Extensions.TimeProvider.Testing package, or a small custom subclass when adding the dependency is overkill.

public class AuditLogger {
    private readonly TimeProvider _clock;
    public AuditLogger(TimeProvider clock) { _clock = clock; }
    public void Log(string evt) => store[evt] = _clock.GetUtcNow();
}

[TestMethod]
public void AuditTimestamp() {
    var fake = new FakeTimeProvider(DateTimeOffset.Parse("2026-01-01Z"));
    var logger = new AuditLogger(fake);
    logger.Log("checkout");
    Assert.AreEqual(fake.GetUtcNow(), store["checkout"]);
}

With Mergify. Test Insights links timing failures to their CI runner type. When a test only fails on the slower runner pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the real-clock dependency is the obvious place to look.

Pattern 8

TestProperty rerun configuration hiding real bugs

Symptom. Your build is green. A user hits a bug that should have been caught by a test that ran four times yesterday before passing.

Root cause. MaxCpuCount + retry plugins for MSTest add a rerun-on-failure flag. A real race that loses on attempt 1 and wins on attempt 2 gets reported as PASSED. The bug is still there. The pipeline has decided not to look at it.

// .runsettings (please don't)
<RunSettings>
  <RunConfiguration>
    <TestSessionTimeout>3600000</TestSessionTimeout>
  </RunConfiguration>
  <MSTestRetry>
    <RetryCount>3</RetryCount>
  </MSTestRetry>
</RunSettings>

Fix. Do not retry at the framework level. When a test is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a test passed on attempt 2 of 3, which is exactly the information `MSTestRetry` configurations throw away. Quarantine kicks in once the pattern is clear.

Detection

Catch every MSTest flake in CI

MSTest's `dotnet test` runner emits TRX or JUnit XML reports depending on the logger. Configure the JUnit logger and upload the resulting XML to Mergify with a one-line CLI call. Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci

# 1. Add the JUnit logger
dotnet add package JUnitTestLogger

# 2. Emit JUnit XML on every CI run
dotnet test --logger "junit;LogFilePath=junit.xml"

# 3. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload junit.xml

Prevention

Block flaky MSTest tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without MSTest retry plugins touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky MSTest tests per PR

Quarantine

Quarantine without skipping

Once a MSTest test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `[Ignore]` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

Want to see which MSTest tests in your repo are already flaky?

Works with the JUnit logger or any JUnit-compatible MSTest reporter. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my MSTest tests flaky in CI but pass locally?: Your laptop and your CI runner differ in CPU count, parallel scope settings, and which test the runner happened to discover first. MSTest tests that share static state across classes, race on DeploymentItem paths, or read DateTime.UtcNow surface those issues only under CI's tighter resource budget. Reproduce locally with `dotnet test --settings ci.runsettings` to apply the same parallel and discovery configuration as CI, then fix the underlying coupling before pushing.
How do I detect flaky MSTest tests?: MSTest alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does MSTest retry plugins fix flaky tests?: No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use MSTest retry plugins as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in MSTest?: Eight patterns cover most of what we see: AssemblyInitialize racing test discovery, ClassInitialize ordering surprises across parallel classes, DeploymentItem races on shared files, TestContext mutated across tests, parallel execution settings interacting with shared state, async test signatures missing the Task return, DateTime.UtcNow without TimeProvider, and TestProperty rerun configuration hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky MSTest test without deleting it?: Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `[Ignore]`, no commented-out tests, no orphaned files.
What's the difference between MSTest method-level and class-level parallelism?: Method-level (`<Scope>MethodLevel</Scope>`) parallelizes individual test methods; class-level (`<Scope>ClassLevel</Scope>`) parallelizes test classes. MSTest still creates a separate test class instance per test method either way, so the main flakiness risk in either mode is shared static state or process-wide resources (files, environment variables, clocks, deployment paths). Method-level is often faster for pure tests; class-level can reduce contention when tests in the same class touch the same shared resources. Pick based on what your tests actually share, and mark conflicting classes `[DoNotParallelize]`.

Ship your MSTest suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.

Get started Read the docs

Flaky tests in MSTest. Named, fixed, and quarantined.