Skip to content

Flaky tests in Go.
Named, fixed, and quarantined.

Flaky Go suites are not random. They follow patterns: t.Parallel races on package state, leaked goroutines, map iteration assumptions, real-clock timing, leaked httptest servers. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published

mergify[bot] commented · 2 minutes ago Flaky test detected checkout flow › settles the pending promise src/checkout.test.ts:42 Last 3 runs on this commit: ✕ Failed ✓ Passed ✓ Passed Confidence on main: 98% 71% over the last 7 days Auto-quarantined by Test Insights This test no longer blocks your merge. Quarantine lifts when stable.
Example PR comment from the Mergify bot detecting a flaky Go test and quarantining it automatically.

Why Go is uniquely flaky

Go's testing model is intentionally minimal: a flat _test.go file in each package, a single go test ./... command, parallelism opt-in via t.Parallel(). The minimalism is the appeal. It is also why every flake category boils down to "shared mutable state that the test author did not see as shared."

Three Go-specific facets do most of the damage. Package-level var declarations are global by language design and survive across every test in the package. Goroutines started by a test outlive the test unless the author explicitly waits for them. Maps iterate in unspecified order and the runtime exploits that to keep developers honest, which means an assertion on the first key of a map is a flake the moment the map grows.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of Go suites: t.Parallel() racing on shared package state, goroutine leaks across test boundaries, map iteration order assumptions, time-based assertions without testing/synctest, httptest.Server connection leaks, TestMain teardown bypassed by os.Exit, subtest plus t.Parallel interleaving surprises, and CI retries hiding flaky Go test failures. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

t.Parallel() racing on shared package state

Symptom. A test that has been green for months starts failing intermittently after another test in the same package adopts t.Parallel(), and the failure mentions a value the failing test never set.

Root cause. Calling t.Parallel() tells go test to run that test alongside the next parallel test in the same package. Anything they share (a package-level var, a singleton client, a shared temp file path, an environment variable read at init) becomes a race. Tests that worked sequentially fail under parallelism without any code change.

var lastUserID string // package-level state

func TestCreateUser(t *testing.T) {
	t.Parallel()
	lastUserID = createUser(t).ID
	if lastUserID == "" {
		t.Fatal("expected a user ID")
	}
}

func TestNotifyLastUser(t *testing.T) {
	t.Parallel()
	// races with TestCreateUser; sometimes "" sometimes a real ID
	notify(lastUserID)
}

Fix. Treat package-level mutable state as a smell in tested packages. Pass values through *testing.T helpers or local variables so each parallel test owns what it touches. For unavoidable shared external state (a database, a temp directory), give each parallel test its own slice via t.TempDir(), a row scoped by t.Name(), or a transactional fixture.

func TestCreateUser(t *testing.T) {
	t.Parallel()
	id := createUser(t).ID
	if id == "" {
		t.Fatal("expected a user ID")
	}
}

func TestNotifyUser(t *testing.T) {
	t.Parallel()
	id := createUser(t).ID
	notify(id)
}

With Mergify. Test Insights catches the parallel-only signature: a test only fails when run with -parallel > 1 and passes under -parallel=1 reruns. The dashboard tags it as parallelism-sensitive so the shared-state root cause is the obvious lead.

Pattern 2

Goroutine leaks across test boundaries

Symptom. A test passes, the next test fails with `connection refused` or a panic from a goroutine that the failing test never started.

Root cause. Tests start goroutines (a worker, a server, a poller) and finish before those goroutines do. The runtime keeps them running into the next test, where they continue to push to a channel, write to a file, or hit a network endpoint that the new test owns. The new test sees behavior it cannot explain.

func TestPollMetrics(t *testing.T) {
	go func() {
		for {
			time.Sleep(10 * time.Millisecond)
			metrics.Push() // still running after t returns
		}
	}()
	// no cleanup
}

func TestMetricsAreEmpty(t *testing.T) {
	if got := metrics.Len(); got != 0 {
		t.Fatalf("expected empty, got %d", got) // sometimes fails
	}
}

Fix. Use a context.WithCancel tied to t.Cleanup so every goroutine the test starts is cancelled when the test ends. Add go.uber.org/goleak in TestMain to fail loudly when a leak ships.

func TestPollMetrics(t *testing.T) {
	ctx, cancel := context.WithCancel(context.Background())
	t.Cleanup(cancel)
	go func() {
		ticker := time.NewTicker(10 * time.Millisecond)
		defer ticker.Stop()
		for {
			select {
			case <-ctx.Done():
				return
			case <-ticker.C:
				metrics.Push()
			}
		}
	}()
}

With Mergify. Test Insights surfaces the cross-test signature: test B fails consistently when run after test A and never alone. The dashboard groups failures by their predecessor so the leaking-goroutine pattern is easy to find.

Pattern 3

Map iteration order assumptions

Symptom. A test passes hundreds of times locally and fails once a week on CI with an assertion against the first element of a slice built from a map.

Root cause. Go intentionally randomizes map iteration order to discourage code that depends on it. A test that iterates a map and asserts on the first value implicitly assumes an order Go does not promise. The randomization is per-process, so the test can pass on every run for months until the seed lands on the unhappy ordering.

func TestNamesContainsAlice(t *testing.T) {
	users := map[string]int{"alice": 1, "bob": 2, "carol": 3}
	first := ""
	for name := range users {
		first = name
		break
	}
	if first != "alice" {
		t.Fatalf("expected alice, got %q", first) // fails ~66% of the time
	}
}

Fix. Sort before asserting. Build a slice from the map's keys (or values), sort.Strings it, then compare. For "the map contains X" tests, prefer an explicit lookup over iterating.

func TestNamesContainsAlice(t *testing.T) {
	users := map[string]int{"alice": 1, "bob": 2, "carol": 3}
	if _, ok := users["alice"]; !ok {
		t.Fatal("expected alice in users")
	}
}

// or, when order matters, sort:
func TestNamesSorted(t *testing.T) {
	users := map[string]int{"alice": 1, "bob": 2, "carol": 3}
	names := slices.Sorted(maps.Keys(users))
	if !slices.Equal(names, []string{"alice", "bob", "carol"}) {
		t.Fatalf("got %v", names)
	}
}

With Mergify. Test Insights tracks pass rates per test on the default branch. A test that fails once every few hundred runs with a string-comparison error against a known map iteration is the textbook signature; the dashboard surfaces it under map-iteration confidence drops.

Pattern 4

Time-based assertions without testing/synctest

Symptom. A test that asserts something happens "after 100ms" passes locally and fails on the slow CI runner with a tick that arrived a millisecond late.

Root cause. time.Sleep and time.NewTimer use the real clock. A test that sleeps 100ms and asserts a side effect happened is racing the scheduler. Locally the spare CPU finishes in 102ms; on CI the throttled runner finishes in 105ms and the assertion fires before the side effect lands.

func TestRefreshFiresEvery100ms(t *testing.T) {
	count := 0
	stop := startRefresher(func() { count++ }, 100*time.Millisecond)
	defer stop()
	time.Sleep(250 * time.Millisecond)
	if count < 2 {
		t.Fatalf("expected at least 2 ticks, got %d", count) // sometimes 1 on CI
	}
}

Fix. Use testing/synctest (Go 1.24+) to virtualize time inside the test: time advances when every goroutine in the bubble is blocked on something time-dependent, so a 100ms sleep is instant and exact. For older Go, inject a Clock interface and substitute a fake.

func TestRefreshFiresEvery100ms(t *testing.T) {
	synctest.Run(func() {
		count := 0
		stop := startRefresher(func() { count++ }, 100*time.Millisecond)
		defer stop()
		time.Sleep(250 * time.Millisecond) // virtual time, deterministic
		if count != 2 {
			t.Fatalf("expected exactly 2 ticks, got %d", count)
		}
	})
}

With Mergify. Test Insights links the failure to its CI runner type. When a timing test only fails on the slower runner pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the real-clock dependency is the obvious culprit.

Pattern 5

httptest.Server connection leaks

Symptom. A long-running test suite eventually fails with `dial tcp: too many open files` or a flaky test panics with `address already in use`.

Root cause. httptest.NewServer binds a port and starts a goroutine. Forgetting to call srv.Close() leaks both. After enough tests, the worker hits the file-descriptor limit. The first test to run after that point fails with an unrelated socket error.

func TestUploadHandler(t *testing.T) {
	srv := httptest.NewServer(uploadHandler())
	// no defer srv.Close()
	resp, _ := http.Post(srv.URL, "application/octet-stream", strings.NewReader("hi"))
	if resp.StatusCode != 200 {
		t.Fatalf("got %d", resp.StatusCode)
	}
}

Fix. Always defer srv.Close(), or register it with t.Cleanup so the cleanup survives early returns. Pair with resp.Body.Close() for every HTTP response to release the underlying connection.

func TestUploadHandler(t *testing.T) {
	srv := httptest.NewServer(uploadHandler())
	t.Cleanup(srv.Close)
	resp, err := http.Post(srv.URL, "application/octet-stream", strings.NewReader("hi"))
	if err != nil {
		t.Fatal(err)
	}
	defer resp.Body.Close()
	if resp.StatusCode != 200 {
		t.Fatalf("got %d", resp.StatusCode)
	}
}

With Mergify. Test Insights groups failures whose only signature is `too many open files` or `address already in use` into a single fd-exhaustion bucket. The dashboard surfaces the test that first triggered the leak rather than the random victim that ran when fds ran out.

Pattern 6

TestMain teardown bypassed by os.Exit

Symptom. A test suite that uses TestMain to spin up a database container leaves the container running after a failure, and the next CI run fails with `port already allocated`.

Root cause. TestMain patterns that defer teardown and then call os.Exit(code) never run the deferred function: os.Exit terminates the process without unwinding the stack. The same trap shows up for code under test that calls log.Fatal or os.Exit directly, or for a goroutine that panics outside any test function and crashes the whole process before m.Run() returns.

func TestMain(m *testing.M) {
	pool := startTestDB()
	defer pool.Close() // SKIPPED: os.Exit never runs deferred funcs
	os.Exit(m.Run())
}

Fix. Move os.Exit into a wrapper that returns the exit code from a function whose deferred teardown runs first. Inside test bodies, replace log.Fatal with t.Fatal so the runner can clean up; for goroutines started in tests, recover and report through t.Errorf instead of letting the panic crash the process.

func TestMain(m *testing.M) {
	os.Exit(run(m))
}

func run(m *testing.M) int {
	pool := startTestDB()
	defer pool.Close() // runs before os.Exit
	return m.Run()
}

With Mergify. Test Insights notices that the first test of a CI run fails for the same `port already allocated` reason and the failure correlates with a panic on the previous run. The dashboard surfaces the cross-run signature so the missing teardown is easy to spot.

Pattern 7

Subtest plus t.Parallel interleaving surprises

Symptom. A table-driven test with `t.Parallel()` inside the subtest passes locally and fails on CI with subtests sharing the same loop-variable state.

Root cause. t.Run with t.Parallel() inside delays the subtest until the parent finishes its body, then runs all parallel subtests together. Pre-Go 1.22 this combined with loop variable capture to give every subtest the same value of the loop variable. Go 1.22+ fixed the loop-var semantics but the parent-finishes-first ordering still surprises people who assume subtests run in declaration order.

func TestPricing(t *testing.T) {
	cases := []struct{ plan string; want int }{
		{"free", 0}, {"pro", 99}, {"enterprise", 999},
	}
	for _, tc := range cases {
		t.Run(tc.plan, func(t *testing.T) {
			t.Parallel()
			got := priceFor(tc.plan)
			if got != tc.want {
				// pre-Go 1.22: tc was the last loop value for every subtest
				t.Fatalf("plan %q: want %d, got %d", tc.plan, tc.want, got)
			}
		})
	}
}

Fix. Stay on Go 1.22 or newer where loop variables are scoped per iteration. If you cannot upgrade, capture the loop variable explicitly (tc := tc) before the subtest body. For tests that need to share setup across subtests, run setup in the parent and pass it explicitly through closure.

func TestPricing(t *testing.T) {
	cases := []struct{ plan string; want int }{
		{"free", 0}, {"pro", 99}, {"enterprise", 999},
	}
	for _, tc := range cases {
		// Go 1.22+: tc is per-iteration, no manual shadow needed
		t.Run(tc.plan, func(t *testing.T) {
			t.Parallel()
			if got := priceFor(tc.plan); got != tc.want {
				t.Fatalf("plan %q: want %d, got %d", tc.plan, tc.want, got)
			}
		})
	}
}

With Mergify. Test Insights groups failing subtests by their parent. When every subtest of TestPricing fails with the same wrong value, the dashboard surfaces the loop-variable signature so the upgrade or shadow fix lands once.

Pattern 8

CI retries hiding flaky Go test failures

Symptom. Your first CI run fails, a retry passes, and the pipeline reports green. A user still hits the race in production.

Root cause. go test -count=N reruns tests N times and exits non-zero if any run fails, so it tends to expose flakes rather than mask them. The masking happens when CI or shell scripts retry the whole suite and only keep the final result. That turns "failed once, passed later" into "green" even though the underlying race or order dependency is still there. Cached results are a separate concern; use -count=1 when you need a fresh run.

# CI script (please don't)
go test ./... || go test ./... || go test ./...

Fix. Do not hide failures behind blind suite retries. Let the first failing attempt stay visible, then fix the flaky test or quarantine it explicitly while you work on the root cause. If you need a fresh run instead of cached results, use go test -count=1 without treating later retries as success.

With Mergify. Test Insights reruns at the CI level with attempt-by-attempt result tracking. You can see that a test failed on attempt 1 and passed on attempt 2, which is exactly the signal that naive shell retries would hide. Quarantine kicks in once the pattern is clear.

Detection

Catch every Go flake in CI

Add gotestsum (or any JUnit-emitting Go test runner) to your CI script, point it at your packages, and upload the resulting XML to Mergify with a one-line CLI call. Test Insights builds a confidence score for every test on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

mergify ci
# 1. Install gotestsum (one-time)
go install gotest.tools/gotestsum@latest

# 2. Emit JUnit XML on every CI run
gotestsum --junitfile junit.xml -- -race -shuffle=on ./...

# 3. Upload the result (once, in CI)
curl -sSL https://get.mergify.com/ci | sh
mergify ci junit upload junit.xml

Prevention

Block flaky Go tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without `go test -count=N` touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky Go tests per PR

Quarantine

Quarantine without skipping

Once a Go test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `t.Skip()` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

renders the invoice line Healthy login dispatches the right action Healthy checkout flow settles the pending promise Quarantined rate limiter rejects after 3 requests Healthy

Want to see which Go tests in your repo are already flaky?

Works with gotestsum or any JUnit-compatible Go test runner. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my Go tests flaky in CI but pass locally?
Your laptop and your CI runner differ in CPU count, parallelism, and the random seed Go uses for map iteration and goroutine scheduling. Tests that race on package-level state, leak goroutines into the next test, or assume a map iteration order surface those issues only under CI's tighter resource budget. Reproduce locally with `go test -race -count=10 -shuffle=on ./...` and let the race detector and shuffler push the failure into the open before you push.
How do I detect flaky Go tests?
Go alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does `go test -count=N` fix flaky tests?
No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use `go test -count=N` as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in Go?
Eight patterns cover most of what we see: t.Parallel() racing on shared package state, goroutine leaks across test boundaries, map iteration order assumptions, time-based assertions without testing/synctest, httptest.Server connection leaks, TestMain teardown bypassed by os.Exit, subtest plus t.Parallel interleaving surprises, and CI retries hiding flaky Go test failures. Each is covered above with a minimal reproducer.
How do I quarantine a flaky Go test without deleting it?
Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `t.Skip()`, no commented-out tests, no orphaned files.
Why does my Go test pass with -parallel=1 but fail without it?
Because `t.Parallel()` runs that test alongside any other parallel test in the same package, and they share package-level state by default. A package-level `var`, an `init()`-time singleton, an environment variable, or a shared external resource (a file, a port, a database row) becomes a race the moment two parallel tests touch it. Move the state inside the test body (`t.TempDir()`, `t.Setenv()`, a per-test row scoped by `t.Name()`) and the parallel mode stops mattering.

Ship your Go suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.