Why are my RSpec tests flaky in CI but pass locally?

Your laptop and your CI runner differ in CPU, parallelism, and the random seed RSpec picked for the run. Specs that depend on test order, hold onto class-level state, or race against Capybara's implicit waits surface those issues only under CI's tighter budget. Reproduce with the exact failing seed (`rspec --seed 12345`) and `rspec --bisect` to find the minimal reproducing pair, then fix the underlying coupling before pushing.

How do I detect flaky RSpec tests?

RSpec alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.

Does rspec-retry fix flaky tests?

No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use rspec-retry as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.

How do I quarantine a flaky RSpec test without deleting it?

Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `skip`, no commented-out tests, no orphaned files.

Why do my RSpec tests pass on my machine but fail in CI?

RSpec picks a new random seed per run, so a spec that fails in CI may have been ordered next to a spec it implicitly depends on. Re-run locally with the failing seed (`rspec --seed N`) to reproduce, then `rspec --bisect --seed N` to narrow it to the minimal pair. CI's tighter resource budget also exposes Capybara timing assumptions and database_cleaner strategy mismatches that finish in time on a fast laptop.

Flaky tests in RSpec.
Named, fixed, and quarantined.

Flaky RSpec suites are not random. They follow patterns: order dependencies, database_cleaner mismatches, lazy let surprises, Timecop freezes, constant leakage. Name them, fix them, quarantine what is left.
Your CI stays green.

By Rémy Duthu, Software Engineer, CI Insights · Published April 2026

Why RSpec is uniquely flaky

RSpec gives Ruby teams a beautifully expressive DSL: let, subject, shared_examples, before blocks at every nesting level, hooks for the suite and the example. Every one of those primitives makes the happy spec path readable. Each one also creates a place where state can persist longer than you intended.

Ruby itself amplifies the surface. Constants and class variables are global by default. Loading a gem can monkey-patch core classes. database_cleaner has three strategies that interact differently with Capybara's separate connection in a JS-driven browser session. RSpec's random ordering exists precisely because the language makes hidden coupling so easy to introduce, and the random seed surfaces it.

The patterns are finite. We've seen the same eight on Mergify Test Insights across hundreds of RSpec suites: hidden order dependencies under --order random, database_cleaner strategy mismatches with JS-driven specs, let vs let! lazy memoization surprises, Timecop freezes that forgot to reset, class variable and constant leakage between specs, shared_examples coupling that hides state, Capybara hardcoded sleeps that race the page, and rspec-retry hiding real bugs. Each has a clean fix once you can name it.

The 8 patterns behind most flaky suites

Pattern 1

Hidden order dependencies under --order random

Symptom. A spec that has passed for months suddenly fails on a green main commit, with a stack trace pointing at a model the failing spec never touches.

Root cause. When RSpec is configured to run in random order (a Rails-generated .rspec sets it, most teams keep it on), each CI run uses a seed that exposes hidden dependencies. A spec that mutates a class variable, registers a Sidekiq worker, or stubs a constant without resetting it leaves that mutation in place for whatever spec runs next. With a different seed, the dependent pair runs in a different order and the failure jumps to a new spec.

# spec/models/user_spec.rb
RSpec.describe User do
  it "registers the welcome callback" do
    User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
    # never unregistered
  end
end

# spec/models/order_spec.rb
RSpec.describe Order do
  it "creates the order" do
    create(:user) # User.callbacks now contains :welcome from the previous file
    # Sidekiq.inline! triggers WelcomeMailer in a context that lacks the deps
  end
end

Fix. Reproduce with the failing seed (rspec --seed 12345), then bisect with rspec --bisect to find the minimal pair. Once you have the dependency, fix the leak: a class-level reset hook, a DatabaseCleaner strategy that covers the mutated table, or a stub_const instead of a permanent reassignment.

# spec/spec_helper.rb
RSpec.configure do |config|
  config.before(:each) { User.callbacks.clear }
end

# Or in the offending spec, scope the mutation
it "registers the welcome callback" do
  original = User.callbacks.dup
  User.register_callback(:welcome) { |u| WelcomeMailer.deliver(u) }
  expect(User.callbacks).to include(:welcome)
ensure
  User.callbacks.replace(original)
end

With Mergify. Test Insights records the seed for every run and groups failures that share a seed range. When a spec fails only under specific seeds, the dashboard surfaces the ordering signature so the dependency is visible without manual bisect.

Pattern 2

database_cleaner strategy mismatches with JS-driven specs

Symptom. Feature specs with `js: true` see a half-empty database: records the test created vanish before assertions run, or records from previous specs appear when the suite hits a JS-driven spec.

Root cause. The transaction strategy is fast and isolated for unit specs because each example runs inside a transaction that rolls back. Capybara's JS driver (Selenium, Cuprite, Playwright-Ruby) opens a separate database connection from the test process, and that connection cannot see uncommitted data from the test's transaction. The spec writes a record, the browser cannot find it, and the assertion fails. Or the test uses truncation globally and pays the deletion cost on every unit spec for the few JS specs that need it.

# rails_helper.rb
RSpec.configure do |config|
  config.before(:suite) { DatabaseCleaner.strategy = :transaction }
end

# spec/features/checkout_spec.rb
feature "checkout", js: true do
  scenario "user buys a thing" do
    user = create(:user) # rolled-back inside the test transaction
    visit "/login"
    fill_in :email, with: user.email # browser cannot see this user
    click_button "Sign in" # fails with "Invalid credentials"
  end
end

Fix. Pick a strategy per spec type. Use transaction for unit and request specs, truncation or deletion for any spec tagged js: true. Database Cleaner has had a metadata-driven configuration for years.

RSpec.configure do |config|
  config.before(:suite) { DatabaseCleaner.clean_with(:truncation) }
  config.before(:each) { DatabaseCleaner.strategy = :transaction }
  config.before(:each, js: true) { DatabaseCleaner.strategy = :truncation }
  config.before(:each) { DatabaseCleaner.start }
  config.append_after(:each) { DatabaseCleaner.clean }
end

With Mergify. Test Insights catches the signature: a feature spec that fails consistently under JS but passes when re-run alone (the truncation from the previous run cleared the connection issue). The dashboard tags the affected specs by their `:js` metadata so the strategy mismatch is the obvious culprit.

Pattern 3

let vs let! lazy memoization surprises

Symptom. A spec that asserts on a side effect of `create(:user)` passes when run alone and fails when run inside a `describe` block that also references `user` lazily.

Root cause. let is lazy: the block runs the first time the helper is called inside an example, not before. let! runs in a before hook, eagerly. A spec that asserts on a count of users without ever referencing user directly will see zero users, because the lazy let(:user) never fires. Add a sibling spec that does reference it, and the count is suddenly one.

RSpec.describe UserCounter do
  let(:user) { create(:user) } # lazy: only runs if 'user' is called

  it "counts users" do
    # never references 'user'; create never fires
    expect(UserCounter.count).to eq(0) # passes here
  end

  it "creates a counter for the user" do
    expect(user.counter).to be_present # 'user' is called, create fires
  end
  # If this spec runs first, the previous 'count' assertion fails on rerun
end

Fix. Use let! when the side effect of the factory is part of what you are asserting on. Reserve plain let for values you need to reference inside the example body. If both specs in the same context disagree on whether the user should exist, split them into separate describe blocks.

RSpec.describe UserCounter do
  context "with no users" do
    it "counts zero" do
      expect(UserCounter.count).to eq(0)
    end
  end

  context "with one user" do
    let!(:user) { create(:user) } # eager

    it "counts one" do
      expect(UserCounter.count).to eq(1)
    end

    it "creates a counter" do
      expect(user.counter).to be_present
    end
  end
end

With Mergify. Test Insights spots the pair-dependent failure pattern: spec A only fails when run after spec B in the same describe context. The dashboard groups by enclosing context so the let-vs-let! mistake is easy to find.

Pattern 4

Timecop freezes that forgot to reset

Symptom. A spec that calls Timecop.freeze passes, and the next spec that touches `Time.now` fails with a date months in the past or future.

Root cause. Timecop.freeze without a paired Timecop.return leaves the global clock frozen for every subsequent example in the worker. A spec that asserts on a token's expiry, a cron-like trigger, or a "now is between X and Y" check will see a stale clock and fail in ways that look completely unrelated.

it "expires invitations after 7 days" do
  Timecop.freeze(Date.new(2026, 1, 1)) # no block: clock stays frozen
  invitation = create(:invitation)
  Timecop.travel(8.days)
  expect(invitation).to be_expired
  # missing Timecop.return
end

it "creates a session token valid for an hour" do
  token = SessionToken.new(user)
  # Time.now is still January 1 2026 because the previous example never reset it
  expect(token.expires_at).to be_within(1.minute).of(1.hour.from_now)
end

Fix. Always pair Timecop.freeze with Timecop.return, ideally via the block form. For Rails 5.1+, prefer ActiveSupport::Testing::TimeHelpers: travel_to and travel_back are auto-cleaned per example.

# rails_helper.rb
RSpec.configure do |config|
  config.include ActiveSupport::Testing::TimeHelpers
  config.after(:each) { travel_back } # belt-and-braces
end

it "expires invitations after 7 days" do
  travel_to Date.new(2026, 1, 1) do
    invitation = create(:invitation)
    travel 8.days
    expect(invitation).to be_expired
  end
end

With Mergify. Test Insights shows the cross-spec time signature: a spec fails only when run after a known time-mutating spec, and only when assertions touch the clock. The dashboard surfaces the ordering so the missed Timecop.return is easy to locate.

Pattern 5

Class variable and constant leakage

Symptom. A spec that stubs a constant or assigns to a class variable passes, then a spec ten files away fails with a value it could not have produced.

Root cause. In Ruby, FOO = bar at the top level mutates a constant globally and triggers a warning, not an error. Class variables (@@cache) live for the life of the process. A spec that reassigns either keeps the change for every spec the worker runs after it. remove_const and manual reset are easy to forget.

RSpec.describe Pricing do
  it "discounts in the test environment" do
    Pricing::DISCOUNT = 0.5 # warning, not an error
    expect(Pricing.for(:pro)).to eq(49)
  end
end

# spec/models/order_spec.rb (loaded later in the same worker)
RSpec.describe Order do
  it "totals at list price" do
    expect(Order.new(plan: :pro).total).to eq(99)
    # Pricing::DISCOUNT is still 0.5; total is 49
  end
end

Fix. Use stub_const for constants and stub_class_variable-style helpers (or an explicit reset in after) for class variables. Both auto-revert at the end of the example.

it "discounts in the test environment" do
  stub_const("Pricing::DISCOUNT", 0.5)
  expect(Pricing.for(:pro)).to eq(49)
  # constant reverts at end of example
end

With Mergify. Test Insights groups the downstream failures by the spec they all follow. When five seemingly unrelated specs fail only after a specific constant-mutating spec runs first, the dashboard surfaces the upstream culprit.

Pattern 6

shared_examples coupling that hides state

Symptom. A `shared_examples` block that worked in two contexts breaks when included in a third, with errors that mention `let` helpers the new context never defined.

Root cause. shared_examples is included into the calling context with full access to its let definitions. A shared block that calls user.email assumes every including context defines a user helper. The first two contexts happened to. The third does not, or defines it with a different shape, and the failure looks like a bug in the shared block when it is really an undeclared dependency.

shared_examples "an authorized request" do
  it "returns 200" do
    sign_in user # depends on a 'user' let
    get path
    expect(response).to have_http_status(:ok)
  end
end

RSpec.describe "GET /admin" do
  let(:user) { create(:admin) }
  let(:path) { "/admin" }
  it_behaves_like "an authorized request" # works
end

RSpec.describe "GET /reports" do
  # forgot to define 'user'
  let(:path) { "/reports" }
  it_behaves_like "an authorized request" # NameError: undefined 'user'
end

Fix. Make the shared block take its dependencies as parameters, or document the required let helpers at the top of the block. shared_examples_for with explicit parameters surfaces the contract.

shared_examples "an authorized request" do |user_factory:|
  let(:request_user) { create(user_factory) }

  it "returns 200" do
    sign_in request_user
    get path
    expect(response).to have_http_status(:ok)
  end
end

it_behaves_like "an authorized request", user_factory: :admin

With Mergify. Failures inside shared examples carry the including context's location in the trace. Test Insights groups them by the shared block's name so a single drift fix lands in the right place instead of N different specs.

Pattern 7

Capybara hardcoded sleeps that race the page

Symptom. A feature spec passes locally on a fast machine and fails in CI with `ElementNotFound`, even though the element is clearly there in the screenshot.

Root cause. Capybara's matchers (have_content, find) already retry up to Capybara.default_max_wait_time. Sprinkling sleep 1 in front of an action is a sign the spec author lost the wait-vs-action argument. Under CI's tighter resource budget, the manual sleep finishes before the page is ready and the next click misses.

feature "user signs up" do
  scenario "via the modal" do
    visit "/"
    click_link "Sign up"
    sleep 1 # wait for the modal
    fill_in :email, with: "user@example.com"
    # In CI the modal sometimes is not open yet; fill_in raises ElementNotFound
  end
end

Fix. Let Capybara wait. Use find with the action target so Capybara polls until the element is interactable, or assert on the modal first so the implicit wait does the work.

feature "user signs up" do
  scenario "via the modal" do
    visit "/"
    click_link "Sign up"
    expect(page).to have_selector("[role=dialog]") # waits, fails fast if missing
    within "[role=dialog]" do
      fill_in :email, with: "user@example.com"
    end
  end
end

With Mergify. Test Insights links the failure to its CI runner type. When a spec only fails on the slower runner pool and never on the laptop pool, the dashboard surfaces the resource sensitivity so the timing assumption is the obvious place to look.

Pattern 8

rspec-retry hiding real bugs

Symptom. Your suite is green. A user reports a bug that your specs were supposed to catch.

Root cause. rspec-retry with retry: 3 reruns failing examples up to three times and reports the last result. A real race that loses on attempt 1 and wins on attempt 2 gets reported as green. The bug is still there. The pipeline has decided not to look at it.

# spec/spec_helper.rb (please don't)
require "rspec/retry"

RSpec.configure do |config|
  config.verbose_retry = true
  config.default_retry_count = 3
end

Fix. Do not retry at the framework level. When a spec is genuinely flaky, fix it. When the fix takes longer than a session, quarantine it instead. That keeps the signal visible without blocking the merge queue.

With Mergify. Test Insights reruns at the CI level with attempt-level result tracking. You see that a spec passed on attempt 2 of 3, which is exactly the information rspec-retry throws away. Quarantine kicks in once the pattern is clear.

Detection

Catch every RSpec flake in CI

Mergify ships a native RSpec gem. Add it to your Gemfile and Test Insights builds a confidence score for every spec on your default branch. PR runs are compared against that baseline. Anything inconsistent gets flagged in a PR comment before the author merges.

Gemfile

# Add to your Gemfile (test group)
gem "rspec-mergify", group: :test

# Full setup, CI auth, and configuration:
# https://docs.mergify.com/ci-insights/test-frameworks/rspec

Prevention

Block flaky RSpec tests at PR time

On every PR, Mergify reruns the tests whose confidence is below threshold, without rspec-retry touching your config. The PR gets a comment naming the unreliable tests, their confidence history, and whether the failure on this PR is new or historical noise. Authors fix the real bugs before merge instead of re-running CI until it passes.

Mergify Test Insights Prevention view showing caught flaky RSpec tests per PR

Quarantine

Quarantine without skipping

Once a RSpec test is confirmed flaky, Test Insights quarantines it. The test still runs in the suite, no `skip` rewrite required, but its result no longer blocks merges or marks the pipeline red. When the pass rate on main recovers, quarantine lifts automatically and the test goes back to being load-bearing.

Want to see which RSpec specs in your repo are already flaky?

Native `rspec-mergify` gem, no JUnit XML wrangling. Setup takes under five minutes.

Book a discovery call

Frequently asked questions

Why are my RSpec tests flaky in CI but pass locally?: Your laptop and your CI runner differ in CPU, parallelism, and the random seed RSpec picked for the run. Specs that depend on test order, hold onto class-level state, or race against Capybara's implicit waits surface those issues only under CI's tighter budget. Reproduce with the exact failing seed (`rspec --seed 12345`) and `rspec --bisect` to find the minimal reproducing pair, then fix the underlying coupling before pushing.
How do I detect flaky RSpec tests?: RSpec alone cannot tell flaky from broken since each run gives one data point per test. You need to run the same commit multiple times and compare results. Mergify Test Insights does that on every PR and on the default branch, scores each test, and surfaces the tests whose pass rate drops below a confidence threshold.
Does rspec-retry fix flaky tests?: No, it hides them. A test that fails on attempt 1 and passes on attempt 2 is still broken; you have only decided not to look at the failure. Use rspec-retry as a temporary bandage for a test you are actively fixing, never as a permanent policy. For visibility without blocking the merge queue, quarantine instead of retry.
What causes flaky tests in RSpec?: Eight patterns cover most of what we see: hidden order dependencies under --order random, database_cleaner strategy mismatches with JS-driven specs, let vs let! lazy memoization surprises, Timecop freezes that forgot to reset, class variable and constant leakage between specs, shared_examples coupling that hides state, Capybara hardcoded sleeps that race the page, and rspec-retry hiding real bugs. Each is covered above with a minimal reproducer.
How do I quarantine a flaky RSpec test without deleting it?: Mergify Test Insights quarantines the test automatically once its confidence score drops. The test still runs in the suite, but a failing result no longer blocks merges and its noise no longer drowns out real signal. When the test stabilizes on main, quarantine lifts automatically. No `skip`, no commented-out tests, no orphaned files.
Why do my RSpec tests pass on my machine but fail in CI?: RSpec picks a new random seed per run, so a spec that fails in CI may have been ordered next to a spec it implicitly depends on. Re-run locally with the failing seed (`rspec --seed N`) to reproduce, then `rspec --bisect --seed N` to narrow it to the minimal pair. CI's tighter resource budget also exposes Capybara timing assumptions and database_cleaner strategy mismatches that finish in time on a fast laptop.

Ship your RSpec suite green.

2k+ organizations use Mergify to merge 75k+ pull requests a month without breaking main.

Get started Read the docs

Flaky tests in RSpec. Named, fixed, and quarantined.