Diving into pytest Finalizers

While building pytest-mergify, we hit a wall with fixture teardown during test reruns. The fix was two helper functions and a trick borrowed from pytest-rerunfailures.

I thought rerunning a pytest test would be easy. We’re building pytest-mergify, a plugin that can rerun a single test multiple times in a row. pytest already has a clean test protocol with three steps: setup, call, and teardown, so why not just run the call phase multiple times?

Something like:

for _ in range(retries):
    _pytest.runner.call_and_report(item=item, when="call", log=True)

That actually works fine for simple tests.

But then fixtures show up, and everything breaks.

Say you have a fixture that creates a temporary database:

@pytest.fixture
def _create_test_db():
    conn = _connect_to_db()
    yield conn
    conn.drop_database()

During the first run, it works perfectly. On the second run, pytest reuses the setup state from the first run, so the fixture isn’t reinitialized. Now the test tries to recreate a DB that already exists.

At that point, I knew I had to dig into how pytest handles fixture scopes and teardowns.

Understanding how pytest actually tears things down

pytest keeps track of all fixtures that need cleanup in a private attribute. This attribute can be accessed from an item with: item.session._setupstate.stack.

It’s a dictionary that maps nodes (like the test item, module, or session) to their teardown callables (and also exception information).

It looks roughly like this:

{
    <Function test_example>: [(finalizer_fn, ...)],
    <Module test_file.py>: [(finalizer_fn, ...)],
    <Session>: [(finalizer_fn, ...)],
}

Each entry is a list of finalizers.

When pytest finishes a test, it calls teardown_exact() (you can see it in the source) to decide what to clean up. That function compares the current test item with the next one (nextitem). If there is no next item, pytest assumes the session is ending and tears down everything.

That behavior makes sense in normal test runs, but not when you’re trying to rerun the same test multiple times in the same session. Calling _pytest.runner.runtestprotocol(item=item, nextitem=nextitem, log=True) resets fixtures that should stay alive, like session- or module-scoped ones.

At that point, I needed to either replicate the teardown logic myself (bad idea) or find a way to temporarily stop pytest from tearing down the wrong fixtures.

The small trick hiding in pytest-rerunfailures

I started reading how other plugins deal with this. The most interesting one was pytest-rerunfailures.

It reruns failed tests, so it faces the same problem: how to isolate test retries without tearing down the world every time.

Their approach is clever. They use what they call suspended finalizers.

Here’s the idea: between retries, they temporarily remove higher-scoped finalizers (class, module, session) from the stack so only function-scoped fixtures get cleaned up and recreated. When the last retry finishes, they restore everything and let pytest do its normal teardown.

A minimal version looks like this:

_suspended_finalizers = {}

def suspend_item_finalizers(item):
    for stacked_item in list(item.session._setupstate.stack.keys()):
        if stacked_item == item:
            continue  # Keep function-scoped finalizers.

        _suspended_finalizers[stacked_item] = item.session._setupstate.stack.pop(stacked_item)

def restore_item_finalizers(item):
    item.session._setupstate.stack.update(_suspended_finalizers)
    _suspended_finalizers.clear()

In your plugin, you can call these inside a pytest_runtest_teardown hook, depending on whether this is the last retry or not:

def pytest_runtest_teardown(self, item):
    if is_last_execution(item):
        restore_item_finalizers(item)
    else:
        suspend_item_finalizers(item)

Two short helpers and one conditional hook.

During intermediate retries, pytest will tear down only the function-scoped fixtures. At the end, the full teardown sequence runs normally.

Why I liked this solution

It feels like working with pytest instead of against it. The internal setup stack stays intact, the teardown order is preserved, and other plugins continue to behave normally.

It’s also surprisingly minimal. You don’t need to copy any pytest logic, just adjust when certain finalizers are visible.

Once you get used to reading pytest’s source, you start understanding why the behavior is the way it is. That understanding is worth more than the fix itself.

If you’re interested in how CI failures affect developer focus or why CI feels broken in general, we’ve written about those too.

PS: I did try asking AI for help while figuring this out. It didn’t mention finalizers once. Even when I added specific context about the setup stack. Turns out, this is one of those cases where reading the source code is still the fastest path to clarity.

Diving into pytest Finalizers

Understanding how pytest actually tears things down

The small trick hiding in pytest-rerunfailures

Why I liked this solution

Tired of flaky tests blocking your pipeline?

Recommended posts

Timecop.freeze without Timecop.return is the textbook RSpec time bomb

Timecop.freeze without Timecop.return is the textbook RSpec time bomb

Playwright storageState is not just a setup file. It is a contract.

Playwright storageState is not just a setup file. It is a contract.

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for

Vitest's isolate:false buys you 30% speed and a class of flake you cannot grep for