A merge queue is critical infrastructure. Build it accordingly.
On April 23, GitHub's merge queue silently corrupted merges for four and a half hours. The failure mode is structural, and it shows what it takes to build a merge queue at the level of critical infrastructure.
On April 23, GitHub’s merge queue silently reverted merged code for about four and a half hours. Pull requests that passed CI ended up on main with the wrong contents. Branch protection ran and the PR pages showed merged. Engineers found out hours later, when something on main started behaving in ways that didn’t match the diff.
This is the worst kind of failure a merge queue can produce. There’s no outage banner and the SLA monitors stay green. The audit trail looks clean too. The only signal is that the code on main no longer matches the code your team thought it merged.
How a merge queue can lie
A merge queue exists to solve one problem. When many engineers land many PRs, you can’t trust that each one will still pass when combined with everything ahead of it in the queue. The queue tests PRs in the order they will land, with their predecessors applied, and only merges them if the combined state stays green.
That works only if one invariant holds: the commit you tested is the commit you land. Same SHA, same bytes. The replay-after-CI strategy, where the queue runs CI against a temporary commit and constructs the actual merge commit at land time, produces a different commit even when the diff is identical. A different parent or a slightly different rebase moment is enough.
Branch protection doesn’t catch the drift, because branch protection runs on the temporary commit. The audit trail looks clean, because the PR went through every gate. The only way to know is to compare what main actually has to what CI signed off on, byte for byte. Most teams don’t do that.
That is the bug class GitHub hit on April 23.
Why it shows up at platform scale
A merge queue is one of those features where every edge case is a different code path. Squash, rebase, merge commits, force-pushes, conflicts that resolve mid-queue, status checks that go red between staging and landing, autoqueue, manual queue, branch protection rules that mutate the merge plan. Each of these has its own variant of the integrity invariant. Each can drift independently.
Inside a large platform, the merge queue is one square on a roadmap. The team that owns it ships a regression in one mode (say, squash combined with rebase) while tuning something adjacent in the staging step. The change passes review, it ships behind a feature flag, the flag rolls out, and a few hours later some PRs on main have the wrong contents.
This is the failure mode of treating a merge queue as a feature inside a larger product. The architecture is unforgiving regardless of how careful the team is. The blast radius of a wrong call is the main branch of every customer using the feature, and the invariants that make the feature safe live deep in the structure of the code, below the level a roadmap operates at.
What a product mindset looks like
When a merge queue is the product, the engineering trade-offs change.
You write down the invariants. You don’t ship a change that touches the merge path without a reviewer who has seen what happens when each invariant breaks. You build the architecture so the dangerous invariants are protected by structure: there is no path through the system that lands a SHA other than the one CI tested, because there is no reconstruction step at land time. The commit you merged is the commit you tested, by construction.
When we built our merge queue, we eventually verified the core algorithm with TLA+. That’s the kind of investment you make when the cost of being wrong is “the customer’s main branch is broken and they don’t know which commits to trust.” For a feature on a roadmap, formal verification is overkill. For a product whose failure mode is silent corruption, it’s the floor.
If you got bitten
For anyone who used GitHub’s merge queue during the window, the audit looks roughly like this. For each merge commit on main between 16:05 and 20:43 UTC on April 23, get the tree hash of the merge commit. Compare it to the tree hash of the PR’s tested commit, modulo whatever the merge mode normalizes. Anything that doesn’t match is a candidate for review.
This is not a fast script. The reason you can’t just check “what was the SHA CI signed off on” is that, for the bug class we’re describing, that SHA isn’t on the branch anymore.
If you have a merge queue, regardless of who runs it, you should know how to detect this class of bug. The audit doesn’t work after the fact unless you’ve kept enough trail to reconstruct what was tested. Most teams haven’t.
Build it accordingly
Merge queues sit on the critical path of every release. When they’re wrong, the wrongness is silent and the recovery is hard.
A feature on a roadmap is built well. A product whose failure mode is silent corruption is built so that a specific class of failure cannot happen.
We make a merge queue. We’re up at night about it.