When GitHub Webhooks Lie: How an Empty Array Broke Our Merge Queue
GitHub webhooks can deliver structurally valid payloads with stale data. We traced a customer incident to out-of-order delivery and built action-aware upserts to protect against it.
A review_request_removed webhook arrived with an empty labels array. Seven minutes later, a customer was manually re-queuing their PR because our merge queue had ejected it, even though nothing about the PR’s labels had changed.
The problem
If you build on GitHub webhooks, you probably do what we do: receive the event, validate the payload, and upsert the data into your database. GitHub sends you the current state of the object, you store it, and your system reacts to the new state.
The GitHub webhook documentation says the pull_request event includes a labels array. What it doesn’t tell you is that sometimes that array is empty, even when the PR has four labels on it. We found this the hard way when a customer’s PR was ejected mid-pipeline from our merge queue, with the error message: “The pull request rule doesn’t match anymore.”
The PR’s merge rules required specific labels like ready-to-merge and priority/high. All four labels were present on the PR. GitHub’s own UI showed them correctly. But our database said the PR had zero labels, because a review_request_removed webhook had just told us so.
This class of bug is insidious because the webhook isn’t malformed. It’s structurally valid JSON with correct types. It’s just incomplete. And if your system treats webhooks as the source of truth, an incomplete payload is indistinguishable from an intentional state change.
The timeline
Here’s exactly what happened, with timestamps from our logs:
| Time (UTC) | Event |
|---|---|
| 17:00:34 | PR queued, all four label conditions matched |
| 17:24:02 | GitHub sent review_request_removed webhook with empty labels array |
| 17:24:16 | All 4 labels (ready-to-merge, priority/high, team/backend, reviewed) evaluated as missing |
| 17:24:18 | PR dequeued: “The pull request rule doesn’t match anymore” |
| 17:31:38 | Customer manually re-queued via @mergifyio queue |
Twenty-four minutes of queue time wasted, a developer interrupted, and a merge rule that appeared to spontaneously break for no reason.
How the upsert worked (and why it was fragile)
Our PR model stores the full pull request state from GitHub in PostgreSQL. When a webhook arrives, we run a straightforward INSERT ... ON CONFLICT DO UPDATE:
fields = {
"id": validated_data["id"],
"number": validated_data["number"],
"title": validated_data["title"],
"labels": validated_data["labels"],
"draft": validated_data["draft"],
# ... 20+ more fields
}
stmt = (
postgresql.insert(cls)
.values(**fields)
.on_conflict_do_update(
index_elements=[cls.base_repository_id, cls.number],
set_=fields,
)
)
This is a wholesale field upsert. Every field from the webhook payload overwrites the database row, every time, regardless of which webhook action triggered it. For most fields this is correct: the webhook is the authoritative source of truth for the PR’s title, state, and head SHA.
But labels is different. A review_request_removed event has nothing to do with labels. GitHub includes the labels array because the pull request schema always includes it. The question is: can you trust the value?
It’s not (just) a bug. It’s out-of-order delivery
Here’s what makes this deeper than a GitHub quirk: webhooks arrive unordered. GitHub doesn’t guarantee delivery order. A user adds a label and removes a reviewer within seconds. GitHub fires two webhooks: labeled with the full label set, then review_request_removed with whatever the PR state was at the time that event was generated. If the second webhook was built from a snapshot taken before the label was applied, its labels array is legitimately empty from GitHub’s perspective.
Now your system processes them. If review_request_removed arrives second (which it will, most of the time), it overwrites the correct labels from the labeled event with its stale snapshot. No bug in GitHub. No malformed payload. Just two valid events, processed in delivery order, producing a corrupted state.
This isn’t hypothetical. We already had a defense for this exact pattern, but only for the head.sha field. Our code checks whether a webhook’s head SHA has already been superseded by a synchronize event, and skips the update if so. We’d recognized that out-of-order delivery corrupts head SHAs. We just hadn’t generalized the lesson to other fields.
The uncomfortable truth is that any field in a webhook payload can be stale if the event was generated from a pre-mutation snapshot. The action field tells you what actually changed. Everything else is context that may or may not reflect the current state.
The fix: action-aware updates
The fix is conceptually simple: don’t overwrite labels unless the webhook action is actually about labels.
LABEL_CHANGING_ACTIONS = {"labeled", "unlabeled", "opened"}
if (
old_pull_request is not None
and validated_data["action"] not in LABEL_CHANGING_ACTIONS
and not validated_data["labels"]
and old_pull_request.labels
):
validated_data["labels"] = old_pull_request.labels
Four conditions, all required:
- The PR already exists in our database — this is an update, not a creation.
- The webhook action doesn’t change labels — only
labeled,unlabeled, andopenedevents legitimately modify the labels array. - The incoming labels array is empty — we’re not overriding a non-empty payload, only protecting against empty ones.
- The database has labels — there’s something worth preserving.
When all four conditions are met, we keep the database labels instead of trusting the webhook. The fix sits in process_github_data(), the validation layer above the upsert, so the upsert itself stays clean and generic.
Why not just diff every field?
The tempting over-engineering is to diff every field and only update what changed. We rejected it: for most fields, the webhook really is authoritative, and you can’t reliably diff without knowing which fields each action is supposed to update — a per-action schema that GitHub doesn’t document. The empty-array pattern is specific and detectable. We’re not guessing whether data is stale. We’re detecting a known pattern and handling it. A scalpel, not a shotgun.
This wasn’t our first incomplete payload
When we dug into the issue, we found that we’d already hit a similar bug months earlier: pull_request_review_thread events were arriving with incomplete payloads too, and we’d added a workaround in the webhook handler layer. That fix was buried in a different file with a different approach, a classic case of the same bug class being independently discovered and independently patched.
This time, we placed the fix at the model layer where the upsert happens, which is architecturally cleaner: every webhook action flows through the same process_github_data() method, so the protection applies universally.
What this means for your webhook consumers
If you consume GitHub webhooks (or any third-party webhooks), here’s what we took away from this:
Webhooks are not ordered snapshots. A webhook tells you what changed, but the full object it includes may be stale. Not because the provider has a bug, but because out-of-order delivery means you can process an older snapshot after a newer one. The action field tells you what actually happened. Use it to decide which fields to trust.
Wholesale upserts are fragile. ON CONFLICT DO UPDATE SET = all_fields is the simplest pattern, but it assumes every field in every event is both authoritative and current. Out-of-order delivery and incomplete payloads both break that assumption, and your database state corrupts silently.
Watch for defaultdict symptoms. Our labels disappeared without any error. The empty array was valid JSON, valid schema, valid type. There was nothing to catch in validation. The only signal was a behavioral anomaly: labels vanishing on a non-label event. If you can, add logging when a non-empty field is about to be overwritten with an empty value. It’s a cheap canary.
Check your incident history for pattern recurrence. We’d fixed this class of bug once before in a different layer. If a bug class appears twice, it’s a design gap, not a coincidence. That’s your signal to fix it at the right abstraction level.
The full fix was 24 lines of code and 57 lines of tests. The investigation, the timeline reconstruction, and the confidence that we weren’t breaking something else? That took the real time.