Stop Using Labels to Control CI in GitHub Actions

Gating CI on a PR label feels clean until you learn how GitHub fires label events. Any label change re-runs the whole workflow from scratch, and a required check that's skipped when the label is absent counts as a pass, so PRs merge without ever running it. Here's why the pattern breaks and what to do instead.

Somewhere in a lot of GitHub Actions workflows there’s a line that looks like this:

if: contains(github.event.pull_request.labels.*.name, 'run-e2e')

The idea is reasonable. You have an expensive test suite, end-to-end tests that take an hour, and you don’t want to run them on every push. So you put them behind a label. Add run-e2e when you want the slow suite, skip it the rest of the time. A human decides, the CI obeys. Clean.

It isn’t clean. It’s one of the most expensive mistakes you can quietly bake into a pipeline, and the reason has nothing to do with your tests and everything to do with how GitHub fires label events.

How the label gate actually works

To make a label control a workflow, you have to react to label changes. That means adding labeled and unlabeled to your event types:

on:
  pull_request:
    types: [opened, synchronize, reopened, labeled, unlabeled]

jobs:
  e2e:
    if: contains(github.event.pull_request.labels.*.name, 'run-e2e')
    runs-on: ubuntu-latest
    steps:
      - run: ./run-e2e-tests.sh   # ~60 minutes

When someone adds the run-e2e label, GitHub emits a pull_request event with action: labeled. The workflow starts, the if condition sees the label, and the slow job runs. Remove the label, you get an unlabeled event, the condition is false, the job is skipped.

flowchart LR
    A[Add label run-e2e] --> B[pull_request: labeled event]
    B --> C[Workflow run starts]
    C --> D{if contains run-e2e?}
    D -->|true| E[Run 60-min suite]
    D -->|false| F[Job skipped]

So far it does what you wanted. The problem is everything else that also fires that event.

You can’t filter on the label name

Here’s the part that surprises people: GitHub lets you filter on: by the event type, but not by the label. There is no types: [labeled:run-e2e]. You subscribe to “a label changed,” full stop. Which label changed is something you only find out after the workflow has already started, inside the if.

That distinction is the whole problem. Every label on the PR runs through the same door. A teammate triaging the PR adds bug. Full workflow run. Your project board automation flips priority:high. Full workflow run. Someone removes needs-review after approving. Full workflow run. None of those have anything to do with your test suite, and every one of them creates a brand-new run of the entire workflow.

Put the mental model next to the machine. You picture a gate, the label deciding whether CI runs at all:

flowchart LR
    A[Touch a label] --> B{run-e2e?}
    B -->|yes| C[CI runs]
    B -->|no| D[Nothing happens]

What you actually built fires the whole workflow on any label, and runs every job that doesn’t carry the if:

flowchart LR
    A[Touch any label] --> W[Whole workflow fires]
    W --> L[lint runs]
    W --> U[unit tests run]
    W --> Bld[build runs]
    W --> G{run-e2e?}
    G -->|yes| E[e2e runs]
    G -->|no| S[e2e skipped]

The if you wrote gates one job, not the trigger. Adding bug to triage a PR doesn’t do nothing, it reruns your whole pipeline and skips exactly one job. On a busy PR, labels move around a lot, and each move is a fresh run of everything.

Every run starts from zero

You might think a re-run is cheap when nothing changed. It isn’t. GitHub Actions has no memory that the suite was green five minutes ago on the same commit. There is no “already passed, skip it” at the workflow level. A new event means a new run, and a new run means the 60-minute suite runs for 60 minutes again.

Picture a PR where the code is done and people are just tidying labels before merge. The commit SHA never changes. The tests never change. The result is never going to be different. And yet:

sequenceDiagram
    actor Dev as Teammate
    participant GH as GitHub
    participant CI as e2e suite
    Note over GH: commit abc123, never changes
    Dev->>GH: add label "bug"
    GH->>CI: trigger run
    CI-->>GH: green after 60 min
    Dev->>GH: set label "priority:high"
    GH->>CI: trigger run
    CI-->>GH: green after 60 min
    Dev->>GH: remove label "triage"
    GH->>CI: trigger run
    CI-->>GH: green after 60 min
    Note over Dev,CI: 3 hours of compute, one unchanged result

Three labels touched, not one of them run-e2e, and the suite still ran three times on identical code.

If your suite cancels in-progress runs on new events, you trade the wasted minutes for a different problem: a label edit now kills a test run that was almost finished, and you start the hour over. Either way the label, not the code, is driving your compute bill.

The required-check trap

The cost is the obvious problem. The correctness one is quieter and worse, and it shows up the moment you make that label-gated job a required status check.

You’d assume that requiring e2e means nothing merges without it. It doesn’t. When the run-e2e label is absent, the job is skipped, and a skipped job isn’t a failure. Branch protection and rulesets treat a skipped required check as satisfied, exactly as if it had passed. So every PR that doesn’t carry the label merges with e2e counting as green and the suite never having run.

flowchart TD
    P[PR without the run-e2e label] --> S[e2e job is skipped]
    S --> G[Branch protection treats skipped as satisfied]
    G --> M[Required check: green]
    M --> X[Merge with zero e2e coverage]

This inverts what the gate was for. The label was meant to let you opt in to the expensive suite. What it actually does is let every unlabeled PR opt out of a required check while branch protection reports all-clear. Nobody had to defeat the gate with a clever workaround. The default path is the bypass: you merge code that never ran the test you marked mandatory, and the checkmark says it’s fine.

A control that’s green when it should be red is more dangerous than no control, because people stop checking the thing they think is being checked for them.

Don’t just swap the label for a path filter

The obvious next move is to trigger on paths instead of a label, so the decision comes from the diff rather than from a human:

on:
  pull_request:
    paths:
      - "src/**"
      - "package.json"

It’s the same trap in a quieter costume. A required check that doesn’t run never reports a status, so a docs-only PR sits unmergeable, and the usual fix is a second always-green workflow that satisfies the check without testing anything. Worse, a check that’s green because it didn’t run is indistinguishable from one that didn’t run because the event was dropped or GitHub Actions was down. You’re back to trusting the absence of a signal, which is the one thing an outage is guaranteed to produce.

Path filters earn their keep on workflows that don’t gate a merge, where skipping only saves minutes: a preview deploy, an advisory lint. Keep them off anything branch protection requires. There’s a full breakdown of why path filters can’t be a gate in its own post.

What to do instead

The instinct behind the label gate is fine: don’t run expensive work when it isn’t needed. The problem is what you let a merge gate stand on. A required check should be a positive signal, proof that the work which needed to run actually ran and passed. Stop deriving “safe to merge” from whether a trigger happened to fire, and put the decision where the merge actually happens.

Make genuinely manual runs actually manual

If a job really is something a human kicks off on demand, like a load test against a staging environment, that’s what workflow_dispatch is for:

on:
  workflow_dispatch:
    inputs:
      target:
        description: "Environment to load-test"
        required: true

A workflow_dispatch run is explicit and isolated. It doesn’t ride on pull_request events, so unrelated label edits can’t trigger it, and it never sits in branch protection pretending to be a gate.

You could try to force it into one, and the idea is tempting: pass a PR number to the workflow, check out its head commit, run the suite, then post a commit status back to that exact SHA with the name branch protection requires. It works. It also rebuilds by hand the one thing a merge queue does for you, tying a result to a specific commit so it can gate the merge. And it’s brittle, because the status is pinned to the SHA you ran against, so the next push leaves the head commit with no status and someone has to remember to trigger it again. By the time you’ve made it reliable, you’ve written a worse merge queue out of a gh api call and a sticky note.

Run only what changed, and still know it ran

The real monorepo case is the hard one: you want to run only the tests for the parts of the repo a PR touched, and that result has to gate the merge. Native GitHub Actions can’t give you both. Filter the trigger and a skipped suite leaves you blind. Branch inside one run with dorny/paths-filter and the skipped job is still a missing required check, unless you hand-roll an aggregating gate on top, which is the dummy job again with extra steps.

This decision belongs to whatever owns the merge. Mergify’s monorepo CI does it with scopes: named slices of the repo mapped to file patterns in .mergify.yml. It detects which scopes a PR actually touches, runs only the CI for those, and the merge queue reuses the same scopes to decide eligibility and build batches. “The backend tests didn’t run” becomes a deliberate, recorded outcome instead of a check that’s silently absent, so a non-run can never be mistaken for a pass.

Run the expensive suite when the PR is actually merging

The hardest case to solve with native GitHub Actions is the real reason most people reach for labels: an expensive suite you genuinely don’t want on every push, but absolutely want before code lands. Labels are a bad answer to a real question. A merge queue is a good one.

The model is simple. Cheap, fast checks run on every push, the way they should. The hour-long suite runs once, when the PR reaches the front of the queue and is about to merge, against the exact state it would merge into. You stop paying for the slow suite on every intermediate commit and every label edit, and you only pay when the result actually gates a merge.

With Mergify, the queue decides what to run and when, and it evaluates merge conditions itself rather than leaning on a skipped required check that may or may not report:

queue_rules:
  - name: default
    queue_conditions:
      - check-success = lint
      - "#approved-reviews-by >= 1"
    merge_conditions:
      - check-success = e2e

The fast lint check decides whether a PR is allowed into the queue. The slow e2e suite is a merge_condition, so it runs as part of the merge process, not on every push and not when someone retags the PR. The trigger is “this is about to merge,” which is exactly the condition you were trying, and failing, to encode with a label.

flowchart TD
    Push[Push to PR] --> Fast[Fast checks run on every push: lint]
    Fast --> Cond{lint green and approved?}
    Cond -->|no| Out[Stays out of the queue]
    Cond -->|yes| Q[Enters the merge queue]
    Q --> Front[At the front: run e2e once, against the merge state]
    Front --> Res{e2e green?}
    Res -->|yes| Merge[Merge]
    Res -->|no| Kick[Dequeued, author notified]

The expensive suite runs once per merge, on the code as it will actually land, and never because someone touched a label. And when several PRs are ready together, the queue proves them as one batch, so a single e2e run can clear five PRs instead of five runs clearing one each.

Labels describe, they don’t drive

A label is a note for humans. It says “this is a bug” or “this needs design review.” It’s a description of state. The moment you wire a label to execution, you’ve made a passive annotation into an active trigger, and GitHub’s event model makes you pay for that on every unrelated edit, while branch protection quietly stops meaning what you think it means.

A merge gate should be a positive signal: the work that needed to run, ran, and passed. A label-gated check can’t give you that, because a skipped job still reports green without testing a thing. A path filter can’t either, because a workflow that never runs reports nothing at all. Put the decision where the merge actually happens, so “this didn’t need to run” is a recorded choice instead of a green checkmark that means nothing.

If you’re reaching for a label to control CI, what you probably want is a merge queue.