Skip to content
← Back to merge queue guide Merge queue feature

Batching

Testing each PR with its own CI run means one full pipeline per PR. With ten PRs in the queue, that is ten pipelines. Batching combines several PRs into a single CI run, dramatically cutting cost and resource usage.

flowchart LR
  PR1["PR #1"] --> Batch["Combined<br/>batch of 4"]
  PR2["PR #2"] --> Batch
  PR3["PR #3"] --> Batch
  PR4["PR #4"] --> Batch
  Batch --> CI["1 CI run"]
  CI -->|pass| M["All 4 merge ✓"]
  CI -->|fail| B["Bisect to find<br/>the bad PR"]

  style M fill:#E6F8F2,stroke:#1CB893,color:#1A1D24
  style B fill:#FFF4E5,stroke:#F27B2A,color:#1A1D24

One CI run instead of four. If the run fails, the queue bisects to find which PR caused it.

The cost math

Without batching With batching of 4
PRs44
CI runs41
Cost4x1x

With 20 PRs per day and a 30-minute CI pipeline, going from no batching to batches of 4 takes you from 10 hours of CI time to 2.5 hours. That is a 75% reduction in CI resource usage. For teams paying CI by the minute, the savings show up in the next invoice.

How it works

  1. Multiple PRs enter the queue.
  2. The merge queue combines them into a single test branch.
  3. CI runs once against the combined changes.
  4. If it passes, all PRs in the batch merge together.

Handling failures: bisection

If the batch fails, the queue needs to identify which PR caused the failure. The standard approach is speculative bisection: test overlapping subsets in parallel.

Batch [1,2,3,4] fails
  → Test [1,2] and [1,2,3] in parallel
  → [1,2] passes      → merge PRs #1 and #2
  → [1,2,3] fails     → PR #3 is the culprit
  → Remove PR #3
  → Put PR #4 back in the queue

The good batches still merge. The bad PR is removed and notified. Bisection adds a couple of extra CI runs to isolate the culprit, which is the price of batching at all.

Choosing your batch size

The right batch size depends on your queue failure rate and CI duration:

  • Low failure rate (under 2%): larger batches (5 to 10 PRs) work well, since bisection is rare.
  • Medium failure rate (2 to 5%): moderate batches (3 to 5 PRs) balance efficiency and recovery time.
  • High failure rate (over 5%): small batches (2 to 3 PRs), and prioritize fixing your flaky tests first.

A useful heuristic: if bisection happens more than once a day, your batch size is too large or your test stability needs work.

Tradeoffs to know about

Batching is a cost play, but it has real downsides. One failure affects the whole batch. Bisection adds latency. You may need larger CI runners to handle combined diffs. The math wins for most teams, but not all.

Related features

See how to enable batching in Mergify (set batch_max_size on a queue rule).

Stop running CI per PR.

Mergify batches PRs in your queue and bisects on failure automatically. No config beyond setting a batch size.