Batching
Testing each PR with its own CI run means one full pipeline per PR. With ten PRs in the queue, that is ten pipelines. Batching combines several PRs into a single CI run, dramatically cutting cost and resource usage.
flowchart LR PR1["PR #1"] --> Batch["Combined<br/>batch of 4"] PR2["PR #2"] --> Batch PR3["PR #3"] --> Batch PR4["PR #4"] --> Batch Batch --> CI["1 CI run"] CI -->|pass| M["All 4 merge ✓"] CI -->|fail| B["Bisect to find<br/>the bad PR"] style M fill:#E6F8F2,stroke:#1CB893,color:#1A1D24 style B fill:#FFF4E5,stroke:#F27B2A,color:#1A1D24
One CI run instead of four. If the run fails, the queue bisects to find which PR caused it.
The cost math
| Without batching | With batching of 4 | |
|---|---|---|
| PRs | 4 | 4 |
| CI runs | 4 | 1 |
| Cost | 4x | 1x |
With 20 PRs per day and a 30-minute CI pipeline, going from no batching to batches of 4 takes you from 10 hours of CI time to 2.5 hours. That is a 75% reduction in CI resource usage. For teams paying CI by the minute, the savings show up in the next invoice.
How it works
- Multiple PRs enter the queue.
- The merge queue combines them into a single test branch.
- CI runs once against the combined changes.
- If it passes, all PRs in the batch merge together.
Handling failures: bisection
If the batch fails, the queue needs to identify which PR caused the failure. The standard approach is speculative bisection: test overlapping subsets in parallel.
Batch [1,2,3,4] fails → Test [1,2] and [1,2,3] in parallel → [1,2] passes → merge PRs #1 and #2 → [1,2,3] fails → PR #3 is the culprit → Remove PR #3 → Put PR #4 back in the queue
The good batches still merge. The bad PR is removed and notified. Bisection adds a couple of extra CI runs to isolate the culprit, which is the price of batching at all.
Choosing your batch size
The right batch size depends on your queue failure rate and CI duration:
- Low failure rate (under 2%): larger batches (5 to 10 PRs) work well, since bisection is rare.
- Medium failure rate (2 to 5%): moderate batches (3 to 5 PRs) balance efficiency and recovery time.
- High failure rate (over 5%): small batches (2 to 3 PRs), and prioritize fixing your flaky tests first.
A useful heuristic: if bisection happens more than once a day, your batch size is too large or your test stability needs work.
Tradeoffs to know about
Batching is a cost play, but it has real downsides. One failure affects the whole batch. Bisection adds latency. You may need larger CI runners to handle combined diffs. The math wins for most teams, but not all.
Related features
- Speculative checks: test batches in parallel for maximum throughput.
- Two-step CI: ensure PRs pass lightweight checks before entering a batch.
See how to enable batching in Mergify (set batch_max_size on a queue rule).
Stop running CI per PR.
Mergify batches PRs in your queue and bisects on failure automatically. No config beyond setting a batch size.