How to Cut Your GitHub Actions CI Bill (Without Compromising Tests)
Five concrete levers that move the GitHub Actions bill: runner sizing, caching, test selection, batching, and two-step CI. The math, the tradeoffs, and how to pick the two that actually fit your codebase.
Your GitHub Actions bill went from $200 to $2,000 in eighteen months. The team did not double in size. Engineers shipped more PRs, the test suite grew, and the runner minutes piled up faster than any other line item in your cloud spend.
You can fix this without firing your test suite.
The most common failure modes are shrugging at the bill until finance asks, or reaching for the wrong lever first. The five levers below cover what actually moves the number, with the tradeoffs that matter for each.
First, find where the money is going
Before reaching for a tool, look at where minutes are being spent. The GitHub billing page tells you the total per repo and per workflow. That is enough to see which workflows dominate. Open the worst offender and look at the job durations: a single 30-minute integration test run is usually worse than 50 quick lint runs.
The diagnostic question is which job runs the most minutes per week. If your top workflow is integration tests, lever 2 (caching) and lever 5 (two-step CI) move the most. If your top workflow is build, lever 1 (runner sizing) wins. If your top is a full matrix run on every PR, lever 3 (test selection) is mandatory.
Skip the levers that do not match what your bill actually shows. Resist the urge to do all five at once.
Lever 1: Runner sizing
The default ubuntu-latest runner is 2 vCPU, 7 GB RAM. For most jobs, that is correct. For some, it is wasteful. If your build needs 4 vCPU but you booked 8, you paid double for nothing.
jobs:
build:
# Default — fine for most jobs
runs-on: ubuntu-latest
build-heavy:
# Larger runner — only worth it if you measured a >1.5x speedup
runs-on: ubuntu-latest-4-cores
Run a baseline: trigger a workflow on each runner size you are considering and compare wall-clock time. If the larger runner is not at least 1.5x faster, the smaller one wins on cost. Larger runners cost roughly 2x per minute per step in size, so you only profit when the runtime cuts proportionally. The full price ladder lives on GitHub’s Actions billing page.
Two specific cases worth calling out. Large runners are clearly worth it for builds with parallelizable steps like image builds and frontend bundling. They are usually a waste for test runners that only push CPU on one core at a time.
Lever 2: Caching
actions/cache is the single biggest free lever, and most repos under-use it. The defaults cache npm and Cargo registries but not the build outputs that actually take time. A Node project with thousands of dependencies might spend three minutes installing them on every run. Cached, that drops to thirty seconds.
- uses: actions/cache@v4
with:
path: |
~/.npm
node_modules
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
The trap is cache invalidation. A cache key that changes too often costs the same as no cache. A key that changes too rarely serves stale artifacts and breaks builds. The reliable rule: hash the lockfile for dependencies, hash a config file for build outputs, and version the key when you change the toolchain.
Useful targets to cache that most repos miss include built node_modules with native modules, Rust target directories per profile, Docker layer caches via buildx, Bazel disk cache, and the Vite or webpack module graph.
Lever 3: Test selection
Run only what is affected by the diff. This is the highest-impact lever for monorepos and the lowest-impact for small repos. If your monorepo has a frontend app, a backend service, and shared infrastructure, a documentation change should not run the full E2E suite.
The fastest version of this is a paths: filter on the workflow trigger:
on:
pull_request:
paths:
- 'apps/frontend/**'
- 'packages/ui/**'
Tools that do this with real dependency-graph awareness: Bazel knows your dependency graph natively, Nx and Turborepo work for JavaScript monorepos, and Pants does it for polyglot setups. The paths: filter is the 80/20 starting point; the dependency-graph tools are where you go when the filter rules become impossible to maintain.
The tradeoff: when test selection is wrong, you skip a test that should have run, and main breaks. The mitigation is to run the full suite on a schedule (nightly or pre-release) so drift gets caught even if no PR touched the affected paths.
We wrote about this approach in detail at Monorepo CI for GitHub Actions including the implementation we ship at Mergify.
Lever 4: Batching merges
Run CI once for several PRs instead of once per PR. With ten PRs queued for merge and a 30-minute pipeline, that is the difference between five hours of CI minutes and thirty minutes.
This requires a merge queue that supports batching. GitHub’s native merge queue does not. Mergify’s merge queue batches by default, with bisect-on-failure so a single bad PR does not invalidate the whole batch.
# .mergify.yml
queue_rules:
- name: default
batch_max_size: 5
batch_max_wait_time: 30s
The math is straightforward: a batch of N PRs uses one CI run instead of N. The cost per merged PR drops nearly N-fold. The catch is that when a batch fails, bisection adds one or two extra runs to find which PR caused the failure. For a healthy codebase with under 5% failure rate, that overhead is far smaller than the savings.
The sweet spot for batch size depends on your test reliability: 4 to 6 for stable suites, 2 to 3 for flaky ones. If bisection runs more than once per day, your batch size is too large or your tests need work. The full configuration reference lives in the Mergify queue rules docs.
Lever 5: Two-step CI
The expensive checks (E2E browser tests, integration suites that spin up databases, performance benchmarks) only need to pass on PRs that are actually about to merge. Running them on every push is paying for tests on PRs that will be force-pushed, abandoned, or rebased twenty more times before they land.
Two-step CI splits the work: lightweight checks like lint and unit tests run on every PR push. The full expensive suite runs only when a PR enters the merge queue. For a team with 50 PRs per week where 30 reach the queue, that is 50 cheap runs plus 30 full runs, instead of 50 full runs.
The trigger split is just two events on the workflow:
# .github/workflows/pr-ci.yml — fast checks on every push
on:
pull_request:
# .github/workflows/queue-ci.yml — full suite only when the PR enters the queue
on:
merge_group:
Setup requires a merge queue, the same way batching does. The pattern works with GitHub’s merge_group event directly, but you have to write the workflow split yourself; Mergify ships it as a configuration option.
The realistic math
Most teams end up running two of these five levers, not all five. Picking the right two for your codebase is more important than picking all of them.
Two patterns we see most often. A monorepo with a slow integration suite benefits most from test selection (lever 3) plus two-step CI (lever 5), with a typical 50 to 70% bill reduction. A polyglot repo with mixed runner needs gets the most from a runner-sizing audit (lever 1) plus aggressive caching (lever 2), often 30 to 50% with no changes to merge logic.
After picking your two, measure the bill at one month and three months. If the change is less than 20%, the diagnosis was wrong. Open the billing page again and look harder at where the minutes went.
What this is not
This post does not cover self-hosted runners, which are a different shape of decision (hardware ownership and ops cost vs minute cost) and worth their own analysis. It also does not cover GitLab CI or CircleCI, though levers 2, 3, and 4 translate directly. Lever 5 needs a queue feature your CI platform may or may not have.
The point is to look at the bill, find where the minutes actually go, and pick the levers that move that number. A 60% reduction is normal for teams that have not tuned this. A 90% reduction is achievable but rare and usually means the original setup was significantly broken.