Continuous Integration
Continuous integration is the practice of merging every engineer's work into one shared branch many times a day, with an automated build and test on every change. The mechanics are well known. What separates teams that get value from CI from teams that just pay for it is what they do with the signal.
In one paragraph
Every change an engineer makes gets pushed, built, and tested automatically, many times a day. A green pipeline means the change is safe to integrate. A red pipeline blocks the merge. The point is to surface integration problems immediately, when one engineer can fix them in an hour, instead of at the end of a release cycle when the whole team has to. CI is the foundation that trunk-based development, continuous delivery, and modern merge queues are built on.
The problem CI solves
flowchart LR Dev1["Engineer A<br/>2 weeks of work"] Dev2["Engineer B<br/>2 weeks of work"] Dev3["Engineer C<br/>2 weeks of work"] IntegrationDay["Integration<br/>day"] Broken["main<br/>BROKEN<br/>for days"] Dev1 --> IntegrationDay Dev2 --> IntegrationDay Dev3 --> IntegrationDay IntegrationDay --> Broken style Broken fill:#FDECEA,stroke:#E53935,color:#1A1D24 style IntegrationDay fill:#FFF4E5,stroke:#F27B2A,color:#1A1D24
Pre-CI integration: three engineers work in isolation for two weeks, then try to combine the work in one go. The conflicts are not in the code, they are in the assumptions each engineer made about a codebase that has since changed underneath them.
Before CI, teams shipped on a calendar. Engineers worked on long-lived branches for weeks at a time. At the end of the cycle, somebody (the unlucky release engineer) merged all the branches back into one main branch and spent the next several days debugging the result. The phrase "integration hell" is older than most engineers using it now. It described a real week on a real calendar.
The cost of integration is not linear in branch lifetime, it is exponential. A two-day branch is a regular merge. A two-week branch is a project. A two-month branch is the kind of thing teams quietly abandon. The reason is that integration is about reconciling two diverging worlds: every day the branch stays open, both worlds change, and the surface area of the eventual merge grows.
Continuous integration cuts the problem at its source. Branches do not survive long enough for the world around them to change. The integration cost stays small because integration happens constantly, not on a schedule.
How CI works in practice
flowchart LR
Push["Push to branch"]
Build["Build"]
Test["Run tests"]
Signal{"Pass?"}
Green["Mergeable ✓"]
Red["Blocked<br/>Fix the branch"]
Push --> Build --> Test --> Signal
Signal -->|yes| Green
Signal -->|no| Red
style Green fill:#E6F8F2,stroke:#1CB893,color:#1A1D24
style Red fill:#FDECEA,stroke:#E53935,color:#1A1D24
The CI loop: every change is built and tested before it can be considered mergeable.
The mechanics are unremarkable. An engineer pushes a commit to a branch. A CI tool picks up the push, runs a pipeline, and reports the result back as a status check on the pull request. The pipeline almost always contains the same steps in roughly the same order: clone the repository, install dependencies, build the code, run linters, run tests, report.
The pipeline is defined as code, usually in a YAML file checked into the repo (.github/workflows/ci.yml, .gitlab-ci.yml, .circleci/config.yml, and so on). That choice matters: the pipeline lives in the same git history as the code, which means changing the build is a regular pull request like any other.
A trustworthy signal
For CI to be useful, a green pipeline has to mean something. If half the failures on main are flaky tests that pass on retry, engineers learn to ignore the signal. Once they ignore it, the signal is gone, and CI becomes a tax that everyone pays without anyone benefiting. Test Insights exist as a product category specifically because this failure mode is universal.
Fast enough to use
If the pipeline takes 45 minutes, nobody is going to push four times a day. The pipeline has to fit inside the cycle of "make a change, push, wait, react." Ten to fifteen minutes is the upper bound where most teams stay productive. Beyond that, the levers are well known: parallelize the test suite, run only the tests affected by the change (selective testing), and split heavy checks into a second stage that only runs at merge time.
Required on the way to main
A passing pipeline is only useful if it is enforced. Most teams set CI as a required check, so a pull request cannot merge until the pipeline is green. Without that enforcement, "CI is broken on main" becomes a regular condition rather than an incident, and the whole loop unravels.
CI, continuous delivery, continuous deployment
Three terms, often used interchangeably, that describe three different things.
| Practice | What it covers | Who runs it |
|---|---|---|
| Continuous integration | Every change is built and tested in main automatically. | Every engineering team that uses a CI tool. |
| Continuous delivery | Main is always in a state that could be deployed to production. Deployment itself is a manual click. | Most production SaaS. |
| Continuous deployment | Every green commit is deployed to production automatically. | A smaller set: companies with strong test coverage, feature flags, and on-call discipline. |
Continuous integration is the prerequisite. Without an always-green main, there is nothing safe to deliver and certainly nothing safe to deploy. The other two practices are about what the team decides to do once that prerequisite is in place.
Where CI quietly stops working
Most teams that have CI do not have a CI problem in the obvious sense. The pipelines run, the YAML is checked in, the green ticks show up on pull requests. The problem is that the signal has gradually decayed and nobody noticed.
Flaky tests
Tests that fail on one run and pass on the next destroy trust faster than anything else. After a week of retries, engineers learn that red is not necessarily red, and they start clicking "rerun" by reflex. Once that habit forms, every real failure goes through "is this real, or is this flaky?" first, and CI is no longer a signal, it is a question.
The fix is to detect flakes systematically, quarantine them out of the required check, and assign the cleanup. See the flaky tests guide for the framework-specific patterns and the Test Insights product for what it looks like to automate this.
A slow pipeline
When CI takes 30 minutes, pushing a change costs 30 minutes of attention. Engineers context-switch, lose the thread, come back to a failure and have to remember what they were doing. The healthy ceiling is around 10 to 15 minutes. Past that, parallelization, selective testing, and two-step CI are the levers.
Two green PRs that break main together
This one surprises teams the most. Two pull requests pass CI in isolation, both green, both reviewed. They land on main back to back. Main is red the moment the second one merges, because the two changes were tested against an older version of main, not against each other. A function got renamed in PR #1 while a caller was added in PR #2. Neither pipeline saw both changes at once. The fix is a merge queue that tests each PR against the actual future state of main before it merges. CI alone cannot catch this, by design.
No visibility on what is failing
CI emits a green or red tick. That is not enough to run a real engineering org. Teams that take CI seriously look at retry rate, time to green, flake rate per test, slowest tests, and where in the pipeline failures concentrate. Without that data, "CI is slow" and "CI is flaky" stay as vague complaints in retros and nothing gets fixed. CI Insights exist for this, but even a basic dashboard built on top of the CI tool's API is better than guessing.
The CI tools landscape
For most teams the choice of CI tool is settled by the choice of forge:
- GitHub Actions. The default for new projects on GitHub. Tight integration with pull requests, large marketplace of actions, scales to most workloads. The default choice unless something pulls you elsewhere.
- GitLab CI. The native pipeline runner for GitLab. Mature, opinionated, well integrated with merge requests and GitLab's wider DevOps surface.
- CircleCI. Independent CI provider. Strong macOS support, good caching, used heavily by mobile and cross-platform teams.
- Buildkite. Self-hosted runners with a hosted control plane. The pattern of choice for teams that want CI on their own hardware (GPU, large machines, on-prem secrets) without running the orchestration themselves.
- Jenkins. The oldest of the modern CI tools. Still pervasive in enterprise. Powerful and extensible. Operationally heavy in a way that newer tools are not.
- Bazel and other build-tool-driven setups. Inside large monorepos (Google, Meta, Uber, Stripe), CI is often a thin shell around a build tool that does selective testing, remote caching, and dependency tracking. The CI tool runs Bazel; Bazel decides what to run.
Once the CI tool is picked, the interesting questions are not about the tool. They are about the pipeline (what does it run, how fast, how reliably) and about the surrounding system (merge queue, flake detection, observability) that turns the pipeline output into something useful.
If you are setting CI up for the first time
A reasonable order to get to a useful baseline without overbuilding.
- Pick the CI tool that matches your forge. GitHub Actions for GitHub, GitLab CI for GitLab. Switching later is annoying but possible, so do not over-evaluate.
- Start with build plus linters plus unit tests. A green pipeline should mean "the code compiles, the formatters and linters are happy, and the fast tests pass." That is enough to start trusting the signal.
- Make it a required check. Block merges to main on a green pipeline. If you skip this step, the signal will atrophy within a few weeks.
- Add integration and end-to-end tests next, in a slower stage. Many teams discover too late that running Playwright on every push costs more in time and money than they can afford. Separate fast checks (run on every PR push) from heavy checks (run once at merge time). That is the basic shape of two-step CI.
- Add a merge queue as soon as you have two engineers merging in parallel. The cost of waiting until main breaks is higher than the cost of setting it up early.
- Instrument the pipeline. Failure rates, retry rates, time to green, flake rate per test. You need numbers to know which step to speed up next.
FAQ
What is continuous integration?
Continuous integration is the practice of merging every engineer's work into one shared branch many times a day, with an automated build and test running on every change. The point is to find integration problems immediately, when they are still cheap to fix, instead of accumulating them until release week.
What is the difference between continuous integration and continuous delivery?
Continuous integration is about merging code: every change goes into main, every change is built and tested. Continuous delivery is about shipping it: main is always in a state that could be deployed to production at any time. Continuous deployment goes one step further and actually pushes every green commit to production automatically. CI is the foundation the other two are built on.
What is a CI pipeline?
A CI pipeline is the sequence of automated steps that runs on every change: typically clone, install dependencies, build, run linters and tests, report results back to the pull request. Most pipelines are defined as code in a file like .github/workflows/ci.yml, .gitlab-ci.yml, or .circleci/config.yml.
What is a CI tool?
A CI tool is the system that executes the pipeline. The common ones today are GitHub Actions, GitLab CI, CircleCI, Buildkite, Jenkins, and (still) Bazel-driven custom setups inside large monorepos. The tool runs the steps. The discipline of integrating constantly is what makes it continuous.
Why is continuous integration important?
Because integration cost compounds. A two-day branch is roughly a regular merge. A two-week branch is a project. Without CI, integration is the work nobody does until it has to be done all at once, usually at the worst possible time. CI moves that cost from a periodic crisis to a constant background hum.
Does CI require automated tests?
In a meaningful sense, yes. The whole point of CI is a fast, trustworthy signal on every change. A CI pipeline that only builds does not give you that. The minimum is a build plus enough automated tests that a green pipeline actually means something. Adding tests over time is a normal part of CI adoption.
Does continuous integration replace code review?
No. CI verifies that the code builds and the tests pass. Code review verifies that a human thinks the change is correct, well structured, and consistent with the rest of the codebase. They check different things and most engineering orgs require both before merge.
How does CI work with pull requests?
Every PR triggers the pipeline against the PR's commits. The result is reported back as a check on the PR (green tick or red cross). Most teams set the green CI as a required check, so a PR cannot merge until the pipeline passes. Once it merges, CI runs again on main to confirm the merged result is still green.
CI gives you a signal. Mergify is what you do with it.
CI Insights surfaces flakes, slow tests, and retry storms. The merge queue is how every green PR stays green when it lands on main. The two together are what makes continuous integration actually useful.