The Trenches of CI/CD: What We Learned Deploying 100+ Times a Day

Julien Danjou

Aug 4, 2025

∙

5 min

read

CI/CD promises speed, automation, and confidence. But if you’ve ever stared at a red pipeline for hours, battled flaky tests, or deployed a “fully tested” build that still broke production—you know there’s a lot more to it.

This post is a look behind the scenes. Not at the ideal world of CI/CD, but at the messy reality of building and maintaining fast pipelines in a real product team. We’re talking broken builds, team burnout, and the slow path to stability. These are the trenches.

1. Why We Went All-In on CI/CD

When we started, our goal was simple: deploy to production multiple times a day with confidence. We were shipping a developer tool—so fast iteration was critical. Manual deploys weren’t cutting it, and delays between code and customer feedback were slowing us down.

We set up a full pipeline:

Linting and formatting
Unit, integration, and e2e tests
Preview deployments on PRs
Auto-merge and auto-deploy on green checks

On paper, it looked great. But the real story was what came next.

2. When the Pipeline Becomes a Bottleneck

The Myth of Green Builds

It didn’t take long before confidence in the pipeline dropped. Tests would randomly fail, and re-running the build would pass. Engineers started saying “just re-run it” as a default. Trust was eroding.

We traced a lot of this to:

Flaky tests that depended on timing or shared state
Long build times that encouraged people to cut corners
Parallel jobs clashing in unexpected ways

Merge Queues From Hell

We introduced a merge queue to avoid last-minute breakages, but it quickly turned into a traffic jam. PRs would wait hours to merge because one flaky test at the front would block the entire queue. Productivity dropped. Morale followed.

3. The Hidden Cost: Humans in the Loop

CI/CD is often talked about in terms of infrastructure and tooling. But we learned the real constraint was team psychology.

Friday Deploys (And Regrets)

We deployed on Fridays. Until we didn’t. Even with a “safe” pipeline, rollback wasn’t always smooth. And if a production issue appeared late on Friday… well, it ruined weekends. We shifted to a “No Friday deploys after 2pm” rule—less elegant, more humane.

Debugging ≠ Building

When engineers spend more time debugging pipelines than writing features, something’s wrong. Our team was drowning in red pipelines, noisy Slack alerts, and unclear error messages. It didn’t feel like we were moving fast. It felt like we were fighting the tools meant to help us.

4. Digging Ourselves Out

We Quarantined Flaky Tests

Instead of fixing every flaky test immediately (often hard to reproduce), we started quarantining them. They’d still run and report, but wouldn’t block the pipeline. We tracked their flakiness rate and reviewed them weekly. This immediately improved trust and helped us prioritize the worst offenders.

// utils/flakyTest.js
export function flakyTest(description, testFn) {
  if (process.env.CI) {
    test.skip(`[FLAKY in CI] ${description}`, testFn);
  } else {
    test(description, testFn);
  }
}

We Added Context to Failures

CI logs are often walls of noise. We improved this by:

Grouping related logs
Adding links to related PRs or incidents
Labeling common failure patterns with suggestions (e.g., “Likely network timeout – retry with --retry”)

It sounds simple, but reduced time-to-debug dramatically.

Replaced Auto-Deploy with Controlled Triggers

We moved from “every green PR deploys” to “every green PR merges, but only specific branches deploy.” This added a buffer to catch last-minute regressions or bundle-size spikes. It also helped the team mentally separate “merge confidence” from “production readiness.”

5. CI/CD Culture > CI/CD Tools

You can have the best tooling in the world, but if the team treats the pipeline like an annoying gatekeeper instead of a shared asset, you’re doomed.

We Made CI/CD Everyone’s Job

Instead of blaming “DevOps,” we built habits:

Every engineer owns the test and build quality of their code
If your PR breaks the pipeline, you fix it fast—or pair with someone who can
Weekly CI health check where we review test flakiness, queue length, and build duration

We Documented Pipeline Expectations

This included:

What “green” means
What to do when builds fail
How merge queues work
When it’s safe (and not safe) to deploy

This reduced Slack noise and made onboarding smoother.

6. It’s Never ‘Done’—And That’s Okay

CI/CD isn’t a switch you flip—it’s a system you nurture. You’ll always have edge cases, occasional broken builds, and moments of frustration. That’s normal.

What matters is how your team handles it:

Do you own it together?
Do you learn and improve?
Do you balance speed with safety?

If so, you’re doing it right—even when it feels like you’re still in the trenches.

Wrap-Up: Your Trench Is Someone Else’s Blueprint

If your pipeline feels chaotic, you’re not alone. We’ve been there. Most teams that look like they have perfect CI/CD are just better at hiding the scars.

So share your learnings. Swap horror stories. And don’t forget to celebrate when the pipeline runs green on the first try—because in the trenches, that’s a small miracle.

Recommended blogposts

Aug 4, 2025

∙

5 min

read

The Trenches of CI/CD: What We Learned Deploying 100+ Times a Day

Julien Danjou

Aug 4, 2025

∙

5 min

read

The Trenches of CI/CD: What We Learned Deploying 100+ Times a Day

Julien Danjou

Aug 4, 2025

∙

5 min

read

The Trenches of CI/CD: What We Learned Deploying 100+ Times a Day

Julien Danjou

Aug 4, 2025

∙

5 min

read

The Trenches of CI/CD: What We Learned Deploying 100+ Times a Day

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 2

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 2

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 2

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 2

Julien Danjou

landmark photography of trees near rocky mountain under blue skies daytime

Aug 4, 2025

∙

5 min

read

Blogpost 3

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 3

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 3

Julien Danjou

Aug 4, 2025

∙

5 min

read

Blogpost 3

Julien Danjou

Curious where your CI is slowing you down?

Try CI Insights — observability for CI teams.

Get started

Talk to our team

Curious where your CI is slowing you down?

Try CI Insights — observability for CI teams.

Get started

Talk to our team

Curious where your CI is slowing you down?

Try CI Insights — observability for CI teams.

Get started

Talk to our team

Curious where your CI is slowing you down?

Try CI Insights — observability for CI teams.

Get started

Talk to our team