We put Jinja2 in our config for flexibility. Users wanted one variable.
We gave users a full templating engine in their config, then scanned 4,629 of them to see what they did with it. Across 6,873 label values, exactly one held a template. Here's what that taught us about shipping flexibility.
We gave users a full templating engine inside their YAML config. After years of watching what they actually did with it, I scanned every config to count the dynamic values. One field group of 6,873 values held a single template, and wherever people did reach for a variable, it was almost always the same one: the PR author.
The power we handed out
Mergify config lives in a .mergify.yml file in each repo. For a long time, many fields in that file accepted Jinja2 templates. You could write {{ author }} to drop in the pull request author, or something fancier with filters and loops. The idea was generous: give people a real templating language and let them build whatever automation they wanted.
That generosity has a price, and most of it is paid in security. A template engine that renders strings pulled straight from a user’s repo is an execution surface you own forever. It runs inside a sandbox we’ve had to harden ourselves, and every new piece of context we expose to a template means another round of careful review about what someone could do with it. We get a steady stream of HackerOne reports poking at exactly this. The thing works, and it is still the kind of code you never fully relax about.
So we kept coming back to one question: was anyone actually using the power we’d handed out? If the answer was “lots of people, in ways we can’t reproduce without templates,” we were stuck maintaining the engine and its sandbox indefinitely. If the answer was “almost nobody,” we could replace the few real uses with something safe and delete the engine.
What the data said
We scanned a snapshot of 4,629 config files across 1,262 organizations and counted, per field, how many values contained a template versus a plain string. The label action was the clearest case:
| Field | Uses | Orgs | Templated |
|---|---|---|---|
label.add | 3,486 | 293 | 1 |
label.remove | 2,870 | 151 | 0 |
label.toggle | 517 | 75 | 0 |
Across all 6,873 label values, one used a template: branch:{{ base }}, from a single org embedding the base branch name into a label. Every other value was a constant string.
The assignee and bot-account fields told a slightly different version of the same story. People did reach for a variable there, but almost always the same one. Out of 551 uses of the assignee fields, 531 (96%) were exactly {{ author }}. The most interesting of the stragglers was one org computing a contributor handle from merged_by through a filter chain.
The real shape of demand was static strings plus a single variable, the PR author. Past that, near silence. We had shipped a programming language to deliver what turned out to be a dropdown with one useful option.
What the open engine taught us
Here’s the part I’ll defend even though it sounds backwards. Shipping the full engine was the right call, and I’d do it again.
When you don’t know what people need from a config format, you have two ways to find out. Guess the small set of knobs they’ll want and build only those, or give them something open-ended and watch what they reach for. We gave them open-ended. Years of real configs are a far better requirements document than any whiteboard session. That scan justifies the cleanup, and it also answers “what should the safe, declarative version actually support?” We know, because we watched.
I won’t pretend users surprised us with some clever trick we’d never have imagined. The opposite happened. All that freedom mostly revealed how little dynamism anyone wanted, and that’s still a result. It told us we could collapse a general engine down to a handful of named values and lose almost nothing real.
The obvious objection: maybe people left the label fields static because they never discovered they could template them. Then the low count would just mean our docs failed to surface the feature, and it would tell us nothing about real demand. I don’t buy it. Templating is documented in depth, and in the fields where dynamism pays off, like commit messages and PR bodies, people write real Jinja2, for loops and filters included. They found the feature, and they lean on it hard where it earns its place. They just never needed it to build a label.
That also answers the allowlist objection: a short list of named variables would have taught us the same thing with none of the risk. An allowlist only measures the variables you already thought to offer. When you see no demand for something you never shipped, you can’t tell whether nobody wanted it or you never gave them the chance. The open engine had no such ceiling, and the one org that pushed an assignee field to its limit (that merged_by chain) proves the ceiling was reachable. In the label and assignee fields, almost nobody reached past {{ author }}, and that silence is the measurement.
I should be clear that the measurement wasn’t the plan. We shipped the engine to give people a lot of flexibility fast, without building a dedicated feature for every case, and years later the scan turned that into a requirements document. The bet’s real downside was security, and it bit hardest in the early years, before we had AI helping us find the holes that a template engine on untrusted input always develops. If the scan had come back showing hundreds of orgs running real automation through templates, this would be a different post, and we’d carry that risk for years more. The demand was shallow, so the cleanup is cheap. It was still a bet.
Removing an engine without breaking a config
Knowing what to keep is half the job. The other half is getting from here to there without breaking configs that are already running in production.
We go field by field, driven by the usage scan instead of a guess. Each field we want to wean off Jinja2 gets one piece of metadata:
DeprecatedJinja2(allowed_onpremise=..., allowed_saas=..., allowed_literals=...)
From those flags, a single place derives the JSON schema the dashboard reads, the validation that accepts or rejects a templated value, the wording of the deprecation notice, and the named choices we surface in the docs. The schema always presents the field as a plain string. It never advertises templating, even while we keep accepting it through the deprecation window.
For the one variable people actually use, we “bless” {{ author }} as a static literal. We recognize it when we parse the config and resolve it straight to the PR author when a rule runs, without ever entering the template engine. Users keep the exact syntax they already wrote. A value that used to require the whole sandbox now resolves through a direct attribute lookup.
The match is deliberately narrow: only the bare {{ author }} resolves this way, after a migration normalizes the spacing variants. Write {{ author | lower }} or anything with a filter and it stays a real template, deprecated like the rest. We bless one exact string and nothing more.
flowchart LR
A["{{ author }} in config"]
A -->|before| B["Jinja2 engine + sandbox"]
B --> C["author login"]
A -->|after| D["parse-time literal match"]
D --> E["direct attribute lookup"]
E --> C
For these fields, the deprecation is warn-only first. Nothing breaks the day we ship the annotation, and templated values keep working with a notice attached. After the cutoff (2026-09-30) we start refusing templates on the fields where we can do it without breaking live configs, and we hold off on the ones where a hard rejection still would.
Telling developers without nagging them
A deprecation only matters if someone reads it. Our users are developers, and in my experience they rarely read changelog emails. They do read pull requests.
So the migration meets them where they already are. We open PRs against their repos that normalize the config, for example rewriting {{author}} and other spacing variants into the canonical {{ author }} so the literal match stays clean. A PR in your own repo, with a diff you can scan in ten seconds, is the one deprecation notice a developer will not miss. These are ordinary pull requests. The maintainer reads the diff and merges it, or closes it and migrates by hand.
Not every case is clean, and I’d rather say so. The lone templated label, branch:{{ base }}, buries the variable inside a larger string, so it can’t be blessed as a whole-value literal the way a bare {{ author }} can. That one merged_by filter chain is the same kind of holdout. We don’t just cut them. We look at what each config actually does and decide, case by case, whether the pattern is worth a safe replacement. When it is, we build one. When it isn’t, the value is deprecated like any other template. Until then everything keeps working under a notice, so nobody wakes up to a broken config at the deadline.
What I’d take away from this
An open-ended feature is also a measurement instrument. We shipped Jinja2 to give people flexibility, and as a side effect the configs they wrote became the spec for the safe version. You stop guessing which five percent matters because you can count it. It only works if you can keep the permissive version safe while it runs, and a template engine on untrusted input is about the most expensive way to make that promise.
We’re keeping {{ author }}. The static fields were the cheap part, and they’re nearly done. Full removal is the goal, and the hard part is still ahead: the message and body fields where people write real templates, for loops and all, each one a use case we have to weigh and maybe rebuild before the engine can go. The reward at the end is deleting the sandbox we never stopped worrying about.