Session Base per PR: Why Data Reviews Lie

Written by Even Wei | May 7, 2026 11:30:57 PM

The PR looked fine. The data did not.

The PR was small. A two-line change on an incremental model, cleanly reviewed in GitHub. Then the automated data review ran. It built the model on both sides, base and current, diffed the results, and a Recce agent analyzed the impact and posted a summary on the PR. One row sat at the top: row count, base 10,000,000, current 100. Variance: 99.999%.

That is exactly the kind of row the summary is supposed to surface. The question is what that gap means.

In this case it meant nothing about the PR. The base was a production build with five years of history. The current was a CI build of the same model from a one-hour window. Both sides ran exactly as configured. The diff was reporting the gap between two environments that had been built for different jobs, not the gap the PR introduced.

This is a post about the difference between a flagged issue and a real diagnosis. The variance was a legitimate thing to flag, and ignoring it would be the wrong instinct. The diagnosis, though, was the reviewer's to make, and to make it well they needed both sides built the same way.

What we have learned the hard way is that a data review whose flags keep resolving to environment artifacts is worse than no review at all. We call these "false alarms": flagged issues that demand attention but resolve to environment artifacts rather than real defects in the PR. As one customer put it, the next PR may not be on an incremental model, but by then the reviewer has already learned to scroll past row-count variance because the last one was off. The fix lives in the environment work you do before the comparison runs.

What code review misses

Reading the SQL diff does not tell you the data diff. Rename a column in a staging model, and every mart that references it inherits the rename. The code diff shows one file touched; the data diff shows whatever fraction of the warehouse depends on that column.

Reviewers know this in the abstract. In practice, it shows up as the PR author writing "should be safe, it's just a rename" and the reviewer approving on vibes, because the only way to actually check would be to build both versions of the warehouse and compare them.

A reasonable next move is to point a general-purpose agent at the warehouse and ask it what changed.

The general-agent gap

The pitch writes itself. Give an LLM SQL access to the warehouse, point it at the PR, let it query before and after, summarize the diff. What could be simpler?

The problem is that "after" does not exist yet. The PR's SQL has not been run anywhere. For the agent to produce an "after" to compare against, it has to do the following, end to end, for every PR:

Read the new SQL out of the diff.
Resolve all of the Jinja so the SQL is something a warehouse can execute. Expand macros, resolve every into the right physical table name in the right schema, fill in every variable.
Figure out the right environment to run against: which schema, which credentials, which target-dependent branches the model takes.
Execute the resulting SQL against production data without writing somewhere it should not write.
Compare the result to the current materialized table.

That is really complicated. None of those steps are optional, and getting any one of them wrong produces a comparison that is confidently wrong rather than visibly broken.

The point is not that this is impossible. It's that materializing the PR's version of the data is not optional, and materialization is environment work no matter who runs it. The work doesn't disappear when you hand it to an agent; it just moves. Which puts the whole review on the shoulders of how those two environments get built.

The environment question

Once both sides must be materialized, the question is how, and against what reference. In Recce's language, that is base (the reference) versus current (the PR). Base-vs-current is not a configuration detail. It is the lever that decides whether the comparison produces truth or noise.

The customer mentioned at the beginning of this post ran into exactly this. False alarms come from the two sides being built differently, and the Recce agent's summary at the top of the diff is what makes them visible. The summary is doing its job: it weighs the largest variances first so a reviewer sees the candidates worth a second look. When the largest variances are themselves environment artifacts, the surfacing is honest and the diagnosis is still wrong.

The harm was not the individual error. It was what the false alarms trained the team to do. After a few PRs, reviewers started scrolling past row-count variance on incremental models.

The fix is not smarter summaries. It is building both sides the same way.

Same Jinja, different targets

False alarms come from one place. The two environments evaluate the same Jinja differently, so the SQL that actually runs against the warehouse is not the SQL the reader sees in the file. Once that is true, the diff is reporting the gap between two queries, not the gap the PR introduced. The pattern shows up in two flavors.

Incremental model mismatch. Base has five years of history from production's incremental builds. Current has a few hours of CI-window data. The diff reports an enormous row-count gap. Neither side is wrong; they are just at different points on the same conveyor belt. The reason the two sides diverge is not the incremental materialization itself. In a fresh CI build where neither side has a prior incremental table, is_incremental() evaluates to false on both sides, and both sides do a full refresh from scratch. The split happens because the two builds were targeted differently, and the model's logic branches on the target.

Time-based data drift. Production updates continuously. The dev or CI snapshot was taken at a specific moment. In any model that windows on the current period, the newest rows in base look like additions that do not exist in current, or vice versa. This is unavoidable whenever base is live and current is a snapshot. The mechanism underneath it is the same one as the incremental case: current_date() and current_timestamp() resolve to different values in the two environments, so the where-clauses that depend on them do not match.

The shared root cause is non-deterministic SQL. The three common SQL functions that drive this are target.name, current_date(), and current_timestamp(). A model that branches on the target looks like this:


  where order_date >= current_date - interval '7 days'

When the base is a production build, that model takes the first branch. When the current is a CI build, it takes the second. The two environments are running different queries against different windows of data, and the diff is reporting the consequence of that, not the consequence of the PR.

This is the clue. Recce's own user-case buckets sort along the same axis: teams that build only views, or that do full refreshes, rarely see any of these. Teams with small data slices, incremental models, or advanced transformation patterns hit both flavors at once, because both flavors share the same underlying mechanism. Any fix that only handles incremental models will miss current_date() drift, dev-only filters, and every other branch keyed on the target.

If the mismatch is caused by the two environments evaluating Jinja differently, the fix is to stop comparing a prod-shaped build to a PR-shaped build. Build both sides the same way.

Isolated PR base

Session base per PR is the pattern that comes out of this. For each PR, build two environments using the same CI target, so that Jinja resolves the same way on both sides.

Two builds per PR:

PR base. Check out the merge-base commit (the point where the PR branched off main) and build with --target pr_base.
PR current. Check out the PR head and build with --target pr_current.

jobs:
  build-pr-base:
    steps:
      - uses: actions/checkout@v4
        with:
          ref: $
      - run: dbt build --target pr_base
  build-pr-current:
    steps:
      - uses: actions/checkout@v4
        with:
          ref: $
      - run: dbt build --target pr_current

Both targets are CI targets. Both take the same branches through every

View full post