Build in Public Agentic data workflows

I let Claude Code build my dbt models. The interesting part wasn't the code.

Dori Wilson

February 25, 2026

I've been building out our analytics warehouse at Recce. We recently had our devs dump our production databases into S3, then I set up the pipeline from S3 into Snowflake with dbt-core on top. We follow the usual STAR schema, medallion architecture: staging, intermediate, marts. The repo already has conventions, existing models (including a dim_dates table), Meltano for orchestration, some Slack chart scripts. This wasn't a pure greenfield build, but it was close. The other data was related to Amplitude and some scraping we do of our own internal conversations.

Like many folks, I've been playing around with LLMs and wanted to see what would happen if I gave Claude Code the full end-to-end run from Snowflake ingestion on down. This was definitely not "write me a staging model." It really was creating Snowflake tables, loading data from S3, and building the sources, staging, intermediate models, dim and fact tables. It also needed to test and verify everything that it built.

Before I let it go, though, I needed to spend time on setup. And that's where this gets interesting.

The setup is the work

I opened up Claude, went into /plan mode, and told it about everything I was trying to do. Previously, our engineering team had created Notion docs detailing every production table and key fields. We had gone back and forth about the table schemas, what I actually needed from the analytics, and what's possible to pull from production. That part was collaborative by design. I wanted the devs to understand the why behind what I was pulling, not hand me a dump and walk away. I uploaded those docs along with other context about what we're building towards.

With Claude's help I built custom skills: naming conventions, primary key patterns, model structure, dev environment commands, and what I expected data models to look like. I integrated the dbt MCP and Recce's MCP (ours is free and open source, handles row count diffs, schema comparison, profiling). I also created golden scenarios so Claude knew what good output looks like for our models and how I wanted things filtered through various levels of warehouse and validation tests. I've found incorporating MCPs as part of customized skills and workflows is a great way to get better outcomes from Claude.

All of that happened before the one-shot. That's where the interesting and most thought-intensive work was.

When the skills felt ready, I cleared context and let Claude go. Unfortunately, I started it near the end of my workday, so it couldn't finish at one time. I shut my laptop, came back the next morning, and Claude automatically picked up where it left off. It built everything. Sources, staging, intermediates, marts. It followed my naming conventions. Used CTEs, which is exactly what I'd wanted. Named things and created folders similarly to how I would. It even made certain intermediate tables incremental without being told. I liked that. It inferred from the data pattern that these production tables were appending rows, so incremental made sense. Smart call, and I hadn't been explicit about it.

It also created its own verification plan. Row count diffs between dev and prod, profiling (min, max, average, distinct counts), histograms for distributions. It used the Recce MCP to compare environments and ran dbt tests. Setting that up manually takes real time, and Claude did it as part of the workflow because the tools and skills were available.

Where I wouldn't trust it

The code worked. But "it ran" and "I'd ship this" are different things.

Inner joins where I'd want left joins. Most of the time it's fine because the tables are constructed from the same production source. But silently dropping rows on edge cases is the kind of thing that bites you six months later. I want left joins as the default and an explicit reason when I don't.

It ignored the existing dim_dates table and rebuilt date logic from scratch. We already have that. It's right there in the repo. That's a convention I'll add to the skills: don't reconstruct what exists.

Descriptions weren't contextual enough. For basic models that's whatever, but the next thing I'm building is a semantic layer. Descriptions matter for that. I need them to reflect what a field means in our business, not a generic "timestamp of creation."

The one that bothered me most: it filtered out rows with missing org_ids. An org_id should never be missing. If it is, that's potentially a production bug. Claude made a silent data quality decision that should have been flagged, not handled. That kind of thing should be documented in the YAML as a conscious choice with a note about why. Not an AI deciding on its own that missing data isn't important.

Some mart models pulled from staging instead of the incremental intermediate tables Claude itself had created. The row number logic for getting the first org row would miss multiple warehouse connections per org. Unusual edge case, but I want that documented as a known limitation, not silently shipped.

What it actually saved me

The Snowflake configuration. I'd been having issues and getting fed up troubleshooting. I had Claude diagnose and fix it instead of me spinning my wheels for an afternoon. And the iteration loop of "build, review, catch problems, discuss" was faster than doing everything from a blank file.

The manual verification still has to happen regardless. I'm going to query things in Hex notebooks, run Recce sessions, compare to what I know about the data.
All in all the time savings were modest for this data ingestion. I could have copied and pasted things, written up the simple code, and made sure the mistakes Claude introduced didn't appear. I ended up having to do manual SQL changes anyway.

The real value is the iteration + documentation loop

Every bad join becomes a rule. Every ignored existing model becomes a convention. Every silent data quality decision becomes a guardrail in the skills file.

And Claude can update those skills itself. I built a custom /handsoff skill that updates memory, status, and other skills based on a session's discussion and output (I'll write more about that soon). I don't have to go write out "always use left joins unless you have a documented reason." I have the conversation about why the inner join was wrong, and Claude adds the rule.

The skills get better every time. The next run on a different set of production tables will be tighter because of everything I caught on this one. My teammates won't need to know as deep contextual details to get the same quality output.

AI-assisted analytics engineering isn't a prompting problem. It's an infrastructure problem. The skills, the MCP configs, the schema conventions, the guardrails. That's the actual work. The generation is the easy part, and it's the part that still needs a human reviewing every decision.

I'm not sure yet how much this changes my day-to-day versus making parts of it faster. The Snowflake debugging saved me an afternoon. The iteration loop was faster than starting from blank files. The one-shot itself was impressive to watch but required every bit as much review as if I'd written it and gotten distracted halfway through. Maybe more, because at least when I make a bad join, I know I made it.

That's the thing about AI writing code for you. The decisions that matter are still yours.