What do data teams actually want, and how can we DELIVER VALUE?

Build in Public

We Built Something Data Teams Wanted, But Couldn't Setup

Karen Hsieh | September 19, 2025

The Painful Realization

"This is exactly what we need," said the data analyst, her eyes lighting up as I walked through Recce's Impact Radius concept. "We waste so much time running tests on everything, or worse, missing something critical because we didn't know it was impacted."

Two weeks later: "I tried to set it up, but I got stuck on the automation scripts. I don't really know how to write CI/CD pipelines... I still think this is exactly what we need. I just can't make it work."

Unfortunately, she wasn't alone. We'd see analytics engineers install Recce, run it a few times during dev time, then... nothing. When we reached out we'd discover they had given up. "Yeah, I used it a few times during dev, but the hassle of running it manually every time was just too much. I needed something that actually saved me time, not another thing to remember. So I gave up and moved on.”

Then there were the "successful" adoptions that only used us for a smaller part of our capabilities. These users actually figured out the automation, but only in their CI/CD pipeline after creating PRs. "We love Recce," a team lead explained, "but we only use it once the PR is up. I tried using it during development like you showed in the demo, but keeping the local version in sync with our prod environment was too much hassle. So now it just runs automatically when we open a PR. It’s good enough for us."

These weren't edge cases. We were building something users clearly wanted, yet somehow couldn't use Recce in the way we intended. The gap between "brilliant solution" and "daily reality" was becoming our biggest problem.

Where We Got the Adoption Sequence Wrong

We understood data teams' workflows perfectly. The development cycle was clear: develop data models locally → test with sample data → create PR → merge to production. We designed Recce to align seamlessly with this flow, letting users validate locally during development, then share results during PR collaboration.

We assumed the obvious adoption path: start local, then add automation. Use Recce locally first to see immediate value, then set up CI/CD to make it even better. This felt logical from a product perspective.

Suggest adopt Recce align with the workflow sequence

(Suggest adopt Recce align with the workflow sequence)

However, actual user feedback told a different story entirely. Users consistently struggled with setup complexity, and even technically minded users called the automation a "significant effort."

The reality is users would install Recce locally, hit the automation complexity wall, then abandon local usage entirely. They defaulted to PR-only usage because CI/CD integration worked automatically there because no manual setup required on their part.

In reality, most users adopt Recce locally then hit the automation complexity.

(In reality, most users adopt Recce locally then hit the automation complexity.)

The cruel irony? This PR-only usage pattern was a big a win for them. Coming from a world where they had nothing but ad-hoc SQL queries and python notebooks, being able to see "what changed and what should I validate" during PR review was a huge step forward. That's where the validation problem hurt most.

They were celebrating solving 30% of their validation problems, while we had built something that could solve 80%. Issues they were catching at PR time with larger datasets could have been caught locally in seconds with smaller dev data. Preventable problems still slipped through even though Recce could have flagged those during development, when fixes are cheaper and faster.

Our users were working harder than they needed to, missing opportunities to catch issues earlier, all because our setup complexity was blocking them from experiencing the full power of what we'd built.

Our First Solution: Leverage Github Codespaces

Faced with users struggling to install and configure Recce locally, we thought we'd found a clever workaround. Instead of fighting local environment issues, why not move Recce entirely to the cloud leveraging Github codespaces? All of our users had github accounts, so leveraging Codespaces was an easy way of adding cloud performance without building it ourselves.

We designed a three-part cloud solution:

CI/CD automation: automatically triggered dbt runs on base and PR branches, generated metadata, and uploaded artifacts
Cloud storage: stored all the metadata artifacts that GitHub Actions generated
Cloud container: provided a consistent browser-based environment to launch Recce and review the stored results

GitHub Codespaces felt like the perfect solution for cloud container. GitHub Actions would handle the automation, triggering dbt runs on base and PR branches, and we'd manage the metadata storage layer. From a security perspective, this kept data warehouse connections and credentials entirely within teams' GitHub accounts while giving us a controlled storage layer for the metadata.

Our architecture became:

CI/CD automation via GitHub Actions
Cloud storage via Recce's backend
Cloud container via GitHub Codespaces

(Legacy cloud: Launching Recce in GitHub codespace.)

We thought we'd eliminated the local setup friction. Yet, data teams feedback revealed we'd simply shifted the complexity elsewhere.

The automation layer worked. GitHub Actions reliably generated metadata and posted PR summaries. But now the data teams needed to create .devcontainer folders, write Dockerfiles that installed Recce, dbt, and their specific data warehouse adapters, configure GitHub secrets for database credentials, and understand port forwarding. Instead of learning Python package management locally, users now needed to understand Docker, DevContainers, and GitHub secrets management.

Even worse, most GitHub users had never touched Codespaces before. What felt like a familiar GitHub feature to us was actually another new tool they had to learn just to try our tool.

Looking back, Codespaces was solving the wrong problem entirely. We weren't addressing the core issue.

The Actual Core Problems

After watching users struggle with both our local setup and our Codespaces "solution," we realized we needed to dig deeper. User behaviors revealed we weren't fighting one problem, but two fundamental barriers compounding each other:

Problem #1: Setup Complexity Blocking Adoption

We were asking data practitioners to become experts in tools and concepts completely outside their domain: Docker containers, GitHub secrets, CI/CD pipelines, metadata management, and file synchronization. Each of these concepts made perfect sense to us as developers, but they represented entirely new learning curves for our users.

The technical complexity wasn't just about installation or configuration: it was about the cognitive load we were placing on the data teams who just wanted to validate their data changes.

Problem #2: Wrong Adoption Sequence

We assumed the logical adoption path: start with local development validation, then add CI/CD automation to make it even better. This felt intuitive from a product perspective.

But our users consistently followed a different pattern: try to use Recce locally, hit the complexity wall, abandon the local workflow entirely, and default to PR-only usage where automation handled everything automatically. The proactive, continuous validation we were designing for became an aspiration rather than reality.

Problem #3: The Hidden Inefficiency of Validating Twice

During dev, data teams had workarounds: run spot checks, compare row counts, maybe write a quick query to verify logic. But during PR review? That's where the real validation gap lived. Reviewers couldn't easily see what changed. They had no systematic way to validate impacts. They were flying blind on whether changes were safe to merge. Developers had to validate again with production-scale data and formal documentation requirements.

Users dealt with significant setup complexity to get Recce working in PRs not because it was easier, but because that's where they desperately needed a solution. Users focused on solving their biggest pain point first, leaving no bandwidth to optimize ad-hoc dev-time validation.

The Deeper Revelation

All problems pointed to a fundamental misalignment. Data teams wanted to focus on data validation, not become Recce/Docker/Github Codespace/CI/CD experts. The thing they care about wasn't "how do I configure Recce?" but "what do I actually need to validate to ensure my data is right?"

We were asking them to learn our architecture when they just wanted to be better at their jobs. They didn't want to become DevOps engineers, they wanted to catch data issues before their stakeholders did.

Mindset Change From "Our Requirements" to "User Value"

Honestly, this feedback was crushing. We'd spent months building what we thought was the perfect solution, only to watch our users struggle with basic setup. The worst part? They loved the core idea but couldn't actually use it. It forced us to confront some uncomfortable truths about how we were thinking about our users.

Looking back, we had all the signals earlier but missed them. In our team Slack, we'd see messages like "another user stuck on setup" or "they loved the demo but couldn't get it working locally." We kept treating these as isolated incidents rather than systemic feedback. It took accumulating dozens of these conversations before we realized: this wasn't a documentation problem or a user education problem. This was a fundamental product strategy problem.

The Old Mindset: "It's Just a Prerequisite"

Before these insights, our attitude was essentially: "Look, you need to set up your environment. It's a prerequisite, it's outside our product scope. We can't help with that."

We'd point data teams to documentation, maybe provide some example configurations, but fundamentally we saw setup complexity as their problem to solve. This mindset created a subtle but toxic dynamic. Every time a user struggled, we'd internally categorize it as a "them" problem rather than an "us" problem. They needed better technical skills, more DevOps knowledge, more time to read our docs.

The New Mindset: "These Are Our Users"

The research forced us to flip this entirely. These weren't "not technical enough" users failing to meet our requirements. WE were failing to meet our users' needs.

The fundamental shift was moving from "What do we need users to SET UP?" to "What do data teams actually want, and how can we DELIVER VALUE?"

When we asked that second question, the answers were remarkably simple. Data teams wanted to know what changed in their data models. They wanted to understand what they should validate. They wanted to catch issues before their stakeholders did. They wanted to do their jobs better.

None of those goals required understanding Docker or writing automation scripts.

(What do data teams actually want, and how can we DELIVER VALUE?)

What We're Learning (And What We've Built)

This “data teams’ value first” mindset shift was just the beginning. The technical breakthroughs that followed changed everything about how we approach data validation, and we think we've cracked the code on that "zero-setup" experience data/analytics teams kept asking for.

Here's what we built from these painful lessons: Data teams can now launch Recce with just two metadata files. No Docker. No local environment setup. No CI/CD expertise required. Just click a link and see their data changes instantly. We're calling it our "value-first" approach: show the power first, then let teams decide if they want to integrate deeper.

We're sharing this journey because these problems aren't unique to Recce. Every tool in the modern data stack faces the "setup complexity vs. power user features" tension. If you're building dev tools, you've probably hit similar walls.

We need your help: We're launching this new approach, and we want to get it right. Are we solving the right problem? Does this "metadata-only" launch actually feel like zero setup to you? What would make you try a data validation tool that you wouldn't have tried before?

Try what we built: cloud.reccehq.com

The technical details of how we pulled this off are coming next, and the breakthrough might surprise you as much as it surprised us.

Keep reading

release

Release at Oct 1st 2025

Build in Public Automated validations