In our previous article, we eliminated the biggest adoption barrier. Now anyone can sign up and launch Recce easily. As we go more people signing up, we realized we had a new problem.
When a data engineer or their stakeholder opens a review session in Recce, they immediately see the lineage diff. At a glance they can see how their one-line change upstream now impacts 5+ models. But now what? Where and how should they start validating their datasets?
Lineage diff: what's the next step?
Our first reaction was to build an in-app guide that would walk users through what columns and models to check. This ended up being a three-step bubble guide to walk users through what to do next in-app. We thought it was great until we watched users dismiss the modal immediately without giving it a one second glance. That hurt! 😓 But we learned something important: users want to solve their problems but not learn your interface.
Three-step bubble guide
We briefly considered other solutions, rule-based suggestion, user preferences settings, template-based checklists. But, we could see the effort required would be massive: building all that logic, handling edge cases, and still end up with something rigid that wouldn't adopt to each user’s unique preferences or context.
As we reviewed our user interviews, we realized that users don't start in Recce, they start in the pull request or merge request.
Even our open source users have hacked ways to get Recce Summary setup every time they open a new PR, see example. Recce Summary is a comment posted to PRs that shows what checks were run and their results. Users can read and decide if they can merge this PR or they should do more validation in Recce.
R
Recce Summary
Here's the problem: Recce Summary shows results only after someone has already done the work. But, users need guidance before they start, with what should they check and why? What are the unknown unknowns that they missed validating during development?
As we were thinking about where Recce enters a user's workflow, a potential customer specifically requested an AI summary feature that would appear as a comment in every pull request describing the "so what" of what changed in the data for that code change. This gave us an idea about how we could improve Recce summary, and land a new customer at the same time.
We could turn Recce summary into an intelligent AI-assisted summary that tells reviewers what is change, why that matters. ✨
Our first approach was straightforward: Give a LLM the PR context and ask it to summarize what changed.
We fed it:
Then prompted it to generate a summary for reviewers.
First attempt of AI assistant summary
We showed this to our potential customer. They liked that the AI summary was "to the point" with 3 clear bullets. But the content wasn't actionable enough.
Here's what he told us:
This summary is basic and is more of a description than something that is helpful for data review.
He manages multiple data analysts, so he needs something that can give him context quickly on what is changing in PRs with the data and why that may matter. He wants the summary to highlight things he should check, so he can spend less time reviewing PRs.
Our first reaction was to make the summary more specific and the agent perform data diffing. But as we optimized the summary output and dogfooded internally, we realized something messier: there isn’t just one preferred way to summarize a PR. There are many depending on who is reviewing it and why. For example:
Team maturity impacts what information they want surfaced:
Is this someone reviewing their own data change? Or someone reviewing a teammates works?
There were actually several personas with completely different needs. To be useful, it needed to understand the change (what data changed, why, and impacts) and do checks (run profiles diff, etc.)
This is where we shifted from prompt engineering to context engineering.
Prompt engineering gave the LLM:
Context engineering gave the agent:
The agent could now do the checks instead of just suggesting them, for example:
Anomalies detected
With the right context, the agent could be specific, data-driven and actionable.
Once we had this foundation, we explored different outputs:
Visual summaries:
Generate mermaid graph from left to right to display the lineage diff with impact radius, highlight the transformation type of impact columns
Executive decisions:
Just give me a YES/NO suggestion if I can merge this PR. If No, give me reasons in 3 sentences with data as evidence.
Checks summary:
Give me checks summary including preset checks and suggested checks
We also make up some problematic PRs to see how AI detect and suggest the checks. e.g. do a PR with SQL code error, do another PR with wrong description.
Some experiments worked brilliantly. Some results are redundant, long and mess up the PR comments. This raised a few challenges:
We don’t have prefect results yet. We’re still trying, tweaking our assumptions of what would help users and asking real users for feedback.
We shipped the AI summary in Recce Cloud. You can try it with your PR. We’re still working on define what outputs fit in which scenarios, extend preset checks to accumulate knowledge and making the validation journey seamless.
We're learning as we go and talk feedback seriously. Your feedback helps us build something that actually solves real problems, not just leverages cool technology. If you have 30 mins, tell us your thoughts. We'll give you a $25 gift card for your trouble.