Data quality is often used as a catch-all term for rating ‘good data’ but, in practice, data quality is more concerned with the structural integrity and completeness of data, rather than the correctness of the data.
Data correctness refers to whether the data is ‘right,’ meaning it is fit-for-purpose in the context of real-world requirements and business processes. It’s possible for data to be of high quality (structurally complete), but fundamentally incorrect.
It’s a scary prospect that data can pass all tests and yet still be ‘wrong’. This is why silent errors are often deployed to production and no one knows until it’s too late.
A proactive approach to data correctness is required to ensure that data remains semantically correct. ‘Remain’ is the key word here. We generally trust in historic data because it has passed the test of time that the numbers are correct. Therefore, a good benchmark for data correctness is that historical metrics did not change following a data modeling update.
Only checking data for structural integrity will come back to bite you.
Here’s a real world example from a team that now knows the value of ensuring data correctness.
The data team pushed incorrect data into a reverse ETL model that was powering their marketing automation. The data came from a core model that was technically “tested,” but the bug was a logical one, something a standard data test like schema or null check would not have caught. So it slipped through unnoticed.
The corrupted data reached the experimentation platform and wasn’t discovered until almost a full week later, by that time the damage was already included in key reports.
The process that follows is the stuff of nightmares:
What made the issue tricky was that nothing looked obviously broken at a glance. It only became apparent when they started calculating metrics and noticed that patterns didn’t make sense.
During this whole process the data experimentation team had to wait before they could continue working. The monetary cost to business was significant, but for the data team, the loss of trust was immeasurable. The whole situation could have been avoided by simply cross referencing against known good production metrics before shipping.
Structural quality and data completeness should never be overlooked; they are the foundation of any data quality framework. But if data correctness issues slip through, they undermine business decisions at the core. In data-critical applications the consequences can be devastating.
To catch these issues before they reach production, data teams need a proactive workflow that goes beyond traditional testing:
These practices form the basis for ensuring data correctness and they should be an integral part of your overall data quality assurance strategy.
Recce provides a proactive solution to data correctness by facilitating mindful, contextual workflows for data-change validation. With Recce you can:
Recce ensures your data is correct by helping you catch silent errors before deployment. With Recce in your toolkit, your data team can ship changes with confidence while maintaining velocity.