i've been running financial pipelines on Bronze/Silver/Gold for a while now, and the real headache isn't cleaning data - it's figuring out what level of quality the business actually needs. Everyone talks about validation rules, but the decision to block the entire pipeline vs just raising a warning is where most teams screw up.
In my setup, some rules are non-negotiable: inconsistent balances, duplicate accounting entries, invalid dates. Those kill the pipeline immediately. But a lot of other stuff? You can tolerate it temporarily, depending on downstream impact.
A colleague made a solid point: if you treat warnings as anything less than errors, they become noise. nobody acts on them, alert fatigue sets in, and you might as well not have them. But if you treat them as seriously as errors, why not just make them errors? the key is negotiating with stakeholders - what's acceptable degradation for their SLAs? We run a T-2 SLA by default: data from day T-1 might not be there because overnight jobs got blocked. If that doesn't work for a team, we sit down and agree on which exceptions are tolerable and build those into the tests.
The mistake I see is engineers assuming thresholds on their own. You need to force the business to define what "trustworthy" means for each dataset. otherwise you overengineer the Silver layer with rigid rules that block everything, or you underengineer it and nobody trusts the numbers.
What's your framework for classifying severity? I'm trying to turn this into a systematic policy - which rules make the dataset untrustworthy, which represent acceptable degradation, and how much depends on the consumer