Not every audit reaches the same level of confidence. Clash Watchdog AI classifies every audit into one of three evidence tiers — Provisional, Verified, or Gold — based on the amount of data examined, the consistency across sources, and the duration of observation. This article explains the three tiers, what each one means, and why more data buys a tighter threshold.
Why isn't every audit the same strength?
Because the strength of a statistical conclusion depends on the amount of evidence behind it.
An audit based on 1,000 rounds of data from a single source can tell you something — but it cannot tell you much with high confidence. An audit based on 100,000 rounds from three independent sources can tell you a great deal with very high confidence. The mathematics of statistical inference do not allow the same conclusion from both datasets.
Rather than pretending all audits are equal, Clash Watchdog AI explicitly classifies each audit into one of three evidence tiers that communicate exactly how much data is behind the conclusion. This is not common practice in gambling auditing — most regulatory audits produce a binary pass/fail with no indication of the underlying confidence level. We believe this obscures important information.
The tier system serves two audiences:
For players: The tier tells you how seriously to take the verdict. A Tier 1 Provisional whitelist means "we have checked and so far it looks fine, but we do not have enough data to be highly confident." A Tier 3 Gold whitelist means "we have checked extensively and are highly confident." Both are whitelists, but they carry different weight.
For operators: The tier tells you what to expect from the next audit. A Tier 1 classification is a starting point, not a destination. Games that want to be taken seriously by regulators, journalists, and players should aim for Gold — and the path to Gold is transparent.
What is the Provisional tier?
Tier 1 — Provisional is the entry-level classification. It means we have performed an initial audit with limited data and found no disqualifying issues — but the dataset is not large enough to rule out subtle deviations.
Requirements:
| Parameter | Threshold |
|---|---|
| Minimum rounds | 1,000–10,000 |
| Data sources required | At least 1 (typically operator data + self-proxy) |
| RTP deviation tolerance | ±2.5% from declared |
| Distribution test | Chi-squared p > 0.01 |
| Hash verification (if provably fair) | 100% of sampled rounds pass |
What Provisional means:
A Provisional Whitelist classification means: "Based on the data we have examined, this game's observable behavior is consistent with its declared parameters. The sample is too small to detect deviations below 2.5%. We will continue collecting data."
A Provisional Watchlist classification means: "We have found something worth monitoring — a distribution anomaly, an RTP deviation, or a data-source disagreement — but we do not have enough data to determine whether it is meaningful or within the range of normal variance."
Why the thresholds are wide: With 1,000 rounds, the 95% confidence interval for RTP is roughly ±3%. This means a game with a true RTP of 97% could easily show observed RTP anywhere from 94% to 100% in a 1,000-round sample, purely from random variation. Setting the threshold tighter than the confidence interval would produce false positives — flagging fair games as suspicious because of normal variance.
What is the Verified tier?
Tier 2 — Verified means we have performed a substantive audit with a meaningful dataset and found the game's behavior to be consistent with its declared parameters within tighter tolerances.
Requirements:
| Parameter | Threshold |
|---|---|
| Minimum rounds | 10,000–50,000 |
| Data sources required | At least 2 (must include community or self-proxy) |
| RTP deviation tolerance | ±1.0% from declared |
| Distribution test | Chi-squared p > 0.05 |
| Hash verification (if provably fair) | 100% of sampled rounds pass |
| Serial correlation test | No significant autocorrelation at any lag 1–20 |
What Verified means:
A Verified Whitelist means: "Based on a substantial dataset from multiple independent sources, this game's RTP is within 1% of its declared value, its distribution matches the theoretical shape, its rounds show no serial correlation, and its hash chain (if applicable) verifies completely. We have moderate-to-high confidence in this assessment."
Verified is the threshold at which we consider an audit result reliable enough to cite in public communications and to use as a basis for listing recommendations.
The two-source requirement: Tier 2 requires at least two independent data sources. This is the level at which single-source manipulation becomes detectable. If operator data and community data agree, or if operator data and self-proxy data agree, the probability that both are being manipulated in the same way drops significantly.
What is the Gold tier?
Tier 3 — Gold is the highest classification. It represents the maximum empirical confidence our methodology can provide. Gold is the standard we recommend for regulatory citations, academic references, and journalist reporting.
Requirements:
| Parameter | Threshold |
|---|---|
| Minimum rounds | 50,000+ |
| Data sources required | All 3 (operator + community + self-proxy) |
| RTP deviation tolerance | ±0.5% from declared |
| Distribution test | Chi-squared p > 0.05, KS test p > 0.05 |
| Hash verification (if provably fair) | 100% of all rounds in sample |
| Serial correlation test | No significant autocorrelation at any lag 1–100 |
| Rotation analysis (if provably fair) | No significant correlation between rotations and player events |
| Observation duration | Minimum 90 days of data collection |
What Gold means:
A Gold Whitelist means: "Based on an extensive dataset from three independent sources collected over at least 90 days, this game's behavior matches its declared parameters within 0.5% across all measured dimensions. We have high confidence that the game is operating as advertised, and we have found no evidence of systemic manipulation."
Gold is deliberately difficult to achieve. The three-source requirement, the large sample size, the 90-day observation window, and the tight tolerances all serve the same purpose: making it very expensive for an operator to fake a pass. An operator who wants to manipulate their game while maintaining a Gold classification would need to sustain consistent behavior across 50,000+ rounds, across three independent observation channels, for 90+ days. This is operationally infeasible for any manipulation that produces a meaningful financial benefit.
The rotation analysis requirement: Gold-tier audits of provably fair games must include rotation analysis — testing whether server seed rotations correlate with player events. This test is unique to our methodology and addresses an attack vector that standard provably fair verification cannot detect.
Can a game move from Provisional to Gold?
Yes. The tier progression is designed to be a one-way ratchet — games start at Provisional and move up as more data becomes available and as the data continues to support the declared parameters.
The typical progression:
Provisional → Verified: Accumulate 10,000+ rounds from at least two sources. If the data remains consistent with the declared parameters at the tighter Verified thresholds, upgrade. If the data reveals anomalies that were not visible in the Provisional sample, the game may stay at Provisional or move to the Watchlist.
Verified → Gold: Accumulate 50,000+ rounds from all three sources over 90+ days. Run the full Gold test suite including rotation analysis. If everything passes, upgrade. If rotation analysis reveals suspicious patterns, the game stays at Verified pending investigation.
Downgrade: If new data contradicts a previous classification, the game is downgraded. Downgrades trigger the due process procedure described in MUST_READ §11.2: the operator is notified, given 30 days to respond with counter-evidence, and the operator's response is published alongside the updated audit report.
What does tier have to do with the Whitelist and Blacklist thresholds?
The tier determines the confidence of the classification, not the classification itself. A game can be Whitelisted at any tier — a Tier 1 Whitelist is a less confident endorsement than a Tier 3 Whitelist, but both mean the game has passed the relevant thresholds for its tier.
Similarly, a game can be Blacklisted at any tier, though the bar for Blacklisting is deliberately higher. We are more cautious about condemning a game than about approving one, because a false Blacklist harms an honest operator. The asymmetry:
| Action | Minimum Tier | Rationale |
|---|---|---|
| Whitelist | Tier 1+ | Low risk of harm from false positive |
| Watchlist | Any | Watchlist is informational, not punitive |
| Blacklist | Tier 2+ | High risk of harm from false positive; requires stronger evidence |
A game cannot be Blacklisted at Tier 1. If Provisional data suggests a problem, the game is placed on the Watchlist and data collection continues until a Tier 2 conclusion is possible.
This asymmetry is a deliberate design choice. We accept the risk that a manipulated game might be Whitelisted at Tier 1 (and caught at Tier 2) in exchange for never Blacklisting an honest game on insufficient evidence. The cost of a false Blacklist — reputational damage to an honest operator — is higher than the cost of a temporary false Whitelist — players using a game that will be caught in the next audit cycle.
For the full methodology, including the exact statistical tests, confidence levels, and decision procedures, see our methodology page. For which games are at which tier, see our game listings.