CLASH WATCHDOGAI
Games
OperatorsCompareLearnMethodologyTrustSubmit
Submit
Home›Learn›What Makes an Audit Conclusive: Evidence Tiers Explained

What Makes an Audit Conclusive: Evidence Tiers Explained

Published 2026-06-11· 10 min read

Not every audit reaches the same level of confidence. Clash Watchdog AI classifies every audit into one of three evidence tiers — Provisional, Verified, or Gold — based on the amount of data examined, the consistency across sources, and the duration of observation. This article explains the three tiers, what each one means, and why more data buys a tighter threshold.

Why isn't every audit the same strength?

Because the strength of a statistical conclusion depends on the amount of evidence behind it.

An audit based on 1,000 rounds of data from a single source can tell you something — but it cannot tell you much with high confidence. An audit based on 100,000 rounds from three independent sources can tell you a great deal with very high confidence. The mathematics of statistical inference do not allow the same conclusion from both datasets.

Rather than pretending all audits are equal, Clash Watchdog AI explicitly classifies each audit into one of three evidence tiers that communicate exactly how much data is behind the conclusion. This is not common practice in gambling auditing — most regulatory audits produce a binary pass/fail with no indication of the underlying confidence level. We believe this obscures important information.

The tier system serves two audiences:

For players: The tier tells you how seriously to take the verdict. A Tier 1 Provisional whitelist means "we have checked and so far it looks fine, but we do not have enough data to be highly confident." A Tier 3 Gold whitelist means "we have checked extensively and are highly confident." Both are whitelists, but they carry different weight.

For operators: The tier tells you what to expect from the next audit. A Tier 1 classification is a starting point, not a destination. Games that want to be taken seriously by regulators, journalists, and players should aim for Gold — and the path to Gold is transparent.

What is the Provisional tier?

Tier 1 — Provisional is the entry-level classification. It means we have performed an initial audit with limited data and found no disqualifying issues — but the dataset is not large enough to rule out subtle deviations.

Requirements:

ParameterThreshold
Minimum rounds1,000–10,000
Data sources requiredAt least 1 (typically operator data + self-proxy)
RTP deviation tolerance±2.5% from declared
Distribution testChi-squared p > 0.01
Hash verification (if provably fair)100% of sampled rounds pass

What Provisional means:

A Provisional Whitelist classification means: "Based on the data we have examined, this game's observable behavior is consistent with its declared parameters. The sample is too small to detect deviations below 2.5%. We will continue collecting data."

A Provisional Watchlist classification means: "We have found something worth monitoring — a distribution anomaly, an RTP deviation, or a data-source disagreement — but we do not have enough data to determine whether it is meaningful or within the range of normal variance."

Why the thresholds are wide: With 1,000 rounds, the 95% confidence interval for RTP is roughly ±3%. This means a game with a true RTP of 97% could easily show observed RTP anywhere from 94% to 100% in a 1,000-round sample, purely from random variation. Setting the threshold tighter than the confidence interval would produce false positives — flagging fair games as suspicious because of normal variance.

What is the Verified tier?

Tier 2 — Verified means we have performed a substantive audit with a meaningful dataset and found the game's behavior to be consistent with its declared parameters within tighter tolerances.

Requirements:

ParameterThreshold
Minimum rounds10,000–50,000
Data sources requiredAt least 2 (must include community or self-proxy)
RTP deviation tolerance±1.0% from declared
Distribution testChi-squared p > 0.05
Hash verification (if provably fair)100% of sampled rounds pass
Serial correlation testNo significant autocorrelation at any lag 1–20

What Verified means:

A Verified Whitelist means: "Based on a substantial dataset from multiple independent sources, this game's RTP is within 1% of its declared value, its distribution matches the theoretical shape, its rounds show no serial correlation, and its hash chain (if applicable) verifies completely. We have moderate-to-high confidence in this assessment."

Verified is the threshold at which we consider an audit result reliable enough to cite in public communications and to use as a basis for listing recommendations.

The two-source requirement: Tier 2 requires at least two independent data sources. This is the level at which single-source manipulation becomes detectable. If operator data and community data agree, or if operator data and self-proxy data agree, the probability that both are being manipulated in the same way drops significantly.

What is the Gold tier?

Tier 3 — Gold is the highest classification. It represents the maximum empirical confidence our methodology can provide. Gold is the standard we recommend for regulatory citations, academic references, and journalist reporting.

Requirements:

ParameterThreshold
Minimum rounds50,000+
Data sources requiredAll 3 (operator + community + self-proxy)
RTP deviation tolerance±0.5% from declared
Distribution testChi-squared p > 0.05, KS test p > 0.05
Hash verification (if provably fair)100% of all rounds in sample
Serial correlation testNo significant autocorrelation at any lag 1–100
Rotation analysis (if provably fair)No significant correlation between rotations and player events
Observation durationMinimum 90 days of data collection

What Gold means:

A Gold Whitelist means: "Based on an extensive dataset from three independent sources collected over at least 90 days, this game's behavior matches its declared parameters within 0.5% across all measured dimensions. We have high confidence that the game is operating as advertised, and we have found no evidence of systemic manipulation."

Gold is deliberately difficult to achieve. The three-source requirement, the large sample size, the 90-day observation window, and the tight tolerances all serve the same purpose: making it very expensive for an operator to fake a pass. An operator who wants to manipulate their game while maintaining a Gold classification would need to sustain consistent behavior across 50,000+ rounds, across three independent observation channels, for 90+ days. This is operationally infeasible for any manipulation that produces a meaningful financial benefit.

The rotation analysis requirement: Gold-tier audits of provably fair games must include rotation analysis — testing whether server seed rotations correlate with player events. This test is unique to our methodology and addresses an attack vector that standard provably fair verification cannot detect.

Can a game move from Provisional to Gold?

Yes. The tier progression is designed to be a one-way ratchet — games start at Provisional and move up as more data becomes available and as the data continues to support the declared parameters.

The typical progression:

Provisional → Verified: Accumulate 10,000+ rounds from at least two sources. If the data remains consistent with the declared parameters at the tighter Verified thresholds, upgrade. If the data reveals anomalies that were not visible in the Provisional sample, the game may stay at Provisional or move to the Watchlist.

Verified → Gold: Accumulate 50,000+ rounds from all three sources over 90+ days. Run the full Gold test suite including rotation analysis. If everything passes, upgrade. If rotation analysis reveals suspicious patterns, the game stays at Verified pending investigation.

Downgrade: If new data contradicts a previous classification, the game is downgraded. Downgrades trigger the due process procedure described in MUST_READ §11.2: the operator is notified, given 30 days to respond with counter-evidence, and the operator's response is published alongside the updated audit report.

What does tier have to do with the Whitelist and Blacklist thresholds?

The tier determines the confidence of the classification, not the classification itself. A game can be Whitelisted at any tier — a Tier 1 Whitelist is a less confident endorsement than a Tier 3 Whitelist, but both mean the game has passed the relevant thresholds for its tier.

Similarly, a game can be Blacklisted at any tier, though the bar for Blacklisting is deliberately higher. We are more cautious about condemning a game than about approving one, because a false Blacklist harms an honest operator. The asymmetry:

ActionMinimum TierRationale
WhitelistTier 1+Low risk of harm from false positive
WatchlistAnyWatchlist is informational, not punitive
BlacklistTier 2+High risk of harm from false positive; requires stronger evidence

A game cannot be Blacklisted at Tier 1. If Provisional data suggests a problem, the game is placed on the Watchlist and data collection continues until a Tier 2 conclusion is possible.

This asymmetry is a deliberate design choice. We accept the risk that a manipulated game might be Whitelisted at Tier 1 (and caught at Tier 2) in exchange for never Blacklisting an honest game on insufficient evidence. The cost of a false Blacklist — reputational damage to an honest operator — is higher than the cost of a temporary false Whitelist — players using a game that will be caught in the next audit cycle.

For the full methodology, including the exact statistical tests, confidence levels, and decision procedures, see our methodology page. For which games are at which tier, see our game listings.


Frequently Asked Questions

Because smaller samples produce wider confidence intervals. With 1,000 rounds, an observed RTP of 96.5% could be consistent with a true RTP of 97% — the deviation is within the expected range of random variation for that sample size. With 100,000 rounds, the same 96.5% would be statistically incompatible with a true 97%. Looser thresholds at Tier 1 are not about being less careful — they are about being mathematically honest about what smaller datasets can and cannot prove.
Yes, if the data requirements are met. There is no mandatory waiting period between tiers. If a game has 100,000+ rounds of three-source data available from day one (because it has been operating for years and the data is publicly accessible), it can be classified as Gold on the first audit. The tiers are about data sufficiency, not about calendar time. In practice, most first audits are Tier 1 because collecting three-source data takes time.
No. Gold means we have high confidence that the game's observable behavior matches its declared parameters based on a large, multi-source dataset. It does not guarantee future behavior — the operator could change parameters tomorrow. It does not cover aspects we cannot measure (internal operational practices, employee conduct, server security). Gold is the highest level of empirical confidence we can provide. It is not a guarantee of fairness in perpetuity.
It depends on data availability, not calendar time. A high-volume game like Stake Crash, where millions of rounds are played daily and the provably fair system provides complete data, could theoretically reach Gold within months. A lower-volume game or one without provably fair transparency could take a year or more to accumulate sufficient three-source data. The timeline is dictated by the data, not by our schedule.
If new data contradicts a previous classification — for example, if a Verified game's latest data shows RTP deviation that exceeds the Verified threshold — the game is downgraded. The downgrade is published as a new audit report with a clear explanation of what changed. The previous audit report remains available for reference (audit reports are immutable). Downgrades are also accompanied by notification to the operator, who has 30 days to respond before the new classification goes live on the listing pages.

Related Articles

  • Three Data Sources
  • Provably Fair
  • Rtp Vs House Edge

Product

  • Games
  • Whitelist
  • Watchlist
  • Blacklist
  • Methodology
  • Learn
  • FAQ

Company

  • About

Legal

  • Privacy Policy

Open

  • Methodology

Transparency

  • Our Funding
  • Transparency Reports
  • Conflicts Policy

Our Funding: See /trust/funding for every dollar we receive.

We do not provide gambling advice. We audit game fairness.

© 2026Clash Watchdog AI · Built with SSR · No third-party ad tracking