🔬 Clash Watchdog AI — Audit Methodology

Version: v1.0.0 — RATIFIED Status: ✅ Ratified by founding team, 2026-04-16 Parent Document: ../MUST_READ.md Sister Document: ../architecture/agentic-audit-v1.md — defines how this methodology is executed by AI agents Governance: See MUST_READ Section 15 (Amendment Process)

Amendment 2026-04-16 (at ratification): All audit processes specified in this document are executed under the Agentic Audit Architecture defined in the sister document above. The statistical core remains fully deterministic; only orchestration, scheduling, and draft-report assembly are agentic. The 5-year Reproducibility Promise (§7.5) is preserved because the statistical functions are version-pinned, and agent actions are logged in the provenance ledger (§2.4).

0. Preamble

This document is the technical constitution of Clash Watchdog AI's audit operations. It answers one question:

Given a crash game and N rounds of its historical data, how do we determine whether it is fair, suspicious, or unfair?

Every audit we ever publish must be traceable to a specific version of this document. If we change how we measure fairness, the version of this document changes with it — and every historical audit remains anchored to the version under which it was produced.

This is the single most important sentence in this document:

Any audit conclusion must be reproducible five years from now, by any third party, using the methodology version and the raw data as they existed at the time of publication, and must yield the same result.

If we cannot promise that, we cannot publish the conclusion.

0.1 Relationship to MUST_READ

This document inherits and operationalizes the following from MUST_READ.md:

Section 2 — Six Constitutional Laws (especially Law 4: No algorithmic mystification)
Section 4 — Product Architecture (Game State × Player State data engine)
Section 11 — Legal & Risk Posture (due process requirements)
Section 15 — Amendment Process (how this document can be changed)

Any conflict between this document and MUST_READ.md is resolved in favor of MUST_READ.md.

1. The Three-Column Definition of Fairness

We reject any single-axis definition of fairness. A crash game can be "mathematically fair on average" and still be "unfair in experience." A game can be "statistically well-distributed" and still be "cryptographically opaque." We therefore evaluate every game along three independent columns, and only a game that passes enough columns with enough evidence earns a public whitelist.

1.1 Column A — RTP Fairness (Allocative Fairness)

"Does the game's long-run return match what its operator claims it returns?"

Measured property: The long-run return-to-player percentage, its confidence interval, its house edge stability over time.

Primary statistic: Observed RTP vs. declared RTP. Deviation measured as absolute percentage points.

Underlying assumption: The operator makes a public claim (e.g., "RTP = 97%") which we take as the null hypothesis. We test whether the data rejects that null.

Applies to: Every crash game, regardless of technology. This is the most universally comparable column.

1.2 Column B — Distributional Fairness (Procedural Fairness)

"Does the game's round-by-round behavior look like what its declared RNG should produce?"

Measured properties: The full distribution of multipliers (crash points), the independence of successive rounds, the absence of anomalous patterns in streak length, reversal density, and autocorrelation.

Primary statistics: Chi-square goodness-of-fit, Kolmogorov-Smirnov, Anderson-Darling, Runs test, autocorrelation at lags 1 through 50, Benford's law on multiplier first digits.

Underlying assumption: Crash games with honest RNGs produce multipliers that follow a known theoretical distribution (typically a truncated geometric / power-law of the form M ~ 1/(1-U) where U is uniform [0, 1-house_edge]). We test whether observed data fits this distribution within statistical noise.

Applies to: Every crash game where we have sufficient round-level data (≥2,000 rounds for the minimum threshold).

1.3 Column C — Cryptographic Fairness (Provable Fairness)

"Can we mathematically verify that every round result is exactly what the game operator committed to produce, before the round began?"

Measured properties:

Hash-chain integrity across the provably-fair seed lifecycle.
Consistency of revealed server seeds with the pre-committed hashes.
Correctness of HMAC(server_seed, client_seed || nonce) derivations.
Rotation patterns of server seeds over time (our original contribution — see §4.3).

Primary check: SHA-256 verification of every committed hash against its revealed pre-image. HMAC-SHA-256 derivation of the multiplier for every observable round.

Underlying assumption: A provably-fair game commits to a server seed (by publishing its SHA-256 hash) before any round is played under it. When the seed is later revealed, any observer can recompute every round under that seed and verify that nothing was tampered with.

Applies to: Only to games that claim provably-fair operation (currently: Stake Crash, BC.Game Crash, Roobet Crash, and a few others). Not applicable to traditional server-RNG games (Aviator, JetX, Spaceman, etc.), which receive an N/A in this column — not a negative score.

1.4 Combined Verdict Matrix

A game's overall classification depends on how many columns return positive, negative, or inconclusive results. Any N/A (column not applicable) is excluded from the denominator.

Columns Positive	Columns Negative	Columns N/A or Inconclusive	Verdict (subject to Evidence Tier — §5)
3 of 3	0	0	🛡️ WHITELIST candidate
2 of 2	0	1 (N/A)	🛡️ WHITELIST candidate
2 of 3	0	1 (inconclusive)	⚠️ WATCHLIST (more data needed)
Any 1	1	Any	⚠️ WATCHLIST (contradictions being investigated)
Any	2 or more	Any	🚫 BLACKLIST candidate

"Candidate" status becomes final only after passing the Due Process workflow in §6.

2. Data Collection & Source Independence

No single data source is trusted alone. Every audit is built from three independent sources, and conclusions are published only when the required sources agree.

2.1 The Three Sources

Source 1 — Official / Public Disclosure The game operator's own publicly available data. Examples:

Provably-fair public history pages (Stake, BC.Game, Roobet expose round-by-round hash logs).
Regulatory disclosures (Brazilian Lei 14.790 requires licensed operators to publish RTP metrics).
Public APIs exposed by the game client.

When an operator changes or deletes this data after we have observed it, that event itself becomes a recorded data point — we keep immutable snapshots of all Source 1 observations.

Source 2 — Community Crowd-Sourced Data contributed by users via the Clash Watchdog AI browser extension, manual input forms, and optional screenshot OCR. Each contribution is:

Tagged with a pseudonymous contributor identifier (never linked to real identity).
Timestamped server-side at the moment of submission.
Cross-checked against other contributions from different users for the same round window.
Discarded from the audit if any contribution appears to be a poisoning attempt (outlier beyond 6σ from other sources on the same round range is flagged and quarantined).

Source 3 — Self-Operated Proxy Accounts Clash Watchdog AI operates its own audited accounts on each target game, placing minimum stakes purely to observe rounds that we are playing with real value at risk. This is the "mystery shopper" approach used in consumer protection and financial audit.

Each proxy account is funded with a small standing balance (per-game budget TBD, see §10).
Accounts are rotated and de-correlated to avoid detection / differential treatment.
Every action is logged with timestamps and network-level captures.
No proxy account ever attempts to "win" — the goal is observation, not profit.

2.2 The Cross-Validation Rule

           ┌────────────┐
           │  Source 1  │
           │  Official  │
           └─────┬──────┘
                 │
   ┌─────────────┼─────────────┐
   │             │             │
   ▼             ▼             ▼
  cross        cross          cross
  check        check          check
   ▲             ▲             ▲
   │             │             │
   └─────────────┼─────────────┘
                 │
        ┌────────┴────────┐
        │                 │
  ┌─────▼──────┐   ┌──────▼──────┐
  │  Source 2  │   │   Source 3  │
  │ Community  │   │ Self-Proxy  │
  └────────────┘   └─────────────┘

Publication rules:

All three sources agree within tolerance → conclusion may be published with full confidence.
Two sources agree, the third is missing → conclusion may be published with a "partial-source" caveat.
Two sources agree, the third disagrees → WATCHLIST, investigation opened.
Source 1 disagrees with Sources 2 and 3 (i.e., the operator is reporting different data than what users and our own accounts observed) → this is the single strongest possible signal of manipulation and triggers immediate Emergency Blacklist review (see §6.3).

2.3 Minimum Sample Sizes (tentative v1.0)

These are the minimum round counts required for a column to "open its mouth." Below these thresholds, the column returns INCONCLUSIVE regardless of what the data looks like.

Column	Minimum (inconclusive below)	Recommended	Ideal
A — RTP	5,000 rounds	20,000	100,000
B — Distribution	2,000 rounds	10,000	50,000
C — Provably-Fair	1,000 consecutive verified hashes	10,000	Continuous

Note — tentative numbers. These thresholds are derived from power analyses targeting the ability to detect a 1% RTP deviation at α = 0.05, β = 0.10. Final values will be confirmed by the technical appendix before v1.0.0 is ratified.

2.4 Data Provenance Ledger

Every raw data record entering our audit pipeline is tagged with:

source_id — which of the three sources
collection_timestamp (UTC, ISO 8601)
observation_timestamp (UTC, ISO 8601) — when the round actually happened
collector_pseudonym (for Source 2) or proxy_account_id (for Source 3)
raw_payload_hash (SHA-256 of the unmodified raw payload)
pipeline_version (commit hash of the ingestion code that processed it)

This ledger is append-only. Records are never deleted — if a record is later disqualified (e.g., poisoning detected), it is flagged but retained for audit trail.

3. Layered Statistical Suite

Every audit produces output at two layers simultaneously. The layers must always agree: if the public layer says "pass" but the technical appendix says "fail," the audit is invalid and must be re-run.

3.1 Public Layer (for players, journalists, regulators)

Short, comprehensible, fits on one screen. Uses plain language and a small number of headline metrics.

Every audit report's public layer contains:

Column A headline: RTP: observed 96.82% (±0.18%) vs. declared 97.00% → pass/fail badge.
Column B headline: Distribution fit p-value: 0.31 → pass/fail badge, with one-sentence plain-language interpretation.
Column C headline: Provably-fair hash chain: verified for 12,430 consecutive rounds → pass/fail badge, or Not applicable.
Overall verdict: one of 🛡️ WHITELIST, ⚠️ WATCHLIST, 🚫 BLACKLIST, along with the Evidence Tier (§5) the conclusion is based on.
One sentence of plain-language interpretation per column.
Methodology version under which the audit was run.
Next review date.

3.2 Technical Appendix Layer (for statisticians, academics, regulators)

Full statistical workup. Published alongside the public report, as a Jupyter notebook on GitHub, under clashwatchdog-ai/audit-reports/<game-slug>/<date>/notebook.ipynb.

The technical appendix for every audit contains:

For Column A (RTP):

Observed mean RTP with 95% CI via bootstrap (10,000 resamples)
Kolmogorov-Smirnov test comparing observed RTP rolling windows to declared RTP
Variance decomposition (within-session vs. between-session)
Trend analysis: linear regression of RTP over time to detect drift

For Column B (Distribution):

Chi-square goodness-of-fit with bin count ≥ 20
Kolmogorov-Smirnov against the theoretical crash distribution
Anderson-Darling (upweights tail deviations — critical for crash games)
Runs test for serial independence
Autocorrelation plot for lags 1 through 50
Benford's law check on multiplier first digits
NIST SP 800-22 selected tests (Phase 2+)
Multiple testing correction: Benjamini-Hochberg with FDR = 0.05 across all tests reported

For Column C (Provably-Fair):

Full SHA-256 verification of every revealed server seed against its committed hash
HMAC-SHA-256 recomputation of the multiplier for every round in the verified window
Rotation analysis: distribution of seed lifetimes, correlation of rotations with user-level events (see §4.3)
Nonce continuity check: detection of missed or duplicated nonces

Reproducibility artifacts:

The Jupyter notebook itself (executable)
The raw data CSV (or a reference to the immutable snapshot in our data lake)
The methodology commit hash
Environment specification (requirements.txt or pyproject.toml)
A single-command reproduction script (make reproduce or python reproduce.py)

3.3 Toolchain

Phase 1 reference implementation:

Purpose	Library	Notes
Statistics / numerics	SciPy, NumPy, Pandas	Python ecosystem
Distribution fitting	SciPy `stats`	Built-in chi-square, K-S, A-D
Multiple testing	`statsmodels.stats.multitest`	Benjamini-Hochberg
Visualization	Plotly, Matplotlib	Q-Q plots, histograms, autocorrelation
Hash verification	Python `hashlib`, Node `crypto`	SHA-256, HMAC-SHA-256
Notebooks	Jupyter	Published as HTML + source
Reproducibility	`uv` or `pdm` for environment pinning

3.4 Phased Build

We do not implement every test from day one. The statistical suite is built in phases that map to the company's SEO/content strategy — each new capability also becomes a published methodology-update article.

Phase	Months	Column A	Column B	Column C
1	0–3	Mean + bootstrap CI	Chi-square + K-S	SHA-256 chain verification
2	3–6	+ variance decomposition	+ Anderson-Darling, Runs test, autocorrelation	+ Full rotation analysis (§4.3)
3	6–12	+ drift detection	+ Benford, NIST selected tests	+ Multi-operator comparison

Each phase upgrade is a MINOR version bump (v1.0 → v1.1 → v1.2), publicly changelogged, and announced with a 14-day pre-notice window (see §6.4).

4. Provably-Fair Deep Verification

This section details Column C and is the most technically intricate part of the methodology. It is also the part where Clash Watchdog AI produces original research contribution: the Rotation Analysis framework in §4.3.

4.1 Provably-Fair Primer (for the record)

A typical provably-fair crash game works as follows:

The operator generates a random 256-bit server_seed.
Before any round is played, the operator publishes SHA-256(server_seed) as a public commitment.

For round n under this seed, the multiplier is derived as:

hmac   = HMAC-SHA-256(key=server_seed, message=client_seed || nonce_n)
hex    = first 52 bits of hmac, as a big-endian integer
if hex == 0: multiplier = high_cap
else:        multiplier = max(1.00, floor((2^52 * house_edge) / hex * 100) / 100)

(exact formula varies by operator — our pipeline records the operator-specific derivation)

After some number of rounds (typically 1,000 to 10,000 or 24 hours), the operator reveals server_seed. At that moment, anyone can compute SHA-256(revealed_seed) and compare to the original commitment. If they match, the seed was honestly committed before the rounds were played.
A new server_seed is generated, a new commitment is published, and the cycle repeats.

4.2 Full Chain Verification (Phase 1 capability)

For every completed seed cycle on every monitored game, we:

Pull the original commitment hash (from Source 1 + Source 3 independently).
Pull the revealed server seed.
Compute SHA-256(revealed_seed) and compare to step 1.
For every round n in [0, seed_lifetime - 1]:
- Pull the client_seed and nonce as recorded by the operator and by our proxy account.
- Recompute the multiplier using the operator's documented derivation.
- Compare against the observed multiplier from Sources 1, 2, and 3.
Record the result in the Provenance Ledger (§2.4).

Any discrepancy at any step is a potential hash mismatch, which is an immediate WATCHLIST placement and triggers investigation. A confirmed discrepancy (reproducible, cross-source) is an immediate Emergency Blacklist.

4.3 Rotation Analysis (Phase 2 capability — Clash Watchdog AI original contribution)

Hash-chain verification, by itself, can only confirm that the operator did not violate a commitment they already made. It cannot detect a more subtle attack: the operator choosing when to rotate the server seed in a way that is correlated with outcomes.

For example: an operator could honestly commit server_seed_1, honestly play 1,000 rounds under it, honestly reveal it — and yet still tilt the game in their favor by rotating to server_seed_2 (pre-committed) precisely at a moment when rotating is favorable to them. The rotation timing becomes the attack surface.

Rotation Analysis examines the distribution of seed lifetimes and rotation events and asks:

Lifetime distribution. Are seeds rotated at approximately regular intervals, as the operator claims? We fit the distribution of observed seed lifetimes to the declared policy and compute a K-S goodness-of-fit.
Event correlation. Are rotations correlated with user-level events we can observe from Source 3?
- Rotation just after a large proxy-account win?
- Rotation just before a proxy-account bet that would have won at unusually high multiplier?
- Rotation timing correlated with daily peak traffic?
Pre-rotation / post-rotation distribution shift. Do multiplier distributions in the 100 rounds before a rotation differ statistically from the 100 rounds after? Tested via a two-sample Kolmogorov-Smirnov.
Nonce continuity. Within a single seed's lifetime, are all nonces accounted for? A gap in the nonce sequence suggests rounds were played that were not publicly recorded.
Commit-ahead verification. A secure provably-fair operator commits to a chain of future seed hashes (commit-ahead). If the operator does not commit ahead, the rotation window becomes an attack vector. We record and publish commit-ahead depth for every audited game.

Rotation Analysis outputs:

A rotation timeline chart (public layer).
Statistical test results for each of the five checks (technical appendix).
An overall "Rotation Risk Score" (Clean / Low / Medium / High / Critical).

A Medium or worse Rotation Risk Score downgrades the game's Column C from positive to inconclusive, even if basic hash verification passes.

4.4 Continuous Monitoring Mode

For Gold Tier audits (§5), Rotation Analysis runs continuously. Data is ingested in near-real-time from Sources 1 and 3, and the Rotation Risk Score is updated daily. Any material change in the score within the monitoring window triggers a review cycle.

5. Evidence-Tiered Classification Thresholds

We classify games not on a single line in the sand, but on a tiered ladder where the evidence burden scales with the sample size. More data buys a tighter threshold.

5.1 The Three Tiers

Tier	Round Count Range	Label on Reports
Tier 1	1,000 – 5,000	Provisional
Tier 2	5,000 – 20,000	Verified
Tier 3	> 20,000 + ≥3 months of continuous monitoring + zero incidents in window	Gold

Below Tier 1's minimum, no conclusion is published — the game is marked Insufficient Data on the Watchlist.

5.2 Whitelist Thresholds (a game is WHITELIST if all applicable rules below pass)

Tier	RTP Deviation	Column B p-value	Column C Requirement
1 Provisional	`< ±1.5 percentage points`	`> 0.10`	Hash chain valid (if applicable)
2 Verified	`< ±0.8 percentage points`	`> 0.15`	+ Rotation analysis clean
3 Gold	`< ±0.4 percentage points`	`> 0.20` on all Phase-2 tests (BH-corrected)	+ ≥3 months continuous monitoring, zero incidents

5.3 Blacklist Thresholds (a game is BLACKLIST if any applicable rule below triggers AND all three sources confirm)

Tier	RTP Deviation	Column B p-value	Column C
1 Provisional	`> ±4.0 percentage points`	`< 0.0001`	Confirmed hash mismatch
2 Verified	`> ±2.5 percentage points`	`< 0.001`	Hash mismatch OR critical rotation anomaly
3 Gold	`> ±1.5 percentage points`	`< 0.01` (BH-corrected, multiple tests failing)	Persistent hash or rotation anomalies

5.4 Watchlist

Anything between the Whitelist and Blacklist thresholds. Watchlist is not a failure — it is a state of ongoing observation. Games on the Watchlist are listed with a human-readable explanation of what is missing or inconclusive.

5.5 Incentive Alignment (business-strategic note)

Because Whitelist thresholds tighten as the Tier rises, an operator who wishes to move up (Provisional → Verified → Gold) must provide or allow collection of more data. This means:

More user traffic to our analyzer pages on their games (our SEO gain).
More transparency from them (our data moat deepens).
More continuous observation opportunities (our rotation analysis gets stronger).

The methodology itself becomes an incentive-alignment mechanism. See MUST_READ §5.2 for the commercial implications.

6. Due Process

No audit conclusion is published without passing the Due Process workflow appropriate to its verdict type. This section governs the transition from internal candidate status to publicly visible listing.

6.1 Positive Transitions (`WATCHLIST → WHITELIST`, or upward Tier moves)

Window: 0 days (publish immediately).
Notification: Operator notified after publication, not before.
Rationale: Positive transitions improve a game's public status and do not carry defamation risk. Requiring a notice window on good news would slow down trust-building without benefit.

6.2 Negative Transitions (`WHITELIST/WATCHLIST → BLACKLIST`)

Window: 30 days.
Workflow:
1. Internal audit complete, Blacklist candidate status set.
2. Two-founder review and sign-off (MUST_READ §9).
3. Notification package sent privately to the operator, containing:
  - The complete draft public report and technical appendix.
  - All raw data used.
  - A link to the methodology version (§7).
  - An invitation to respond within 30 days.
4. Operator response options during the 30-day window:
  - Submit counter-evidence. If the operator provides data showing our conclusion is incorrect (e.g., we missed a data source, used the wrong derivation formula), we re-run the audit. If the re-run confirms the operator is right, we withdraw the candidate status, issue a public correction on our next transparency report, and the game stays on its previous listing.
  - Dispute methodology. If the operator disputes our methodology but not our data, their dispute text is published side-by-side with our report when the Blacklist goes live.
  - Do nothing. After 30 days, the Blacklist status automatically takes effect, and a note is attached to the report indicating the operator declined to respond.
  - Fix the game. If the operator materially changes the game in response to our findings and asks for re-audit, we treat this as a new audit cycle. The original Blacklist report remains in the public archive as a historical record.
5. At T+30 days: Blacklist goes live with the operator's response (or note of non-response) attached.

6.3 Emergency Blacklist

Reserved for situations where a 30-day window would cause ongoing harm to players. Triggers:

An active exploit is observed harming players in real time.
Source 1 (the operator's own disclosures) diverges from Sources 2 and 3 — indicating the operator is publishing false data.
Hash chain breakage with no plausible innocent explanation.
Operator is actively destroying the evidence that would underlie our audit.

Workflow:

Window: 72 hours.
Required sign-offs: Both founders + one external advisor (journalist, academic, or former regulator — maintained as a standing panel of 3, of whom any one must sign).
Publication format: A brief Urgent Safety Notice is published on the front page of Clash Watchdog AI immediately, linking to the full emergency report when the 72-hour window closes.
Parallel notification: Emergency reports are shared simultaneously with relevant regulators, consumer-protection NGOs, and at least two established gambling-industry journalists.

Emergency Blacklist is a rare and heavy action. We expect to use it at most 1–2 times in the first 24 months of operation.

6.4 Methodology Upgrade Re-Classification

When a MINOR or MAJOR methodology version change would cause any existing listing to change state:

Pre-notice window: 14 days.
What we publish during pre-notice:
- The new methodology version and changelog.
- A list of which games will change state under the new methodology, and how.
- An explanation of the upgrade's rationale.
After 14 days: The new methodology takes effect. Old reports remain anchored to their original methodology version (§7.3).

6.5 The Two-Founder Rule (v1.0 only)

For the first 24 months of operation, every public listing change — other than routine positive Watchlist → Watchlist updates — requires the explicit written sign-off of both founders. This is a manual discipline layer that slows us down enough to catch mistakes before they become public. When the team grows beyond two, this rule is replaced by a formal review board.

7. Versioning & Reproducibility

This methodology document follows Semantic Versioning 2.0.0. Every public audit report is permanently anchored to the methodology version under which it was produced.

7.1 Version Number Semantics

Change Type	Version Bump	Examples
MAJOR (vX.0.0)	Threshold changes, test suite removals, new columns, anything that could cause a currently-published verdict to flip	Adding Column D, changing Whitelist RTP tolerance from ±0.8% to ±0.5%, removing A-D test
MINOR (v1.X.0)	New additive capability, new test, new data source, no conclusion can flip	Adding NIST tests, enabling continuous monitoring, adding new operators to the schema
PATCH (v1.0.X)	Clarifications, typo fixes, worked-example additions, no computational change	Rewriting a paragraph, adding a glossary term

7.2 Approval Rules

PATCH: Either founder may commit.
MINOR: Both founders must sign.
MAJOR: Both founders + one external advisor must sign. Followed by 14-day pre-notice per §6.4.

7.3 Conclusion Anchoring

Every audit report header includes:

game: stake-crash
audit_date: 2026-06-15
methodology_version: 1.1.0
evidence_tier: 2
verdict: WHITELIST
data_snapshot_hash: <SHA-256 of the full raw-data bundle>
notebook_commit: <GitHub commit hash of the reproduction notebook>

When we later upgrade the methodology from v1.1 to v1.2, this report keeps its v1.1.0 anchor. It does not get silently re-run. The current state of the listing may be overridden by a new audit under the new methodology, but the original report never mutates.

7.4 Archive Policy

All past methodology versions are preserved under /methodology/archive/v1.0.0, /methodology/archive/v1.1.0, etc. Old versions are never deleted. Anyone auditing our history must be able to verify, five years from now, what rules were in force on any given date.

7.5 The Reproducibility Promise (the single most important commitment)

Any audit report published by Clash Watchdog AI must be reproducible by any third party, five years after publication, using only: (a) the raw data snapshot referenced in the report header, (b) the methodology document at the version pinned in the report header, (c) the notebook at the commit hash pinned in the report header. The reproduction must yield the same verdict. If it does not, the report was not valid when published, and we will retract it and publish a correction in our next transparency report.

This promise is not aspirational. It is a functional requirement of every artifact we produce.

8. Legal Language Rules

Our methodology is technical. Our words must also be technical. Loose language is a legal attack surface we refuse to expose.

8.1 Words We Use

"In N observations, Game X's mean RTP was measured at Y%, which deviates from the operator's declared Z% by W percentage points."
"Game X's multiplier distribution failed a chi-square goodness-of-fit test at p = A against a theoretical distribution derived from the operator's stated parameters."
"Game X is currently on our Watchlist pending additional sample collection."
"Game X's provably-fair hash chain could not be verified for N rounds within the observation window."
"Our sample is insufficient to support a conclusion at this time."

8.2 Words We Never Use

"Game X is cheating."
"Game X is rigged."
"Game X is a scam."
"Don't play Game X."
"Game X has been caught stealing from players."
"We predict that Game X will crash at Y."
Any imperative or prophetic language whatsoever.

8.3 Tense and Mood

All factual claims in past indicative: "was measured," "failed," "deviated."
All current-status statements in present descriptive: "is on our Watchlist," "is pending review."
No future tense about game outcomes. Ever. The only future tense we use is about our own actions: "will be re-audited on date X."
No subjunctive mood about operators' intent. We describe data, not motivation.

8.4 Numerical Precision Rules

Percentages to one decimal place in public layer, three decimals in technical appendix.
Confidence intervals always explicit (never a point estimate without a range).
p-values to two significant figures.
Sample sizes always visible in the same viewport as the conclusion.

9. Deferred & Open Items

These decisions are flagged for resolution before v1.0.0 is ratified. They are documented here so that no one can pretend they were forgotten.

#	Item	Current Placeholder	Decision Required By
1	Real-money budget for Source 3 proxy accounts, first 6 months	$3,000–5,000 total, $600–1,000 per game across 3–5 games	Pre-ratification
2	Minimum sample sizes (Column A = 5,000; B = 2,000; C = 1,000 hash)	As shown in §2.3	Pre-ratification, to be confirmed via power analysis
3	Exact theoretical distribution formula for Aviator-style games	TBD — most operators publish approximate RTP but not the exact RNG parameterization	Before first non-crypto audit
4	Choice of external advisor panel for Emergency Blacklist sign-off	TBD — need to identify 3 candidates	Before first audit publication
5	Hosting location for raw data snapshots (immutability requirement)	Candidates: S3 with object lock; IPFS; Arweave; tiered approach	Before first data collection
6	Company jurisdiction (also open in MUST_READ §13)	Candidates: Delaware, UK, Netherlands, Estonia	Before first paid audit
7	Legal review of the Legal Language Rules (§8)	TBD — engage a media-law attorney	Before first Blacklist publication
8	Statistics background and onboarding plan for the founding team	TBD — see §11	Within 30 days of this document being ratified

10. Phased Build Roadmap (binding)

This is the build order for the audit system itself. It is binding in the sense that we will not ship audit conclusions using a capability before the capability has been built and tested.

Phase 1 — Months 0–3 (minimum viable audit)

Column A: mean RTP + bootstrap CI
Column B: chi-square + Kolmogorov-Smirnov
Column C: SHA-256 hash chain verification (full chain, no rotation analysis)
Data sources: Source 1 + Source 3 only (Source 2 community ingest deferred to Phase 2)
Tiers: Only Tier 1 (Provisional) verdicts are published
Publications expected: 3 Provisional audits by end of Phase 1 (Stake Crash, BC.Game Crash, Roobet Crash)
Build artifacts: one reusable Jupyter notebook template, one data ingestion pipeline, the /methodology/audit-v1.md document live on the site

Phase 2 — Months 3–6 (full-column capability)

Column A: + variance decomposition, + drift detection
Column B: + Anderson-Darling, + Runs test, + autocorrelation at lags 1–50
Column C: + Rotation Analysis (full §4.3 capability)
Data sources: Source 2 community ingest opens (browser extension beta)
Tiers: Tier 2 (Verified) verdicts become possible
Publications expected: 5 additional audits, first Verified upgrades for Phase 1 games
This is also when the first non-crypto game (Aviator, audited through Sources 1 + 2, no Column C) is expected

Phase 3 — Months 6–12 (continuous monitoring + academic rigor)

Column B: + Benford, + NIST SP 800-22 selected tests, + Benjamini-Hochberg correction
Column C: + continuous monitoring mode, + multi-operator rotation comparison
Data sources: all three sources mature
Tiers: Tier 3 (Gold) verdicts become possible for the longest-running audits
Publications expected: continuous updates, 10 total audited games, first Gold whitelist
First external academic review engagement

11. Team Onboarding (Statistics)

Because the methodology depends on statistical literacy and neither founder has been publicly confirmed to have a formal statistics background, the founding team commits to the following within 30 days of document ratification:

Required reading / working-through:

Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms — chapter 3 (Random Numbers). This is the canonical treatment of RNG testing.
NIST Special Publication 800-22 Rev. 1a — A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Public domain.
Wasserman, All of Statistics — chapters 9 (Hypothesis Testing), 10 (Likelihood), 13 (Nonparametric).
Gentle, Computational Statistics — chapters on bootstrap and Monte Carlo.
Lehmann & Romano, Testing Statistical Hypotheses — chapter 15 (Nonparametric) for goodness-of-fit theory.

Practical onboarding:

Reproduce an existing published audit of a random number generator (NIST has published example reports).
Build a toy end-to-end audit of a simulated "honest" crash game and a simulated "slightly biased" crash game. Verify that the methodology correctly classifies both.
Schedule a paid 2-hour consultation with a statistician before the first public audit is published.

If within 30 days we determine that one of us must focus full-time on statistics while the other handles engineering and content, we update MUST_READ §9 (Team Structure) accordingly.

12. Glossary

Short definitions of terms used throughout this document. Expanded definitions with examples are maintained in the public Learn cluster on the website.

Client seed — An input that the player can optionally customize to influence (but not choose) the outcome of a provably-fair round. Combined with server_seed and nonce via HMAC.
Crash game — A gambling game in which a multiplier rises from 1.00× at a variable rate, and the player must "cash out" before the multiplier "crashes" at a randomly determined point.
Due Process — Our workflow for transitioning an internal verdict to a publicly visible listing; see §6.
Evidence Tier — One of Provisional, Verified, or Gold; determines how strict the threshold for Whitelist/Blacklist classification is; see §5.
Hash chain — A cryptographic sequence in which each server seed is pre-committed via SHA-256 before any round under it is played.
HMAC — Keyed hash message authentication code. Used in provably-fair derivations to bind a result to both the server's secret and the player's visible inputs.
Nonce — A counter that increments by one for each round within a single server seed's lifetime. Combined with client_seed in the HMAC input.
Provably fair — A family of techniques that allow players to mathematically verify, after the fact, that no round result was modified from what was committed to before the round began.
Rotation Risk Score — Our internal score (Clean / Low / Medium / High / Critical) that summarizes the §4.3 rotation analysis for a game.
RTP (Return to Player) — The percentage of all bets made that is paid back to players over the long run. The complement of house edge.
Server seed — A random 256-bit value generated by the game operator, committed via SHA-256 hash before play, and revealed after all rounds under that seed have been played.
Three-Source Cross-Validation — Our data collection doctrine: official disclosure + community crowd-sourcing + self-operated proxy accounts, triangulated against each other.
Watchlist — Our public state for games under active audit but without enough evidence (or with conflicting evidence) to assign a Whitelist or Blacklist verdict.

13. Next Actions (before v1.0.0 ratification)

Resolve all Deferred & Open Items (§9).
Complete the 30-day statistics onboarding plan (§11).
Write and test the Phase 1 reference notebook on a simulated honest/biased pair.
Engage a media-law attorney for a one-hour review of §8 (Legal Language Rules).
Register at least one Source 3 proxy account on Stake Crash and begin ingestion.
Stand up the data provenance ledger table in the Phase 1 database.
Sign this document (both founders) and move its status from DRAFT to RATIFIED.

Draft authored by the founding team, 2026-04-15. Version 1.0.0-DRAFT — "The Scale Before the First Weight."

🔬 Clash Watchdog AI — Audit Methodology

Version: v1.0.0 — RATIFIED Status: ✅ Ratified by founding team, 2026-04-16 Parent Document: ../MUST_READ.md Sister Document: ../architecture/agentic-audit-v1.md — defines how this methodology is executed by AI agents Governance: See MUST_READ Section 15 (Amendment Process)

Amendment 2026-04-16 (at ratification): All audit processes specified in this document are executed under the Agentic Audit Architecture defined in the sister document above. The statistical core remains fully deterministic; only orchestration, scheduling, and draft-report assembly are agentic. The 5-year Reproducibility Promise (§7.5) is preserved because the statistical functions are version-pinned, and agent actions are logged in the provenance ledger (§2.4).

0. Preamble

This document is the technical constitution of Clash Watchdog AI's audit operations. It answers one question:

Given a crash game and N rounds of its historical data, how do we determine whether it is fair, suspicious, or unfair?

This is the single most important sentence in this document:

Any audit conclusion must be reproducible five years from now, by any third party, using the methodology version and the raw data as they existed at the time of publication, and must yield the same result.

If we cannot promise that, we cannot publish the conclusion.

0.1 Relationship to MUST_READ

This document inherits and operationalizes the following from MUST_READ.md:

Section 2 — Six Constitutional Laws (especially Law 4: No algorithmic mystification)
Section 4 — Product Architecture (Game State × Player State data engine)
Section 11 — Legal & Risk Posture (due process requirements)
Section 15 — Amendment Process (how this document can be changed)

Any conflict between this document and MUST_READ.md is resolved in favor of MUST_READ.md.

1. The Three-Column Definition of Fairness

1.1 Column A — RTP Fairness (Allocative Fairness)

"Does the game's long-run return match what its operator claims it returns?"

Measured property: The long-run return-to-player percentage, its confidence interval, its house edge stability over time.

Primary statistic: Observed RTP vs. declared RTP. Deviation measured as absolute percentage points.

Underlying assumption: The operator makes a public claim (e.g., "RTP = 97%") which we take as the null hypothesis. We test whether the data rejects that null.

Applies to: Every crash game, regardless of technology. This is the most universally comparable column.

1.2 Column B — Distributional Fairness (Procedural Fairness)

"Does the game's round-by-round behavior look like what its declared RNG should produce?"

Primary statistics: Chi-square goodness-of-fit, Kolmogorov-Smirnov, Anderson-Darling, Runs test, autocorrelation at lags 1 through 50, Benford's law on multiplier first digits.

Applies to: Every crash game where we have sufficient round-level data (≥2,000 rounds for the minimum threshold).

1.3 Column C — Cryptographic Fairness (Provable Fairness)

"Can we mathematically verify that every round result is exactly what the game operator committed to produce, before the round began?"

Measured properties:

Hash-chain integrity across the provably-fair seed lifecycle.
Consistency of revealed server seeds with the pre-committed hashes.
Correctness of HMAC(server_seed, client_seed || nonce) derivations.
Rotation patterns of server seeds over time (our original contribution — see §4.3).

Primary check: SHA-256 verification of every committed hash against its revealed pre-image. HMAC-SHA-256 derivation of the multiplier for every observable round.

1.4 Combined Verdict Matrix

A game's overall classification depends on how many columns return positive, negative, or inconclusive results. Any N/A (column not applicable) is excluded from the denominator.

Columns Positive	Columns Negative	Columns N/A or Inconclusive	Verdict (subject to Evidence Tier — §5)
3 of 3	0	0	🛡️ WHITELIST candidate
2 of 2	0	1 (N/A)	🛡️ WHITELIST candidate
2 of 3	0	1 (inconclusive)	⚠️ WATCHLIST (more data needed)
Any 1	1	Any	⚠️ WATCHLIST (contradictions being investigated)
Any	2 or more	Any	🚫 BLACKLIST candidate

"Candidate" status becomes final only after passing the Due Process workflow in §6.

2. Data Collection & Source Independence

No single data source is trusted alone. Every audit is built from three independent sources, and conclusions are published only when the required sources agree.

2.1 The Three Sources

Source 1 — Official / Public Disclosure The game operator's own publicly available data. Examples:

Provably-fair public history pages (Stake, BC.Game, Roobet expose round-by-round hash logs).
Regulatory disclosures (Brazilian Lei 14.790 requires licensed operators to publish RTP metrics).
Public APIs exposed by the game client.

When an operator changes or deletes this data after we have observed it, that event itself becomes a recorded data point — we keep immutable snapshots of all Source 1 observations.

Source 2 — Community Crowd-Sourced Data contributed by users via the Clash Watchdog AI browser extension, manual input forms, and optional screenshot OCR. Each contribution is:

Tagged with a pseudonymous contributor identifier (never linked to real identity).
Timestamped server-side at the moment of submission.
Cross-checked against other contributions from different users for the same round window.
Discarded from the audit if any contribution appears to be a poisoning attempt (outlier beyond 6σ from other sources on the same round range is flagged and quarantined).

Each proxy account is funded with a small standing balance (per-game budget TBD, see §10).
Accounts are rotated and de-correlated to avoid detection / differential treatment.
Every action is logged with timestamps and network-level captures.
No proxy account ever attempts to "win" — the goal is observation, not profit.

2.2 The Cross-Validation Rule

           ┌────────────┐
           │  Source 1  │
           │  Official  │
           └─────┬──────┘
                 │
   ┌─────────────┼─────────────┐
   │             │             │
   ▼             ▼             ▼
  cross        cross          cross
  check        check          check
   ▲             ▲             ▲
   │             │             │
   └─────────────┼─────────────┘
                 │
        ┌────────┴────────┐
        │                 │
  ┌─────▼──────┐   ┌──────▼──────┐
  │  Source 2  │   │   Source 3  │
  │ Community  │   │ Self-Proxy  │
  └────────────┘   └─────────────┘

Publication rules:

All three sources agree within tolerance → conclusion may be published with full confidence.
Two sources agree, the third is missing → conclusion may be published with a "partial-source" caveat.
Two sources agree, the third disagrees → WATCHLIST, investigation opened.
Source 1 disagrees with Sources 2 and 3 (i.e., the operator is reporting different data than what users and our own accounts observed) → this is the single strongest possible signal of manipulation and triggers immediate Emergency Blacklist review (see §6.3).

2.3 Minimum Sample Sizes (tentative v1.0)

These are the minimum round counts required for a column to "open its mouth." Below these thresholds, the column returns INCONCLUSIVE regardless of what the data looks like.

Column	Minimum (inconclusive below)	Recommended	Ideal
A — RTP	5,000 rounds	20,000	100,000
B — Distribution	2,000 rounds	10,000	50,000
C — Provably-Fair	1,000 consecutive verified hashes	10,000	Continuous

2.4 Data Provenance Ledger

Every raw data record entering our audit pipeline is tagged with:

source_id — which of the three sources
collection_timestamp (UTC, ISO 8601)
observation_timestamp (UTC, ISO 8601) — when the round actually happened
collector_pseudonym (for Source 2) or proxy_account_id (for Source 3)
raw_payload_hash (SHA-256 of the unmodified raw payload)
pipeline_version (commit hash of the ingestion code that processed it)

This ledger is append-only. Records are never deleted — if a record is later disqualified (e.g., poisoning detected), it is flagged but retained for audit trail.

3. Layered Statistical Suite

3.1 Public Layer (for players, journalists, regulators)

Short, comprehensible, fits on one screen. Uses plain language and a small number of headline metrics.

Every audit report's public layer contains:

Column A headline: RTP: observed 96.82% (±0.18%) vs. declared 97.00% → pass/fail badge.
Column B headline: Distribution fit p-value: 0.31 → pass/fail badge, with one-sentence plain-language interpretation.
Column C headline: Provably-fair hash chain: verified for 12,430 consecutive rounds → pass/fail badge, or Not applicable.
Overall verdict: one of 🛡️ WHITELIST, ⚠️ WATCHLIST, 🚫 BLACKLIST, along with the Evidence Tier (§5) the conclusion is based on.
One sentence of plain-language interpretation per column.
Methodology version under which the audit was run.
Next review date.

3.2 Technical Appendix Layer (for statisticians, academics, regulators)

Full statistical workup. Published alongside the public report, as a Jupyter notebook on GitHub, under clashwatchdog-ai/audit-reports/<game-slug>/<date>/notebook.ipynb.

The technical appendix for every audit contains:

For Column A (RTP):

Observed mean RTP with 95% CI via bootstrap (10,000 resamples)
Kolmogorov-Smirnov test comparing observed RTP rolling windows to declared RTP
Variance decomposition (within-session vs. between-session)
Trend analysis: linear regression of RTP over time to detect drift

For Column B (Distribution):

Chi-square goodness-of-fit with bin count ≥ 20
Kolmogorov-Smirnov against the theoretical crash distribution
Anderson-Darling (upweights tail deviations — critical for crash games)
Runs test for serial independence
Autocorrelation plot for lags 1 through 50
Benford's law check on multiplier first digits
NIST SP 800-22 selected tests (Phase 2+)
Multiple testing correction: Benjamini-Hochberg with FDR = 0.05 across all tests reported

For Column C (Provably-Fair):

Full SHA-256 verification of every revealed server seed against its committed hash
HMAC-SHA-256 recomputation of the multiplier for every round in the verified window
Rotation analysis: distribution of seed lifetimes, correlation of rotations with user-level events (see §4.3)
Nonce continuity check: detection of missed or duplicated nonces

Reproducibility artifacts:

The Jupyter notebook itself (executable)
The raw data CSV (or a reference to the immutable snapshot in our data lake)
The methodology commit hash
Environment specification (requirements.txt or pyproject.toml)
A single-command reproduction script (make reproduce or python reproduce.py)

3.3 Toolchain

Phase 1 reference implementation:

Purpose	Library	Notes
Statistics / numerics	SciPy, NumPy, Pandas	Python ecosystem
Distribution fitting	SciPy `stats`	Built-in chi-square, K-S, A-D
Multiple testing	`statsmodels.stats.multitest`	Benjamini-Hochberg
Visualization	Plotly, Matplotlib	Q-Q plots, histograms, autocorrelation
Hash verification	Python `hashlib`, Node `crypto`	SHA-256, HMAC-SHA-256
Notebooks	Jupyter	Published as HTML + source
Reproducibility	`uv` or `pdm` for environment pinning

3.4 Phased Build

Phase	Months	Column A	Column B	Column C
1	0–3	Mean + bootstrap CI	Chi-square + K-S	SHA-256 chain verification
2	3–6	+ variance decomposition	+ Anderson-Darling, Runs test, autocorrelation	+ Full rotation analysis (§4.3)
3	6–12	+ drift detection	+ Benford, NIST selected tests	+ Multi-operator comparison

Each phase upgrade is a MINOR version bump (v1.0 → v1.1 → v1.2), publicly changelogged, and announced with a 14-day pre-notice window (see §6.4).

4. Provably-Fair Deep Verification

4.1 Provably-Fair Primer (for the record)

A typical provably-fair crash game works as follows:

The operator generates a random 256-bit server_seed.
Before any round is played, the operator publishes SHA-256(server_seed) as a public commitment.

For round n under this seed, the multiplier is derived as:

hmac   = HMAC-SHA-256(key=server_seed, message=client_seed || nonce_n)
hex    = first 52 bits of hmac, as a big-endian integer
if hex == 0: multiplier = high_cap
else:        multiplier = max(1.00, floor((2^52 * house_edge) / hex * 100) / 100)

(exact formula varies by operator — our pipeline records the operator-specific derivation)

After some number of rounds (typically 1,000 to 10,000 or 24 hours), the operator reveals server_seed. At that moment, anyone can compute SHA-256(revealed_seed) and compare to the original commitment. If they match, the seed was honestly committed before the rounds were played.
A new server_seed is generated, a new commitment is published, and the cycle repeats.

4.2 Full Chain Verification (Phase 1 capability)

For every completed seed cycle on every monitored game, we:

Pull the original commitment hash (from Source 1 + Source 3 independently).
Pull the revealed server seed.
Compute SHA-256(revealed_seed) and compare to step 1.
For every round n in [0, seed_lifetime - 1]:
- Pull the client_seed and nonce as recorded by the operator and by our proxy account.
- Recompute the multiplier using the operator's documented derivation.
- Compare against the observed multiplier from Sources 1, 2, and 3.
Record the result in the Provenance Ledger (§2.4).

4.3 Rotation Analysis (Phase 2 capability — Clash Watchdog AI original contribution)

Rotation Analysis examines the distribution of seed lifetimes and rotation events and asks:

Lifetime distribution. Are seeds rotated at approximately regular intervals, as the operator claims? We fit the distribution of observed seed lifetimes to the declared policy and compute a K-S goodness-of-fit.
Event correlation. Are rotations correlated with user-level events we can observe from Source 3?
- Rotation just after a large proxy-account win?
- Rotation just before a proxy-account bet that would have won at unusually high multiplier?
- Rotation timing correlated with daily peak traffic?
Pre-rotation / post-rotation distribution shift. Do multiplier distributions in the 100 rounds before a rotation differ statistically from the 100 rounds after? Tested via a two-sample Kolmogorov-Smirnov.
Nonce continuity. Within a single seed's lifetime, are all nonces accounted for? A gap in the nonce sequence suggests rounds were played that were not publicly recorded.
Commit-ahead verification. A secure provably-fair operator commits to a chain of future seed hashes (commit-ahead). If the operator does not commit ahead, the rotation window becomes an attack vector. We record and publish commit-ahead depth for every audited game.

Rotation Analysis outputs:

A rotation timeline chart (public layer).
Statistical test results for each of the five checks (technical appendix).
An overall "Rotation Risk Score" (Clean / Low / Medium / High / Critical).

A Medium or worse Rotation Risk Score downgrades the game's Column C from positive to inconclusive, even if basic hash verification passes.

4.4 Continuous Monitoring Mode

5. Evidence-Tiered Classification Thresholds

We classify games not on a single line in the sand, but on a tiered ladder where the evidence burden scales with the sample size. More data buys a tighter threshold.

5.1 The Three Tiers

Tier	Round Count Range	Label on Reports
Tier 1	1,000 – 5,000	Provisional
Tier 2	5,000 – 20,000	Verified
Tier 3	> 20,000 + ≥3 months of continuous monitoring + zero incidents in window	Gold

Below Tier 1's minimum, no conclusion is published — the game is marked Insufficient Data on the Watchlist.

5.2 Whitelist Thresholds (a game is WHITELIST if all applicable rules below pass)

Tier	RTP Deviation	Column B p-value	Column C Requirement
1 Provisional	`< ±1.5 percentage points`	`> 0.10`	Hash chain valid (if applicable)
2 Verified	`< ±0.8 percentage points`	`> 0.15`	+ Rotation analysis clean
3 Gold	`< ±0.4 percentage points`	`> 0.20` on all Phase-2 tests (BH-corrected)	+ ≥3 months continuous monitoring, zero incidents

5.3 Blacklist Thresholds (a game is BLACKLIST if any applicable rule below triggers AND all three sources confirm)

Tier	RTP Deviation	Column B p-value	Column C
1 Provisional	`> ±4.0 percentage points`	`< 0.0001`	Confirmed hash mismatch
2 Verified	`> ±2.5 percentage points`	`< 0.001`	Hash mismatch OR critical rotation anomaly
3 Gold	`> ±1.5 percentage points`	`< 0.01` (BH-corrected, multiple tests failing)	Persistent hash or rotation anomalies

5.4 Watchlist

5.5 Incentive Alignment (business-strategic note)

Because Whitelist thresholds tighten as the Tier rises, an operator who wishes to move up (Provisional → Verified → Gold) must provide or allow collection of more data. This means:

More user traffic to our analyzer pages on their games (our SEO gain).
More transparency from them (our data moat deepens).
More continuous observation opportunities (our rotation analysis gets stronger).

The methodology itself becomes an incentive-alignment mechanism. See MUST_READ §5.2 for the commercial implications.

6. Due Process

6.1 Positive Transitions (`WATCHLIST → WHITELIST`, or upward Tier moves)

Window: 0 days (publish immediately).
Notification: Operator notified after publication, not before.
Rationale: Positive transitions improve a game's public status and do not carry defamation risk. Requiring a notice window on good news would slow down trust-building without benefit.

6.2 Negative Transitions (`WHITELIST/WATCHLIST → BLACKLIST`)

Window: 30 days.
Workflow:
1. Internal audit complete, Blacklist candidate status set.
2. Two-founder review and sign-off (MUST_READ §9).
3. Notification package sent privately to the operator, containing:
  - The complete draft public report and technical appendix.
  - All raw data used.
  - A link to the methodology version (§7).
  - An invitation to respond within 30 days.
4. Operator response options during the 30-day window:
  - Submit counter-evidence. If the operator provides data showing our conclusion is incorrect (e.g., we missed a data source, used the wrong derivation formula), we re-run the audit. If the re-run confirms the operator is right, we withdraw the candidate status, issue a public correction on our next transparency report, and the game stays on its previous listing.
  - Dispute methodology. If the operator disputes our methodology but not our data, their dispute text is published side-by-side with our report when the Blacklist goes live.
  - Do nothing. After 30 days, the Blacklist status automatically takes effect, and a note is attached to the report indicating the operator declined to respond.
  - Fix the game. If the operator materially changes the game in response to our findings and asks for re-audit, we treat this as a new audit cycle. The original Blacklist report remains in the public archive as a historical record.
5. At T+30 days: Blacklist goes live with the operator's response (or note of non-response) attached.

6.3 Emergency Blacklist

Reserved for situations where a 30-day window would cause ongoing harm to players. Triggers:

An active exploit is observed harming players in real time.
Source 1 (the operator's own disclosures) diverges from Sources 2 and 3 — indicating the operator is publishing false data.
Hash chain breakage with no plausible innocent explanation.
Operator is actively destroying the evidence that would underlie our audit.

Workflow:

Window: 72 hours.
Required sign-offs: Both founders + one external advisor (journalist, academic, or former regulator — maintained as a standing panel of 3, of whom any one must sign).
Publication format: A brief Urgent Safety Notice is published on the front page of Clash Watchdog AI immediately, linking to the full emergency report when the 72-hour window closes.
Parallel notification: Emergency reports are shared simultaneously with relevant regulators, consumer-protection NGOs, and at least two established gambling-industry journalists.

Emergency Blacklist is a rare and heavy action. We expect to use it at most 1–2 times in the first 24 months of operation.

6.4 Methodology Upgrade Re-Classification

When a MINOR or MAJOR methodology version change would cause any existing listing to change state:

Pre-notice window: 14 days.
What we publish during pre-notice:
- The new methodology version and changelog.
- A list of which games will change state under the new methodology, and how.
- An explanation of the upgrade's rationale.
After 14 days: The new methodology takes effect. Old reports remain anchored to their original methodology version (§7.3).

6.5 The Two-Founder Rule (v1.0 only)

7. Versioning & Reproducibility

This methodology document follows Semantic Versioning 2.0.0. Every public audit report is permanently anchored to the methodology version under which it was produced.

7.1 Version Number Semantics

Change Type	Version Bump	Examples
MAJOR (vX.0.0)	Threshold changes, test suite removals, new columns, anything that could cause a currently-published verdict to flip	Adding Column D, changing Whitelist RTP tolerance from ±0.8% to ±0.5%, removing A-D test
MINOR (v1.X.0)	New additive capability, new test, new data source, no conclusion can flip	Adding NIST tests, enabling continuous monitoring, adding new operators to the schema
PATCH (v1.0.X)	Clarifications, typo fixes, worked-example additions, no computational change	Rewriting a paragraph, adding a glossary term

7.2 Approval Rules

PATCH: Either founder may commit.
MINOR: Both founders must sign.
MAJOR: Both founders + one external advisor must sign. Followed by 14-day pre-notice per §6.4.

7.3 Conclusion Anchoring

Every audit report header includes:

game: stake-crash
audit_date: 2026-06-15
methodology_version: 1.1.0
evidence_tier: 2
verdict: WHITELIST
data_snapshot_hash: <SHA-256 of the full raw-data bundle>
notebook_commit: <GitHub commit hash of the reproduction notebook>

7.4 Archive Policy

7.5 The Reproducibility Promise (the single most important commitment)

Any audit report published by Clash Watchdog AI must be reproducible by any third party, five years after publication, using only: (a) the raw data snapshot referenced in the report header, (b) the methodology document at the version pinned in the report header, (c) the notebook at the commit hash pinned in the report header. The reproduction must yield the same verdict. If it does not, the report was not valid when published, and we will retract it and publish a correction in our next transparency report.

This promise is not aspirational. It is a functional requirement of every artifact we produce.

8. Legal Language Rules

Our methodology is technical. Our words must also be technical. Loose language is a legal attack surface we refuse to expose.

8.1 Words We Use

"In N observations, Game X's mean RTP was measured at Y%, which deviates from the operator's declared Z% by W percentage points."
"Game X's multiplier distribution failed a chi-square goodness-of-fit test at p = A against a theoretical distribution derived from the operator's stated parameters."
"Game X is currently on our Watchlist pending additional sample collection."
"Game X's provably-fair hash chain could not be verified for N rounds within the observation window."
"Our sample is insufficient to support a conclusion at this time."

8.2 Words We Never Use

"Game X is cheating."
"Game X is rigged."
"Game X is a scam."
"Don't play Game X."
"Game X has been caught stealing from players."
"We predict that Game X will crash at Y."
Any imperative or prophetic language whatsoever.

8.3 Tense and Mood

All factual claims in past indicative: "was measured," "failed," "deviated."
All current-status statements in present descriptive: "is on our Watchlist," "is pending review."
No future tense about game outcomes. Ever. The only future tense we use is about our own actions: "will be re-audited on date X."
No subjunctive mood about operators' intent. We describe data, not motivation.

8.4 Numerical Precision Rules

Percentages to one decimal place in public layer, three decimals in technical appendix.
Confidence intervals always explicit (never a point estimate without a range).
p-values to two significant figures.
Sample sizes always visible in the same viewport as the conclusion.

9. Deferred & Open Items

These decisions are flagged for resolution before v1.0.0 is ratified. They are documented here so that no one can pretend they were forgotten.

#	Item	Current Placeholder	Decision Required By
1	Real-money budget for Source 3 proxy accounts, first 6 months	$3,000–5,000 total, $600–1,000 per game across 3–5 games	Pre-ratification
2	Minimum sample sizes (Column A = 5,000; B = 2,000; C = 1,000 hash)	As shown in §2.3	Pre-ratification, to be confirmed via power analysis
3	Exact theoretical distribution formula for Aviator-style games	TBD — most operators publish approximate RTP but not the exact RNG parameterization	Before first non-crypto audit
4	Choice of external advisor panel for Emergency Blacklist sign-off	TBD — need to identify 3 candidates	Before first audit publication
5	Hosting location for raw data snapshots (immutability requirement)	Candidates: S3 with object lock; IPFS; Arweave; tiered approach	Before first data collection
6	Company jurisdiction (also open in MUST_READ §13)	Candidates: Delaware, UK, Netherlands, Estonia	Before first paid audit
7	Legal review of the Legal Language Rules (§8)	TBD — engage a media-law attorney	Before first Blacklist publication
8	Statistics background and onboarding plan for the founding team	TBD — see §11	Within 30 days of this document being ratified

10. Phased Build Roadmap (binding)

This is the build order for the audit system itself. It is binding in the sense that we will not ship audit conclusions using a capability before the capability has been built and tested.

Phase 1 — Months 0–3 (minimum viable audit)

Column A: mean RTP + bootstrap CI
Column B: chi-square + Kolmogorov-Smirnov
Column C: SHA-256 hash chain verification (full chain, no rotation analysis)
Data sources: Source 1 + Source 3 only (Source 2 community ingest deferred to Phase 2)
Tiers: Only Tier 1 (Provisional) verdicts are published
Publications expected: 3 Provisional audits by end of Phase 1 (Stake Crash, BC.Game Crash, Roobet Crash)
Build artifacts: one reusable Jupyter notebook template, one data ingestion pipeline, the /methodology/audit-v1.md document live on the site

Phase 2 — Months 3–6 (full-column capability)

Column A: + variance decomposition, + drift detection
Column B: + Anderson-Darling, + Runs test, + autocorrelation at lags 1–50
Column C: + Rotation Analysis (full §4.3 capability)
Data sources: Source 2 community ingest opens (browser extension beta)
Tiers: Tier 2 (Verified) verdicts become possible
Publications expected: 5 additional audits, first Verified upgrades for Phase 1 games
This is also when the first non-crypto game (Aviator, audited through Sources 1 + 2, no Column C) is expected

Phase 3 — Months 6–12 (continuous monitoring + academic rigor)

Column B: + Benford, + NIST SP 800-22 selected tests, + Benjamini-Hochberg correction
Column C: + continuous monitoring mode, + multi-operator rotation comparison
Data sources: all three sources mature
Tiers: Tier 3 (Gold) verdicts become possible for the longest-running audits
Publications expected: continuous updates, 10 total audited games, first Gold whitelist
First external academic review engagement

11. Team Onboarding (Statistics)

Required reading / working-through:

Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms — chapter 3 (Random Numbers). This is the canonical treatment of RNG testing.
NIST Special Publication 800-22 Rev. 1a — A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Public domain.
Wasserman, All of Statistics — chapters 9 (Hypothesis Testing), 10 (Likelihood), 13 (Nonparametric).
Gentle, Computational Statistics — chapters on bootstrap and Monte Carlo.
Lehmann & Romano, Testing Statistical Hypotheses — chapter 15 (Nonparametric) for goodness-of-fit theory.

Practical onboarding:

Reproduce an existing published audit of a random number generator (NIST has published example reports).
Build a toy end-to-end audit of a simulated "honest" crash game and a simulated "slightly biased" crash game. Verify that the methodology correctly classifies both.
Schedule a paid 2-hour consultation with a statistician before the first public audit is published.

If within 30 days we determine that one of us must focus full-time on statistics while the other handles engineering and content, we update MUST_READ §9 (Team Structure) accordingly.

12. Glossary

Short definitions of terms used throughout this document. Expanded definitions with examples are maintained in the public Learn cluster on the website.

Client seed — An input that the player can optionally customize to influence (but not choose) the outcome of a provably-fair round. Combined with server_seed and nonce via HMAC.
Crash game — A gambling game in which a multiplier rises from 1.00× at a variable rate, and the player must "cash out" before the multiplier "crashes" at a randomly determined point.
Due Process — Our workflow for transitioning an internal verdict to a publicly visible listing; see §6.
Evidence Tier — One of Provisional, Verified, or Gold; determines how strict the threshold for Whitelist/Blacklist classification is; see §5.
Hash chain — A cryptographic sequence in which each server seed is pre-committed via SHA-256 before any round under it is played.
HMAC — Keyed hash message authentication code. Used in provably-fair derivations to bind a result to both the server's secret and the player's visible inputs.
Nonce — A counter that increments by one for each round within a single server seed's lifetime. Combined with client_seed in the HMAC input.
Provably fair — A family of techniques that allow players to mathematically verify, after the fact, that no round result was modified from what was committed to before the round began.
Rotation Risk Score — Our internal score (Clean / Low / Medium / High / Critical) that summarizes the §4.3 rotation analysis for a game.
RTP (Return to Player) — The percentage of all bets made that is paid back to players over the long run. The complement of house edge.
Server seed — A random 256-bit value generated by the game operator, committed via SHA-256 hash before play, and revealed after all rounds under that seed have been played.
Three-Source Cross-Validation — Our data collection doctrine: official disclosure + community crowd-sourcing + self-operated proxy accounts, triangulated against each other.
Watchlist — Our public state for games under active audit but without enough evidence (or with conflicting evidence) to assign a Whitelist or Blacklist verdict.

13. Next Actions (before v1.0.0 ratification)

Resolve all Deferred & Open Items (§9).
Complete the 30-day statistics onboarding plan (§11).
Write and test the Phase 1 reference notebook on a simulated honest/biased pair.
Engage a media-law attorney for a one-hour review of §8 (Legal Language Rules).
Register at least one Source 3 proxy account on Stake Crash and begin ingestion.
Stand up the data provenance ledger table in the Phase 1 database.
Sign this document (both founders) and move its status from DRAFT to RATIFIED.

Draft authored by the founding team, 2026-04-15. Version 1.0.0-DRAFT — "The Scale Before the First Weight."

🔬 Clash Watchdog AI — Audit Methodology

0. Preamble

0.1 Relationship to MUST_READ

1. The Three-Column Definition of Fairness

1.1 Column A — RTP Fairness (Allocative Fairness)

1.2 Column B — Distributional Fairness (Procedural Fairness)

1.3 Column C — Cryptographic Fairness (Provable Fairness)

1.4 Combined Verdict Matrix

2. Data Collection & Source Independence

2.1 The Three Sources

2.2 The Cross-Validation Rule

2.3 Minimum Sample Sizes (tentative v1.0)

2.4 Data Provenance Ledger

3. Layered Statistical Suite

3.1 Public Layer (for players, journalists, regulators)

3.2 Technical Appendix Layer (for statisticians, academics, regulators)

3.3 Toolchain

3.4 Phased Build

4. Provably-Fair Deep Verification

4.1 Provably-Fair Primer (for the record)

4.2 Full Chain Verification (Phase 1 capability)

4.3 Rotation Analysis (Phase 2 capability — Clash Watchdog AI original contribution)

4.4 Continuous Monitoring Mode

5. Evidence-Tiered Classification Thresholds

5.1 The Three Tiers

5.2 Whitelist Thresholds (a game is WHITELIST if all applicable rules below pass)

5.3 Blacklist Thresholds (a game is BLACKLIST if any applicable rule below triggers AND all three sources confirm)

5.4 Watchlist

5.5 Incentive Alignment (business-strategic note)

6. Due Process

6.1 Positive Transitions (WATCHLIST → WHITELIST, or upward Tier moves)

6.2 Negative Transitions (WHITELIST/WATCHLIST → BLACKLIST)

6.3 Emergency Blacklist

6.4 Methodology Upgrade Re-Classification

6.5 The Two-Founder Rule (v1.0 only)

7. Versioning & Reproducibility

7.1 Version Number Semantics

7.2 Approval Rules

7.3 Conclusion Anchoring

7.4 Archive Policy

7.5 The Reproducibility Promise (the single most important commitment)

8. Legal Language Rules

8.1 Words We Use

8.2 Words We Never Use

8.3 Tense and Mood

8.4 Numerical Precision Rules

9. Deferred & Open Items

10. Phased Build Roadmap (binding)

Phase 1 — Months 0–3 (minimum viable audit)

Phase 2 — Months 3–6 (full-column capability)

Phase 3 — Months 6–12 (continuous monitoring + academic rigor)

11. Team Onboarding (Statistics)

12. Glossary

13. Next Actions (before v1.0.0 ratification)

Version Archive

Related Articles

🔬 Clash Watchdog AI — Audit Methodology

0. Preamble

0.1 Relationship to MUST_READ

1. The Three-Column Definition of Fairness

1.1 Column A — RTP Fairness (Allocative Fairness)

1.2 Column B — Distributional Fairness (Procedural Fairness)

1.3 Column C — Cryptographic Fairness (Provable Fairness)

1.4 Combined Verdict Matrix

2. Data Collection & Source Independence

2.1 The Three Sources

2.2 The Cross-Validation Rule

2.3 Minimum Sample Sizes (tentative v1.0)

2.4 Data Provenance Ledger

3. Layered Statistical Suite

3.1 Public Layer (for players, journalists, regulators)

3.2 Technical Appendix Layer (for statisticians, academics, regulators)

3.3 Toolchain

3.4 Phased Build

4. Provably-Fair Deep Verification

4.1 Provably-Fair Primer (for the record)

4.2 Full Chain Verification (Phase 1 capability)

4.3 Rotation Analysis (Phase 2 capability — Clash Watchdog AI original contribution)

4.4 Continuous Monitoring Mode

5. Evidence-Tiered Classification Thresholds

6.1 Positive Transitions (`WATCHLIST → WHITELIST`, or upward Tier moves)

6.2 Negative Transitions (`WHITELIST/WATCHLIST → BLACKLIST`)

6.1 Positive Transitions (`WATCHLIST → WHITELIST`, or upward Tier moves)

6.2 Negative Transitions (`WHITELIST/WATCHLIST → BLACKLIST`)