๐ฌ Clash Watchdog AI โ Audit Methodology
Version: v1.0.0 โ RATIFIED Status: โ Ratified by founding team, 2026-04-16 Parent Document:
../MUST_READ.mdSister Document:../architecture/agentic-audit-v1.mdโ defines how this methodology is executed by AI agents Governance: See MUST_READ Section 15 (Amendment Process)Amendment 2026-04-16 (at ratification): All audit processes specified in this document are executed under the Agentic Audit Architecture defined in the sister document above. The statistical core remains fully deterministic; only orchestration, scheduling, and draft-report assembly are agentic. The 5-year Reproducibility Promise (ยง7.5) is preserved because the statistical functions are version-pinned, and agent actions are logged in the provenance ledger (ยง2.4).
0. Preamble
This document is the technical constitution of Clash Watchdog AI's audit operations. It answers one question:
Given a crash game and N rounds of its historical data, how do we determine whether it is fair, suspicious, or unfair?
Every audit we ever publish must be traceable to a specific version of this document. If we change how we measure fairness, the version of this document changes with it โ and every historical audit remains anchored to the version under which it was produced.
This is the single most important sentence in this document:
Any audit conclusion must be reproducible five years from now, by any third party, using the methodology version and the raw data as they existed at the time of publication, and must yield the same result.
If we cannot promise that, we cannot publish the conclusion.
0.1 Relationship to MUST_READ
This document inherits and operationalizes the following from MUST_READ.md:
- Section 2 โ Six Constitutional Laws (especially Law 4: No algorithmic mystification)
- Section 4 โ Product Architecture (Game State ร Player State data engine)
- Section 11 โ Legal & Risk Posture (due process requirements)
- Section 15 โ Amendment Process (how this document can be changed)
Any conflict between this document and MUST_READ.md is resolved in favor of MUST_READ.md.
1. The Three-Column Definition of Fairness
We reject any single-axis definition of fairness. A crash game can be "mathematically fair on average" and still be "unfair in experience." A game can be "statistically well-distributed" and still be "cryptographically opaque." We therefore evaluate every game along three independent columns, and only a game that passes enough columns with enough evidence earns a public whitelist.
1.1 Column A โ RTP Fairness (Allocative Fairness)
"Does the game's long-run return match what its operator claims it returns?"
Measured property: The long-run return-to-player percentage, its confidence interval, its house edge stability over time.
Primary statistic: Observed RTP vs. declared RTP. Deviation measured as absolute percentage points.
Underlying assumption: The operator makes a public claim (e.g., "RTP = 97%") which we take as the null hypothesis. We test whether the data rejects that null.
Applies to: Every crash game, regardless of technology. This is the most universally comparable column.
1.2 Column B โ Distributional Fairness (Procedural Fairness)
"Does the game's round-by-round behavior look like what its declared RNG should produce?"
Measured properties: The full distribution of multipliers (crash points), the independence of successive rounds, the absence of anomalous patterns in streak length, reversal density, and autocorrelation.
Primary statistics: Chi-square goodness-of-fit, Kolmogorov-Smirnov, Anderson-Darling, Runs test, autocorrelation at lags 1 through 50, Benford's law on multiplier first digits.
Underlying assumption: Crash games with honest RNGs produce multipliers that follow a known theoretical distribution (typically a truncated geometric / power-law of the form M ~ 1/(1-U) where U is uniform [0, 1-house_edge]). We test whether observed data fits this distribution within statistical noise.
Applies to: Every crash game where we have sufficient round-level data (โฅ2,000 rounds for the minimum threshold).
1.3 Column C โ Cryptographic Fairness (Provable Fairness)
"Can we mathematically verify that every round result is exactly what the game operator committed to produce, before the round began?"
Measured properties:
- Hash-chain integrity across the provably-fair seed lifecycle.
- Consistency of revealed server seeds with the pre-committed hashes.
- Correctness of
HMAC(server_seed, client_seed || nonce)derivations. - Rotation patterns of server seeds over time (our original contribution โ see ยง4.3).
Primary check: SHA-256 verification of every committed hash against its revealed pre-image. HMAC-SHA-256 derivation of the multiplier for every observable round.
Underlying assumption: A provably-fair game commits to a server seed (by publishing its SHA-256 hash) before any round is played under it. When the seed is later revealed, any observer can recompute every round under that seed and verify that nothing was tampered with.
Applies to: Only to games that claim provably-fair operation (currently: Stake Crash, BC.Game Crash, Roobet Crash, and a few others). Not applicable to traditional server-RNG games (Aviator, JetX, Spaceman, etc.), which receive an N/A in this column โ not a negative score.
1.4 Combined Verdict Matrix
A game's overall classification depends on how many columns return positive, negative, or inconclusive results. Any N/A (column not applicable) is excluded from the denominator.
| Columns Positive | Columns Negative | Columns N/A or Inconclusive | Verdict (subject to Evidence Tier โ ยง5) |
|---|---|---|---|
| 3 of 3 | 0 | 0 | ๐ก๏ธ WHITELIST candidate |
| 2 of 2 | 0 | 1 (N/A) | ๐ก๏ธ WHITELIST candidate |
| 2 of 3 | 0 | 1 (inconclusive) | โ ๏ธ WATCHLIST (more data needed) |
| Any 1 | 1 | Any | โ ๏ธ WATCHLIST (contradictions being investigated) |
| Any | 2 or more | Any | ๐ซ BLACKLIST candidate |
"Candidate" status becomes final only after passing the Due Process workflow in ยง6.
2. Data Collection & Source Independence
No single data source is trusted alone. Every audit is built from three independent sources, and conclusions are published only when the required sources agree.
2.1 The Three Sources
Source 1 โ Official / Public Disclosure The game operator's own publicly available data. Examples:
- Provably-fair public history pages (Stake, BC.Game, Roobet expose round-by-round hash logs).
- Regulatory disclosures (Brazilian Lei 14.790 requires licensed operators to publish RTP metrics).
- Public APIs exposed by the game client.
When an operator changes or deletes this data after we have observed it, that event itself becomes a recorded data point โ we keep immutable snapshots of all Source 1 observations.
Source 2 โ Community Crowd-Sourced Data contributed by users via the Clash Watchdog AI browser extension, manual input forms, and optional screenshot OCR. Each contribution is:
- Tagged with a pseudonymous contributor identifier (never linked to real identity).
- Timestamped server-side at the moment of submission.
- Cross-checked against other contributions from different users for the same round window.
- Discarded from the audit if any contribution appears to be a poisoning attempt (outlier beyond 6ฯ from other sources on the same round range is flagged and quarantined).
Source 3 โ Self-Operated Proxy Accounts Clash Watchdog AI operates its own audited accounts on each target game, placing minimum stakes purely to observe rounds that we are playing with real value at risk. This is the "mystery shopper" approach used in consumer protection and financial audit.
- Each proxy account is funded with a small standing balance (per-game budget TBD, see ยง10).
- Accounts are rotated and de-correlated to avoid detection / differential treatment.
- Every action is logged with timestamps and network-level captures.
- No proxy account ever attempts to "win" โ the goal is observation, not profit.
2.2 The Cross-Validation Rule
โโโโโโโโโโโโโโ
โ Source 1 โ
โ Official โ
โโโโโโโฌโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
cross cross cross
check check check
โฒ โฒ โฒ
โ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโ
โ โ
โโโโโโโผโโโโโโโ โโโโโโโโผโโโโโโโ
โ Source 2 โ โ Source 3 โ
โ Community โ โ Self-Proxy โ
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
Publication rules:
- All three sources agree within tolerance โ conclusion may be published with full confidence.
- Two sources agree, the third is missing โ conclusion may be published with a "partial-source" caveat.
- Two sources agree, the third disagrees โ
WATCHLIST, investigation opened. - Source 1 disagrees with Sources 2 and 3 (i.e., the operator is reporting different data than what users and our own accounts observed) โ this is the single strongest possible signal of manipulation and triggers immediate Emergency Blacklist review (see ยง6.3).
2.3 Minimum Sample Sizes (tentative v1.0)
These are the minimum round counts required for a column to "open its mouth." Below these thresholds, the column returns INCONCLUSIVE regardless of what the data looks like.
| Column | Minimum (inconclusive below) | Recommended | Ideal |
|---|---|---|---|
| A โ RTP | 5,000 rounds | 20,000 | 100,000 |
| B โ Distribution | 2,000 rounds | 10,000 | 50,000 |
| C โ Provably-Fair | 1,000 consecutive verified hashes | 10,000 | Continuous |
Note โ tentative numbers. These thresholds are derived from power analyses targeting the ability to detect a 1% RTP deviation at ฮฑ = 0.05, ฮฒ = 0.10. Final values will be confirmed by the technical appendix before v1.0.0 is ratified.
2.4 Data Provenance Ledger
Every raw data record entering our audit pipeline is tagged with:
source_idโ which of the three sourcescollection_timestamp(UTC, ISO 8601)observation_timestamp(UTC, ISO 8601) โ when the round actually happenedcollector_pseudonym(for Source 2) orproxy_account_id(for Source 3)raw_payload_hash(SHA-256 of the unmodified raw payload)pipeline_version(commit hash of the ingestion code that processed it)
This ledger is append-only. Records are never deleted โ if a record is later disqualified (e.g., poisoning detected), it is flagged but retained for audit trail.
3. Layered Statistical Suite
Every audit produces output at two layers simultaneously. The layers must always agree: if the public layer says "pass" but the technical appendix says "fail," the audit is invalid and must be re-run.
3.1 Public Layer (for players, journalists, regulators)
Short, comprehensible, fits on one screen. Uses plain language and a small number of headline metrics.
Every audit report's public layer contains:
- Column A headline:
RTP: observed 96.82% (ยฑ0.18%) vs. declared 97.00%โ pass/fail badge. - Column B headline:
Distribution fit p-value: 0.31โ pass/fail badge, with one-sentence plain-language interpretation. - Column C headline:
Provably-fair hash chain: verified for 12,430 consecutive roundsโ pass/fail badge, orNot applicable. - Overall verdict: one of
๐ก๏ธ WHITELIST,โ ๏ธ WATCHLIST,๐ซ BLACKLIST, along with the Evidence Tier (ยง5) the conclusion is based on. - One sentence of plain-language interpretation per column.
- Methodology version under which the audit was run.
- Next review date.
3.2 Technical Appendix Layer (for statisticians, academics, regulators)
Full statistical workup. Published alongside the public report, as a Jupyter notebook on GitHub, under clashwatchdog-ai/audit-reports/<game-slug>/<date>/notebook.ipynb.
The technical appendix for every audit contains:
For Column A (RTP):
- Observed mean RTP with 95% CI via bootstrap (10,000 resamples)
- Kolmogorov-Smirnov test comparing observed RTP rolling windows to declared RTP
- Variance decomposition (within-session vs. between-session)
- Trend analysis: linear regression of RTP over time to detect drift
For Column B (Distribution):
- Chi-square goodness-of-fit with bin count โฅ 20
- Kolmogorov-Smirnov against the theoretical crash distribution
- Anderson-Darling (upweights tail deviations โ critical for crash games)
- Runs test for serial independence
- Autocorrelation plot for lags 1 through 50
- Benford's law check on multiplier first digits
- NIST SP 800-22 selected tests (Phase 2+)
- Multiple testing correction: Benjamini-Hochberg with FDR = 0.05 across all tests reported
For Column C (Provably-Fair):
- Full SHA-256 verification of every revealed server seed against its committed hash
- HMAC-SHA-256 recomputation of the multiplier for every round in the verified window
- Rotation analysis: distribution of seed lifetimes, correlation of rotations with user-level events (see ยง4.3)
- Nonce continuity check: detection of missed or duplicated nonces
Reproducibility artifacts:
- The Jupyter notebook itself (executable)
- The raw data CSV (or a reference to the immutable snapshot in our data lake)
- The methodology commit hash
- Environment specification (
requirements.txtorpyproject.toml) - A single-command reproduction script (
make reproduceorpython reproduce.py)
3.3 Toolchain
Phase 1 reference implementation:
| Purpose | Library | Notes |
|---|---|---|
| Statistics / numerics | SciPy, NumPy, Pandas | Python ecosystem |
| Distribution fitting | SciPy stats | Built-in chi-square, K-S, A-D |
| Multiple testing | statsmodels.stats.multitest | Benjamini-Hochberg |
| Visualization | Plotly, Matplotlib | Q-Q plots, histograms, autocorrelation |
| Hash verification | Python hashlib, Node crypto | SHA-256, HMAC-SHA-256 |
| Notebooks | Jupyter | Published as HTML + source |
| Reproducibility | uv or pdm for environment pinning |
3.4 Phased Build
We do not implement every test from day one. The statistical suite is built in phases that map to the company's SEO/content strategy โ each new capability also becomes a published methodology-update article.
| Phase | Months | Column A | Column B | Column C |
|---|---|---|---|---|
| 1 | 0โ3 | Mean + bootstrap CI | Chi-square + K-S | SHA-256 chain verification |
| 2 | 3โ6 | + variance decomposition | + Anderson-Darling, Runs test, autocorrelation | + Full rotation analysis (ยง4.3) |
| 3 | 6โ12 | + drift detection | + Benford, NIST selected tests | + Multi-operator comparison |
Each phase upgrade is a MINOR version bump (v1.0 โ v1.1 โ v1.2), publicly changelogged, and announced with a 14-day pre-notice window (see ยง6.4).
4. Provably-Fair Deep Verification
This section details Column C and is the most technically intricate part of the methodology. It is also the part where Clash Watchdog AI produces original research contribution: the Rotation Analysis framework in ยง4.3.
4.1 Provably-Fair Primer (for the record)
A typical provably-fair crash game works as follows:
- The operator generates a random 256-bit
server_seed. - Before any round is played, the operator publishes
SHA-256(server_seed)as a public commitment. - For round
nunder this seed, the multiplier is derived as:
(exact formula varies by operator โ our pipeline records the operator-specific derivation)hmac = HMAC-SHA-256(key=server_seed, message=client_seed || nonce_n) hex = first 52 bits of hmac, as a big-endian integer if hex == 0: multiplier = high_cap else: multiplier = max(1.00, floor((2^52 * house_edge) / hex * 100) / 100) - After some number of rounds (typically 1,000 to 10,000 or 24 hours), the operator reveals
server_seed. At that moment, anyone can computeSHA-256(revealed_seed)and compare to the original commitment. If they match, the seed was honestly committed before the rounds were played. - A new
server_seedis generated, a new commitment is published, and the cycle repeats.
4.2 Full Chain Verification (Phase 1 capability)
For every completed seed cycle on every monitored game, we:
- Pull the original commitment hash (from Source 1 + Source 3 independently).
- Pull the revealed server seed.
- Compute
SHA-256(revealed_seed)and compare to step 1. - For every round
nin[0, seed_lifetime - 1]:- Pull the
client_seedandnonceas recorded by the operator and by our proxy account. - Recompute the multiplier using the operator's documented derivation.
- Compare against the observed multiplier from Sources 1, 2, and 3.
- Pull the
- Record the result in the Provenance Ledger (ยง2.4).
Any discrepancy at any step is a potential hash mismatch, which is an immediate WATCHLIST placement and triggers investigation. A confirmed discrepancy (reproducible, cross-source) is an immediate Emergency Blacklist.
4.3 Rotation Analysis (Phase 2 capability โ Clash Watchdog AI original contribution)
Hash-chain verification, by itself, can only confirm that the operator did not violate a commitment they already made. It cannot detect a more subtle attack: the operator choosing when to rotate the server seed in a way that is correlated with outcomes.
For example: an operator could honestly commit server_seed_1, honestly play 1,000 rounds under it, honestly reveal it โ and yet still tilt the game in their favor by rotating to server_seed_2 (pre-committed) precisely at a moment when rotating is favorable to them. The rotation timing becomes the attack surface.
Rotation Analysis examines the distribution of seed lifetimes and rotation events and asks:
-
Lifetime distribution. Are seeds rotated at approximately regular intervals, as the operator claims? We fit the distribution of observed seed lifetimes to the declared policy and compute a K-S goodness-of-fit.
-
Event correlation. Are rotations correlated with user-level events we can observe from Source 3?
- Rotation just after a large proxy-account win?
- Rotation just before a proxy-account bet that would have won at unusually high multiplier?
- Rotation timing correlated with daily peak traffic?
-
Pre-rotation / post-rotation distribution shift. Do multiplier distributions in the 100 rounds before a rotation differ statistically from the 100 rounds after? Tested via a two-sample Kolmogorov-Smirnov.
-
Nonce continuity. Within a single seed's lifetime, are all nonces accounted for? A gap in the nonce sequence suggests rounds were played that were not publicly recorded.
-
Commit-ahead verification. A secure provably-fair operator commits to a chain of future seed hashes (commit-ahead). If the operator does not commit ahead, the rotation window becomes an attack vector. We record and publish commit-ahead depth for every audited game.
Rotation Analysis outputs:
- A rotation timeline chart (public layer).
- Statistical test results for each of the five checks (technical appendix).
- An overall "Rotation Risk Score" (Clean / Low / Medium / High / Critical).
A Medium or worse Rotation Risk Score downgrades the game's Column C from positive to inconclusive, even if basic hash verification passes.
4.4 Continuous Monitoring Mode
For Gold Tier audits (ยง5), Rotation Analysis runs continuously. Data is ingested in near-real-time from Sources 1 and 3, and the Rotation Risk Score is updated daily. Any material change in the score within the monitoring window triggers a review cycle.
5. Evidence-Tiered Classification Thresholds
We classify games not on a single line in the sand, but on a tiered ladder where the evidence burden scales with the sample size. More data buys a tighter threshold.
5.1 The Three Tiers
| Tier | Round Count Range | Label on Reports |
|---|---|---|
| Tier 1 | 1,000 โ 5,000 | Provisional |
| Tier 2 | 5,000 โ 20,000 | Verified |
| Tier 3 | > 20,000 + โฅ3 months of continuous monitoring + zero incidents in window | Gold |
Below Tier 1's minimum, no conclusion is published โ the game is marked Insufficient Data on the Watchlist.
5.2 Whitelist Thresholds (a game is WHITELIST if all applicable rules below pass)
| Tier | RTP Deviation | Column B p-value | Column C Requirement |
|---|---|---|---|
| 1 Provisional | < ยฑ1.5 percentage points | > 0.10 | Hash chain valid (if applicable) |
| 2 Verified | < ยฑ0.8 percentage points | > 0.15 | + Rotation analysis clean |
| 3 Gold | < ยฑ0.4 percentage points | > 0.20 on all Phase-2 tests (BH-corrected) | + โฅ3 months continuous monitoring, zero incidents |
5.3 Blacklist Thresholds (a game is BLACKLIST if any applicable rule below triggers AND all three sources confirm)
| Tier | RTP Deviation | Column B p-value | Column C |
|---|---|---|---|
| 1 Provisional | > ยฑ4.0 percentage points | < 0.0001 | Confirmed hash mismatch |
| 2 Verified | > ยฑ2.5 percentage points | < 0.001 | Hash mismatch OR critical rotation anomaly |
| 3 Gold | > ยฑ1.5 percentage points | < 0.01 (BH-corrected, multiple tests failing) | Persistent hash or rotation anomalies |
5.4 Watchlist
Anything between the Whitelist and Blacklist thresholds. Watchlist is not a failure โ it is a state of ongoing observation. Games on the Watchlist are listed with a human-readable explanation of what is missing or inconclusive.
5.5 Incentive Alignment (business-strategic note)
Because Whitelist thresholds tighten as the Tier rises, an operator who wishes to move up (Provisional โ Verified โ Gold) must provide or allow collection of more data. This means:
- More user traffic to our analyzer pages on their games (our SEO gain).
- More transparency from them (our data moat deepens).
- More continuous observation opportunities (our rotation analysis gets stronger).
The methodology itself becomes an incentive-alignment mechanism. See MUST_READ ยง5.2 for the commercial implications.
6. Due Process
No audit conclusion is published without passing the Due Process workflow appropriate to its verdict type. This section governs the transition from internal candidate status to publicly visible listing.
6.1 Positive Transitions (WATCHLIST โ WHITELIST, or upward Tier moves)
- Window: 0 days (publish immediately).
- Notification: Operator notified after publication, not before.
- Rationale: Positive transitions improve a game's public status and do not carry defamation risk. Requiring a notice window on good news would slow down trust-building without benefit.
6.2 Negative Transitions (WHITELIST/WATCHLIST โ BLACKLIST)
- Window: 30 days.
- Workflow:
- Internal audit complete, Blacklist candidate status set.
- Two-founder review and sign-off (MUST_READ ยง9).
- Notification package sent privately to the operator, containing:
- The complete draft public report and technical appendix.
- All raw data used.
- A link to the methodology version (ยง7).
- An invitation to respond within 30 days.
- Operator response options during the 30-day window:
- Submit counter-evidence. If the operator provides data showing our conclusion is incorrect (e.g., we missed a data source, used the wrong derivation formula), we re-run the audit. If the re-run confirms the operator is right, we withdraw the candidate status, issue a public correction on our next transparency report, and the game stays on its previous listing.
- Dispute methodology. If the operator disputes our methodology but not our data, their dispute text is published side-by-side with our report when the Blacklist goes live.
- Do nothing. After 30 days, the Blacklist status automatically takes effect, and a note is attached to the report indicating the operator declined to respond.
- Fix the game. If the operator materially changes the game in response to our findings and asks for re-audit, we treat this as a new audit cycle. The original Blacklist report remains in the public archive as a historical record.
- At T+30 days: Blacklist goes live with the operator's response (or note of non-response) attached.
6.3 Emergency Blacklist
Reserved for situations where a 30-day window would cause ongoing harm to players. Triggers:
- An active exploit is observed harming players in real time.
- Source 1 (the operator's own disclosures) diverges from Sources 2 and 3 โ indicating the operator is publishing false data.
- Hash chain breakage with no plausible innocent explanation.
- Operator is actively destroying the evidence that would underlie our audit.
Workflow:
- Window: 72 hours.
- Required sign-offs: Both founders + one external advisor (journalist, academic, or former regulator โ maintained as a standing panel of 3, of whom any one must sign).
- Publication format: A brief
Urgent Safety Noticeis published on the front page of Clash Watchdog AI immediately, linking to the full emergency report when the 72-hour window closes. - Parallel notification: Emergency reports are shared simultaneously with relevant regulators, consumer-protection NGOs, and at least two established gambling-industry journalists.
Emergency Blacklist is a rare and heavy action. We expect to use it at most 1โ2 times in the first 24 months of operation.
6.4 Methodology Upgrade Re-Classification
When a MINOR or MAJOR methodology version change would cause any existing listing to change state:
- Pre-notice window: 14 days.
- What we publish during pre-notice:
- The new methodology version and changelog.
- A list of which games will change state under the new methodology, and how.
- An explanation of the upgrade's rationale.
- After 14 days: The new methodology takes effect. Old reports remain anchored to their original methodology version (ยง7.3).
6.5 The Two-Founder Rule (v1.0 only)
For the first 24 months of operation, every public listing change โ other than routine positive Watchlist โ Watchlist updates โ requires the explicit written sign-off of both founders. This is a manual discipline layer that slows us down enough to catch mistakes before they become public. When the team grows beyond two, this rule is replaced by a formal review board.
7. Versioning & Reproducibility
This methodology document follows Semantic Versioning 2.0.0. Every public audit report is permanently anchored to the methodology version under which it was produced.
7.1 Version Number Semantics
| Change Type | Version Bump | Examples |
|---|---|---|
| MAJOR (vX.0.0) | Threshold changes, test suite removals, new columns, anything that could cause a currently-published verdict to flip | Adding Column D, changing Whitelist RTP tolerance from ยฑ0.8% to ยฑ0.5%, removing A-D test |
| MINOR (v1.X.0) | New additive capability, new test, new data source, no conclusion can flip | Adding NIST tests, enabling continuous monitoring, adding new operators to the schema |
| PATCH (v1.0.X) | Clarifications, typo fixes, worked-example additions, no computational change | Rewriting a paragraph, adding a glossary term |
7.2 Approval Rules
- PATCH: Either founder may commit.
- MINOR: Both founders must sign.
- MAJOR: Both founders + one external advisor must sign. Followed by 14-day pre-notice per ยง6.4.
7.3 Conclusion Anchoring
Every audit report header includes:
game: stake-crash
audit_date: 2026-06-15
methodology_version: 1.1.0
evidence_tier: 2
verdict: WHITELIST
data_snapshot_hash: <SHA-256 of the full raw-data bundle>
notebook_commit: <GitHub commit hash of the reproduction notebook>
When we later upgrade the methodology from v1.1 to v1.2, this report keeps its v1.1.0 anchor. It does not get silently re-run. The current state of the listing may be overridden by a new audit under the new methodology, but the original report never mutates.
7.4 Archive Policy
All past methodology versions are preserved under /methodology/archive/v1.0.0, /methodology/archive/v1.1.0, etc. Old versions are never deleted. Anyone auditing our history must be able to verify, five years from now, what rules were in force on any given date.
7.5 The Reproducibility Promise (the single most important commitment)
Any audit report published by Clash Watchdog AI must be reproducible by any third party, five years after publication, using only: (a) the raw data snapshot referenced in the report header, (b) the methodology document at the version pinned in the report header, (c) the notebook at the commit hash pinned in the report header. The reproduction must yield the same verdict. If it does not, the report was not valid when published, and we will retract it and publish a correction in our next transparency report.
This promise is not aspirational. It is a functional requirement of every artifact we produce.
8. Legal Language Rules
Our methodology is technical. Our words must also be technical. Loose language is a legal attack surface we refuse to expose.
8.1 Words We Use
- "In N observations, Game X's mean RTP was measured at Y%, which deviates from the operator's declared Z% by W percentage points."
- "Game X's multiplier distribution failed a chi-square goodness-of-fit test at p = A against a theoretical distribution derived from the operator's stated parameters."
- "Game X is currently on our Watchlist pending additional sample collection."
- "Game X's provably-fair hash chain could not be verified for N rounds within the observation window."
- "Our sample is insufficient to support a conclusion at this time."
8.2 Words We Never Use
- "Game X is cheating."
- "Game X is rigged."
- "Game X is a scam."
- "Don't play Game X."
- "Game X has been caught stealing from players."
- "We predict that Game X will crash at Y."
- Any imperative or prophetic language whatsoever.
8.3 Tense and Mood
- All factual claims in past indicative: "was measured," "failed," "deviated."
- All current-status statements in present descriptive: "is on our Watchlist," "is pending review."
- No future tense about game outcomes. Ever. The only future tense we use is about our own actions: "will be re-audited on date X."
- No subjunctive mood about operators' intent. We describe data, not motivation.
8.4 Numerical Precision Rules
- Percentages to one decimal place in public layer, three decimals in technical appendix.
- Confidence intervals always explicit (never a point estimate without a range).
- p-values to two significant figures.
- Sample sizes always visible in the same viewport as the conclusion.
9. Deferred & Open Items
These decisions are flagged for resolution before v1.0.0 is ratified. They are documented here so that no one can pretend they were forgotten.
| # | Item | Current Placeholder | Decision Required By |
|---|---|---|---|
| 1 | Real-money budget for Source 3 proxy accounts, first 6 months | $3,000โ5,000 total, $600โ1,000 per game across 3โ5 games | Pre-ratification |
| 2 | Minimum sample sizes (Column A = 5,000; B = 2,000; C = 1,000 hash) | As shown in ยง2.3 | Pre-ratification, to be confirmed via power analysis |
| 3 | Exact theoretical distribution formula for Aviator-style games | TBD โ most operators publish approximate RTP but not the exact RNG parameterization | Before first non-crypto audit |
| 4 | Choice of external advisor panel for Emergency Blacklist sign-off | TBD โ need to identify 3 candidates | Before first audit publication |
| 5 | Hosting location for raw data snapshots (immutability requirement) | Candidates: S3 with object lock; IPFS; Arweave; tiered approach | Before first data collection |
| 6 | Company jurisdiction (also open in MUST_READ ยง13) | Candidates: Delaware, UK, Netherlands, Estonia | Before first paid audit |
| 7 | Legal review of the Legal Language Rules (ยง8) | TBD โ engage a media-law attorney | Before first Blacklist publication |
| 8 | Statistics background and onboarding plan for the founding team | TBD โ see ยง11 | Within 30 days of this document being ratified |
10. Phased Build Roadmap (binding)
This is the build order for the audit system itself. It is binding in the sense that we will not ship audit conclusions using a capability before the capability has been built and tested.
Phase 1 โ Months 0โ3 (minimum viable audit)
- Column A: mean RTP + bootstrap CI
- Column B: chi-square + Kolmogorov-Smirnov
- Column C: SHA-256 hash chain verification (full chain, no rotation analysis)
- Data sources: Source 1 + Source 3 only (Source 2 community ingest deferred to Phase 2)
- Tiers: Only Tier 1 (Provisional) verdicts are published
- Publications expected: 3 Provisional audits by end of Phase 1 (Stake Crash, BC.Game Crash, Roobet Crash)
- Build artifacts: one reusable Jupyter notebook template, one data ingestion pipeline, the
/methodology/audit-v1.mddocument live on the site
Phase 2 โ Months 3โ6 (full-column capability)
- Column A: + variance decomposition, + drift detection
- Column B: + Anderson-Darling, + Runs test, + autocorrelation at lags 1โ50
- Column C: + Rotation Analysis (full ยง4.3 capability)
- Data sources: Source 2 community ingest opens (browser extension beta)
- Tiers: Tier 2 (Verified) verdicts become possible
- Publications expected: 5 additional audits, first Verified upgrades for Phase 1 games
- This is also when the first non-crypto game (Aviator, audited through Sources 1 + 2, no Column C) is expected
Phase 3 โ Months 6โ12 (continuous monitoring + academic rigor)
- Column B: + Benford, + NIST SP 800-22 selected tests, + Benjamini-Hochberg correction
- Column C: + continuous monitoring mode, + multi-operator rotation comparison
- Data sources: all three sources mature
- Tiers: Tier 3 (Gold) verdicts become possible for the longest-running audits
- Publications expected: continuous updates, 10 total audited games, first Gold whitelist
- First external academic review engagement
11. Team Onboarding (Statistics)
Because the methodology depends on statistical literacy and neither founder has been publicly confirmed to have a formal statistics background, the founding team commits to the following within 30 days of document ratification:
Required reading / working-through:
- Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms โ chapter 3 (Random Numbers). This is the canonical treatment of RNG testing.
- NIST Special Publication 800-22 Rev. 1a โ A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. Public domain.
- Wasserman, All of Statistics โ chapters 9 (Hypothesis Testing), 10 (Likelihood), 13 (Nonparametric).
- Gentle, Computational Statistics โ chapters on bootstrap and Monte Carlo.
- Lehmann & Romano, Testing Statistical Hypotheses โ chapter 15 (Nonparametric) for goodness-of-fit theory.
Practical onboarding:
- Reproduce an existing published audit of a random number generator (NIST has published example reports).
- Build a toy end-to-end audit of a simulated "honest" crash game and a simulated "slightly biased" crash game. Verify that the methodology correctly classifies both.
- Schedule a paid 2-hour consultation with a statistician before the first public audit is published.
If within 30 days we determine that one of us must focus full-time on statistics while the other handles engineering and content, we update MUST_READ ยง9 (Team Structure) accordingly.
12. Glossary
Short definitions of terms used throughout this document. Expanded definitions with examples are maintained in the public Learn cluster on the website.
- Client seed โ An input that the player can optionally customize to influence (but not choose) the outcome of a provably-fair round. Combined with
server_seedandnoncevia HMAC. - Crash game โ A gambling game in which a multiplier rises from 1.00ร at a variable rate, and the player must "cash out" before the multiplier "crashes" at a randomly determined point.
- Due Process โ Our workflow for transitioning an internal verdict to a publicly visible listing; see ยง6.
- Evidence Tier โ One of Provisional, Verified, or Gold; determines how strict the threshold for Whitelist/Blacklist classification is; see ยง5.
- Hash chain โ A cryptographic sequence in which each server seed is pre-committed via SHA-256 before any round under it is played.
- HMAC โ Keyed hash message authentication code. Used in provably-fair derivations to bind a result to both the server's secret and the player's visible inputs.
- Nonce โ A counter that increments by one for each round within a single server seed's lifetime. Combined with
client_seedin the HMAC input. - Provably fair โ A family of techniques that allow players to mathematically verify, after the fact, that no round result was modified from what was committed to before the round began.
- Rotation Risk Score โ Our internal score (Clean / Low / Medium / High / Critical) that summarizes the ยง4.3 rotation analysis for a game.
- RTP (Return to Player) โ The percentage of all bets made that is paid back to players over the long run. The complement of house edge.
- Server seed โ A random 256-bit value generated by the game operator, committed via SHA-256 hash before play, and revealed after all rounds under that seed have been played.
- Three-Source Cross-Validation โ Our data collection doctrine: official disclosure + community crowd-sourcing + self-operated proxy accounts, triangulated against each other.
- Watchlist โ Our public state for games under active audit but without enough evidence (or with conflicting evidence) to assign a Whitelist or Blacklist verdict.
13. Next Actions (before v1.0.0 ratification)
- Resolve all Deferred & Open Items (ยง9).
- Complete the 30-day statistics onboarding plan (ยง11).
- Write and test the Phase 1 reference notebook on a simulated honest/biased pair.
- Engage a media-law attorney for a one-hour review of ยง8 (Legal Language Rules).
- Register at least one Source 3 proxy account on Stake Crash and begin ingestion.
- Stand up the data provenance ledger table in the Phase 1 database.
- Sign this document (both founders) and move its status from
DRAFTtoRATIFIED.
Draft authored by the founding team, 2026-04-15. Version 1.0.0-DRAFT โ "The Scale Before the First Weight."