Every audit published by Clash Watchdog AI is reproducible by any third party, five years after publication, using only three things: the raw data snapshot, the methodology version at publication, and the Jupyter notebook commit pinned in the report. This is not marketing — it's a structural requirement built into every layer of how the audit pipeline is designed. This article explains why, how, and what it means for trust.
What does "reproducible five years later" mean?
It means that any person — a journalist, an academic, a regulator, a competitor, a skeptic — can take our published audit report, follow the links in its header, obtain the exact data we used, run the exact code we ran, and get the exact same results we published.
Not "similar" results. Not "consistent" results. The exact same numbers, to the decimal.
This is possible because every component of the audit is versioned, pinned, and archived:
- The data is a snapshot — a frozen, hashed archive of the round data at the time of analysis. The hash is published in the report header. If anyone modifies the data, the hash will not match.
- The methodology is version-pinned. The report says "conducted under Methodology v1.0.0" and links to the permanent URL of that version. Even if we later publish v2.0.0 with different thresholds, the v1.0.0 document remains available at its original URL.
- The code is a Jupyter notebook committed to a Git repository. The report links to the exact commit hash. Even if we later modify the notebook, the old commit remains accessible.
Five years from now, all three components will still be accessible. The data snapshot is archived. The methodology version is permanent. The notebook commit is immutable in Git history.
Why do we care about a five-year horizon?
Three reasons:
Trust requires falsifiability. A claim that cannot be tested is not a claim — it is an assertion. Trust institutions — from scientific journals to credit rating agencies — derive their authority from the testability of their conclusions. If our audits were sealed reports with no way to verify them, they would carry exactly as much credibility as the operator's own claims: trust-me authority. Reproducibility converts our claims from "trust us" to "test us."
The industry needs a standard. As of 2026, there is no widely adopted standard for crash game auditing. RNG certification by testing labs (GLI, eCOGRA, BMM) is an established practice for traditional casino games, but these certifications are opaque — the methodology is not published, the data is not shared, and the reports are not reproducible by third parties. We believe the crash-game sector deserves an auditing standard that is as transparent as the provably fair technology it evaluates.
Our conclusions may be wrong. We are a two-person team in year one. We will make mistakes — in data collection, in statistical methodology, in interpretation. Reproducibility is our mechanism for catching those mistakes. If a statistician downloads our data, runs our notebook, and finds an error in our chi-squared test, we want to know. The alternative — publishing sealed conclusions and hoping they are right — is not a standard we are willing to set.
How do we technically guarantee reproducibility?
Four structural decisions, each addressing a specific reproducibility threat:
Decision 1: Pinned dependencies. Every Jupyter notebook includes a requirements.txt with exact version pins for every Python package used (e.g., numpy==1.24.3, not numpy>=1.24). This ensures that dependency updates do not change numerical results. The reproducibility environment is also containerized — a Dockerfile is provided that builds the exact environment used in the analysis.
Decision 2: Deterministic random seeds. Any step in the analysis that involves randomness (bootstrap sampling, Monte Carlo simulations) uses a fixed random seed that is documented in the notebook. This ensures that repeated runs produce identical results.
Decision 3: Data snapshot hashing. The raw data snapshot is hashed with SHA-256 before analysis. The hash is recorded in the notebook output and in the published report. Before running the analysis, the notebook verifies the hash of the loaded data against the expected hash. If they do not match, the notebook halts with an error.
Decision 4: Version-pinned methodology. The notebook imports the statistical thresholds (confidence levels, minimum sample sizes, pass/fail criteria) from the methodology document, referenced by version number. If the methodology version is v1.0.0, the notebook uses v1.0.0 thresholds, even if v2.0.0 is available. Threshold changes require a new audit under the new methodology — they do not retroactively change old results.
What happens if a reproduction ever fails?
A reproduction failure means one of three things:
-
Our code has a bug. The notebook produces different results when run on the same data by a different person. This is the most common cause of reproduction failures in scientific computing and is usually caused by floating-point precision differences, platform-specific behavior, or undocumented environmental dependencies.
-
Our data was corrupted or mislabeled. The data snapshot linked in the report does not match the data we actually analyzed, perhaps due to a copy error or a labeling mistake.
-
Our published results are wrong. The notebook runs correctly on the correct data, but the numbers in the published report do not match the notebook output — a transcription or formatting error.
In any of these cases, the response is the same:
- Acknowledge the failure publicly.
- Identify the root cause.
- Fix the error.
- Publish a corrected audit report (new report, not a modification of the old one).
- Document the incident in the next transparency report.
We do not delete or modify published reports. The old report remains as a record of what we originally concluded. The new report supersedes it with a correction note. This is standard practice in scientific publishing (errata, corrigenda) and we adopt it directly.
Why don't other audit systems make this promise?
Most gambling audit systems operate under a different trust model: institutional authority. GLI, eCOGRA, and BMM are trusted because they are established institutions with regulatory accreditation. Their reports are accepted by regulators because the regulators trust the institution, not because the regulators verify the work.
This model works well when the institutional authority is robust — when the lab has a long track record, financial incentives aligned with accuracy, and regulatory oversight of its own processes. For traditional casino games with decades of regulatory history, institutional authority is a reasonable trust model.
For crash games — a new category, in lightly regulated markets, with operators in offshore jurisdictions — institutional authority is less available. Many crash game operators are not subject to the regulatory frameworks that make traditional lab certification meaningful. And the operators who most need auditing are precisely the ones least likely to seek it voluntarily.
Our model is different: evidentiary authority. We are not asking anyone to trust us as an institution. We are asking them to verify our work. The trust comes not from who we are but from what we show. Our methods are public. Our data is public. Our code is public. Our results are testable.
This is a higher bar than institutional authority. It is also a more appropriate bar for a new organization that has not yet earned institutional trust. We will earn it — through a track record of accurate, reproducible audits — but until then, reproducibility is our substitute for reputation.
How can you reproduce a report today?
The steps, once an audit report is published:
- Open the audit report (e.g., /games/stake-crash/audits/2026-06-15).
- Find the reproducibility block in the report header. It contains three links: data snapshot URL, methodology version URL, notebook commit URL.
- Download the data snapshot. Verify its SHA-256 hash against the hash in the report.
- Clone the methodology repository. Check out the specific commit linked in the report.
- Install the environment. Use the provided
requirements.txtorDockerfile. - Run the notebook. Load the data snapshot, execute all cells.
- Compare output to the published report. Every number, chart, and conclusion should match.
If anything does not match, document the discrepancy and contact us. We will investigate publicly and publish our findings.
This process takes about 30 minutes for someone with basic Python and Jupyter experience. It does not require special access, special tools, or special permissions. Everything is public.
That is the point. Transparency that requires special access is not transparency. Reproducibility that requires special tools is not reproducibility. Our standard is: anyone with a laptop and an internet connection can verify our work. That is the standard we hold ourselves to, and it is the standard against which we ask to be judged.
See our methodology for the full specification. See our methodology archive for all historical versions. See our Trust page for our broader commitments to independence and transparency.