Why three sources and not two or four?

Three is the minimum for meaningful triangulation. With two sources, a disagreement tells you something is wrong but not which source is wrong. With three, you can identify the outlier. Four or more would increase confidence but also increase cost and operational complexity beyond what is necessary for the statistical power we need. If we later find that three sources are insufficient for specific game architectures, we will add more — but the three-source minimum is binding.

Can an operator trick our self-proxy accounts?

In theory, yes — if the operator identifies our accounts and serves them different outcomes. In practice, this requires the operator to detect which accounts are ours (we use standard player profiles with realistic behavior patterns) and to maintain a separate outcome pipeline for those accounts without breaking their provably fair commitments. For provably fair games, serving different outcomes to specific accounts would cause hash verification failures. For non-provably-fair games, the risk is real, which is why self-proxy data is never used alone.

How do you protect community data from poisoning?

Three mechanisms: (1) Statistical outlier detection — contributed data that deviates significantly from both operator data and self-proxy data is flagged and excluded from the primary analysis. (2) Source diversity — we require community data from at least 5 independent contributors before including it. A single contributor cannot move the aggregate. (3) Cross-validation — community data is never the sole basis for any conclusion. It must be corroborated by at least one other source. Poisoned community data would need to compromise multiple independent contributors simultaneously.

What happens to audits where only two sources are available?

The audit proceeds but cannot achieve Gold Tier classification. Two-source audits are capped at Tier 2 (Verified) regardless of sample size. The limitation is noted in the audit report, and the missing source is identified. For games where community data is unavailable (new or niche games), we rely on operator data and self-proxy data. For games where self-proxy accounts are not feasible (geo-restricted platforms), we rely on operator data and community data.

How much does three-source auditing cost to run?

More than single-source auditing, but less than you might expect. Operator data is free (publicly available APIs or game histories). Community data is crowdsourced. Self-proxy accounts require real deposits and real play, which costs real money — but we play at minimum bet sizes and treat the cost as a research expense. The total cost per audit is disclosed in our annual transparency report at /trust/transparency-reports. We believe this cost is justified because the alternative — single-source auditing — is insufficient for the trust standard we are trying to establish.

Why We Use Three Data Sources for Every Audit

Clash Watchdog AI audits every game using three independent data sources: the operator's own public disclosures, community-contributed data, and data from our self-operated proxy accounts. We publish a conclusion only when sources agree, and we flag disagreement as the single strongest signal of potential manipulation. This article explains why the three-source doctrine is essential and what each source can and can't detect.

Why isn't one data source enough?

Because every single data source has a failure mode that allows manipulation to go undetected.

If you rely only on operator data, the operator controls what you see. They could publish sanitized data, exclude outlier rounds, or present data from a test environment rather than the live game. Operator-published data is useful — it is often the largest and most complete dataset available — but it is self-reported, and self-reported data has an obvious conflict of interest.

If you rely only on community-contributed data, you are vulnerable to selection bias and poisoning. Players who contribute data may disproportionately share unusual sessions (extremely bad or extremely good), which skews the sample. Malicious actors could submit fabricated data to influence an audit outcome in either direction. Community data is valuable for its independence, but it lacks the quality controls of systematic collection.

If you rely only on self-proxy data (your own accounts playing the game), your sample may not be representative. The operator could detect and flag your accounts. Your play patterns might not reflect the full range of conditions (different times of day, different bet sizes, different client seeds). Self-proxy data is the most controlled, but it is limited in volume and potentially compromised by detection.

Each source, alone, is insufficient. Together, they form a triangulation that is dramatically harder to fool. An operator who wants to pass a three-source audit would need to simultaneously:

Publish accurate data on their own platform (compromising their ability to hide manipulation)
Ensure that community contributors see the same data (requiring consistent behavior across all players)
Serve our proxy accounts the same data (requiring no account-level targeting)

If all three sources agree, the game is operating consistently for all players. If they disagree, the disagreement itself is the finding — and disagreement between independent sources is the strongest signal we have.

What does each of the three sources cover?

Source 1 — Operator Public Data

What it is: Round outcomes published by the operator through their platform — game history feeds, public APIs, provably fair hash chains, and any other data the operator makes available to all players.

What it can detect:

Whether the published hash chain is internally consistent (hashes link correctly)
The declared RTP and whether the published round data is consistent with it
Patterns in the published data (distribution shape, streak characteristics, instant-crash frequency)

What it cannot detect:

Whether the published data matches what players actually experienced
Whether the operator selectively publishes favorable data and withholds unfavorable data
Whether the operator serves different outcomes to different players

Our process: We scrape or API-pull operator data at regular intervals, timestamped and archived. The data is immutable once collected — we never modify collected data, and the collection timestamps are part of the audit record.

Source 2 — Community-Contributed Data

What it is: Round outcomes reported by independent players who choose to share their game histories with Clash Watchdog AI. Contributors are anonymous; we do not require accounts or personal information.

What it can detect:

Whether the data individual players see matches the data the operator publishes
Whether different players see different outcomes for the same rounds (a strong manipulation signal)
Whether the overall distribution experienced by diverse players matches the theoretical distribution

What it cannot detect:

Whether individual contributors are reporting accurately (they could make errors or fabricate data)
Whether the sample of contributors is representative of all players
Whether a coordinated group of contributors is poisoning the dataset

Our safeguards: We require minimum 5 independent contributors per audit. We apply statistical outlier detection to flag data that deviates significantly from the other two sources. Community data is never the sole basis for any audit conclusion — it must be corroborated.

Source 3 — Self-Proxy Accounts

What it is: Clash Watchdog AI operates its own player accounts on the platforms we audit. These accounts are funded with real money (at minimum bet sizes), play real rounds, and collect real data under conditions we fully control.

What it can detect:

The exact experience of a real player, with complete control over variables (client seed, bet size, timing)
Whether provably fair verification passes on rounds we actually played
Whether the game's behavior changes based on bet patterns, time of day, or account age

What it cannot detect:

Whether the operator has identified our accounts and is serving them different outcomes
Whether our play patterns (minimum bets, specific times) produce different data than high-volume players
Whether conditions during our data collection window are representative of long-term behavior

Our safeguards: We rotate accounts periodically. We vary play patterns (bet sizes, times, sessions). We never reveal which accounts are ours until after the audit is published. Self-proxy data is always cross-validated against the other two sources.

What happens when the three sources disagree?

Disagreement is the most important finding an audit can produce. It means that at least one of the following is true:

The operator is serving different data to different players. This is the most serious finding — it means the game is not operating uniformly, which is a prerequisite for fairness.
The operator is publishing inaccurate data. The public data does not match what players actually experience.
Community data is contaminated. One or more contributors are submitting inaccurate or fabricated data.
Our self-proxy data is compromised. The operator has identified our accounts and is treating them differently.

Our resolution process:

First, we identify which source is the outlier. If two sources agree and one disagrees, the disagreeing source is investigated.

If operator data is the outlier, the game is placed on the Watchlist pending investigation. Operator data disagreeing with both community and self-proxy data is a strong signal of data manipulation.

If community data is the outlier, we increase the contributor threshold and apply stricter outlier detection. We do not penalize the game for community data quality issues.

If self-proxy data is the outlier, we rotate accounts and recollect. Self-proxy data being the outlier is consistent with account-level targeting, which we treat as a serious finding if confirmed.

If all three sources disagree with each other, the audit is classified as Inconclusive and the game remains on the Watchlist until additional data resolves the discrepancy.

How do we prevent community data from being poisoned?

Data poisoning — the deliberate submission of false data to influence an audit outcome — is a real threat. An operator who anticipates a negative audit might submit fabricated positive data. A competitor might submit fabricated negative data.

Our defenses are layered:

Statistical filtering. Every contributed dataset is tested against the expected distribution for the game's declared parameters. Data that deviates by more than 3 standard deviations from the expected mean is flagged for review. This catches gross fabrication but not subtle manipulation.

Source diversity requirement. A minimum of 5 independent contributors is required before community data is included in an audit. A single contributor cannot move the aggregate, and coordinating 5 independent fabrications that are internally consistent and statistically plausible is operationally difficult.

Cross-source validation. Community data is never the sole basis for any finding. It must be corroborated by either operator data or self-proxy data. A finding that appears only in community data and not in the other sources is treated as a data quality issue, not a game issue.

Contributor reputation (Phase 2+). In future phases, we plan to implement a reputation system where contributors who consistently provide data that aligns with verified outcomes earn higher weighting. New contributors start with lower weighting. This creates an incentive for accuracy and a cost for fabrication.

Why do we run our own proxy accounts?

Self-proxy accounts are the most expensive and operationally complex data source. They require real money, real play, and real time. Why not just use the other two sources?

Because self-proxy accounts are the only source where we control every variable. We choose the client seed. We choose the bet size. We choose when to play. We record every round with a timestamp and a full verification chain. No other source provides this level of control.

Self-proxy data also enables specific tests that the other sources cannot:

Client seed manipulation testing. We set known client seeds and verify that the server's HMAC derivation is consistent. This catches operators who select favorable client seeds for automated accounts.

Timing-correlated behavior testing. We play at different times and days and test whether the game's distribution varies by time period. A fair game produces the same distribution regardless of when you play. A manipulated game might perform differently during high-traffic periods.

Account-level treatment testing. We create accounts with different profiles (new accounts, aged accounts, high-volume accounts, low-volume accounts) and test whether they experience the same distribution. Equal treatment across accounts is a basic fairness requirement.

These tests are possible only because we control the inputs. Community contributors cannot coordinate this level of experimental design, and operator data does not provide the per-account granularity.

How can you independently verify our findings?

Every audit report published by Clash Watchdog AI includes:

The methodology version used (pinned, not "latest")
The raw data snapshot hash (so you can verify you are working with the same data)
The Jupyter notebook commit hash (so you can run the exact same analysis)
The statistical test parameters (confidence level, sample size, test type)

To reproduce an audit, you need the raw data (linked in the report), the notebook (linked in the report), and the methodology version (linked in the report). Clone the repository, install the dependencies, run the notebook, and compare your output to our published results.

If your reproduction produces different results, that is a finding about our audit, not about the game. We take reproduction failures seriously — they are documented in the next audit cycle and the discrepancy is investigated.

This commitment to reproducibility is not common in gambling auditing. Most regulatory audits produce a pass/fail result with a sealed report. Our reports are open, our data is open, our methods are open. The reason is simple: we are trying to build trust in an industry where trust is scarce. Open methods are harder to fake than sealed ones. See Reproducibility for the full philosophy.

Clash Watchdog AI audits every game using three independent data sources: the operator's own public disclosures, community-contributed data, and data from our self-operated proxy accounts. We publish a conclusion only when sources agree, and we flag disagreement as the single strongest signal of potential manipulation. This article explains why the three-source doctrine is essential and what each source can and can't detect.

Why isn't one data source enough?

Because every single data source has a failure mode that allows manipulation to go undetected.

Each source, alone, is insufficient. Together, they form a triangulation that is dramatically harder to fool. An operator who wants to pass a three-source audit would need to simultaneously:

Publish accurate data on their own platform (compromising their ability to hide manipulation)
Ensure that community contributors see the same data (requiring consistent behavior across all players)
Serve our proxy accounts the same data (requiring no account-level targeting)

What does each of the three sources cover?

Source 1 — Operator Public Data

What it can detect:

Whether the published hash chain is internally consistent (hashes link correctly)
The declared RTP and whether the published round data is consistent with it
Patterns in the published data (distribution shape, streak characteristics, instant-crash frequency)

What it cannot detect:

Whether the published data matches what players actually experienced
Whether the operator selectively publishes favorable data and withholds unfavorable data
Whether the operator serves different outcomes to different players

Source 2 — Community-Contributed Data

What it can detect:

Whether the data individual players see matches the data the operator publishes
Whether different players see different outcomes for the same rounds (a strong manipulation signal)
Whether the overall distribution experienced by diverse players matches the theoretical distribution

What it cannot detect:

Whether individual contributors are reporting accurately (they could make errors or fabricate data)
Whether the sample of contributors is representative of all players
Whether a coordinated group of contributors is poisoning the dataset

Source 3 — Self-Proxy Accounts

What it can detect:

The exact experience of a real player, with complete control over variables (client seed, bet size, timing)
Whether provably fair verification passes on rounds we actually played
Whether the game's behavior changes based on bet patterns, time of day, or account age

What it cannot detect:

Whether the operator has identified our accounts and is serving them different outcomes
Whether our play patterns (minimum bets, specific times) produce different data than high-volume players
Whether conditions during our data collection window are representative of long-term behavior

What happens when the three sources disagree?

Disagreement is the most important finding an audit can produce. It means that at least one of the following is true:

The operator is serving different data to different players. This is the most serious finding — it means the game is not operating uniformly, which is a prerequisite for fairness.
The operator is publishing inaccurate data. The public data does not match what players actually experience.
Community data is contaminated. One or more contributors are submitting inaccurate or fabricated data.
Our self-proxy data is compromised. The operator has identified our accounts and is treating them differently.

Our resolution process:

First, we identify which source is the outlier. If two sources agree and one disagrees, the disagreeing source is investigated.

If operator data is the outlier, the game is placed on the Watchlist pending investigation. Operator data disagreeing with both community and self-proxy data is a strong signal of data manipulation.

If community data is the outlier, we increase the contributor threshold and apply stricter outlier detection. We do not penalize the game for community data quality issues.

If self-proxy data is the outlier, we rotate accounts and recollect. Self-proxy data being the outlier is consistent with account-level targeting, which we treat as a serious finding if confirmed.

If all three sources disagree with each other, the audit is classified as Inconclusive and the game remains on the Watchlist until additional data resolves the discrepancy.

How do we prevent community data from being poisoned?

Our defenses are layered:

Why do we run our own proxy accounts?

Self-proxy accounts are the most expensive and operationally complex data source. They require real money, real play, and real time. Why not just use the other two sources?

Self-proxy data also enables specific tests that the other sources cannot:

How can you independently verify our findings?

Every audit report published by Clash Watchdog AI includes:

The methodology version used (pinned, not "latest")
The raw data snapshot hash (so you can verify you are working with the same data)
The Jupyter notebook commit hash (so you can run the exact same analysis)
The statistical test parameters (confidence level, sample size, test type)

Why We Use Three Data Sources for Every Audit

Why isn't one data source enough?

What does each of the three sources cover?

Source 1 — Operator Public Data

Source 2 — Community-Contributed Data

Source 3 — Self-Proxy Accounts

What happens when the three sources disagree?

How do we prevent community data from being poisoned?

Why do we run our own proxy accounts?

How can you independently verify our findings?

Frequently Asked Questions

Related Articles

Why We Use Three Data Sources for Every Audit

Why isn't one data source enough?

What does each of the three sources cover?

Source 1 — Operator Public Data

Source 2 — Community-Contributed Data

Source 3 — Self-Proxy Accounts

What happens when the three sources disagree?

How do we prevent community data from being poisoned?

Why do we run our own proxy accounts?

How can you independently verify our findings?

Frequently Asked Questions

Related Articles